# Induction Motor Fault Diagnosis - Data Exploration

This notebook explores and analyzes the motor fault data to develop a diagnostic model.

## 1. Environment Setup and Data Loading

First, let's import the required libraries and load our data.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Set display options
pd.set_option('display.max_columns', None)
plt.style.use('seaborn')

Matplotlib is building the font cache; this may take a moment.


ModuleNotFoundError: No module named 'seaborn'

In [None]:
# Load the motor fault data
df = pd.read_csv('../data/sample_motor_data.csv')
print("Dataset shape:", df.shape)
df.head()

## 2. Data Exploration and Analysis

Let's examine the basic properties of our dataset.

In [None]:
# Display basic information about the dataset
print("Dataset Info:")
print("-" * 50)
df.info()

print("\nBasic Statistics:")
print("-" * 50)
df.describe()

In [None]:
# Check unique fault types and their distribution
print("Fault Type Distribution:")
print("-" * 50)
fault_distribution = df['fault_type'].value_counts()
print(fault_distribution)

# Visualize fault distribution
plt.figure(figsize=(10, 6))
fault_distribution.plot(kind='bar')
plt.title('Distribution of Motor Fault Types')
plt.xlabel('Fault Type')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 3. Feature Analysis

Analyze the relationship between different features and fault types.

In [None]:
# Create correlation matrix
numeric_cols = ['temperature', 'vibration_amplitude', 'current', 'voltage', 'speed_rpm', 'noise_level']
correlation_matrix = df[numeric_cols].corr()

# Plot correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Feature Correlation Matrix')
plt.tight_layout()
plt.show()

## 4. Feature Engineering and Preprocessing

Prepare the data for modeling by scaling features and encoding categorical variables.

In [None]:
# Scale numerical features
scaler = StandardScaler()
df_scaled = df.copy()
df_scaled[numeric_cols] = scaler.fit_transform(df[numeric_cols])

# Create feature pairs plot
sns.pairplot(df_scaled, hue='fault_type', vars=numeric_cols[:4])
plt.show()

## 5. Data Export

Save the preprocessed data for model training.

In [None]:
# Save processed data
df_scaled.to_csv('../data/processed_motor_data.csv', index=False)
print("Preprocessed data saved successfully!")

## 6. Project Features & Environment Notes

- Graphical User Interface (Tkinter) for easy data entry and diagnosis
- Upload new CSV data and update the model instantly
- Export diagnosis results to PDF
- Model evaluation and confusion matrix visualization
- Data distribution plots (seaborn)
- Test model on live or simulated data
- Online learning: add manual diagnosis to training data

**Environment Requirements:**
- Python 3.9 (recommended)
- All dependencies in `pyproject.toml`
- If using SHAP, Python <3.10 is required due to llvmlite limitation

**How to Run:**
1. Activate your Python 3.9 environment
2. Install dependencies
3. Run `main.py` to launch the GUI
