# Predictive Maintenance - Exploratory Data Analysis

This notebook performs exploratory data analysis on sensor and maintenance data to understand patterns that lead to equipment failure.

## Objectives

1. Load and examine the dataset
2. Analyze feature distributions
3. Explore correlations between features and failures
4. Visualize key patterns
5. Identify important features for modeling

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

import warnings
warnings.filterwarnings('ignore')

## 1. Load and Examine Dataset

In [None]:
# Load data
data = pd.read_csv('../data/sensor_maintenance_data.csv')

# Display basic information
print('Dataset Shape:', data.shape)
print('\nFirst few rows:')
data.head()

In [None]:
# Data information
print('Dataset Info:')
data.info()

print('\nBasic Statistics:')
data.describe()

In [None]:
# Check for missing values
print('Missing Values:')
data.isnull().sum()

In [None]:
# Target variable distribution
print('Failure Distribution:')
print(data['failure'].value_counts())
print('\nFailure Percentage:')
print(data['failure'].value_counts(normalize=True) * 100)

## 2. Feature Distributions

In [None]:
# Plot distributions of all features
features = ['temperature', 'vibration', 'pressure', 'hours_operated', 
            'days_since_maintenance', 'equipment_age']

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for idx, feature in enumerate(features):
    axes[idx].hist(data[feature], bins=30, edgecolor='black', alpha=0.7)
    axes[idx].set_title(f'{feature.replace("_", " ").title()} Distribution')
    axes[idx].set_xlabel(feature.replace('_', ' ').title())
    axes[idx].set_ylabel('Frequency')

plt.tight_layout()
plt.show()

In [None]:
# Box plots for feature comparison by failure status
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for idx, feature in enumerate(features):
    data.boxplot(column=feature, by='failure', ax=axes[idx])
    axes[idx].set_title(f'{feature.replace("_", " ").title()} by Failure Status')
    axes[idx].set_xlabel('Failure (0=No, 1=Yes)')
    axes[idx].set_ylabel(feature.replace('_', ' ').title())

plt.suptitle('')
plt.tight_layout()
plt.show()

## 3. Correlation Analysis

In [None]:
# Calculate correlation matrix
correlation_matrix = data.corr()

# Plot correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, 
            square=True, linewidths=1, cbar_kws={"shrink": 0.8})
plt.title('Feature Correlation Matrix')
plt.tight_layout()
plt.show()

In [None]:
# Correlation with failure
print('Correlation with Failure:')
failure_correlation = data.corr()['failure'].sort_values(ascending=False)
print(failure_correlation)

# Visualize
plt.figure(figsize=(8, 6))
failure_correlation[:-1].plot(kind='barh')
plt.title('Feature Correlation with Failure')
plt.xlabel('Correlation Coefficient')
plt.tight_layout()
plt.show()

## 4. Pairwise Relationships

In [None]:
# Pair plot for key features
key_features = ['temperature', 'vibration', 'pressure', 'failure']
sns.pairplot(data[key_features], hue='failure', diag_kind='kde', 
             palette={0: 'blue', 1: 'red'}, plot_kws={'alpha': 0.6})
plt.suptitle('Pairwise Relationships of Key Features', y=1.02)
plt.show()

## 5. Key Insights

Based on the exploratory analysis:

1. **Temperature**: Higher temperatures are strongly correlated with equipment failure
2. **Vibration**: Increased vibration levels indicate potential mechanical issues
3. **Pressure**: Elevated pressure readings are associated with higher failure risk
4. **Days Since Maintenance**: Longer periods without maintenance increase failure probability
5. **Equipment Age**: Older equipment tends to have higher failure rates

These insights will inform the feature engineering and model development process.

## Next Steps

1. Feature engineering (create interaction features, polynomial features)
2. Model training and evaluation (see `../src/maintenance_model.py`)
3. Hyperparameter tuning
4. Model deployment and monitoring