# Predictive Maintenance for an IoT Device
## Project Goal
The goal of this project is to build a machine learning model that predicts device failure based on simulated sensor data. This is a common and high-value use case for AI in the IoT industry.

### Step 1: Import Libraries and Load Data

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the dataset
df = pd.read_csv('simulated_iot_device_data.csv')
df['timestamp'] = pd.to_datetime(df['timestamp'])

print('Data loaded successfully. Shape: ', df.shape)
df.head()

### Step 2: Exploratory Data Analysis (EDA)
Let's visualize the sensor data over time to see if we can spot the patterns leading to a failure.

In [None]:
failures = df[df['failure'] == 1]['timestamp']

fig, axes = plt.subplots(3, 1, figsize=(15, 12), sharex=True)
fig.suptitle('Sensor Readings Over Time with Failures Highlighted', fontsize=16)

# Temperature Plot
axes[0].plot(df['timestamp'], df['temperature'], label='Temperature', color='blue', alpha=0.7)
axes[0].scatter(failures, [df.loc[df['timestamp'] == ts, 'temperature'].iloc[0] for ts in failures], color='red', label='Failure Event', zorder=5)
axes[0].set_ylabel('Temperature (°C)')
axes[0].legend()
axes[0].grid(True)

# Vibration Plot
axes[1].plot(df['timestamp'], df['vibration'], label='Vibration', color='green', alpha=0.7)
axes[1].scatter(failures, [df.loc[df['timestamp'] == ts, 'vibration'].iloc[0] for ts in failures], color='red', label='Failure Event', zorder=5)
axes[1].set_ylabel('Vibration')
axes[1].legend()
axes[1].grid(True)

# Rotation Speed Plot
axes[2].plot(df['timestamp'], df['rotation_speed'], label='Rotation Speed', color='purple', alpha=0.7)
axes[2].scatter(failures, [df.loc[df['timestamp'] == ts, 'rotation_speed'].iloc[0] for ts in failures], color='red', label='Failure Event', zorder=5)
axes[2].set_xlabel('Timestamp')
axes[2].set_ylabel('Rotation Speed (RPM)')
axes[2].legend()
axes[2].grid(True)

plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()

The plots clearly show that temperature and vibration tend to spike upwards in the periods just before a failure event, just as we designed in the simulation. This is a good sign that a model will be able to learn these patterns.

### Step 3: Feature Engineering
A single data point might not be predictive, but a trend is. We can capture trends by creating 'rolling' features, like the average or standard deviation of a sensor reading over the last 10 minutes. This gives the model context about the recent past.

In [None]:
WINDOW_SIZE = 10 # 10 minutes

for sensor in ['temperature', 'vibration', 'rotation_speed']:
    df[f'{sensor}_rolling_mean'] = df[sensor].rolling(window=WINDOW_SIZE).mean()
    df[f'{sensor}_rolling_std'] = df[sensor].rolling(window=WINDOW_SIZE).std()

# Drop rows with NaN values created by the rolling window
df.dropna(inplace=True)

print('New features created.')
df.head()

### Step 4: Model Training
Now we'll train a RandomForestClassifier. This model is an excellent choice for this task because it's robust, handles complex relationships well, and can tell us which features were most important for its predictions.

In [None]:
# Define features (X) and target (y)
features = ['temperature', 'vibration', 'rotation_speed', 'temperature_rolling_mean', 'temperature_rolling_std', 'vibration_rolling_mean', 'vibration_rolling_std', 'rotation_speed_rolling_mean', 'rotation_speed_rolling_std']
target = 'failure'

X = df[features]
y = df[target]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print('Data split into training and testing sets:')
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

# Initialize and train the model
model = RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced')
model.fit(X_train, y_train)

print('
Model training complete.')

### Step 5: Model Evaluation
Let's see how well our model performs on the unseen test data. In failure prediction, **Recall** is often the most important metric – it tells us what percentage of actual failures we successfully caught.

In [None]:
# Make predictions
y_pred = model.predict(X_test)

# Print classification report
print('--- Classification Report ---')
print(classification_report(y_test, y_pred))

# Display confusion matrix
print('--- Confusion Matrix ---')
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['No Failure', 'Failure'], yticklabels=['No Failure', 'Failure'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

### Step 6: Feature Importance
Let's check which features the model found most predictive.

In [None]:
importances = model.feature_importances_
feature_importance_df = pd.DataFrame({'feature': features, 'importance': importances}).sort_values('importance', ascending=False)

plt.figure(figsize=(10, 6))
sns.barplot(x='importance', y='feature', data=feature_importance_df, palette='viridis')
plt.title('Feature Importance')
plt.tight_layout()
plt.show()

## Conclusion
This notebook demonstrates an end-to-end workflow for a predictive maintenance task. We have successfully:
1. Loaded and visualized simulated IoT sensor data.
2. Engineered features (rolling averages) to capture time-series trends.
3. Trained a RandomForest model to predict device failures.
4. Evaluated the model and confirmed it can effectively identify failures from sensor readings.

This project serves as a strong portfolio piece for freelance work in AI for IoT.