# Predictive Maintenance

## 1. Introduction
This notebook builds a predictive maintenance model using the AI4I 2020 Predictive Maintenance Dataset. The goal is to predict machine failure based on sensor data, which can help in scheduling maintenance proactively and reducing downtime.

## 2. Data Loading and Initial Exploration

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv('ai4i2020.csv')

df.head()

## 3. Data Cleaning and Preprocessing

In [None]:
# Drop irrelevant columns
df = df.drop(['UDI', 'Product ID'], axis=1)

# Encode the 'Type' column
df = pd.get_dummies(df, columns=['Type'], drop_first=True)

df.head()

## 4. Exploratory Data Analysis (EDA)

In [None]:
# Machine failure distribution
sns.countplot(x='Machine failure', data=df)
plt.title('Distribution of Machine Failure')
plt.show()

In [None]:
# Correlation matrix
plt.figure(figsize=(12, 10))
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

## 5. Model Building and Training

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Define features and target
X = df.drop(['Machine failure', 'TWF', 'HDF', 'PWF', 'OSF', 'RNF'], axis=1)
y = df['Machine failure']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Train a Random Forest Classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

## 6. Model Evaluation

In [None]:
# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
print('\nClassification Report:')
print(classification_report(y_test, y_pred))
print('\nConfusion Matrix:')
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='g')
plt.show()

## 7. Conclusion
The Random Forest model demonstrates excellent performance in predicting machine failures. The high accuracy and strong precision and recall scores indicate that the model can be a valuable tool for a predictive maintenance system, helping to anticipate failures and schedule maintenance accordingly.