# Model Building

In order to predict whether the sensors need maintenance or not, we had python scripts that simulated data in order for us to train our model. We previously analyzed the data, and concluded that, as our data was simulated, no preprocessing was needed, because it had no typos nor outliers.

In [19]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

In [20]:
temperature= pd.read_csv('./data/temperature_sensor_data.csv')

pressure = pd.read_csv('./data/pressure_sensor_data.csv')

airflow = pd.read_csv('./data/airflow_sensor_data.csv')

print("Temperature data:")
print(temperature.head(5))
print("Pressure data:")
print(pressure.head(5))
print("Airflow data:")
print(airflow.head(5))

Temperature data:
             Timestamp Sensor ID  Temperature Reading (°C) Operational Status  \
0  2023-01-01 00:00:00     TS313                 21.224431               Down   
1  2023-01-01 01:00:00     TS313                 23.510171            Working   
2  2023-01-01 02:00:00     TS313                 22.493797            Working   
3  2023-01-01 03:00:00     TS313                 21.049739            Working   
4  2023-01-01 04:00:00     TS313                 21.540939  Needs Maintenance   

  Last Maintenance Date  Ambient Humidity Heat Source Proximity Sensor Model  \
0   2022-11-22 00:00:00         59.450818                Medium        TS-T2   
1   2022-08-23 01:00:00         55.940795                   Far        TS-T2   
2   2022-08-30 02:00:00         48.724091                Medium        TS-T2   
3   2022-09-20 03:00:00         68.036250                   Far        TS-T2   
4   2022-07-30 04:00:00         44.025957                   Far        TS-T2   

  Last Calibra

We realised a 80/20 split would be the best four our models.

In [21]:
def split_data(df, test_size=0.2):
    X = df.drop('Operational Status', axis=1)
    y = df['Operational Status']

    X = pd.get_dummies(X)

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)
    return X_train, X_test, y_train, y_test

X_train_temp, X_test_temp, y_train_temp, y_test_temp = split_data(temperature)
X_train_press, X_test_press, y_train_press, y_test_press = split_data(pressure)
X_train_airflow, X_test_airflow, y_train_airflow, y_test_airflow = split_data(airflow)

## Models

We studied different models, and ended up using RandomForestClassifier, GradientBoostingClassifier and SVC.

In [22]:
def train_and_evaluate(X_train, X_test, y_train, y_test, model, model_name):
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    accuracy = accuracy_score(y_test, predictions)
    print(f"Accuracy of {model_name}: {accuracy * 100:.2f}%")

rf_model = RandomForestClassifier(random_state=42)
gb_model = GradientBoostingClassifier(random_state=42)
svm_model = SVC(random_state=42)

print("Temperature Sensor Data:")
train_and_evaluate(X_train_temp, X_test_temp, y_train_temp, y_test_temp, rf_model, "Random Forest")
train_and_evaluate(X_train_temp, X_test_temp, y_train_temp, y_test_temp, gb_model, "Gradient Boosting")
train_and_evaluate(X_train_temp, X_test_temp, y_train_temp, y_test_temp, svm_model, "SVM")

print("\nPressure Sensor Data:")
train_and_evaluate(X_train_press, X_test_press, y_train_press, y_test_press, rf_model, "Random Forest")
train_and_evaluate(X_train_press, X_test_press, y_train_press, y_test_press, gb_model, "Gradient Boosting")
train_and_evaluate(X_train_press, X_test_press, y_train_press, y_test_press, svm_model, "SVM")

print("\nAirflow Sensor Data:")
train_and_evaluate(X_train_airflow, X_test_airflow, y_train_airflow, y_test_airflow, rf_model, "Random Forest")
train_and_evaluate(X_train_airflow, X_test_airflow, y_train_airflow, y_test_airflow, gb_model, "Gradient Boosting")
train_and_evaluate(X_train_airflow, X_test_airflow, y_train_airflow, y_test_airflow, svm_model, "SVM")

Temperature Sensor Data:
Accuracy of Random Forest: 80.00%
Accuracy of Gradient Boosting: 80.00%
Accuracy of SVM: 80.00%

Pressure Sensor Data:
Accuracy of Random Forest: 88.50%
Accuracy of Gradient Boosting: 85.50%
Accuracy of SVM: 88.50%

Airflow Sensor Data:
Accuracy of Random Forest: 79.50%
Accuracy of Gradient Boosting: 79.50%
Accuracy of SVM: 79.50%
