Final Project Group 13
Aakash Raj Dhakal
Mirna Elizondo


### Expected Outcomes
In this advanced machine learning project, the goal is to develop a robust predictive model for machine failures, which can have far-reaching benefits. By accurately predicting machine failures before they occur, businesses can enhance reliability, reduce costs, and improve maintenance strategies. This data-driven approach empowers decision-makers to optimize operations and gain a competitive advantage. Furthermore, it contributes to sustainability efforts by minimizing resource consumption and waste generation. In summary, the successful implementation of this model has the potential to revolutionize machinery management, driving efficiency, cost savings, and environmental responsibility.

### Methods:
Baseline: SVM

Advanced:

    1. RFC
    2. GBC
    3. LGBM
    
### Measures
 - Accuracy
 - Precision
 - Recall
 - F1 Score
 - AUC-ROC

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import RobustScaler
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix 
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier


In [2]:
train = pd.read_csv('train.csv', header=0)
test = pd.read_csv('test.csv', header=0)

In [3]:
train = train.reindex(columns=['id', 'Product ID', 'Type', 'Air temperature [K]',
       'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]',
       'Tool wear [min]', 'TWF', 'HDF', 'PWF', 'OSF',
       'RNF', 'Machine failure'])
test = test.reindex(columns=['id', 'Product ID', 'Type', 'Air temperature [K]',
       'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]',
       'Tool wear [min]', 'TWF', 'HDF', 'PWF', 'OSF',
       'RNF', 'Machine failure'])

In [4]:
print(train.shape)
print(test.shape)

(136429, 14)
(90954, 14)


In [5]:
train.dtypes

id                           int64
Product ID                  object
Type                        object
Air temperature [K]        float64
Process temperature [K]    float64
Rotational speed [rpm]       int64
Torque [Nm]                float64
Tool wear [min]              int64
TWF                          int64
HDF                          int64
PWF                          int64
OSF                          int64
RNF                          int64
Machine failure              int64
dtype: object

In [6]:
target = 'Machine failure'
cat_features = ['Type', 'TWF', 'HDF',
                   'PWF', 'OSF','RNF', 'Tool wear [min]']
num_features = ['Product ID', 'Air temperature [K]', 'Process temperature [K]', 
                         'Rotational speed [rpm]', 'Torque [Nm]']

In [7]:
train = pd.get_dummies(train, columns= cat_features, dtype=int).fillna(0).drop('Product ID', axis=1)
test = pd.get_dummies(test, columns= cat_features, dtype=int).fillna(0).drop('Product ID', axis=1)


In [8]:
failure_counts = train['Machine failure'].value_counts()

# Print the counts
print("Occurrences of Machine Failure:")
print("Value 0:", failure_counts[0])
print("Value 1:", failure_counts[1])

X = train.drop(['id', 'Machine failure'], axis=1)
y = train['Machine failure'].values
X_test = test.drop(['id', 'Machine failure'], axis=1)

Occurrences of Machine Failure:
Value 0: 134281
Value 1: 2148


In [14]:
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

In [None]:
model = SVC() 
model.fit(X_train, y_train) 
  
# print prediction results 
svc_predictions = model.predict(X_temp) 
print(classification_report(y_temp, svc_predictions))
print(confusion_matrix(y_temp, svc_predictions))

In [None]:
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

In [None]:
rfc= RandomForestClassifier(random_state=42)
rfc.fit(X_train, y_train)
rfc_predictions = rfc.predict(X_temp) 
print(classification_report(y_temp, rfc_predictions))
print(confusion_matrix(y_temp, rfc_predictions))

In [None]:
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

In [None]:
gbt = GradientBoostingClassifier(learning_rate=0.1, n_estimators=100,max_depth=3, min_samples_split=2, min_samples_leaf=1, subsample=1,max_features='sqrt', random_state=10)
gbt.fit(X_train,y_train)
gbt_predictions = gbt.predict(X_temp) 
print(classification_report(y_temp, gbt_predictions))
print(confusion_matrix(y_temp, gbt_predictions))