# WE07-Neural Networks

## Model Fitting

Lets fit the Logestic Regression Model, Support vector Machines, Decision Tree models and Neural Networks models on this data set and do the analysis and get the best model.

### Importing the packages

In [1]:
import pandas as pd
import numpy as np


from imblearn.under_sampling import RandomUnderSampler
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report,confusion_matrix, accuracy_score, precision_score, recall_score, f1_score



# setting random seed to ensure that results are repeatable
np.random.seed(7026)

### Importing the train and test data set from the Data processing notebook

In [2]:
X_train=pd.read_csv("smoke_train_X.csv")
X_test=pd.read_csv("smoke_test_X.csv")
y_train=pd.read_csv("smoke_train_y.csv")
y_test=pd.read_csv("smoke_test_y.csv")

train_df=pd.read_csv("smoke_train_df.csv")
test_df=pd.read_csv("smoke_test_y.csv")

### Standardizing the variables

We standardize our variables to eliminate the differences in scale between the variables/attributes.

We will use the sklearn library's 'standard scaler' to accomplish this. The standard scaler function will standardize our variables. To achieve this, we will first need to train the scaler on the training data and then apply this trained scaler to standardize both the training and validation sets.

In [3]:
scaler = StandardScaler()
scaler.fit(X_train)

# Transform the predictors of training and test sets
X_train = scaler.transform(X_train) 
 

X_test = scaler.transform(X_test)


### Checking for the Imbalance in data

In [4]:
train_df.Fire_Alarm.value_counts()

1    31421
0    12420
Name: Fire_Alarm, dtype: int64

We can clearlly observe the data imbalance in this data so now lets do the undersampling

### Undersampling the data to get the balace in the data

The reason for undersampling is that there are more observations in the data. So, undersampling th data can acheive the data balance and also helps us remove extra data

Lets use the random under sampler to undersample the data

In [5]:
undersample = RandomUnderSampler(sampling_strategy='majority')

In [6]:
X_train,y_train=undersample.fit_resample(X_train,y_train)

In [7]:
y_train.value_counts()

Fire_Alarm
0             12420
1             12420
dtype: int64

Now the data is balanced

### Deciding on the best evalution metrices

Our main aim is to reduce the False Alarms which are nothing but the False Positives(Detected the smoke but there is actualy no smoke). But neglectng the False Negatives is even more dangerous as it doesn't detect smoke but there is actually smoke which could potentially be a fire.

Which means we have deal with both False Negatives and False Positives and the best evalution metric for this is **'F1 SCORE'**.
F1 score is the harmonic mean of Recall and precision so it deals with both false negatives and false postives

## Modeling

Lets Create a data frame to store the results of each model

In [8]:
performance = pd.DataFrame({"model": [], "Accuracy": [], "Precision": [], "Recall": [], "F1": []})

### Neural Network

#### Without Randomsearch and gridsearch cv

In [9]:
%%time

ann = MLPClassifier(hidden_layer_sizes=(60,50,40), solver='adam', max_iter=200)
_ = ann.fit(X_train,np.ravel( y_train))

CPU times: total: 43.9 s
Wall time: 22.7 s


In [10]:
%%time
y_pred = ann.predict(X_test)

CPU times: total: 0 ns
Wall time: 22.5 ms


In [11]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      5453
           1       1.00      1.00      1.00     13336

    accuracy                           1.00     18789
   macro avg       1.00      1.00      1.00     18789
weighted avg       1.00      1.00      1.00     18789



## With RandomizedSearchCV

In [12]:
%%time

score_measure = "f1"
kfolds = 5

param_grid = {
    'hidden_layer_sizes': [ (50,), (70,),(50,30), (40,20), (60,40, 20), (70,50,40)],
    'activation': ['logistic', 'tanh', 'relu'],
    'solver': ['adam', 'sgd'],
    'alpha': [0, .2, .5, .7, 1],
    'learning_rate': ['constant', 'invscaling', 'adaptive'],
    'learning_rate_init': [0.001, 0.01, 0.1, 0.2, 0.5],
    'max_iter': [3000]
}

ann = MLPClassifier()
grid_search = RandomizedSearchCV(estimator = ann, param_distributions=param_grid, cv=kfolds, n_iter=100,
                           scoring=score_measure, verbose=1, n_jobs=-1,  # n_jobs=-1 will utilize all available CPUs 
                           return_train_score=True)

_ = grid_search.fit(X_train, y_train)

bestRecallTree = grid_search.best_estimator_

print(grid_search.best_params_)

Fitting 5 folds for each of 100 candidates, totalling 500 fits


  y = column_or_1d(y, warn=True)


{'solver': 'sgd', 'max_iter': 3000, 'learning_rate_init': 0.2, 'learning_rate': 'adaptive', 'hidden_layer_sizes': (40, 20), 'alpha': 0, 'activation': 'tanh'}
CPU times: total: 1min 10s
Wall time: 19min 26s


In [13]:
%%time
y_pred = bestRecallTree.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      5453
           1       1.00      1.00      1.00     13336

    accuracy                           1.00     18789
   macro avg       1.00      1.00      1.00     18789
weighted avg       1.00      1.00      1.00     18789

CPU times: total: 93.8 ms
Wall time: 82.4 ms


## With GridSearchCV

In [14]:
%%time

score_measure = "f1"
kfolds = 5

param_grid = {
    'hidden_layer_sizes': [ (50,30), (50,70), (50,90)],
    'activation': ['logistic', 'relu'],
    'solver': ['sgd'],
    'alpha': [0,.5 ],
    'learning_rate': ['adaptive', 'invscaling'],
    'learning_rate_init': [0.1,0.2,0.25],
    'max_iter': [5000]
}

ann = MLPClassifier()
grid_search = GridSearchCV(estimator = ann, param_grid=param_grid, cv=kfolds, 
                           scoring=score_measure, verbose=1, n_jobs=-1,  # n_jobs=-1 will utilize all available CPUs 
                           return_train_score=True)

_ = grid_search.fit(X_train, y_train)

bestRecallTree = grid_search.best_estimator_

print(grid_search.best_params_)

Fitting 5 folds for each of 72 candidates, totalling 360 fits


  y = column_or_1d(y, warn=True)


{'activation': 'relu', 'alpha': 0, 'hidden_layer_sizes': (50, 90), 'learning_rate': 'adaptive', 'learning_rate_init': 0.2, 'max_iter': 5000, 'solver': 'sgd'}
CPU times: total: 1min 16s
Wall time: 22min


In [15]:
%%time
y_pred = bestRecallTree.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      5453
           1       1.00      1.00      1.00     13336

    accuracy                           1.00     18789
   macro avg       1.00      1.00      1.00     18789
weighted avg       1.00      1.00      1.00     18789

CPU times: total: 141 ms
Wall time: 102 ms


In [16]:
c_matrix = confusion_matrix(y_test, y_pred)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
print(f"Accuracy={(TP+TN)/(TP+TN+FP+FN):.7f} Precision={TP/(TP+FP):.7f} Recall={TP/(TP+FN):.7f} F1={2*TP/(2*TP+FP+FN):.7f}")
F1_lr=2*TP/(2*TP+FP+FN)



Accuracy=0.9986162 Precision=0.9989504 Recall=0.9991002 F1=0.9990253


In [17]:
print(f"The F1 score from the Neural Networks model using Random Search and Grid Search is :{F1_lr}")



The F1 score from the Neural Networks model using Random Search and Grid Search is :0.9990252680512859


In [18]:
performance = pd.concat([performance, pd.DataFrame({'model':"neural network using random & grid search", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])

### Logistic Regression Model

#### Logistic Regression model using Random Search and Grid Search

In [19]:
%%time

score_measure = "f1"
kfolds = 5

param_grid = {'C':[0.001,0.01,0.1,1,10], # C is the regulization strength
               'penalty':['l1', 'l2','elasticnet','none'],
              'solver':['saga','liblinear'],
              'max_iter': np.arange(500,1000)
                  
}

lg = LogisticRegression()
rand_search = RandomizedSearchCV(estimator =lg, param_distributions=param_grid, cv=kfolds, n_iter=500,
                           scoring=score_measure, verbose=1, n_jobs=-1  # n_jobs=-1 will utilize all available CPUs 
                                )

_ = rand_search.fit(X_train, y_train)

print(f"The best {score_measure} score is {rand_search.best_score_}")
print(f"... with parameters: {rand_search.best_params_}")

bestlogestic = rand_search.best_estimator_

Fitting 5 folds for each of 500 candidates, totalling 2500 fits


840 fits failed out of a total of 2500.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
270 fits failed with the following error:
Traceback (most recent call last):
  File "C:\Users\arihe\anaconda3_new\lib\site-packages\sklearn\model_selection\_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\arihe\anaconda3_new\lib\site-packages\sklearn\linear_model\_logistic.py", line 1162, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "C:\Users\arihe\anaconda3_new\lib\site-packages\sklearn\linear_model\_logistic.py", line 64, in _check_solver
    raise ValueError(
ValueError: Only 'saga' solver supports elasticnet penalty, got solver=liblinear.

------------

The best f1 score is 0.9066978169406108
... with parameters: {'solver': 'saga', 'penalty': 'none', 'max_iter': 711, 'C': 0.1}
CPU times: total: 11.7 s
Wall time: 14min 43s




In [20]:
%%time

score_measure = "f1"
kfolds = 5
best_penality = rand_search.best_params_['penalty']
best_solver = rand_search.best_params_['solver']
min_regulization_strength=rand_search.best_params_['C']
min_iter = rand_search.best_params_['max_iter']

#Using the best parameters from the Random Search to use as range for the parameters to do the grid search
param_grid = {
    
    'C':np.arange(min_regulization_strength-0.05,min_regulization_strength+0.05), 
               'penalty':[best_penality],
              'solver':[best_solver],
              'max_iter': np.arange(min_iter-300,min_iter+300)
}

lgr =  LogisticRegression()
grid_search = GridSearchCV(estimator = lgr, param_grid=param_grid, cv=kfolds, 
                           scoring=score_measure, verbose=1, n_jobs=-1 # n_jobs=-1 will utilize all available CPUs 
                )

_ = grid_search.fit(X_train, y_train)

print(f"The best {score_measure} score is {grid_search.best_score_}")
print(f"... with parameters: {grid_search.best_params_}")

bestlgr = grid_search.best_estimator_

Fitting 5 folds for each of 600 candidates, totalling 3000 fits


  y = column_or_1d(y, warn=True)


The best f1 score is 0.9066978169406108
... with parameters: {'C': 0.05, 'max_iter': 709, 'penalty': 'none', 'solver': 'saga'}
CPU times: total: 14.8 s
Wall time: 57min 2s




In [21]:
c_matrix = confusion_matrix(y_test, grid_search.predict(X_test))
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
print(f"Accuracy={(TP+TN)/(TP+TN+FP+FN):.7f} Precision={TP/(TP+FP):.7f} Recall={TP/(TP+FN):.7f} F1={2*TP/(2*TP+FP+FN):.7f}")
F1_lr=2*TP/(2*TP+FP+FN)

Accuracy=0.9054234 Precision=0.9641796 Recall=0.9001950 F1=0.9310893


In [22]:
print(f"The F1 score from the Logestic Regression model using Random Search and Grid Search is :{F1_lr}")

The F1 score from the Logestic Regression model using Random Search and Grid Search is :0.9310893085663319


In [23]:
performance = pd.concat([performance, pd.DataFrame({'model':"logistic using random & grid search", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])

### SVM model

#### SVM using RandomSearch and Grid Search

In [24]:
%%time

score_measure = "f1"
kfolds = 3

param_grid = {'C':np.arange(0.1,100,10),  #  regularization parameter.
               'kernel':['linear', 'rbf','poly'],
              'gamma':['scale','auto'],
              'degree':np.arange(1,10), #degree is for the polynomial kernal
              'coef0':np.arange(1,10) #coef0 is for the polynomial kernal
                  
}

svc = SVC()
rand_search = RandomizedSearchCV(estimator =svc, param_distributions=param_grid, cv=kfolds, n_iter=500,
                           scoring=score_measure, verbose=1, n_jobs=-1  # n_jobs=-1 will utilize all available CPUs 
                                )

_ = rand_search.fit(X_train, y_train)

print(f"The best {score_measure} score is {rand_search.best_score_}")
print(f"... with parameters: {rand_search.best_params_}")

bestsvc = rand_search.best_estimator_

Fitting 3 folds for each of 500 candidates, totalling 1500 fits


  y = column_or_1d(y, warn=True)


The best f1 score is 0.9995169859781171
... with parameters: {'kernel': 'poly', 'gamma': 'scale', 'degree': 4, 'coef0': 2, 'C': 50.1}
CPU times: total: 17.7 s
Wall time: 5h 5min 53s


In [25]:
%%time

score_measure = "f1"
kfolds = 3
best_kernel = rand_search.best_params_['kernel']
best_gamma = rand_search.best_params_['gamma']
min_regulization=rand_search.best_params_['C']
best_degree = rand_search.best_params_['degree']
best_coef0=rand_search.best_params_['coef0']

#Using the best parameters from the Random Search to use as range for the parameters to do the grid search
param_grid = {
    
    'C':np.arange(min_regulization-3,min_regulization+3), 
               'kernel':[best_kernel],
              'gamma':[best_gamma],
              'degree': np.arange(best_degree-1,best_degree+1),
            'coef0': np.arange(best_coef0-3,best_coef0+3)
}

svm_grid =  SVC()
grid_search = GridSearchCV(estimator = svm_grid, param_grid=param_grid, cv=kfolds, 
                           scoring=score_measure, verbose=1, n_jobs=-1 # n_jobs=-1 will utilize all available CPUs 
                )

_ = grid_search.fit(X_train, y_train)

print(f"The best {score_measure} score is {grid_search.best_score_}")
print(f"... with parameters: {grid_search.best_params_}")

best_svm = grid_search.best_estimator_

Fitting 3 folds for each of 72 candidates, totalling 216 fits


  y = column_or_1d(y, warn=True)


The best f1 score is 0.9995572387656226
... with parameters: {'C': 47.1, 'coef0': 2, 'degree': 4, 'gamma': 'scale', 'kernel': 'poly'}
CPU times: total: 2.72 s
Wall time: 4min 27s


In [26]:
c_matrix = confusion_matrix(y_test, grid_search.predict(X_test))
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
print(f"Accuracy={(TP+TN)/(TP+TN+FP+FN):.7f} Precision={TP/(TP+FP):.7f} Recall={TP/(TP+FN):.7f} F1={2*TP/(2*TP+FP+FN):.7f}")
F1_svm=2*TP/(2*TP+FP+FN)

Accuracy=0.9996274 Precision=0.9997750 Recall=0.9997001 F1=0.9997375


In [27]:
print(f"The f1 score from the SVM model using Random Search and Grid Search is {F1_svm}")

The f1 score from the SVM model using Random Search and Grid Search is 0.9997375426493195


In [28]:
performance = pd.concat([performance, pd.DataFrame({'model':"svm using Random & Grid search", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])

### Decision Trees

#### Decision Trees using RandomSearchCV combined with GridSearchCV

In [29]:
%%time

score_measure = "f1"
kfolds = 5

param_grid = {
    'min_samples_split': np.arange(1,100),  
    'min_samples_leaf': np.arange(1,100),
    'min_impurity_decrease': np.arange(0.0001, 0.0005),
    'max_leaf_nodes': np.arange(5, 100), 
    'max_depth': np.arange(1,25), 
    'criterion': ['entropy', 'gini'],
}

dtree = DecisionTreeClassifier()
rand_search = RandomizedSearchCV(estimator = dtree, param_distributions=param_grid, cv=kfolds, n_iter=500,
                           scoring=score_measure, verbose=1, n_jobs=-1,  # n_jobs=-1 will utilize all available CPUs 
                           return_train_score=True)

_ = rand_search.fit(X_train, y_train)

print(f"The best {score_measure} score is {rand_search.best_score_}")
print(f"... with parameters: {rand_search.best_params_}")

bestRecallTree = rand_search.best_estimator_

Fitting 5 folds for each of 500 candidates, totalling 2500 fits
The best f1 score is 0.9993154821171236
... with parameters: {'min_samples_split': 3, 'min_samples_leaf': 4, 'min_impurity_decrease': 0.0001, 'max_leaf_nodes': 21, 'max_depth': 13, 'criterion': 'entropy'}
CPU times: total: 9.45 s
Wall time: 59.2 s


In [30]:
%%time

score_measure = "f1"
kfolds = 5
min_samples_split = rand_search.best_params_['min_samples_split']
min_samples_leaf = rand_search.best_params_['min_samples_leaf']
min_impurity_decrease = rand_search.best_params_['min_impurity_decrease']
max_leaf_nodes = rand_search.best_params_['max_leaf_nodes']
max_depth = rand_search.best_params_['max_depth']
criterion = rand_search.best_params_['criterion']
#Using the best parameters from the Random Search to use as range for the parameters to do the grid search
param_grid = {
    'min_samples_split': np.arange(min_samples_split-2,min_samples_split+2),  
    'min_samples_leaf': np.arange(min_samples_leaf-2,min_samples_leaf+2),
    'min_impurity_decrease': np.arange(min_impurity_decrease-0.0001, min_impurity_decrease+0.0001, 0.00005),
    'max_leaf_nodes': np.arange(max_leaf_nodes-2,max_leaf_nodes+2), 
    'max_depth': np.arange(max_depth-2,max_depth+2), 
    'criterion': [criterion]
}

dtree = DecisionTreeClassifier()
grid_search = GridSearchCV(estimator = dtree, param_grid=param_grid, cv=kfolds, 
                           scoring=score_measure, verbose=1, n_jobs=-1,  # n_jobs=-1 will utilize all available CPUs 
                           return_train_score=True)

_ = grid_search.fit(X_train, y_train)

print(f"The best {score_measure} score is {grid_search.best_score_}")
print(f"... with parameters: {grid_search.best_params_}")

bestRecallTree = grid_search.best_estimator_

Fitting 5 folds for each of 1024 candidates, totalling 5120 fits
The best f1 score is 0.9995973100074999
... with parameters: {'criterion': 'entropy', 'max_depth': 11, 'max_leaf_nodes': 22, 'min_impurity_decrease': 5e-05, 'min_samples_leaf': 2, 'min_samples_split': 3}
CPU times: total: 19.2 s
Wall time: 2min 8s


In [31]:
c_matrix = confusion_matrix(y_test, grid_search.predict(X_test))
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
print(f"Accuracy={(TP+TN)/(TP+TN+FP+FN):.7f} Precision={TP/(TP+FP):.7f} Recall={TP/(TP+FN):.7f} F1={2*TP/(2*TP+FP+FN):.7f}")
F1_Decisiontree=2*TP/(2*TP+FP+FN)

Accuracy=0.9994146 Precision=0.9998500 Recall=0.9993251 F1=0.9995875


In [32]:
print(f"The f1 score from the Decision Tree using Random Search and Grid Search is {F1_Decisiontree}")

The f1 score from the Decision Tree using Random Search and Grid Search is 0.9995874742171386


In [33]:
performance = pd.concat([performance, pd.DataFrame({'model':"Decision Tree", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])


## Conclusion

In [34]:
performance

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,neural network using random & grid search,0.998616,0.99895,0.9991,0.999025
0,logistic using random & grid search,0.905423,0.96418,0.900195,0.931089
0,svm using Random & Grid search,0.999627,0.999775,0.9997,0.999738
0,Decision Tree,0.999415,0.99985,0.999325,0.999587


We can observe that th F1 score of the Support Vector Machine model is more than the other models we used to performe the analysis. But there is very slight difference between the Neural Network and SVM and Decision Tree from which we can say that all these three models are a good fit for this data. 

We can say that the neural network is alos fitting the data very well and having much almost equal values to the SVM. But as Neural networks are particularly good at learning complex patterns and relationships in large datasets and our data set is about detecting fire which will have many complex relationships between the attributes and potentially more noise model  . As svm model is not good in handeling noise and decision tree model is not good in complex relations we can conclude that the **Neural Network** as the best model for our data set.