# Data preparation

       In order to prepare data we have to split them into 2 sets: train, validation. Train set is used to train a MLP model, and validation is used for evaluate results of models. Test set will be used from different data to compare best MLP with Fuzzy set.

In [11]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
from sklearn.model_selection import train_test_split

raw_data = pd.read_csv("ACI21-22_Proj1IoTGatewayCrashDataset.csv")
X = raw_data.drop('Falha', axis=1).to_numpy()
y = raw_data['Falha'].to_numpy()

#Splitting the data for train, validation and test set (80%, 20%)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2,  shuffle=False)

# Problem 3 Finding hyperparameters

    Table below presents different hyperparameters used during training of models. Unfortunetly GridSearchCV from sklearn package uses cross validation which can't be used in our example because of sequential data, where order matters. That's why we have to search of hyperparameters by outselfs.

    In previous versions different hyperparameters were used.

In [14]:
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import confusion_matrix

clfs = {
    #'adam_2': MLPClassifier(hidden_layer_sizes=(2,), max_iter=1000, random_state=1234),
    #'adam2_8': MLPClassifier(hidden_layer_sizes=(16,), max_iter=1000, random_state=1234),
    'adam3_32': MLPClassifier(hidden_layer_sizes=(32,), max_iter=1000, random_state=1234),
    #'adam1_2lay': MLPClassifier(hidden_layer_sizes=(32,16), max_iter=1000, random_state=1234),
    'adam2_2lay': MLPClassifier(hidden_layer_sizes=(40,20), max_iter=1000, random_state=1234),
    #'sgd1_2lay': MLPClassifier(hidden_layer_sizes=(2,),solver='sgd',max_iter=1000,random_state=1234),
    #'sgd2_8lay': MLPClassifier(hidden_layer_sizes=(8,),solver='sgd',max_iter=1000, momentum=0.9,random_state=1234),
    #'sgd3_32lay': MLPClassifier(hidden_layer_sizes=(32,),solver='sgd',max_iter=1000, momentum=0.9,random_state=1234),
    #'sgd1_adaptive_momentum': MLPClassifier(hidden_layer_sizes=(2,),solver='sgd',max_iter=1000, momentum=0,random_state=1234),
    #'sgd2_momentum': MLPClassifier(hidden_layer_sizes=(8,),solver='sgd',max_iter=1000, momentum=0.9,random_state=1234),
    #'sgd3_momentum': MLPClassifier(hidden_layer_sizes=(32,),solver='sgd',max_iter=1000, momentum=0.9,random_state=1234),
    #'lbfgs_no_momentum': MLPClassifier(hidden_layer_sizes=(2,), solver='lbfgs', max_iter=1000,random_state=1234),
    #'lbfgs_momentum': MLPClassifier(hidden_layer_sizes=(8,), solver='lbfgs', max_iter=1000, random_state=1234),
    'lbfgs_one_layer': MLPClassifier(hidden_layer_sizes=(32,), solver='lbfgs', max_iter=1000, random_state=1234),
    'lbfgs_two_layers': MLPClassifier(hidden_layer_sizes=(32,20), solver='lbfgs', max_iter=1000, random_state=1234),
}
def TrainModel(X_train, X_val, y_train, y_val):
    for clf_id, clf_name in enumerate(clfs):
        clf = clfs[clf_name]
        clf.fit(X_train, y_train)
        prediction = clf.predict(X_val)
        print(clf_name)
        print("Confusion matrix(tn, fp, fn, tp):", confusion_matrix(y_val, prediction).ravel())
        #print("Accuracy:",accuracy_score(y_val, prediction)) Accuracy is useless in our case
        print("Precision:",precision_score(y_val, prediction ,zero_division=0))
        print("Recall:",recall_score(y_val, prediction))
        print("F1 score:",f1_score(y_val, prediction))

In [15]:
TrainModel(X_train, X_val, y_train, y_val)

adam3_32
Confusion matrix(tn, fp, fn, tp): [389   0   4   7]
Precision: 1.0
Recall: 0.6363636363636364
F1 score: 0.7777777777777778
adam2_2lay
Confusion matrix(tn, fp, fn, tp): [388   1   2   9]
Precision: 0.9
Recall: 0.8181818181818182
F1 score: 0.8571428571428572
lbfgs_one_layer
Confusion matrix(tn, fp, fn, tp): [385   4   0  11]
Precision: 0.7333333333333333
Recall: 1.0
F1 score: 0.846153846153846
lbfgs_two_layers
Confusion matrix(tn, fp, fn, tp): [384   5   0  11]
Precision: 0.6875
Recall: 1.0
F1 score: 0.8148148148148148


#### Conclusion
Using simple MLP is not enough, best result are obtained by using solver lbfgs, which uses quasi-Newton methods. Looking at accuracy doesn't make sense, because to have high value of accuracy we don't even have to train a model. Neverless we should keep eye on recall and precision. Recall, which tells us about situation when there was a crash and system didn't detect is most crucial error that we care about. Our goal is to detect all crashes.

# Problem 4:

       In order to have information about previous request, function createPreviousumberOfRequestData(n) adds to our X_train set addition columns with information about previous request: Number_of_Requests(t-1), Number_of_Requests(t-2),...,Number_of_Requests(t-n).

In [16]:
def createPreviousumberOfRequestData(data, n):
    x = data.iloc[:, 1].to_numpy()
    req = data.iloc[:, :2].to_numpy()
    x = np.vstack((x,req[:,0]))
    for i in range(n):
        requestN =np.roll(req[:,0],i+1)
        x = np.vstack((x,requestN))
        
    return x.T

    By checking past request we tune the hyperameter n, which changes size of input layer.

In [201]:
for i in range(1,6):
    print("Number of request T-",i)
    X_train, X_val, y_train, y_val = train_test_split(createPreviousumberOfRequestData(raw_data, i), y, test_size=0.2,  shuffle=False)
    TrainModel(X_train, X_val, y_train, y_val)
    print()

Number of request T- 1
adam3_32
Confusion matrix(tn, fp, fn, tp): [388   1   6   5]
Precision: 0.8333333333333334
Recall: 0.45454545454545453
F1 score: 0.5882352941176471
adam2_2lay
Confusion matrix(tn, fp, fn, tp): [385   4   3   8]
Precision: 0.6666666666666666
Recall: 0.7272727272727273
F1 score: 0.6956521739130435
lbfgs_one_layer
Confusion matrix(tn, fp, fn, tp): [386   3   0  11]
Precision: 0.7857142857142857
Recall: 1.0
F1 score: 0.88
lbfgs_two_layers
Confusion matrix(tn, fp, fn, tp): [384   5   1  10]
Precision: 0.6666666666666666
Recall: 0.9090909090909091
F1 score: 0.7692307692307692

Number of request T- 2
adam3_32
Confusion matrix(tn, fp, fn, tp): [387   2   5   6]
Precision: 0.75
Recall: 0.5454545454545454
F1 score: 0.631578947368421
adam2_2lay
Confusion matrix(tn, fp, fn, tp): [385   4   1  10]
Precision: 0.7142857142857143
Recall: 0.9090909090909091
F1 score: 0.8
lbfgs_one_layer
Confusion matrix(tn, fp, fn, tp): [383   6   3   8]
Precision: 0.5714285714285714
Recall: 0.72

#### Conclusion
Best results:

Obtainted with 3 additional request from the previous timesteps:

    One hidden layer LBFGS
    Confusion matrix(tn, fp, fn, tp): [388   1   0  11]
    Precision: 0.9166666666666666
    Recall: 1.0
    F1 score: 0.9565217391304348
    
    Two layers LBFGS
    Confusion matrix(tn, fp, fn, tp): [387   2   0  11]
    Precision: 0.8461538461538461
    Recall: 1.0
    F1 score: 0.9166666666666666
    
Best result are obtained by looking at 3 past results by solver lbfgs. Using advice from second expert we were able to detect all crashes ( recall = 1 ). Using additional previous data (t-4) makes our model worse Recall = 0.9090909090909091 and Recall= 0.8181818181818182 for lbfgs.

# Problem 5

Using advice from second expert and instead of using the number of requests as the inputs for the system, we will try to create new feature with them. Our X = {load, normalized sum of t- request).

In [17]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
def transform(X):
    e = np.sum(X[:, 1:5],axis=1)
    result = np.vstack((X[:, 0],e)).T
    scaler.fit_transform(result)
    return result

X = transform(createPreviousumberOfRequestData(raw_data, 3))
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2,  shuffle=False)
TrainModel(X_train, X_val, y_train, y_val)

adam3_32
Confusion matrix(tn, fp, fn, tp): [389   0   4   7]
Precision: 1.0
Recall: 0.6363636363636364
F1 score: 0.7777777777777778
adam2_2lay
Confusion matrix(tn, fp, fn, tp): [388   1   2   9]
Precision: 0.9
Recall: 0.8181818181818182
F1 score: 0.8571428571428572
lbfgs_one_layer
Confusion matrix(tn, fp, fn, tp): [385   4   0  11]
Precision: 0.7333333333333333
Recall: 1.0
F1 score: 0.846153846153846
lbfgs_two_layers
Confusion matrix(tn, fp, fn, tp): [384   5   0  11]
Precision: 0.6875
Recall: 1.0
F1 score: 0.8148148148148148


#### Conclusion
    Best results:
    
    Two layers Adam:
    Confusion matrix(tn, fp, fn, tp): [388   1   2   9]
    Precision: 0.9
    Recall: 0.8181818181818182
    F1 score: 0.8571428571428572
    
    One layer lbfgs:
    Confusion matrix(tn, fp, fn, tp): [385   4   0  11]
    Precision: 0.7333333333333333
    Recall: 1.0
    F1 score: 0.846153846153846
    
    Two layers lbfgs:
    Confusion matrix(tn, fp, fn, tp): [384   5   0  11]
    Precision: 0.6875
    Recall: 1.0
    F1 score: 0.8148148148148148

Once again we recieve best result from two layers lbfgs.

# Fuzzy Rule Based Expert System

In [3]:
import numpy as np
import skfuzzy as fuzz
import matplotlib.pyplot as plt
from skfuzzy import control as ctrl

In [None]:
load = ctrl.Antecedent(np.arange(0, 1, 0.01), 'load')
sum_request = ctrl.Antecedent(np.arange(0, 1, 0.01), 'sum_request')
crash = ctrl.Consequent(np.arange(0, 2, 1), 'crash')

load['low'] = fuzz.trimf(load.universe, [0, 0, 0.5])
load['medium'] = fuzz.trimf(load.universe, [0, 0.5, 1])
load['high'] = fuzz.trimf(load.universe, [0.5, 1, 1])
sum_request['low'] = fuzz.trimf(sum_request.universe, [0, 0, 0.5])
sum_request['medium'] = fuzz.trimf(sum_request.universe, [0, 0.5, 1])
sum_request['high'] = fuzz.trimf(sum_request.universe, [0.5, 1, 1])
crash['no'] = fuzz.trimf(crash.universe, [0, 0, 0.4])
crash['yes'] = fuzz.trimf(crash.universe, [0.4, 1, 1])

rule1 = ctrl.Rule(load['high'] & sum_request['high'], crash['yes'])
rule2 = ctrl.Rule(load['low'] & sum_request['low'], crash['no'])
rule3 = ctrl.Rule(load['low'] & sum_request['medium'], crash['no'])
rule4 = ctrl.Rule(load['low'] & sum_request['high'], crash['no'])
rule5 = ctrl.Rule(load['medium'] & sum_request['low'], crash['no'])
rule6 = ctrl.Rule(load['medium'] & sum_request['medium'], crash['no'])
rule7 = ctrl.Rule(load['medium'] & sum_request['high'], crash['no'])
rule8 = ctrl.Rule(load['high'] & sum_request['low'], crash['no'])
rule9 = ctrl.Rule(load['high'] & sum_request['medium'], crash['no'])

crash_ctrl = ctrl.ControlSystem([rule1, rule2, rule3, rule4, rule5, rule6, rule7, rule8, rule9])

crashing = ctrl.ControlSystemSimulation(crash_ctrl)

#Testing 
crashing.input['load'] = 0.8
crashing.input['sum_request'] = 0.8
crashing.compute()

print(crashing.output['crash'])
crash.view(sim=crashing)
#We got binary problem and want binary output thay's why we have to use crisp output. Defuzzification of 
print(fuzz.lambda_cut(crashing.output['crash'], 0.5))
X = transform(createPreviousumberOfRequestData(raw_data, 3))
y = raw_data['Falha'].to_numpy()

# Generalization

In order to validate the models we will use different set of data (*ACI_Proj1_TestSet.csv*) to compare models.

In [27]:
test = pd.read_csv("ACI_Proj1_TestSet.csv",header=None)
X_test = test.iloc[:, :2].to_numpy()
y_test = test.iloc[:, 2].to_numpy()


X = raw_data.drop('Falha', axis=1).to_numpy()
y = raw_data['Falha'].to_numpy()

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2,  shuffle=False)

# Doing training once again, but on the same training set as previous. Not including new data set. 
# As a result of setting the same random seed models will be the same.


In [191]:
# MLP without advices from experts
TrainModel(X_train, X_test, y_train, y_test)

adam3_32
Confusion matrix(tn, fp, fn, tp): [188   4   7   1]
Precision: 0.2
Recall: 0.125
F1 score: 0.15384615384615385
adam2_2lay
Confusion matrix(tn, fp, fn, tp): [187   5   7   1]
Precision: 0.16666666666666666
Recall: 0.125
F1 score: 0.14285714285714288
lbfgs_one_layer
Confusion matrix(tn, fp, fn, tp): [190   2   2   6]
Precision: 0.75
Recall: 0.75
F1 score: 0.75
lbfgs_two_layers
Confusion matrix(tn, fp, fn, tp): [188   4   3   5]
Precision: 0.5555555555555556
Recall: 0.625
F1 score: 0.5882352941176471


###### Results
Model without modification doesn't work great. Regardless two layers MLP was able to detect 60% of crashes correctly.

In [211]:
# Problem 4
X_train, X_val, y_train, y_val = train_test_split(createPreviousumberOfRequestData(raw_data, 3), y, test_size=0.2,  shuffle=False)
TrainModel(X_train, createPreviousumberOfRequestData(test, 3), y_train, y_test)

adam3_32
Confusion matrix(tn, fp, fn, tp): [192   0   1   7]
Precision: 1.0
Recall: 0.875
F1 score: 0.9333333333333333
adam2_2lay
Confusion matrix(tn, fp, fn, tp): [191   1   0   8]
Precision: 0.8888888888888888
Recall: 1.0
F1 score: 0.9411764705882353
lbfgs_one_layer
Confusion matrix(tn, fp, fn, tp): [186   6   1   7]
Precision: 0.5384615384615384
Recall: 0.875
F1 score: 0.6666666666666667
lbfgs_two_layers
Confusion matrix(tn, fp, fn, tp): [190   2   1   7]
Precision: 0.7777777777777778
Recall: 0.875
F1 score: 0.823529411764706


###### Results
    Two layers adam:
    Confusion matrix(tn, fp, fn, tp): [191   1   0   8]
    Precision: 0.8888888888888888
    Recall: 1.0
    F1 score: 0.9411764705882353
    
    One layers lbfgs:
    Confusion matrix(tn, fp, fn, tp): [186   6   1   7]
    Precision: 0.5384615384615384
    Recall: 0.875
    F1 score: 0.6666666666666667
    
    Two layers lbfgs:
    Confusion matrix(tn, fp, fn, tp): [190   2   1   7]
    Precision: 0.7777777777777778
    Recall: 0.875
    F1 score: 0.823529411764706
Adding n new columns of previous requests at timestep t-n as inputs works very well. We got high recall with only 1-2 missed crashes.

In [212]:

# Problem 5
X = transform(createPreviousumberOfRequestData(raw_data, 3))
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2,  shuffle=False)
X_test = transform(createPreviousumberOfRequestData(test, 3))
TrainModel(X_train, X_test, y_train, y_test)

adam3_32
Confusion matrix(tn, fp, fn, tp): [191   1   3   5]
Precision: 0.8333333333333334
Recall: 0.625
F1 score: 0.7142857142857143
adam2_2lay
Confusion matrix(tn, fp, fn, tp): [190   2   1   7]
Precision: 0.7777777777777778
Recall: 0.875
F1 score: 0.823529411764706
lbfgs_one_layer
Confusion matrix(tn, fp, fn, tp): [190   2   1   7]
Precision: 0.7777777777777778
Recall: 0.875
F1 score: 0.823529411764706
lbfgs_two_layers
Confusion matrix(tn, fp, fn, tp): [189   3   1   7]
Precision: 0.7
Recall: 0.875
F1 score: 0.7777777777777777


###### Results
    One layers lbfgs:
    Confusion matrix(tn, fp, fn, tp): [190   2   1   7]
    Precision: 0.7777777777777778
    Recall: 0.875
    F1 score: 0.823529411764706
    Two layers lbfgs:
    Confusion matrix(tn, fp, fn, tp): [189   3   1   7]
    Precision: 0.7
    Recall: 0.875
    F1 score: 0.7777777777777777
Creating new feature as normalized sum of previous request gives us respectful model, that mistakes only 1 false negative(worst scenario of not detecting a crash). 

In [28]:
#Fuzzy problem
X_test = transform(createPreviousumberOfRequestData(test, 3))
y_test = test.iloc[:, 2].to_numpy()

crashing.input['load'] = X_test[:,0]
crashing.input['sum_request'] = X_test[:,1]
crashing.compute()
pred=fuzz.lambda_cut(crashing.output['crash'], 0.5)
print("Confusion matrix(tn, fp, fn, tp):", confusion_matrix(y_test, pred).ravel())
print("Precision:",precision_score(y_test, pred ,zero_division=0))
print("Recall:",recall_score(y_test, pred))
print("F1 score:",f1_score(y_test, pred))

Confusion matrix(tn, fp, fn, tp): [185   7   6   2]
Precision: 0.2222222222222222
Recall: 0.25
F1 score: 0.23529411764705882


###### Results
    Confusion matrix(tn, fp, fn, tp): [187   5   7   1]
    Precision: 0.16666666666666666
    Recall: 0.125
    F1 score: 0.14285714285714288
    
Unfortunetly our fuzzy base system doesn't work with test data. Repeat of creation of fuzzy sets and rules is required.

# Conclusion

Creating model for data with imbalanced data is challenging, but with knowledge of experts in domain we are able to get accurate predictions. Simple brute-force MLP rarely is enough. We have to use additional information as input in such imbalanced data or create new feature using previous timesteps. Recall is the most usefull score among accuracy and precision, but F1 score is also useful. Unfortunetly fuzzy this time failed, but it doesn't mean that it wouldn't suit this problem. Propably person creating Expert System isn't fuzzy person.