## Mini-Project Part 4: Supervised Learning (2)
### Naive Bayes, SVM, and Neural Network

Continue to use the dataset about adaptivity to online education. Encode data to integers (code from part 1)
###### Results at bottom

In [2]:
import pandas as pd

data = pd.read_csv(r"C:\Users\roryq\Downloads\online_adapt.csv")

# Encode `Age` to integers, 1, 2, 3, 4, 5, 6.
age_mapper = {'26-30':6, '21-25':5, '16-20':4, '11-15':3, '6-10':2, '1-5':1}
age_t = data['Age'].replace(age_mapper)

# Encode `Network Type` to integers, 2, 3, 4.
net_mapper = {'2G':2, '3G':3, '4G':4}
net_t = data['Network Type'].replace(net_mapper)

# Encode `Class Duration` to integers, 0, 1, 2.
class_mapper = {'0':0, '1-3':1, '3-6':2}
class_t = data['Class Duration'].replace(class_mapper)

# Replace `Age`, `Network Type`, `Class Duration` by their corresponding numeric versions.
data['Age'] = age_t
data['Network Type'] = net_t
data['Class Duration'] = class_t

# One-hot encode the rest of the variables except for the response variable, `Adaptivity Level`.
y = data['Adaptivity Level']
data1 = pd.get_dummies(data.drop('Adaptivity Level', axis=1), dtype=int )
data1.head(3)

Unnamed: 0,Age,Network Type,Class Duration,Gender_Boy,Gender_Girl,Education Level_College,Education Level_School,Education Level_University,Institution Type_Government,Institution Type_Non Government,...,Financial Condition_Mid,Financial Condition_Poor,Financial Condition_Rich,Internet Type_Mobile Data,Internet Type_Wifi,Self Lms_No,Self Lms_Yes,Device_Computer,Device_Mobile,Device_Tab
0,5,4,2,1,0,0,0,1,0,1,...,1,0,0,0,1,1,0,0,0,1
1,5,4,1,0,1,0,0,1,0,1,...,1,0,0,1,0,0,1,0,1,0
2,4,4,1,0,1,1,0,0,1,0,...,1,0,0,0,1,1,0,0,1,0


In [3]:
data1.shape

(1205, 26)

###  Gaussian Naive Bayes 
Fit a Gaussian Naive Bayes model on `data1` and `y`. 


1. Split data (80% training; 20% test) (set `random_state=100`)
2. Calculate the confusion matrix for the test set.
3. Report the accuracy as well as the precision, recall, and the F1 score for the test set.

In [4]:
from sklearn import datasets
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report, accuracy_score

# Create features and target for each model in this part
# Target is adaptability level
# Features are all other variables in the data set (available in readme file)
features = data1
target = y

# Specify train split random state and target/features for model
X_train, X_test, y_train, y_test = train_test_split(
    features, target, random_state=100, test_size=.2)

# select classifer for Gaussian
classifier_NB = GaussianNB()

# Create model with classifier and train
model = classifier_NB.fit(X_train, y_train)  

# Create predictions from model with test data
ypred=model.predict(X_test)

# Find accuracy
#create confusion matrix with accuracy for test and predicted values from model
accuracy = accuracy_score(y_test,ypred)
report = classification_report(ypred, y_test)
cm = confusion_matrix(y_test, ypred)

print("Classification report:")

print(report)
print("Accuracy: ",accuracy)
print("  ")
print("Confusion matrix:")
print(cm)


Classification report:
              precision    recall  f1-score   support

        High       0.57      0.41      0.47        32
         Low       0.75      0.58      0.65       119
    Moderate       0.50      0.70      0.58        90

    accuracy                           0.60       241
   macro avg       0.61      0.56      0.57       241
weighted avg       0.63      0.60      0.60       241

Accuracy:  0.6016597510373444
  
Confusion matrix:
[[13  3  7]
 [ 3 69 20]
 [16 47 63]]


###  Bernoulli Naive Bayes 

The performance of the Gaussian Naive Bayes model is very poor. Let's try to figure out the reason. Check the original data again. Most of the features are categorical variables. Though we've converted them to integers or dummy variables, they are not normal in nature. Probably a Bernoulli Naive Bayes model will fit the data better.

All variables are `object`, i.e. categorical variables! Now, extract the `y` variable and convert all features to dummies.

In [5]:
y=data['Adaptivity Level']
X_dummy = pd.get_dummies(data.drop(['Adaptivity Level'], axis=1))

Now, use a Bernoulli Naive model to fit the data, i.e. `X_dummy` and `y`.

1. Split data (80% training; 20% test) (set `random_state=100`)
2. Calculate the confusion matrix for the test set.
3. Report the accuracy as well as the precision, recall, and the F1 score for the test set.

In [6]:
from sklearn.naive_bayes import BernoulliNB

# Specify features and target for model
features = X_dummy
target = y

# Specify train split random state and target/features for model
X_train, X_test, y_train, y_test = train_test_split(
    features, target, random_state=100, test_size=.2)

# select classifer for Bernoulli
classifer_BNB = BernoulliNB()

# Create model with classifier and train
model_BNB = classifer_BNB.fit(X_train, y_train) 

# Create predictions from model with test data
ypred1=model_BNB.predict(X_test)

# Find accuracy
#create confusion matrix with accuracy for test and predicted values from model
accuracy = accuracy_score(y_test,ypred1)
report = classification_report(ypred1, y_test)
cm = confusion_matrix(y_test, ypred1)

print("Classification report:")
print(report)
print("Accuracy: ",accuracy)
print("  ")
print("Confusion matrix:")
print(cm)

Classification report:
              precision    recall  f1-score   support

        High       0.70      0.67      0.68        24
         Low       0.59      0.62      0.60        87
    Moderate       0.70      0.68      0.69       130

    accuracy                           0.66       241
   macro avg       0.66      0.65      0.66       241
weighted avg       0.66      0.66      0.66       241

Accuracy:  0.6556016597510373
  
Confusion matrix:
[[16  2  5]
 [ 1 54 37]
 [ 7 31 88]]


###  Support Vector Machine
#### Linear Support Vector Machine
Fit a SVM with a linear kernel.


0. Use `data1` and `y` for this question because SVM can take any type of variables.
1. Split data (80% training; 20% test) (set `random_state=100`)
2. Use the standard scaler to preprocess features.
3. Use `LinearSVC` from `sklearn.svm`.
4. Report the test set accuary, precision, recall, and the F1 score. 
5. Print out the confusion matrix for the test set.


In [7]:
# Load libraries
from sklearn.svm import LinearSVC
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import numpy as np


features= data1
target=y
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Specify train split random state and target/features for model
X_train, X_test, y_train, y_test = train_test_split(
    features_standardized, target, random_state=100, test_size=.2)

# select classifer for svc
svc = LinearSVC(C=1.0, dual=False)

# Create model with svc and train model with data
model2 = svc.fit(X_train, y_train)


# Create predictions from model with test data
ypred2=model2.predict(X_test)

# Find accuracy
#create confusion matrix with accuracy for test and predicted values from model
accuracy = accuracy_score(y_test,ypred2)
report = classification_report(ypred2, y_test)
cm = confusion_matrix(y_test, ypred2)



# Check Model accuracy
print("Accuracy: ",accuracy)
print("  ")
if accuracy< 0.75:
    print("\033[1m Poor Accuracy")
else:
    print("Acceptable Accuracy")
print("  ")
print("\033[0m Confusion matrix:")
print(cm)
print("  ")
print("Classification report:")
print(report)


Accuracy:  0.7012448132780082
  
[1m Poor Accuracy
  
[0m Confusion matrix:
[[ 8  7  8]
 [ 0 64 28]
 [ 6 23 97]]
  
Classification report:
              precision    recall  f1-score   support

        High       0.35      0.57      0.43        14
         Low       0.70      0.68      0.69        94
    Moderate       0.77      0.73      0.75       133

    accuracy                           0.70       241
   macro avg       0.60      0.66      0.62       241
weighted avg       0.72      0.70      0.71       241



####  Kernal SVM
##### The performance of the linear SVM is not very good. Try to fit a nonlinear SVM.


1. `kernel` should be `rbf`.
2. Set `C=1e6` to create a hard margin.
3. `gamma='scale'`
4. Report the confusion matrix, accuracy, precision, recall, and the F1 socre for the test data set.

In [20]:
from sklearn.svm import SVC


features= data1
target=y

# Scale features 
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Specify train split random state and target/features for model
X_train, X_test, y_train, y_test = train_test_split(
    features_standardized, target, random_state=100, test_size=.2)

# Set params (nonlinear) with gamma scale
clf = SVC(kernel='rbf', C=1E6, gamma='scale')
# Train model with data
clf.fit(X_train, y_train)

# Save predictions of model from test data
ypred3=clf.predict(X_test)
accuracy = accuracy_score(y_test,ypred3)
report = classification_report(ypred3, y_test)
cm = confusion_matrix(y_test, ypred3)


# Check if accuracy has improved from linear model
print("\033[1m Accuracy: ",accuracy)
if accuracy< 0.75:
    print("\033[1m Poor Accuracy")
else:
    print("\033[1m Significantly Improved Accuracy")
print(" \033[0m ")

'

# Print model report and confusion matrix
print("Classification report:")
print(report)
print("  ")
print("Confusion matrix:")
print(cm)

[1m Accuracy:  0.9377593360995851
[1m Significantly Improved Accuracy
 [0m 
Classification report:
              precision    recall  f1-score   support

        High       0.91      0.91      0.91        23
         Low       0.95      0.93      0.94        94
    Moderate       0.94      0.95      0.94       124

    accuracy                           0.94       241
   macro avg       0.93      0.93      0.93       241
weighted avg       0.94      0.94      0.94       241

  
Confusion matrix:
[[ 21   0   2]
 [  1  87   4]
 [  1   7 118]]


####  Search the "Optimal" SVM model
Use grid search and cross-validation to tune the hyperparameters `C` and `gamma`.

1. For `C`, consider these values: top 5 likely values
2. For `gamma`, consider values: top four likely values
3. Let `kernel='rbf'`.
4. Use 5-fold CV
5. Report the best parameters
6. Report the score (accuracy) of the test set 


In [21]:
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.decomposition import PCA


features= data1
target=y

# Set svc and PCA specs
pca = PCA( random_state=0)
svc = SVC(kernel='rbf', class_weight='balanced')

# Make model with pca and svc specifications from above
# Use make_pipeline to create a pipe 
    # Because GridSearchCV only accept a pipeline for the model argument. 
    # We already preprocessed the data in previous parts, it is not necessary to do it again.
model = make_pipeline(pca, svc)


# Specify train split random state and target/features for model
X_train, X_test, y_train, y_test = train_test_split(data1, target,
                                                random_state=0, test_size=.2)

# Create param search space
param_grid = {'svc__C': [1.0, 1e1, 1e2, 1e3, 1e4],
              'svc__gamma': [1, 0.1, 0.01, 0.001]}

# Create search for model estimator 
# fit with training data
grid = GridSearchCV(model, param_grid, cv=5, refit=True)

%time grid.fit(X_train,y_train)

# Select model with best estimator from grid search
model_best = grid.best_estimator_

# Save model predictions from test data
yfit = model_best.predict(X_test)
print(' ')
print(' \033[1m Accuracy:',np.mean(yfit==y_test))
print('Best Params:',grid.best_params_)


CPU times: total: 29.1 s
Wall time: 8.34 s
 
 [1m Accuracy: 0.8962655601659751
Best Params: {'svc__C': 1000.0, 'svc__gamma': 0.1}


###  Neural Network
####  Neural Network with One Hidden Layer

1. Normalize `data1` before spliting the data.
2. There should be 50 nodes in the hidden layer.
3. Set `max_iter=100000`. 
4. Calculate the accuracy for the test set.

In [24]:
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import Normalizer
from sklearn.model_selection import train_test_split
import numpy as np

features= data1
target=y

# Scale all features with normalizers and transform features for the model
scaler = Normalizer()
features_standardized = scaler.fit_transform(features)


# Specify train split random state and target/features for model
X_train, X_test, y_train, y_test = train_test_split(features_standardized, target, stratify=y,
                                                    random_state=100)



# Create and train model with mpl parameters we set
mlp = MLPClassifier(solver='lbfgs', random_state=100, hidden_layer_sizes=[50], max_iter= 100000)
mlp.fit(X_train, y_train)


# Save the predictions of model from test data
ypred3=mlp.predict(X_test)

accuracy = accuracy_score(y_test,ypred3)
print("Accuracy: ",accuracy)


Accuracy:  0.8576158940397351


####  Neural Network with Two Hidden Layers

1. There should be 10 nodes in the first hidden layer and 10 nodes in the second hidden layer.
2. Set `max_iter=9999`. 
3. Set `random_state=0` and `solver='lbfgs'`

In [9]:
# Same code as above nueral network, 
# Except adding a second layer and small param tweeks

features= data1
target=y

scaler = Normalizer()

features_standardized = scaler.fit_transform(features)


X_train, X_test, y_train, y_test = train_test_split(features_standardized, target, stratify=y,
                                                    random_state=0)




mlp = MLPClassifier(solver='lbfgs', random_state=0, 
    hidden_layer_sizes=[10,10], # Set nodes in first and second layer
    max_iter= 1000)

mlp.fit(X_train, y_train)



ypred3=mlp.predict(X_test)

accuracy = accuracy_score(y_test,ypred3)
print("Accuracy: ",accuracy)

Accuracy:  0.7582781456953642


####  Regularized Neural Network
Now consider 10 values for the regularization hyperparameter `alpha`: `np.linspace(0.001, 0.010, 10)`. Which value will be optimal? i.e. Which value yields the best test accuracy? 


1. Use two hidden layers `[50,50]`. Set `max_iter=100000000`.
2. Write a `for` loop to fit the mlp model for each value of `alpha` and calculate the accuracy of the test set in each iteration.
3. print out the test accuracy. Which value of `alpha` is the optimal one?

In [47]:
features= data1
target=y

# Scale all features with normalizers and transform features for the model
scaler = Normalizer()
features_standardized = scaler.fit_transform(features)

# Specify train split random state and target/features for model
X_train, X_test, y_train, y_test = train_test_split(features_standardized, target, stratify=y,
                                                    random_state=0)
# Specify the possible hyperparameters for alpha
# 10 values from 0.001 to 0.01
alpha= np.linspace(0.001, 0.010, 10)

# For each possible alpha value (10 diff values) fit a neural network and print the accuracy
for x in alpha:
    mlp = MLPClassifier(solver='lbfgs', random_state=0,
                            hidden_layer_sizes=[50,50],
                            alpha=x, max_iter=10000)
    mlp.fit(X_train, y_train)
    
    ypred3=mlp.predict(X_test)

    accuracy1 = accuracy_score(y_test,ypred3)
    print(accuracy1, x)
    
    
    
    
    
        

0.9271523178807947 0.001
0.9039735099337748 0.002
0.9039735099337748 0.003
0.9105960264900662 0.004
0.9271523178807947 0.005
0.9072847682119205 0.006
0.9006622516556292 0.007
0.9072847682119205 0.008
0.890728476821192 0.009000000000000001
0.9006622516556292 0.01


#### Optimal Alpha is .005 or .001

## Results

+ Naive bayes and guassian models were the worst (less than 70%)
+ **Non linear SVM model is the best (93% Accurate)**
+ Nueral networks were fairly accurate
    + Regularized with aplha as 0.001 or 0.005 was 2nd best (92.7% accuracy)
    + 2 layer neural network was worse than single layer
    

+ Given future feature data we could predict a students online learning adaptability level, using our non linear svm model.  This model could be used to adjust the teaching style, or budget allocation of learning materials to best accomodate the predicted adaptability level of students.