# CS3033/CS6405 - Data Mining - Second Assignment

### Submission

This assignment is **due on 06/04/22 at 23:59**. You should submit a single .ipnyb file with your python code and analysis electronically via Canvas.
Please note that this assignment will account for 25 Marks of your module grade.

### Declaration

By submitting this assignment. I agree to the following:

<font color="red">“I have read and understand the UCC academic policy on plagiarism, and agree to the requirements set out thereby in relation to plagiarism and referencing. I confirm that I have referenced and acknowledged properly all sources used in the preparation of this assignment.
I declare that this assignment is entirely my own work based on my personal study. I further declare that I have not engaged the services of another to either assist me in, or complete this assignment”</font>

### Objective

The Boolean satisfiability (SAT) problem consists in determining whether a Boolean formula F is satisfiable or not. F is represented by a pair (X, C), where X is a set of Boolean variables and C is a set of clauses in Conjunctive Normal Form (CNF). Each clause is a disjunction of literals (a variable or its negation). This problem is one of the most widely studied combinatorial problems in computer science. It is the classic NP-complete problem. Over the past number of decades, a significant amount of research work has focused on solving SAT problems with both complete and incomplete solvers.

Recent advances in supervised learning have provided powerful techniques for classifying problems. In this project, we see the SAT problem as a classification problem. Given a Boolean formula (represented by a vector of features), we are asked to predict if it is satisfiable or not.

In this project, we represent SAT problems with a vector of 327 features with general information about the problem, e.g., number of variables, number of clauses, fraction of horn clauses in the problem, etc. There is no need to understand the features to be able to complete the assignment.

The dataset is available at:
https://github.com/andvise/DataAnalyticsDatasets/blob/main/dm_assignment2/sat_dataset_train.csv

This is original unpublished data.

## Data Preparation

In [1]:
import pandas as pd

df = pd.read_csv("https://github.com/andvise/DataAnalyticsDatasets/blob/6d5738101d173b97c565f143f945dedb9c42a400/dm_assignment2/sat_dataset_train.csv?raw=true")
df.head()

Unnamed: 0,c,v,clauses_vars_ratio,vars_clauses_ratio,vcg_var_mean,vcg_var_coeff,vcg_var_min,vcg_var_max,vcg_var_entropy,vcg_clause_mean,...,rwh_0_max,rwh_1_mean,rwh_1_coeff,rwh_1_min,rwh_1_max,rwh_2_mean,rwh_2_coeff,rwh_2_min,rwh_2_max,target
0,420,10,42.0,0.02381,0.6,0.0,0.6,0.6,0.0,0.6,...,78750.0,8e-06,0.0,7.875e-06,8e-06,2.385082e-21,0.0,2.385082e-21,2.385082e-21,1
1,230,20,11.5,0.086957,0.137826,0.089281,0.117391,0.16087,2.180946,0.137826,...,6646875.0,17433.722184,1.0,2.981244e-12,34867.444369,17277.21,1.0,1.358551e-53,34554.42,0
2,240,16,15.0,0.066667,0.3,0.0,0.3,0.3,0.0,0.3,...,500000.0,1525.878932,0.0,1525.879,1525.878932,1525.879,0.0,1525.879,1525.879,1
3,424,30,14.133333,0.070755,0.226415,0.485913,0.056604,0.45283,2.220088,0.226415,...,87500.0,0.000122,1.0,6.535723e-14,0.000245,8.218628e-07,1.0,1.499676e-61,1.643726e-06,0
4,162,19,8.526316,0.117284,0.139701,0.121821,0.111111,0.185185,1.940843,0.139701,...,5859400.0,16591.49431,1.0,6.912725999999999e-42,33182.988621,16659.03,1.0,0.0,33318.07,1


In [2]:
df.dtypes

c                       int64
v                       int64
clauses_vars_ratio    float64
vars_clauses_ratio    float64
vcg_var_mean          float64
                       ...   
rwh_2_mean            float64
rwh_2_coeff           float64
rwh_2_min             float64
rwh_2_max             float64
target                  int64
Length: 328, dtype: object

In [3]:
df['target'].value_counts()

1    976
0    953
Name: target, dtype: int64

In [4]:
# YOUR CODE HERE
import numpy as np
import sklearn 
from sklearn import preprocessing
from sklearn import neighbors
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score

# replace any infinite vals with NaN and then replace all NaN with 0
df.replace([np.inf, -np.inf], np.nan, inplace=True)
df = df.fillna(0)

# target labels
y = df["target"].values

# dataframe of ONLY the features
X = df.drop(columns = ["target"])

# split normalised features into 70% train and 30% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

# Use a min max scaler to normalise the data
sc = preprocessing.MinMaxScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# convert to a dataframe just to make visualisation of the data easier
X_train_df = pd.DataFrame(X_train)
X_train_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,317,318,319,320,321,322,323,324,325,326
0,0.126522,0.062500,0.333333,0.022222,0.289783,0.000000,0.297953,0.288924,0.000000,0.289783,...,0.116364,0.012800,5.989472e-03,0.000000,1.190291e-02,2.994736e-03,3.945657e-03,0.000000,1.247583e-02,1.972829e-03
1,0.394389,0.758929,0.088212,0.105112,0.005227,0.051762,0.014553,0.005964,0.336816,0.005227,...,0.000002,0.080000,4.146299e-04,1.000000,6.658070e-157,4.146299e-04,2.731445e-04,1.000000,0.000000e+00,2.731445e-04
2,0.048174,0.062500,0.120690,0.076462,0.150827,0.000000,0.160596,0.149800,0.000000,0.150827,...,0.025455,0.080000,4.629544e-02,1.000000,1.573274e-11,4.629544e-02,3.156526e-02,1.000000,2.665337e-37,3.156526e-02
3,0.061938,0.035714,0.259770,0.031366,0.214118,0.184379,0.141565,0.285481,0.489595,0.214118,...,0.027084,0.190000,3.567503e-01,1.000000,3.912742e-08,3.567503e-01,2.280460e-01,1.000000,2.888843e-19,2.280460e-01
4,0.409741,0.392857,0.186462,0.047238,0.019006,0.014643,0.029428,0.018256,0.165175,0.019006,...,0.145484,0.042000,1.371386e-04,0.999428,1.558468e-07,1.370994e-04,1.247159e-04,0.999210,3.113659e-07,1.246667e-04
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1345,0.062467,0.116071,0.086207,0.107505,0.096242,0.056781,0.098211,0.103686,0.260600,0.096242,...,0.109091,0.012080,3.115980e-06,0.008394,6.140424e-06,1.571069e-06,2.052525e-06,0.007983,6.438098e-06,1.034455e-06
1346,0.409741,0.392857,0.186462,0.047238,0.019006,0.014643,0.029428,0.018256,0.165175,0.019006,...,0.145600,0.042000,1.371083e-04,0.999428,1.557780e-07,1.370691e-04,1.247160e-04,0.999210,3.113663e-07,1.246667e-04
1347,0.131286,0.191964,0.115709,0.079906,0.055398,0.041846,0.061521,0.057610,0.259160,0.055398,...,0.145455,0.016016,6.075131e-07,0.000381,1.206854e-06,3.038724e-07,3.940541e-07,0.000264,1.245636e-06,1.970791e-07
1348,0.249338,0.125000,0.350192,0.020651,0.225009,0.388078,0.048072,0.535132,0.593980,0.225009,...,0.003258,0.000512,3.373711e-16,0.634005,2.453850e-16,2.756330e-16,1.464385e-40,1.000000,1.442707e-49,1.464385e-40


In [5]:
# non-normalised data
X1_train, X2_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)


# Tasks

## Basic models and evaluation (5 Marks)

Using Scikit-learn, train and evaluate K-NN and decision tree classifiers using 70% of the dataset from training and 30% for testing. For this part of the project, we are not interested in optimising the parameters; we just want to get an idea of the dataset. Compare the results of both classifiers.

In [6]:
# YOUR CODE HERE
# a fucntion that makes a Knn, lets user specify num of neibhors, fits and evaluates

def knn(X1, y1, X2, y2, k):

    knn = neighbors.KNeighborsClassifier(n_neighbors = k)
    knn.fit(X1, y1)

    return(print("Accuracy of knn:", knn.score(X2, y2)))

# a function that makes the decision tree, fits and evaluates
def decision_tree(X1, y1, X2, y2):
    clf = tree.DecisionTreeClassifier()
    clf = clf.fit(X1, y1)
    y_pred = clf.predict(X2)
    # Model Accuracy, how often is the classifier correct?
    return(print("Accuracy of Decision Tree:", sklearn.metrics.accuracy_score(y2, y_pred)))


In [7]:
knn(X_train, y_train, X_test,  y_test, 5)

decision_tree(X_train, y_train, X_test,  y_test)

Accuracy of knn: 0.8911917098445595
Accuracy of Decision Tree: 0.9775474956822107


Both models have a high accuracy with the decision tree performing roughly 10% better than the nearest neighbours model. In general a KNN will perform better than a tree based model if given a large enough dataset however despite the size of the dataset herein it may simply be a case of it does not reach the "size threshold" so to speak whereby it allows the KNN to perform better. 

## Robust evaluation (10 Marks)

In this section, we are interested in more rigorous techniques by implementing more sophisticated methods, for instance:
* Hold-out and cross-validation.
* Hyper-parameter tuning.
* Feature reduction.
* Feature normalisation.

Your report should provide concrete information of your reasoning; everything should be well-explained.

Do not get stressed if the things you try do not improve the accuracy. The key to geting good marks is to show that you evaluated different methods and that you correctly selected the configuration.



---



---


**Holdout and Cross Validation.**

hold-out is simply when the given dataset it divided into "train" and "test" sets which are passed to the model, in this instance we are using a 70% 30% split for training and testing respectively (this approach is used in the kNN in the previous section).

Sometimes knowledge from the test set can "leak" and lead to overfitting, to address this we can withohold a portion to the training data to use as a "validation set", on which the model's performance is evaluated following training and prior to final evaluation using the test set. Howevere, splitting the data into 3 sets reduces the amount of data on which we train the model.

This method splits the data into "k" groups or "folds". Then the model is trained and tested k times so that each fold can be used as a validation set once. Therefore, k-fold cross validation seems to be a more robust method of training and evaluating a model as it doesnt waste as much data as the arbitrary splitting of data to make a validation set.

as with the basic models int he previous section, the tree based model outperformed the KNN.


In [8]:
# YOUR CODE HERE
# Knn function with cross validation
# variables k and f are the number of neighbours and number of folds to use

def knn_cv(X1, y1, X2, y2, k, f):
    knn_cv = neighbors.KNeighborsClassifier(n_neighbors = k)

    # train the model with k-fold cross val
    cv_scores = sklearn.model_selection.cross_val_score(knn_cv, X1, y1, cv = f)
  
    #mean and std of the cv_scores
    return(print("KNN cv_scores mean:{}".format(np.mean(cv_scores))),
           print("%0.2f accuracy with a standard deviation of %0.2f" % (cv_scores.mean(), cv_scores.std())))

# decsision tree function with cross validation
def tree_cv(X1, y1, X2, y2, f):
    Tree = tree.DecisionTreeClassifier()
    cv_scores = sklearn.model_selection.cross_val_score(Tree, X1, y1, cv = f)
    return(print("decision tree cv_scores mean:{}".format(np.mean(cv_scores))),
           print("%0.2f accuracy with a standard deviation of %0.2f" % (cv_scores.mean(), cv_scores.std())))

knn_cv(X_train, y_train, X_test,  y_test, 5, 10)
print("\n")
tree_cv(X_train, y_train, X_test,  y_test, 10) 


KNN cv_scores mean:0.8881481481481481
0.89 accuracy with a standard deviation of 0.03


decision tree cv_scores mean:0.9807407407407409
0.98 accuracy with a standard deviation of 0.02


(None, None)



---

## Hypereparameter tuning using grid search CV.

the hyperparameters of a model are the parameters that are set prior to training such as the number of the number of neighbours to choose in a Knn. the defaults set by scikit Learn are not always the optimal values and therefore it can eb beneficial to find the best hyperparamers for your data.

with respect to tree-based classifiers the hyperparameters include the max depth and the max number of leaf nodes. A max depth of 11 resulted in a training accuracy of 1 and a validation accuracy of 0.98, thus we can set the model's max depth hyperparameter to 11 prior to training as it is the optimal value for this hyperparameter.

in the following cells i use Grid search to identify optimal hyperparamters for my models. Grid search picks out a grid of hyperparameter values and evaluates all of them. there ar a number of different methods to tune the parameters such as random search and manually searching.

using gridsearch to identify the optimal hyperparameters for each model, we found that n=1 was tthe best number of neighbours to use in the KNN. As for the decsision tree the max depth of 11 was the most optimum value.

In [10]:
from sklearn.model_selection import GridSearchCV

# knn classifir to use in "GridSearchCV"
knn1 = neighbors.KNeighborsClassifier()

# dictionary of all vals of K (neighbours) to test for using grid search
param_grid = {"n_neighbors": np.arange(1, 25)}

gs_cv = GridSearchCV(knn1, param_grid, cv = 10 )#fit model to data
gs_cv.fit(X_train, y_train)

gs_cv.best_params_, gs_cv.best_score_

({'n_neighbors': 1}, 0.9)

In [11]:
# grid search for optimal Decision Tree hyperparams
depth_TrainAcc = {}
depth_ValAcc = {}
for max_d in range(1,25):
    model = sklearn.tree.DecisionTreeClassifier(max_depth=max_d, random_state=42)
    model.fit(X_train, y_train)
    depth_TrainAcc[max_d] = (model.score(X_train, y_train))
    depth_ValAcc[max_d] = (model.score(X_test, y_test))

# max_T is the key (max depth) with the higest corresponding training accuracy
# use that key to get the corresponding validation accuracy 
max_T = max(depth_TrainAcc, key = depth_TrainAcc.get)
print(depth_TrainAcc[max_T], depth_ValAcc[max_T], max_T)

1.0 0.9810017271157168 11




---
## Feature Extraction/ Reduction.

from running the "df.shape" function on the dataframe at the begining of this document we can see that there are 328 columns. taking into account one of these is our binary tarvet variable, we have 327 different features or variables. Because of the curse of dimensionality we can expect the performance of our model to degrade when working with a large number of features and thus reducing the dimensionality or number of features could serve to improve performance.

By using PCA we can transorm the features into linear or non-linear combinations, reducing the dinmensionality of the data whilst preserving the information. The overall aim of this reduction technique is to identify vectors which explain the variance and structure of the data and in doing so we filter out any noise and redundancy.

in the following cells of code, PCA is used to reduce dimensionality of the data. To decide what number of principle components (PC's) to retain, we carry out PCA on all 327 features and calculate the cumulative explained variance for each. using "np.argmax(cumsum >= 0.95) + 1" we can find the minimum number of PC's required to preserve 95% of the data's variance.

Using 22 PC's and then training and fitting the model on this now reduced data, the KNN outperformed the Decision tree by quite a bit.


In [12]:
X_train.shape

(1350, 327)

In [13]:
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from sklearn.pipeline import make_pipeline

pca = PCA()
pca.fit(X_train)
cumsum = np.cumsum(pca.explained_variance_ratio_)


d = np.argmax(cumsum >= 0.95) + 1
d # this is the min number of PC's required to preserve 95% of the data's variance

22

In [15]:
pca = PCA(n_components = 22)
train_features = pca.fit_transform(X_train)
test_features = pca.fit_transform(X_test)
print("Training set size ", train_features.shape)

knn1 = neighbors.KNeighborsClassifier()

parameters = {'n_neighbors': [1, 3, 5, 7, 11]}
clf = sklearn.model_selection.GridSearchCV(knn1, parameters)
clf.fit(train_features, y_train)
print("The best classifier is:", clf.best_estimator_)
print("Its accuracy is:",clf.best_score_)
print("Its parameters are:",clf.best_params_)

print("\n")

decision_tree(train_features, y_train, test_features, y_test)

Training set size  (1350, 22)
The best classifier is: KNeighborsClassifier(n_neighbors=1)
Its accuracy is: 0.8977777777777778
Its parameters are: {'n_neighbors': 1}


Accuracy of Decision Tree: 0.5785837651122625




---
## Feature Normalisation.

distamce based classifiers such as KNN's are typically more sensitive to a feature's range as they are using the distances between data points to assess similarity. if two features in the data are on vastly different scales the features with a higher magnitude can be assigned more importance so to speak and therefore methods such as min-Max scaling, which rescales all the features to a range between 0 and 1 can be used to address this challenge. 

in the following cells we will compare the models trained on the normalised datasets which we used in part one and models trained on non-normalised data.

the using the normalised datasets we see that the Knn performs better than when using the un-scaled data, this is to be expected. Tree based algorithms are known to be rather insensitive to the scales of the features in the data. this fact is evident from the accuracies of the following cell of code which outputs the accuracies of a KNN and decison tree models which have both been passed first the normalised data and then the non-normalised data.


In [None]:
X1_train, X1_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

knn(X_train, y_train, X_test,  y_test, 5) # normalised datasets
knn(X1_train, y_train, X1_test,  y_test, 5)

decision_tree(X_train, y_train, X_test,  y_test) #non-normal data
decision_tree(X1_train, y_train, X1_test,  y_test)


Accuracy of knn: 0.8911917098445595
Accuracy of knn: 0.8013816925734024
Accuracy of basic Decision Tree: 0.9706390328151986
Accuracy of basic Decision Tree: 0.9671848013816926


## New classifier (10 Marks)

Replicate the previous task for a classifier that we did not cover in class. So different than K-NN and decision trees. Briefly describe your choice.
Try to create the best model for the given dataset.
Save your best model into your github. And create a single code cell that loads it and evaluate it on the following test dataset:
https://github.com/andvise/DataAnalyticsDatasets/blob/main/dm_assignment2/sat_dataset_test.csv

This link currently contains a sample of the training set. The real test set will be released after the submission. I should be able to run the code cell independently, load all the libraries you need as well.



---

A support vector machine was chosen as the new classifier. this was chosen seeing as SVMs are good at handling data with high dimensionality. the methods from the previous section were applied but as their background has been briefly discussed above i will only briefly metion them here with regards to the following SVM.

## Holdout and Cross Validation.
the final cell of code in this section contains the best performing model i have made thus far. 10-fold cross validation is carried out and the accuracy, precision and recall are all approximately 90%. from the confusion matrix contained in the output we can see 267 true negative and 287 true positive cases identified, whilst there were only 6 and 19 false negatives and false positives respectively. 

## Hyperparameter Tuning.
The hyperparameter grid was made to include regualrisation parameters (C), the kernel coefficient (gamma) and the "rbf" (default), and "linear" kernel types. of which the following configuration appeard to be the most efficient; 
(C=10, gamma=1, kernel='linear').

## Feature Reduction.
using the "train_features" and "test_features" created in the cells in the previous section as part of the PCA for data reduction, the SVM was trained and tested. However, using these sets drastically reduced the accuracy (by roughly 30%) and thus following this the normalised X_train and X_test sets were used in following models.

## Data Normalisation.
It is usually a good idea to normalise the data befor passing them to any model, but just for comparison I tried fitting the SVM using the non-normal data as was done in the above sections for the KNN and decision tree. when executing the cell of code the runtime was extremely long and i have no worthwhile comparison other than the normalising the data is best as it shortens runtime compared to using non-normal data.   


In [23]:
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix

# function to make linear SVM, fit data and evaluate performance 
# by returning confusion matrix and a classification report
def linear_SVM(X1, y1, X2, y2):
  svclassifier = SVC()
  svclassifier.fit(X1, y1)
  y_pred = svclassifier.predict(X2)

  conf_mat = confusion_matrix(y2,y_pred)
  classif_rep = classification_report(y2,y_pred)
  return(print("Accuracy of SVM:", sklearn.metrics.accuracy_score(y2, y_pred)), 
         print("\n"),
         print(conf_mat),
         print("\n"),
         print(classif_rep))


def linear_SVM_cv(X1, y1, X2, y2, f):
    svclassifier = SVC()

    # train the model with k-fold cross val
    cv_scores = sklearn.model_selection.cross_val_score(svclassifier, X1, y1, cv = f)
  
    #mean and std of the cv_scores
    return(print("SVM cv_scores mean:{}".format(np.mean(cv_scores))),
           print("%0.2f accuracy with a standard deviation of %0.2f" % (cv_scores.mean(), cv_scores.std())))


linear_SVM(X_train, y_train, X_test,  y_test)

# calling the function using "train_features" and "test_features" defined in the PCA portion above
# linear_SVM(train_features, y_train, test_features, y_test)

# calling the function using non-normalised data
#linear_SVM(X1_train, y_train, X1_test,  y_test)




Accuracy of SVM: 0.8963730569948186


[[258  28]
 [ 32 261]]


              precision    recall  f1-score   support

           0       0.89      0.90      0.90       286
           1       0.90      0.89      0.90       293

    accuracy                           0.90       579
   macro avg       0.90      0.90      0.90       579
weighted avg       0.90      0.90      0.90       579



(None, None, None, None, None)

In [18]:
linear_SVM_cv(X_train, y_train, X_test,  y_test, 12) # 10-fold CV in a linear SVM

SVM cv_scores mean:0.891118836915297
0.89 accuracy with a standard deviation of 0.03


(None, None)

In [19]:
param_grid = {'C': [0.1, 1, 10, 100, 1000],
              'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
              'kernel': ['rbf', 'linear', 'sigmoid']}

grid = GridSearchCV(SVC(), param_grid, refit = True, verbose = 3)

# fitting the model for grid search
grid.fit(X_train, y_train)

# print best parameter after tuning
print(grid.best_params_)
 
# print how our model looks after hyper-parameter tuning
print(grid.best_estimator_)

Fitting 5 folds for each of 75 candidates, totalling 375 fits
[CV 1/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.848 total time=   0.3s
[CV 2/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.822 total time=   0.3s
[CV 3/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.904 total time=   0.3s
[CV 4/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.874 total time=   0.3s
[CV 5/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.841 total time=   0.3s
[CV 1/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.911 total time=   0.1s
[CV 2/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.885 total time=   0.1s
[CV 3/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.930 total time=   0.1s
[CV 4/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.904 total time=   0.1s
[CV 5/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.889 total time=   0.1s
[CV 1/5] END ....C=0.1, gamma=1, kernel=sigmoid;, score=0.507 total time=   0.2s
[CV 2/5] END ....C=0.1, gamma=1, kernel=sigmoid

In [20]:
grid_predictions = grid.predict(X_test)
 
# print classification report
print(sklearn.metrics.accuracy_score(y_test, grid_predictions),
      ("\n"),
      confusion_matrix(y_test, grid_predictions), 
      classification_report(y_test, grid_predictions))

0.9568221070811744 
 [[267  19]
 [  6 287]]               precision    recall  f1-score   support

           0       0.98      0.93      0.96       286
           1       0.94      0.98      0.96       293

    accuracy                           0.96       579
   macro avg       0.96      0.96      0.96       579
weighted avg       0.96      0.96      0.96       579



In [21]:
# the best model i ahve thus far

def best_svm(X1, y1, X2, y2, f):
  svclassifier = SVC(C=10, gamma=1, kernel='linear')
  cv_scores = sklearn.model_selection.cross_val_score(svclassifier, X1, y1, cv = f)

  mean_score = np.mean(cv_scores)

  svclassifier.fit(X1, y1)
  y_pred = svclassifier.predict(X2)

  conf_mat = confusion_matrix(y2,y_pred)
  classif_rep = classification_report(y2,y_pred)
  return(#print("Accuracy of SVM:", sklearn.metrics.accuracy_score(y2, y_pred)), "mean = ", mean_score, 
         print("\n"),
         print(conf_mat),
         print("\n"),
         print(classif_rep))
  

best_svm(X_train, y_train, X_test,  y_test, 10)



[[267  19]
 [  6 287]]


              precision    recall  f1-score   support

           0       0.98      0.93      0.96       286
           1       0.94      0.98      0.96       293

    accuracy                           0.96       579
   macro avg       0.96      0.96      0.96       579
weighted avg       0.96      0.96      0.96       579



(None, None, None, None)

# <font color="blue">FOR GRADING ONLY</font>

Save your best model into your github. And create a single code cell that loads it and evaluate it on the following test dataset: 
https://github.com/andvise/DataAnalyticsDatasets/blob/main/dm_assignment2/sat_dataset_test.csv

In [None]:
from joblib import dump, load
from io import BytesIO
import requests

# INSERT YOUR MODEL'S URL
mLink = 'URL_OF_YOUR_MODEL_SAVED_IN_YOUR_GITHUB_REPOSITORY?raw=true'
mfile = BytesIO(requests.get(mLink).content)
model = load(mfile)
# YOUR CODE HERE