# CS3033/CS6405 - Data Mining - Second Assignment

### Submission

This assignment is **due on 06/04/22 at 23:59**. You should submit a single .ipnyb file with your python code and analysis electronically via Canvas.
Please note that this assignment will account for 25 Marks of your module grade.

### Declaration

By submitting this assignment. I agree to the following:

<font color="red">“I have read and understand the UCC academic policy on plagiarism, and agree to the requirements set out thereby in relation to plagiarism and referencing. I confirm that I have referenced and acknowledged properly all sources used in the preparation of this assignment.
I declare that this assignment is entirely my own work based on my personal study. I further declare that I have not engaged the services of another to either assist me in, or complete this assignment”</font>

### Objective

The Boolean satisfiability (SAT) problem consists in determining whether a Boolean formula F is satisfiable or not. F is represented by a pair (X, C), where X is a set of Boolean variables and C is a set of clauses in Conjunctive Normal Form (CNF). Each clause is a disjunction of literals (a variable or its negation). This problem is one of the most widely studied combinatorial problems in computer science. It is the classic NP-complete problem. Over the past number of decades, a significant amount of research work has focused on solving SAT problems with both complete and incomplete solvers.

Recent advances in supervised learning have provided powerful techniques for classifying problems. In this project, we see the SAT problem as a classification problem. Given a Boolean formula (represented by a vector of features), we are asked to predict if it is satisfiable or not.

In this project, we represent SAT problems with a vector of 327 features with general information about the problem, e.g., number of variables, number of clauses, fraction of horn clauses in the problem, etc. There is no need to understand the features to be able to complete the assignment.

The dataset is available at:
https://github.com/andvise/DataAnalyticsDatasets/blob/main/dm_assignment2/sat_dataset_train.csv

This is original unpublished data.

## Data Preparation

In [None]:
import pandas as pd

df = pd.read_csv("https://github.com/andvise/DataAnalyticsDatasets/blob/6d5738101d173b97c565f143f945dedb9c42a400/dm_assignment2/sat_dataset_train.csv?raw=true")
df.head()

Unnamed: 0,c,v,clauses_vars_ratio,vars_clauses_ratio,vcg_var_mean,vcg_var_coeff,vcg_var_min,vcg_var_max,vcg_var_entropy,vcg_clause_mean,...,rwh_0_max,rwh_1_mean,rwh_1_coeff,rwh_1_min,rwh_1_max,rwh_2_mean,rwh_2_coeff,rwh_2_min,rwh_2_max,target
0,420,10,42.0,0.02381,0.6,0.0,0.6,0.6,0.0,0.6,...,78750.0,8e-06,0.0,7.875e-06,8e-06,2.385082e-21,0.0,2.385082e-21,2.385082e-21,1
1,230,20,11.5,0.086957,0.137826,0.089281,0.117391,0.16087,2.180946,0.137826,...,6646875.0,17433.722184,1.0,2.981244e-12,34867.444369,17277.21,1.0,1.358551e-53,34554.42,0
2,240,16,15.0,0.066667,0.3,0.0,0.3,0.3,0.0,0.3,...,500000.0,1525.878932,0.0,1525.879,1525.878932,1525.879,0.0,1525.879,1525.879,1
3,424,30,14.133333,0.070755,0.226415,0.485913,0.056604,0.45283,2.220088,0.226415,...,87500.0,0.000122,1.0,6.535723e-14,0.000245,8.218628e-07,1.0,1.499676e-61,1.643726e-06,0
4,162,19,8.526316,0.117284,0.139701,0.121821,0.111111,0.185185,1.940843,0.139701,...,5859400.0,16591.49431,1.0,6.912725999999999e-42,33182.988621,16659.03,1.0,0.0,33318.07,1


In [None]:
df.dtypes

c                       int64
v                       int64
clauses_vars_ratio    float64
vars_clauses_ratio    float64
vcg_var_mean          float64
                       ...   
rwh_2_mean            float64
rwh_2_coeff           float64
rwh_2_min             float64
rwh_2_max             float64
target                  int64
Length: 328, dtype: object

In [None]:
#First data cleaning is required:
#Therefore, we remove all the infinity, NaN values from data
import numpy as np

np.any(np.isnan(df))

True

In [None]:
np.all(np.isfinite(df))

False

In [None]:
#Filling NaN and +- infinity with '0'
dfs = df.fillna(0)
dfs = dfs.replace([np.inf, -np.inf],0)

In [None]:
print (df.shape) #Print shape to check if all okay

(1929, 328)


In [None]:
dfs.describe() #to get an overview of the dataframe

Unnamed: 0,c,v,clauses_vars_ratio,vars_clauses_ratio,vcg_var_mean,vcg_var_coeff,vcg_var_min,vcg_var_max,vcg_var_entropy,vcg_clause_mean,...,rwh_0_max,rwh_1_mean,rwh_1_coeff,rwh_1_min,rwh_1_max,rwh_2_mean,rwh_2_coeff,rwh_2_min,rwh_2_max,target
count,1929.0,1929.0,1929.0,1929.0,1929.0,1929.0,1929.0,1929.0,1929.0,1929.0,...,1929.0,1929.0,1929.0,1929.0,1929.0,1929.0,1929.0,1929.0,1929.0,1929.0
mean,549.087092,57.689995,11.07295,0.134343,0.111101,0.20992,0.073509,0.161222,1.758409,0.111101,...,4391681.0,8645.001262,0.846796,1213.078,16076.9248,16377.181091,0.866337,1284.981,31469.380944,0.505962
std,446.746934,50.556307,8.141268,0.084303,0.103638,0.189588,0.090287,0.156326,0.898686,0.103638,...,5428499.0,26216.028598,0.346169,10031.12,48104.066684,43597.829247,0.327328,9951.46,84954.358986,0.500094
min,1.0,2.0,0.5,0.022727,0.014386,0.0,0.002368,0.015538,0.0,0.014386,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,222.0,27.0,5.333333,0.070755,0.039427,0.056449,0.024882,0.058182,1.091349,0.039427,...,468750.0,0.738796,0.999602,4.72406e-71,0.975101,7.152557,1.0,0.0,12.64995,0.0
50%,404.0,39.0,8.0,0.125,0.079902,0.176697,0.048246,0.103053,1.866538,0.079902,...,3125000.0,453.017692,1.0,5.1321099999999995e-20,893.758807,976.5625,1.0,6.033537e-81,1712.975661,1.0
75%,776.0,70.0,14.133333,0.1875,0.145299,0.282668,0.085714,0.185714,2.359619,0.145299,...,6328125.0,7077.674591,1.0,3.289401e-05,14155.349182,9525.920146,1.0,2.843545e-12,19051.840291,1.0
max,1890.0,226.0,44.0,2.0,1.0,1.308621,1.0,1.0,3.959478,1.0,...,40625000.0,343561.828269,1.0,173611.1,602048.86113,386723.660084,1.0,173611.1,773447.302106,1.0


In [None]:
df['target'].value_counts() #Checking number of unsatisfiable and satisfiable conditions: 

1    976
0    953
Name: target, dtype: int64

In [None]:
# YOUR CODE HERE

#First I will shuffle the data - This is to make a robust model afterwards.
#Shuffle the DataFrame rows
dfx = dfs.sample(frac = 1)
dfx.head()

Unnamed: 0,c,v,clauses_vars_ratio,vars_clauses_ratio,vcg_var_mean,vcg_var_coeff,vcg_var_min,vcg_var_max,vcg_var_entropy,vcg_clause_mean,...,rwh_0_max,rwh_1_mean,rwh_1_coeff,rwh_1_min,rwh_1_max,rwh_2_mean,rwh_2_coeff,rwh_2_min,rwh_2_max,target
846,249,45,5.533333,0.180723,0.069076,0.052861,0.064257,0.072289,0.970116,0.069076,...,625125.0,0.152556,8.7e-05,0.1525423,0.152569,0.1556703,5.3e-05,0.155662,0.1556786,1
857,910,50,18.2,0.054945,0.045055,0.199941,0.024176,0.065934,3.072908,0.045055,...,7031250.0,5937.494933,1.0,1.0549410000000001e-175,11874.989867,5759.32,1.0,0.0,11518.64,1
1837,352,44,8.0,0.125,0.100207,0.265633,0.045455,0.127841,1.760644,0.100207,...,156250.0,2.2e-05,0.0,2.153297e-05,2.2e-05,1.560026e-09,0.0,1.560026e-09,1.560026e-09,0
1864,435,84,5.178571,0.193103,0.034647,0.265952,0.022989,0.05977,2.206326,0.034647,...,4688750.0,488.557316,1.0,5.821882e-30,977.114632,478.3989,1.0,7.391279e-154,956.7979,1
1484,262,62,4.225806,0.236641,0.051157,0.276076,0.038168,0.103053,1.986609,0.051157,...,1955025.0,612.36224,1.0,1.265368e-19,1224.724479,602.2336,1.0,3.6522319999999996e-58,1204.467,1


In [None]:
#Further we need to drop the target column from the dataset
X = dfx.drop(columns=['target'])
X.head()

Unnamed: 0,c,v,clauses_vars_ratio,vars_clauses_ratio,vcg_var_mean,vcg_var_coeff,vcg_var_min,vcg_var_max,vcg_var_entropy,vcg_clause_mean,...,rwh_0_min,rwh_0_max,rwh_1_mean,rwh_1_coeff,rwh_1_min,rwh_1_max,rwh_2_mean,rwh_2_coeff,rwh_2_min,rwh_2_max
846,249,45,5.533333,0.180723,0.069076,0.052861,0.064257,0.072289,0.970116,0.069076,...,625000.0,625125.0,0.152556,8.7e-05,0.1525423,0.152569,0.1556703,5.3e-05,0.155662,0.1556786
857,910,50,18.2,0.054945,0.045055,0.199941,0.024176,0.065934,3.072908,0.045055,...,6.0,7031250.0,5937.494933,1.0,1.0549410000000001e-175,11874.989867,5759.32,1.0,0.0,11518.64
1837,352,44,8.0,0.125,0.100207,0.265633,0.045455,0.127841,1.760644,0.100207,...,156250.0,156250.0,2.2e-05,0.0,2.153297e-05,2.2e-05,1.560026e-09,0.0,1.560026e-09,1.560026e-09
1864,435,84,5.178571,0.193103,0.034647,0.265952,0.022989,0.05977,2.206326,0.034647,...,3750.0,4688750.0,488.557316,1.0,5.821882e-30,977.114632,478.3989,1.0,7.391279e-154,956.7979
1484,262,62,4.225806,0.236641,0.051157,0.276076,0.038168,0.103053,1.986609,0.051157,...,3775.0,1955025.0,612.36224,1.0,1.265368e-19,1224.724479,602.2336,1.0,3.6522319999999996e-58,1204.467


In [None]:
#Now the target has to be assigned to y output variable
import numpy as np
Target = dfx['target'].values
y = np.array([Target]).T
y

array([[1],
       [1],
       [0],
       ...,
       [1],
       [1],
       [0]])

# Tasks

## Basic models and evaluation (5 Marks)

Using Scikit-learn, train and evaluate K-NN and decision tree classifiers using 70% of the dataset from training and 30% for testing. For this part of the project, we are not interested in optimising the parameters; we just want to get an idea of the dataset. Compare the results of both classifiers.

In [None]:
#Applying a KNN model with random n_neighbours
#Importing relevant libraries
from sklearn import neighbors, datasets, preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

#Spliting our dataset into 70% training and 30% for testing.
#We stratify y, so the number of 0's and 1's are stratified.

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify = y, random_state = 0, train_size = 0.7)

KNN = neighbors.KNeighborsClassifier(n_neighbors=3) #Assigning variable KNN to model
KNN.fit(X_train, y_train) #Fitting the training data
y_pred = KNN.predict(X_test) #Predicting on the test set

print(accuracy_score(y_test, y_pred)) #Checking the accurancy and printing.
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

0.8031088082901554
              precision    recall  f1-score   support

           0       0.78      0.84      0.81       286
           1       0.83      0.77      0.80       293

    accuracy                           0.80       579
   macro avg       0.80      0.80      0.80       579
weighted avg       0.80      0.80      0.80       579

[[239  47]
 [ 67 226]]


  return self._fit(X, y)


In [None]:
#Importing relevant libraries for Decision tree classifier
#Using random parameter values
from sklearn.tree import DecisionTreeClassifier

#As the split is already made, we can apply the decision tree model to our data

DT = DecisionTreeClassifier(random_state=0, max_depth=2)
DT.fit(X_train, y_train) #Fitting the training data
y_pred = DT.predict(X_test) #Predicting on the test set

print(accuracy_score(y_test, y_pred)) #Checking the accurancy and printing.
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

0.9689119170984456
              precision    recall  f1-score   support

           0       0.94      1.00      0.97       286
           1       1.00      0.94      0.97       293

    accuracy                           0.97       579
   macro avg       0.97      0.97      0.97       579
weighted avg       0.97      0.97      0.97       579

[[285   1]
 [ 17 276]]


## Robust evaluation (10 Marks)

In this section, we are interested in more rigorous techniques by implementing more sophisticated methods, for instance:
* Hold-out and cross-validation.
* Hyper-parameter tuning.
* Feature reduction.
* Feature normalisation.

Your report should provide concrete information of your reasoning; everything should be well-explained.

Do not get stressed if the things you try do not improve the accuracy. The key to geting good marks is to show that you evaluated different methods and that you correctly selected the configuration.

In [None]:
#Gridsearch Val set

from sklearn.neighbors import KNeighborsClassifier
import numpy as np
from sklearn.metrics import accuracy_score, plot_confusion_matrix
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, plot_confusion_matrix

KNN=KNeighborsClassifier()
#create a dictionary of all values we want to test for n_neighbors
param_grid = {'n_neighbors': np.arange(1, 31)}
#use gridsearch to test all values for n_neighbors
knn_gscv = GridSearchCV(KNN, param_grid, scoring = 'accuracy',cv=10)
#fit model to data
knn_gscv.fit(X_train, np.ravel(y_train))
yhatt=knn_gscv.predict(X_test)
knn_gscv.best_params_


{'n_neighbors': 1}

In [None]:
KNN = neighbors.KNeighborsClassifier(**knn_gscv.best_params_) #Assigning variable KNN to model
KNN.fit(X_train, y_train) #Fitting the training data
y_pred = KNN.predict(X_test) #Predicting on the test set

print(accuracy_score(y_test, y_pred)) #Checking the accurancy and printing.
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

0.8411053540587219
              precision    recall  f1-score   support

           0       0.83      0.85      0.84       286
           1       0.85      0.83      0.84       293

    accuracy                           0.84       579
   macro avg       0.84      0.84      0.84       579
weighted avg       0.84      0.84      0.84       579

[[244  42]
 [ 50 243]]


  return self._fit(X, y)


In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
model = DecisionTreeClassifier()
parameters={"splitter":["best","random"],
            "max_depth" : [1,2,3,4,5],
            "min_samples_leaf":[1,2,3],
            "min_weight_fraction_leaf":[0.1,0.2,0.3],
            "max_features":["auto","log2","sqrt",None],
            "max_leaf_nodes":[None,10,20] }

tuning_model=GridSearchCV(model,param_grid=parameters,refit=True, scoring='accuracy',cv=10,verbose=3)
tuning_model.fit(X_train, y_train)
tuning_model.predict(X_test)
tuning_model.best_params_

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
[CV 1/10] END max_depth=3, max_features=sqrt, max_leaf_nodes=20, min_samples_leaf=1, min_weight_fraction_leaf=0.3, splitter=best;, score=0.607 total time=   0.0s
[CV 2/10] END max_depth=3, max_features=sqrt, max_leaf_nodes=20, min_samples_leaf=1, min_weight_fraction_leaf=0.3, splitter=best;, score=0.889 total time=   0.0s
[CV 3/10] END max_depth=3, max_features=sqrt, max_leaf_nodes=20, min_samples_leaf=1, min_weight_fraction_leaf=0.3, splitter=best;, score=0.570 total time=   0.0s
[CV 4/10] END max_depth=3, max_features=sqrt, max_leaf_nodes=20, min_samples_leaf=1, min_weight_fraction_leaf=0.3, splitter=best;, score=0.674 total time=   0.0s
[CV 5/10] END max_depth=3, max_features=sqrt, max_leaf_nodes=20, min_samples_leaf=1, min_weight_fraction_leaf=0.3, splitter=best;, score=0.859 total time=   0.0s
[CV 6/10] END max_depth=3, max_features=sqrt, max_leaf_nodes=20, min_samples_leaf=1, min_weight_fraction_leaf=0.3, splitter=b

{'max_depth': 2,
 'max_features': None,
 'max_leaf_nodes': None,
 'min_samples_leaf': 1,
 'min_weight_fraction_leaf': 0.1,
 'splitter': 'best'}

In [None]:
DT = DecisionTreeClassifier(random_state=0, **tuning_model.best_params_)
DT.fit(X_train, y_train) #Fitting the training data
y_pred = DT.predict(X_test) #Predicting on the test set

print(accuracy_score(y_test, y_pred)) #Checking the accurancy and printing.
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

0.9481865284974094
              precision    recall  f1-score   support

           0       0.97      0.93      0.95       286
           1       0.93      0.97      0.95       293

    accuracy                           0.95       579
   macro avg       0.95      0.95      0.95       579
weighted avg       0.95      0.95      0.95       579

[[265  21]
 [  9 284]]


## New classifier (10 Marks)

Replicate the previous task for a classifier that we did not cover in class. So different than K-NN and decision trees. Briefly describe your choice.
Try to create the best model for the given dataset.
Save your best model into your github. And create a single code cell that loads it and evaluate it on the following test dataset:
https://github.com/andvise/DataAnalyticsDatasets/blob/main/dm_assignment2/sat_dataset_test.csv

This link currently contains a sample of the training set. The real test set will be released after the submission. I should be able to run the code cell independently, load all the libraries you need as well.

In [None]:
#I am recommending the random forrest classifier. This is due to the fact that,
#As the accuracy in Decsion Trees is high. Random Forest is like a bunch of combined Decision Trees. They can handle categorical features very well. 
#This algorithm can handle high dimensional spaces as well as large number of training examples.

In [None]:
from sklearn.ensemble import RandomForestClassifier
RandomForestClassifier().get_params()

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': 'auto',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 100,
 'n_jobs': None,
 'oob_score': False,
 'random_state': None,
 'verbose': 0,
 'warm_start': False}

In [None]:
estimator = RandomForestClassifier()
param_grid = { "n_estimators"      : [10,20,30,40,50],
            "max_features"      : ["auto", "sqrt"],
            "min_samples_split" : [2,4,8],
            "max_depth": [1,2,4],
            "max_leaf_nodes": [2,4,6],
            "min_samples_leaf": [1,2,4],
            "bootstrap": [True, False],
            }

grid = GridSearchCV(estimator, param_grid, n_jobs=-1, cv=5)
grid.fit(X_train, y_train)
grid.predict(X_test)
grid.best_params_

  self.best_estimator_.fit(X, y, **fit_params)


{'bootstrap': True,
 'max_depth': 4,
 'max_features': 'auto',
 'max_leaf_nodes': 6,
 'min_samples_leaf': 2,
 'min_samples_split': 8,
 'n_estimators': 40}

In [None]:
from sklearn.ensemble import RandomForestClassifier

RF = RandomForestClassifier(**grid.best_params_)
RF.fit(X_train, y_train) #Fitting the training data
y_pred = RF.predict(X_test) #Predicting on the test set

print(accuracy_score(y_test, y_pred)) #Checking the accurancy and printing.
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

0.9792746113989638
              precision    recall  f1-score   support

           0       0.96      1.00      0.98       286
           1       1.00      0.96      0.98       293

    accuracy                           0.98       579
   macro avg       0.98      0.98      0.98       579
weighted avg       0.98      0.98      0.98       579

[[286   0]
 [ 12 281]]


  after removing the cwd from sys.path.


In [None]:
#As per the results, the accuracy is high in random forrest classifier. 
#Now I will try to add normalisation of features and feature selection, to see if the results get better

#First try with random forrest

#Featreselection
from sklearn.feature_selection import RFECV, SelectKBest
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.ensemble import RandomForestClassifier


from sklearn.pipeline import Pipeline

#this is the classifier used for feature selection
clf_featr_sele = RandomForestClassifier(random_state=42) 
                                                                                  
rfecv = RFECV(estimator=clf_featr_sele, 
              step=10, 
              cv=5, 
              scoring='accuracy')
#skb = SelectKBest(k=10)

#you can have different classifier for your final classifier
clf = clf_featr_sele

CV_rfc = GridSearchCV(clf, 
                      param_grid={"n_estimators"      : [10,20,30,40,50],
            "max_features"      : ["auto", "sqrt"],
            "min_samples_split" : [2,4,8],
            "max_depth": [1,2,4],
            "max_leaf_nodes": [2,4,6],
            "min_samples_leaf": [1,2,4],
            "bootstrap": [True, False]},
                      cv= 5, scoring='accuracy')

pipeline  = Pipeline([('feature_sele',rfecv),
                      ('clf_cv',CV_rfc)])

pipeline.fit(X_train, np.ravel(y_train))
yhat = pipeline.predict(X_test)

print(accuracy_score(y_test, yhat))


0.9810017271157168


In [None]:
#As the accuracy has increased, the better option to try is the normalisation of features.
#I am using min max scaler for this.
from sklearn.preprocessing import StandardScaler
import pandas as pd
scale= StandardScaler()
# standardization of dependent variables
 
scaled_data = scale.fit(X_train) 
X_train = scaled_data.transform(X_train)
 
X_test = scaled_data.transform(X_test) 


In [None]:
#Featreselection
from sklearn.feature_selection import RFECV, SelectKBest
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.ensemble import RandomForestClassifier


from sklearn.pipeline import Pipeline

#this is the classifier used for feature selection
clf_featr_sele = RandomForestClassifier(random_state=42) 
                                                                                  
rfecv = RFECV(estimator=clf_featr_sele, 
              step=10, 
              cv=5, 
              scoring='accuracy')
#skb = SelectKBest(k=10)

#you can have different classifier for your final classifier
clf = clf_featr_sele

CV_rfc = GridSearchCV(clf, 
                      param_grid={"n_estimators"      : [10,20,30,40,50],
            "max_features"      : ["auto", "sqrt"],
            "min_samples_split" : [2,4,8],
            "max_depth": [1,2,4],
            "max_leaf_nodes": [2,4,6],
            "min_samples_leaf": [1,2,4],
            "bootstrap": [True, False]},
                      cv= 5, scoring='accuracy')

pipeline  = Pipeline([('feature_sele',rfecv),
                      ('clf_cv',CV_rfc)])

pipeline.fit(X_train, np.ravel(y_train))
yhat = pipeline.predict(X_test)

print(accuracy_score(y_test, yhat))

0.9879101899827288


In [None]:
#The accuracy is up to 98.79 percent now.

# <font color="blue">FOR GRADING ONLY</font>

Save your best model into your github. And create a single code cell that loads it and evaluate it on the following test dataset: 
https://github.com/andvise/DataAnalyticsDatasets/blob/main/dm_assignment2/sat_dataset_test.csv

In [None]:
from joblib import dump, load
from io import BytesIO
import requests

# INSERT YOUR MODEL'S URL
mLink = 'URL_OF_YOUR_MODEL_SAVED_IN_YOUR_GITHUB_REPOSITORY?raw=true'
mfile = BytesIO(requests.get(mLink).content)
model = load(mfile)
# YOUR CODE HERE

MissingSchema: ignored