<a href="https://colab.research.google.com/github/J0AZZ/data-science-studies/blob/master/BasicBinaryClassification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Student Classification
### by J0AZZ
### https://www.github.com/J0AZZ
This exercise was designed in the first chapter of the book Practical Machine Learning with Python, whose dataset and notebooks are available by the author at https://github.com/dipanjanS/practical-machine-learning-with-python.

The code below is a simple example and should not be considered as a solution to a real-world problem, given that we only tried to show some techniques used to solve this kind of issue, that is, a binary classification.

Althought it was not mentioned in that book's section, a model is properly evaluated when exposed to unseen data. In order to understand the generalization capability of the model we splitted the (already escassy) dataset into training and testing sets, as a complementary exercise on the theme.

#### Framework and Data

In [96]:
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.externals import joblib
import os
import pandas as pd
import numpy as np

df = pd.read_csv("https://raw.githubusercontent.com/dipanjanS/practical-machine-learning-with-python/master/notebooks/Ch01_Machine_Learning_Basics/student_records.csv")
df



Unnamed: 0,Name,OverallGrade,Obedient,ResearchScore,ProjectScore,Recommend
0,Henry,A,Y,90,85,Yes
1,John,C,N,85,51,Yes
2,David,F,N,10,17,No
3,Holmes,B,Y,75,71,No
4,Marvin,E,N,20,30,No
5,Simon,A,Y,92,79,Yes
6,Robert,B,Y,60,59,No
7,Trent,C,Y,75,33,No


#### Preprocessing

In [97]:
feature_names = ["OverallGrade", "Obedient", "ResearchScore", "ProjectScore"]

target = df["Recommend"]
features = df[feature_names]

numerical_keys = ["ResearchScore", "ProjectScore"]
categorical_keys = ["OverallGrade", "Obedient"]

In [98]:
# conditional attribution on (target=="Yes"): 1 for true, 0 for false
target = np.where(target == "Yes", 1, 0)

# scaling
ss = StandardScaler()
ss.fit(df[numerical_keys])
features[numerical_keys] = ss.transform(df[numerical_keys])

# one-hot encoding
features = pd.get_dummies(features, columns=categorical_keys)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, value[:, i].tolist())


#### Training

In [99]:
# validation split
test_target = target[5:]
test_features = features[5:]

# training split
target = target[:5]
features = features[:5]

In [100]:
lr = LogisticRegression()

model = lr.fit(features, target)

model

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

#### Evaluation

In [101]:
predictions = model.predict(test_features)
print("Accuracy: ", float(accuracy_score(test_target, predictions))*100, "%")
print("Classification Stats: \n", classification_report(test_target, predictions))

Accuracy:  100.0 %
Classification Stats: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00         2
           1       1.00      1.00      1.00         1

    accuracy                           1.00         3
   macro avg       1.00      1.00      1.00         3
weighted avg       1.00      1.00      1.00         3



#### Deploy

In [102]:
# to save and load the model and the scaler we need to run it on a local machine

if not os.path.exists('Model'):
  os.mkdir('Model')
if not os.path.exists('Scaler'):
  os.mkdir('Scaler')

# create the files
joblib.dump(model, r'Model/model.pickle')
joblib.dump(ss, r'Scaler/scaler.pickle')

# load the files
model = joblib.load(r'Model/model.pickle')
scaler = joblib.load(r'Scaler/scaler.pickle')

['Scaler/scaler.pickle']