<img src="images/cover.png">

# AI2E - [Workshop 4] - [Introduction to ML Part 2 ]

Understand and perform logistic regression on datasets with different number of features.  
  
<h3>Content</h3> 
<ol>
    <li>Logistic Regression from Scratch.</li>  
    <li>Logistic Regression with scikit-learn.</li>    
    <li>Conclusion</li>
</ol>    
 

<h3> 1. Logistic Regression From Scratch </h3>

In [None]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [None]:
df = pd.read_csv('csv/1.csv')
print(df.head())

In [None]:
print(df.isna().sum()) #if there is missing values

In [None]:
print(df.admit.value_counts()) #distributions of admit

In [None]:
target = df.pop('admit').values 
df.drop('id',axis=1,inplace=True) #remove the id column
df = (df - df.mean()) / df.std() #normalize the data
df['const'] = 1 #add a column that has only the value 1 for the regression
nb_variables = len(df.columns) #count the number of features
var_names = df.columns
df = df.values #convert to numpy array
x_train,x_test,y_train,y_test = train_test_split(df,target,test_size=0.2)

In [None]:

def sigmoid(x): 
    return 1/(1+np.exp(-x))

def model(estimators,df): #return the result of the logistic regression applied on df
    return sigmoid(np.dot(df,estimators))

def gradient(estimators,df,target): #compute the gradient of the log loss accoring to the estimators
    return 1/len(df)*np.dot(df.transpose(),model(estimators,df) - target) 

def loss(estimators,df,target): #compute the log loss
    return -np.log(model(estimators,df[target==1])).sum() - np.log(1-model(estimators,df[target==0])).sum()

def train(df,target,epochs,lambd): #optimize the loss function
    estimators = np.random.normal(0,1,nb_variables)
    for epoch in range(epochs):
        LOSS.append(loss(estimators,df,target))
        gr = gradient(estimators,df,target)
        estimators -= lambd * gr 
    return estimators    

In [None]:
EPOCHS = 500
LOSS  = []
lambd = .01
estimators = train(x_train,y_train,EPOCHS,lambd) 
plt.plot(range(len(LOSS)),LOSS)
plt.show()

In [None]:
def accuracy(estimators,x_test,y_test):
    predictions = model(estimators,x_test)
    convert = np.vectorize(lambda x: 1 if x>.5 else 0)
    predictions = convert(predictions)
    print('the accuracy on test is ',1- ((predictions-y_test)**2).mean())
accuracy(estimators,x_test,y_test)    

In [None]:
plt.bar(var_names,estimators) #show the estimators
plt.show()

<h3>2. Logistic Regression with scikit-learn </h3>

In [None]:
lr = LogisticRegression(penalty='l2',max_iter=100)
lr.fit(x_train,y_train)
print('accuracy on test is ',lr.score(x_test,y_test))

In [None]:

plt.bar(var_names,lr.coef_[0]+lr.intercept_)
plt.show()

<h3>3. Conclusion </h3>

Linear and Logistic Regression are the hello world of machine learning, those models are simple but sometimes not complexe enough to approximate complexe relationship, in the next lesson you will learn more powerful models like boosting trees
