# Exploration of the dataset
This Data set is used to know  which of the users purchased/not purchased a particular product

# Description
<br>
<dl>
  <dt>User ID</dt>
    <dd>It stands for <strong>User Identification</strong>, and it's a bunch of numbers enables system to identify and distinguish between the users who access or use it.</dd>
  <dt>Gender</dt>
    <dd>The fact of being male or female</dd>
  <dt>Age</dt>
    <dd>The age of the user</dd>
  <dt>Estimated Salary</dt>
    <dd>The approximate salary of the user</dd>
  <dt>Purchased</dt>
    <dd>It's a logical value to know if the user  purchased/not purchased a particular product</dd>
    <dd> <strong>1: </strong> means the user purchased the product</dd>
    <dd> <strong>0: </strong> means the user didn't purchase the product</dd>
    
</dl>

In [73]:
#import the important libraries
import numpy as np 

import pandas as pd 

import seaborn as sns

import matplotlib.pyplot as plt 
%matplotlib inline

In [74]:
#import the data
train_df = pd.read_csv('Social_Network_Ads.csv')

In [75]:
#Explore the data
train_df.head()

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0


In [76]:
#Explore the shape of the training dataset
train_df.shape

(400, 5)

# Preparing The Dataset for Logistic Regression Model

In [77]:
#split the dataset into features and labels 
x_train = train_df[train_df.columns[train_df.columns != 'Purchased'] & train_df.columns[train_df.columns != 'User ID'] &train_df.columns[train_df.columns != 'Gender']]
y_train = train_df[train_df.columns[train_df.columns == 'Purchased']]

#lets print the shape of the features and labels
print("The Shape of x_train: ",x_train.shape)
print("The Shape of y_train: ",y_train.shape)

The Shape of x_train:  (400, 2)
The Shape of y_train:  (400, 1)


In [78]:
#Explore the features after removing some of them
x_train

Unnamed: 0,Age,EstimatedSalary
0,19,19000
1,35,20000
2,26,43000
3,27,57000
4,19,76000
...,...,...
395,46,41000
396,51,23000
397,50,20000
398,36,33000


## Note:

We remove <strong> "User ID" Feature </strong> because it doesn't affect the output so, we don't need it.
<br>

Also, we remove <strong> "Gender" Feature </strong> because we don't explain One-Hot-Encoding in this lab

In [79]:
#Convert the fearues into numpy array so we can feed them to our model
X_without_Xo = x_train.values

In [49]:
#Define the number of the training examples
m = X_without_Xo.shape[0]

print("The Number of Training examples: {0}".format(m))

The Number of Training examples: 400


# Define Some Helper Functions
<br>
<dl>
  <dt>sigmod()</dt>
  <dd>For applying sigmoid function  on our design matrix, X, to be our hypothsis for non-linear problems</dd>
    <dt>featureNormalize()</dt>
  <dd>For feature scalling to make the gradient descent converge much more quickly</dd>
  <dt>computeCost()</dt>
  <dd>For Computing cross entropy cost function for Logistic Regression model </dd>
  <dt>gradientDescent()</dt>
  <dd>For Updating the parameters</dd>
  <dt>pred()</dt>
  <dd>For making predictions</dd>
</dl>


<br>

### The Formula of Sigmoid Function in Vectorized Form:
\begin{equation}
 sigmoid(z) = \frac{1}{1 + e^{-\theta^{T}X}} = \frac{1}{1 + e^{-X\theta}}
\end{equation}

In [61]:
 def sigmoid(X,theta):
        '''
        Usage:
          #sigmoid --> Computes sigmoid of z = X𝜃 element-wise.
  
        Arguments:
          #X --> The Design Matrix
          #theta --> The Parameters which need to update
    
        Returns:
          #returns ---> The Value of sigmoid of z
        '''
        #Compute  Our linear-hypothesis function 
        z = np.matmul(X,theta)
        
        #computes the sigmoid of z
        g = np.divide(1, (np.add(1, np.exp(-z))))
        
        return g

In [83]:
def featureNormalize(X):
    '''
    Usage:
      #featureNormalize--> used for normalizing features
  
    Arguments:
      #X --> The Design Matrix
    
    Returns:
      #The Normalized Matrix
      
    Notes:
      #X is a matrix where each column is a feature and each row is an example
      #So, you need to perform the normalization separately for each feature
    '''
    
    #Preallocating some variables to be used later 
    X_norm = np.copy(X)
    mu = np.zeros((1, X.shape[1]))
    sigma = np.zeros((1,X.shape[1]))

    #compute the mean of the feature and subtract it from the dataset, storing the mean value in mu
    #Next, compute the standard deviation of each feature, storing the standard deviation in sigma.
    for i in range(X.shape[1]):
        mu[0, i] = mu[0, i] + np.mean(X_norm[:, i])
        sigma[0, i] = sigma[0, i] + np.std(X_norm[:, i])
        
    #Finally, compute the standard deviation of each feature and divide each feature by it's standard deviation, storing the result in x_norm
    for i in range(X.shape[1]):
        X_norm[:, i] = np.divide(np.subtract(X_norm[:, i], mu[0, i]), sigma[0, i])
        
    return X_norm, mu, sigma

### The Formula of Cross Entropy Cost Functin:

\begin{equation}
  CE = \sum_{i=1}^{m} {Loss(y_{pred},y)} =\frac{1}{m}\sum_{i=1}^{m} {(y^{(i)})(-\log(y_{pred}^{(i)})) - (l-y^{(i)})(\log(1-y_{pred}^{(i)}))}
\end{equation}

### Note: 
The type of product in this formula is <strong> element-wise multiplication</strong>

In [44]:
def computeCost(X,y,theta):
    '''
    Usage:
      #computCost --> computes the cost for logistic regression
  
    
    Arguments:
      #X --> The Design Matrix
      #y --> The Ground Truth
      #theta --> The Parameters which need to update
    
    Returns:
      #The cost value
    '''
    
    #Compute  Our non-linear hypothesis function 
    h = sigmoid(X,theta)
    
    #Compute the losses
    losses = np.subtract(np.multiply(-y,np.log(h)), np.multiply((1-y),np.log(1-h)))
     
    #Compute the Cross Entropy Cost function
    J = (1/m)*(np.sum(losses))
    
    return J

## Note:
The terms cost and loss functions almost refer to the same meaning. But, loss function mainly applies for a single sample in the training set as compared to the cost function which deals with a penalty for a number of training sets 

<strong>In short, </strong> The cost Fucntion compute the error over all the trianing set while, the loss function compute the error over a single sample in the training set

<br>

### The Formula of The Gradient is:

\begin{equation}
\frac{\partial J}{\partial \theta_{j}} = \frac{1}{m} \big(\sum_{i=1}^{m} { (y_{pred}^{(i)} - y^{(i)}) x^{(i)}_{j} \big)}
\end{equation}


The type of product in this formula is <strong> element-wise multiplication</strong>

In [1]:
def gradientDescent(X,y,theta,alpha,num_iters):
    '''
    Usage:
      #gradientDescent --> computes the gradient descent for linear regression
  
    
    Arguments:
      #X --> The Design Matrix
      #y --> The Ground Truth
      #theta --> The Parameters which need to update
      #alpha --> is the learning rate which indicates the learning step or how far we go down 
      #num_iters--> is the number of iterations needed to go to the global optima
    
    Returns:
      #The updated parameters,theta 
      #cost_history: which is list containing the the values of the cost function, J, for every iteration
    '''
    #Define the cost history as empty list
    cost_history = []
    
    #Preallocating gradient for faster computaions 
    #The size of gradient equals:(numfeatures (includingx_0),)
    dtheta = np.zeros((X.shape[1],))

    
    #Keep until Convergence
    for i in range(num_iters+1):
        
        #Compute sigmoid of X𝜃 element-wise with the parameters, theta, that we intialize
        h = sigmoid(X,theta)
        
        #dtheta is the partial derivates of cost function with respect to the parameters, theta
        dtheta = (1/m)*(np.matmul(X.T, (np.subtract(h, y))))
        
        #Update theta
        theta = theta - alpha*dtheta
        
        #While debugging, it can be useful to print out the values of the cost function (computeCost) 
        cost = computeCost(X,y,theta)
        
        #Append the value of the cost at a specific value for theta to cost_history
        cost_history.append(cost)
        
        #print the cost function for every itration to track its new value step-by-step
        print("Reached iteration: {0}, the cost = {1}".format(i, cost))
    
    print("\n\nParameters have been trained!") 
    
    return theta, cost_history

In [113]:
def pred(input_pred,theta):
    '''
    Usage:
      #pred --> used to predict the output of the input
      
    Arguments:
      #input_pred --> the input you want to predict its output 
      #theta --> the updated param
    
    Returns:
      #The predicted output
    '''
    #First, Normalize the input 
    #input_pred,_,_ = featureNormalize(input_pred)
    
    #Compute the prediction
    prediction = sigmoid(input_pred,theta)
    
    print("The output: {0}\n".format(prediction))
    
    #check if the prediction is greater than 0.5 , the user purchased the product 
    if (np.greater(prediction,0.5) == True):
        print("Result: The user purchased the product ")
        
    #if not, the user didn't purchase the product
    else:
        print("Result: The user didn't purchase the product ")
        
    return prediction

# Training The Model

In [84]:
#First, we normalize the features
#_,_ indicate that we don't want to return mu and sigma so the function just return the normalize features
X_normalized,_,_ = featureNormalize(X_without_Xo)

In [96]:
#Then We concatenate x_0 with X_normalized 

#Create array of ones , which representes the x_0, to combine it with the rest of the features
ones = np.ones((400,1))

#Combining, so the shape of  the features will be (400,5) 
X = np.concatenate((ones, X_normalized), axis = 1)

#(Optional) --> lets reduce the rank of the output, y, to be (400,) instead of (400,1) 
#And, convert it to numpy array  of size (400,)
y = y_train.values.reshape(400,)

In [97]:
#Print the shape of the final features matrix  and the labels
print("The Shape of x: ",X.shape)
print("The Shape of y: ",y.shape)

The Shape of x:  (400, 3)
The Shape of y:  (400,)


In [100]:
#Train the model
theta, cost_history =  gradientDescent(X,y,theta=np.array([0,0,0]),alpha = 0.001,num_iters = 2000)

Reached iteration: 0, the cost = 0.6930879513485616
Reached iteration: 1, the cost = 0.6930287324644099
Reached iteration: 2, the cost = 0.6929695239074896
Reached iteration: 3, the cost = 0.6929103256777995
Reached iteration: 4, the cost = 0.6928511377753381
Reached iteration: 5, the cost = 0.6927919602001031
Reached iteration: 6, the cost = 0.6927327929520922
Reached iteration: 7, the cost = 0.6926736360313018
Reached iteration: 8, the cost = 0.6926144894377285
Reached iteration: 9, the cost = 0.6925553531713683
Reached iteration: 10, the cost = 0.692496227232217
Reached iteration: 11, the cost = 0.6924371116202691
Reached iteration: 12, the cost = 0.6923780063355196
Reached iteration: 13, the cost = 0.692318911377962
Reached iteration: 14, the cost = 0.6922598267475902
Reached iteration: 15, the cost = 0.6922007524443976
Reached iteration: 16, the cost = 0.6921416884683764
Reached iteration: 17, the cost = 0.6920826348195189
Reached iteration: 18, the cost = 0.692023591497817
Reache

In [101]:
#Explore the trained parameters
theta

array([-0.2851425 ,  0.33266625,  0.21260625])

# Make Prediction

In [114]:
#Lets make prediction, in this case we wil make predictions one of the training examples 
print("The Ground Truth: {0}".format(y[0]))
prediction = pred(X[0], theta)

The Ground Truth: 0
The output: 0.303557327759583

Result: The user didn't purchase the product 


# Congratulations!