<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

# Hyperparameter Tuning

## *Data Science Unit 4 Sprint 2 Assignment 4*

## Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: <https://drive.google.com/file/d/1dfbAsM9DwA7tYhInyflIpZnYs7VT-0AQ/view> 

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


# DAY1

### Learning Objectives
- Describe the foundational components of a neural network
- Implement a Perceptron from scratch in Python

#### Input Layer:

The input Layer is where the feature data from the dataframe are input or where inputs from other neurons are recieved.

#### Hidden Layer:

These are the layer that exist between the input layer and output layer. You cna have one hidden layer or many hidden layers

#### Output Layer:

This is the answer/result of our neurons in our neural netoworks. These ouputs can then be used as inputs for the next layer of neurons or be the final output(s) of the neural network.

#### Neuron:

The neuron recieves inputs, multiplies the inputs by their weights, sums everyhting up, and then applies the activation function to the sum. Usually involves a continuous activation function

#### Weight:

This is the amount or positive or negative effect an input will be associated with the ending output.

#### Activation Function:

The activation function is how the neural network normalizes the results after inputs, weights, and biases have been applied within the neuron.

#### Node Map:

The node maps show how the features of the dataframe or the outputs of upper level neurons are further processed throughout the neural netowork. It shows inputs, outputs, and hidden layers visualized at a high level.

#### Perceptron:

Simply, a perceptron consists of four distinct parts. Uses a binary activation function that is either activate or not, different from a neuron

    Inputs
    Weights
    Weighted Sum
    Activation Function (Output)

Perceptrons classify data into two parts (0,1) most of the time. Perceptrons are also known as Linear Binary Classifiers


#### Inputs -> Outputs
Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?
Your Answer Here

Depending on your network, Inputs and Outputs can range arbitraily. Each input can come from an upper level neuron or the intial inputted values from a dataframe. Each input can be weighted negatively or positvely depending on whether your desired answer needs the neuron to activate negatively or positively depending how your inputted bias has shifted the activation curve up or down.


### Imports

In [1]:
!pip install category-encoders



In [18]:
import numpy as np
import pandas as pd
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

from tensorflow.keras.datasets import mnist

import category_encoders as ce

In [3]:
#Load Data
df = sns.load_dataset('tips')
df.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


In [4]:
print(df.shape)
df.head()

(244, 7)


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [79]:
def prep(df, target):
    
    """
    This function will:
    1. Change "size" into a catagorical to be one hotted
    2. Add Total and Tip and put into 3 bins
    3. Split data
    4. Create X and y train/test
    5. process X train/test data by one hotting categoricals
    6. Make 'sex' a binary column
    7. return 4 df's
    """
    df['size'] = df['size'].astype(str)
    df['bill_tip_sum'] = pd.qcut(df['total_bill']+df['tip'], 3, labels=['low', 'medium', 'high'])
    
    training, testing = train_test_split(df, test_size=.2)
    
    X_train = training.drop(columns=target)
    y_train = training[target]
    X_test = testing.drop(columns=target)
    y_test = testing[target]
    
    processor = make_pipeline(
        ce.OneHotEncoder(use_cat_names=True),  
#        SimpleImputer(strategy='median'),
#        StandardScaler()
    )
    
    gender = {'Female': 0, 'Male': 1}
    y_train = y_train.map(gender)
    y_test = y_test.map(gender)
    
    X_process_train = processor.fit_transform(X_train)
    X_process_test = processor.transform(X_test)
    
    return X_process_train,y_train, X_process_test, y_test

In [80]:
X_train, y_train, X_test, y_test = prep(df, 'sex')
print(X_train.shape) 
print(X_test.shape) 
print(y_train.shape) 
print(y_test.shape)
X_train.head()

(195, 19)
(49, 19)
(195,)
(49,)


Unnamed: 0,total_bill,tip,smoker_Yes,smoker_No,day_Thur,day_Fri,day_Sat,day_Sun,time_Lunch,time_Dinner,size_2,size_4,size_3,size_5,size_1,size_6,bill_tip_sum_low,bill_tip_sum_medium,bill_tip_sum_high
69,15.01,2.09,1,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,0
60,20.29,3.21,1,0,0,0,1,0,0,1,1,0,0,0,0,0,0,1,0
11,35.26,5.0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,0,0,1
154,19.77,2.0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,0,1,0
56,38.01,3.0,1,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,1


In [7]:
class NNet:
    def __init__(self):
        
        # Inputs must be == to number of features
        self.inputs = 19
        # Only one output node b/c only trying to predict one thing
        self.outputNodes = 1
        
        self.weights = np.random.rand(self.inputs, self.outputNodes)
     
    # Squishify
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    
    # Create 0 or 1 from prediced activated output
    def binary(self, X):
        binary = self.feed_forward(X)
        binary = [1 if x > .9999 else 0 for x in binary]
        return binary
    
     
    def feed_forward(self, X):
        """Calculate the NNet inference using the feed forward, aka predict """
        
        # Combining  inputs and weights in a weighted sum
        self.input_sum = np.dot(X, self.weights)
        
        # Apply activation function to the weighted sum
        self.output_activated = self.sigmoid(self.input_sum)
        
        return self.output_activated

In [8]:
nn = NNet()

In [9]:
y_pred1 = nn.binary(X_train)
score = accuracy_score(y_train, y_pred1)

y_pred2 = nn.binary(X_test)
score2 = accuracy_score(y_test, y_pred2)

print(f"Mean baseline for our target(Males) is {round(df['sex'].value_counts(normalize=True)[0]*100, 2)}%")
print(f"The accuracy of the train is {round(score*100, 2)}%")
print(f"The accuracy of the test is {round(score2*100, 2)}%")

Mean baseline for our target(Males) is 64.34%
The accuracy of the train is 65.13%
The accuracy of the test is 67.35%


# Day 2

### Learning Objectives
- Explain the intuition behind backproprogation
- Implement gradient descent + backproprogation on a feedforward neural network

In [99]:
# I want activations that correspond to negative weights to be lower
# and activations that correspond to positive weights to be higher

class NNetbackprop:
    def __init__(self):
        self.inputs = 19
        
        # Hidden Nodes is arbitrary Number
        self.hiddenNodes = 4
        
        # Only one output node b/c only trying to predict one thing
        self.outputNodes = 1
        
        self.weights1 = np.random.rand(self.inputs, self.hiddenNodes)
        self.weights2 = np.random.rand(self.hiddenNodes, self.outputNodes)
        
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        """Calculate the NNet inference using the feed forward, aka predict """
        
        # Combining  inputs and weights in a weighted sum
        self.hidden_sum = np.dot(X, self.weights1)
        
        # Applying sigmoid to weighted sums
        # Activated Values
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        # Weight sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        # Apply activation function to the weighted sums
        self.activated_output = self.sigmoid(self.output_sum)
    
        return self.activated_output
    
    def backward(self, X, y, o):
        """Back Prop through Network"""
        
        self.o_error = y - o
        
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        
        self.z2_error = self.o_delta.dot(self.weights2.T)
        
        self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
        
        self.weights1 += X.T.dot(self.z2_delta)
        
        self.weights2 += self.activated_hidden.T.dot(self.o_delta)
        
    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X,y,o)

### Load Data

In [100]:
print(X_train.values.shape)
print(y_train.values.shape)

(195, 19)
(195,)


#### What is Backproprogation?

[0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0]

## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset using hyperas or hyperopt (if you're brave)
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?