<!-- 
Author: Brian Thomas Ross <ml@brianthomasross.com>
License: BSD-3-Clause
-->

# Neural Network Foundations

----

This study guide should reinforce and provide practice for all the concepts you have seen in the past week. There are a mix of written question and coding exercises, both are equally important to prepare you for the sprint challenge, as well as being able to comfortably speak on these topics in interviews and on the job.

If you get stuck or unsure of something remember the 20 minute rule. If that doesn't help then research a solution with [Google](https://www.google.com/) or [StackOverflow](https://wwww.stackoverflow.com/). Only once you have truly exhausted these methods should you turn to your Team Lead. They wont be there during the sprint challenge or during an interview. That being said, don't hesitate to ask for help if you truly are stuck.

Have fun!

----

## Definitions

Use your own words to define the following terms

### Input Layer

Input layer — initial data for the neural network.

### Hidden Layer

Hidden layers — intermediate layer between input and output layer and place where all the computation is done.

### Output Layer

Output layer — produce the result for given inputs.

In [None]:
https://towardsdatascience.com/everything-you-need-to-know-about-neural-networks-and-backpropagation-machine-learning-made-easy-e5285bc2be3a

### Neuron

Within an artificial neural network, a neuron is a mathematical function that mimics a biological neuron. Typically, a neuron computes the weighted sum of its input, and this sum is passed through a nonlinear function, often called activation function, such as the sigmoid. 
https://i.stack.imgur.com/wXL9A.png
https://stats.stackexchange.com/questions/241888/what-are-neurons-in-neural-networks-how-do-they-work/241904#:~:text=Within%20an%20artificial%20neural%20network,function%2C%20such%20as%20the%20sigmoid.

![alt text](wXL9A.png "Title")

![](https://i.stack.imgur.com/wXL9A.png)

### Weight

 Weights, on the other hand, can be thought of as the strength of the connection. Weight affects the amount of influence a change in the input will have upon the output. A low weight value will have no change on the input, and alternatively a larger weight value will more significantly change the output.

### Bias

Simply, bias represents how far off the predictions are from their intended value. Biases make up the difference between the function's output and its intended output. A low bias suggest that the network is making more assumptions about the form of the output, whereas a high bias value makes less assumptions about the form of the output.
https://deepai.org/machine-learning-glossary-and-terms/weight-artificial-neural-network#:~:text=Weight%20is%20the%20parameter%20within,weight%2C%20and%20a%20bias%20value.&text=Often%20the%20weights%20of%20a,hidden%20layers%20of%20the%20network.

### Acitivation Function


In Neural Network the activation function defines if given node should be “activated” or not based on the weighted sum. Let’s define this weighted sum value as z. In this section I would explain why “Step Function” and “Linear Function” won’t work and talk about “Sigmoid Function” one of the most popular activation functions. There are also other functions which I will leave aside for now.https://towardsdatascience.com/everything-you-need-to-know-about-neural-networks-and-backpropagation-machine-learning-made-easy-e5285bc2be3a

![](https://miro.medium.com/max/939/1*uz3wd5YeVYlU2JR8rE9VDA.png)

### Node Map

a map showing the different neurons and then reslatioships betweeeen different nodes based on weights

### Perceptron

Perceptron is a single layer neural network (not including the inputs) and a multi-layer perceptron is called Neural Networks.; It consists of Input values,
Weights and Bias,the Net sum, and an Activation Function.
https://towardsdatascience.com/what-the-hell-is-perceptron-626217814f53

### Epoch

One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9

### Feed Forward Neural Network

They are called feedforward because information only travels forward in the network (no loops), first through the input nodes, then through the hidden nodes (if present), and finally through the output nodes.

### Back Propogation

In machine learning, backpropagation (backprop,[1] BP) is a widely used algorithm in training feedforward neural networks for supervised learning. Generalizations of backpropagation exist for other artificial neural networks (ANNs), and for functions generally – a class of algorithms referred to generically as "backpropagation".[2] In fitting a neural network, backpropagation computes the gradient of the loss function with respect to the weights of the network for a single input–output example, and does so efficiently, unlike a naive direct computation of the gradient with respect to each weight individually. This efficiency makes it feasible to use gradient methods for training multilayer networks, updating weights to minimize loss;https://en.wikipedia.org/wiki/Backpropagation

----

## Concepts

Answer the following questions using your own words

### Casually explain the steps involved to go from input to output in a simple neural network. How are predictions generated?

As I understand it, we are essentialy pluggin in our our inputs and we create a nerural network that gets train on the data. From there depending on our hyperparameters, batch, optimizer. layers, weights, bias and others the model then is able to output a prediction. Ideally our loss function is low and our accuracy is high, if this is not case then we can always go back to the drawing board. Neuronetworks learn by a  "feedback process called backpropagation (sometimes abbreviated as "backprop"). This involves comparing the output a network produces with the output it was meant to produce, and using the difference between them to modify the weights of the connections between the units in the network, working from the output units through the hidden units to the input units—going backward, in other words. In time, backpropagation causes the network to learn, reducing the difference between actual and intended output to the point where the two exactly coincide, so the network figures things out exactly as it should."https://www.explainthatstuff.com/introduction-to-neural-networks.html#:~:text=Information%20flows%20through%20a%20neural,arrive%20at%20the%20output%20units.

### What kind of use cases exist for Neural Networks?

Some cases but not limited include: olving many business problems such as sales forecasting, customer research, data validation, and risk management. For example, at Statsbot we apply neural networks for time-series predictions, anomaly detection in data, and natural language understanding.https://blog.statsbot.co/neural-networks-for-beginners-d99f2235efca ; 

### How does a neural network address the curse of dimensionality

one possible explanation :  high dimensional data there is some underlying pattern in lower level dimensions that deep learning methods are good at exploiting. So given a high dimensional matrix that represents images, neural networks excel at finding low dimensional features that are not apparent in the high dimensional representation. https://hackernoon.com/what-killed-the-curse-of-dimensionality-8dbfad265bbe ;  traditional Machine Learning algorithms will certainly reach a level, where more data doesn’t improve their performance. The chart below illustrates that perfectly:https://www.experfy.com/blog/pros-and-cons-of-neural-networks/

### What are some potential pro / cons of using neural netowrks versus more traditional statistical models such as logistic regression or a decision tree.

https://www.marktechpost.com/2019/04/18/introduction-to-neural-networks-advantages-and-applications/#:~:text=Advantages%20of%20Neural%20Networks%3A,does%20not%20affect%20its%20working.
Main Advantages are Neural Networks have the ability to learn by themselves and produce the output that is not limited to the input provided to them.
The input is stored in its own networks instead of a database, hence the loss of data does not affect its working.
These networks can learn from examples and apply them when a similar event arises, making them able to work through real-time events.
Even if a neuron is not responding or a piece of information is missing, the network can detect the fault and still produce the output.
They can perform multiple tasks in parallel without affecting the system performance. 
https://www.experfy.com/blog/pros-and-cons-of-neural-networks/ The probably best-known disadvantage of Neural Networks is their “black box” nature, meaning that you don’t know how and why your NN came up with a certain output. For example, when you put in an image of a cat into a neural network and it predicts it to be a car, it is very hard to understand what caused it to came up with this prediction. When you have features that are human interpretable, it is much easier to understand the cause of its mistake. In Comparison, algorithms like Decision trees are very interpretable. This is important because in some domains, interpretability is quite important.

### How would you determine the size of the input layer?

Every network has a single input layer and a single output layer. The number of neurons in the input layer equals the number of input variables in the data being processed. The number of neurons in the output layer equals the number of outputs associated with each input. But the challenge is knowing the number of hidden layers and their neurons.https://towardsdatascience.com/beginners-ask-how-many-hidden-layers-neurons-to-use-in-artificial-neural-networks-51466afa0d3e

----

## Code

This is an open ended challenge using the titanic dataset.

In [83]:
import numpy as np
import pandas as pd
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

import time
from datetime import timedelta

In [96]:
# Load data
train_csv = 'train.csv'
test_csv = 'test.csv'

train_df = pd.read_csv(train_csv, index_col='PassengerId')
test_df = pd.read_csv(test_csv, index_col='PassengerId')

In [97]:
#Exploring data by looking at df.info and df.head()
train_df.info(), test_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 891 entries, 1 to 891
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  891 non-null    int64  
 1   Pclass    891 non-null    int64  
 2   Name      891 non-null    object 
 3   Sex       891 non-null    object 
 4   Age       714 non-null    float64
 5   SibSp     891 non-null    int64  
 6   Parch     891 non-null    int64  
 7   Ticket    891 non-null    object 
 8   Fare      891 non-null    float64
 9   Cabin     204 non-null    object 
 10  Embarked  889 non-null    object 
dtypes: float64(2), int64(4), object(5)
memory usage: 83.5+ KB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 418 entries, 892 to 1309
Data columns (total 10 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Pclass    418 non-null    int64  
 1   Name      418 non-null    object 
 2   Sex       418 non-null    object 
 3   Age       332 non

(None, None)

In [98]:
train_df.head()

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [199]:

# Create wrangle function
def wrangle(df):
  df.drop(['Name', 'Ticket', 'Cabin',], axis=1, inplace=True)
  df = df.dropna()
     
   
  return df

# Wrangle feature matrix
X = wrangle(train_df.copy())

# Separate labels from feature matrix
y = X.pop('Survived')


In [200]:
X, y

(             Pclass     Sex   Age  SibSp  Parch     Fare Embarked
 PassengerId                                                      
 1                 3    male  22.0      1      0   7.2500        S
 2                 1  female  38.0      1      0  71.2833        C
 3                 3  female  26.0      0      0   7.9250        S
 4                 1  female  35.0      1      0  53.1000        S
 5                 3    male  35.0      0      0   8.0500        S
 ...             ...     ...   ...    ...    ...      ...      ...
 886               3  female  39.0      0      5  29.1250        Q
 887               2    male  27.0      0      0  13.0000        S
 888               1  female  19.0      0      0  30.0000        S
 890               1    male  26.0      0      0  30.0000        C
 891               3    male  32.0      0      0   7.7500        Q
 
 [712 rows x 7 columns],
 PassengerId
 1      0
 2      1
 3      1
 4      1
 5      0
       ..
 886    0
 887    0
 888    1

In [201]:
# Split and scale the data
import numpy as np
from sklearn.model_selection import train_test_split
X_train, X_val,y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=42)
#Check shape of train val split
X_train.shape, X_val.shape, y_train.shape, y_val.shape

((569, 7), (143, 7), (569,), (143,))

In [202]:
X_train

Unnamed: 0_level_0,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
473,2,female,33.0,1,2,27.7500,S
433,2,female,42.0,1,0,26.0000,S
667,2,male,25.0,0,0,13.0000,S
31,1,male,40.0,0,0,27.7208,C
292,1,female,19.0,1,0,91.0792,C
...,...,...,...,...,...,...,...
94,3,male,26.0,1,2,20.5750,S
136,2,male,23.0,0,0,15.0458,C
339,3,male,45.0,0,0,8.0500,S
550,2,male,8.0,1,1,36.7500,S


In [203]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 569 entries, 473 to 132
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Pclass    569 non-null    int64  
 1   Sex       569 non-null    object 
 2   Age       569 non-null    float64
 3   SibSp     569 non-null    int64  
 4   Parch     569 non-null    int64  
 5   Fare      569 non-null    float64
 6   Embarked  569 non-null    object 
dtypes: float64(2), int64(3), object(2)
memory usage: 35.6+ KB


In [204]:
from sklearn.preprocessing import OneHotEncoder,OrdinalEncoder
enc = OrdinalEncoder()
ss = StandardScaler()
#encode columns known to have string values using ordinal encoding
X_train[['Sex','Embarked']] = enc.fit_transform(X_train[['Sex','Embarked']])
X_val[['Sex','Embarked']] = enc.fit_transform(X_val[['Sex','Embarked']])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_train[['Sex','Embarked']] = enc.fit_transform(X_train[['Sex','Embarked']])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, value[:, i].tolist())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_val[['Sex','Embarked']] = enc.fit_transform(X_val[['Sex','Embarked']])
A value is trying to

In [205]:
X_train.shape, X_val.shape

((569, 7), (143, 7))

In [206]:
X_train = ss.fit_transform(X_train)
X_val = ss.fit_transform(X_val)

In [207]:
X_train.shape, X_val.shape

((569, 7), (143, 7))

In [212]:
 #helps display tensoryboard
%load_ext tensorboard

In [231]:

from tensorflow.keras.callbacks import EarlyStopping, TensorBoard
from tensorflow.keras.layers import Flatten, Dense
from tensorflow.keras.layers import ReLU
import tensorflow as tf
import os

logdir = os.path.join("logs", "EarlyStopping-Loss")

tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
stop = EarlyStopping(monitor='val_loss', min_delta=0.01, patience=3)

def create_model(units=320, activation='relu', learning_rate=.001):
    model = tf.keras.Sequential([Dense(units=units,input_shape=(569,7), activation=activation),
       Dense(units=150,activation = 'softmax'),
       Dense(100, activation ='softmax')])
    model.compile(
      optimizer= tf.keras.optimizers.Adamax(learning_rate=learning_rate),
      loss='sparse_categorical_crossentropy',
      metrics=['accuracy'])
    return model
model = create_model()

model.fit(X_train, y_train, epochs=99, 
          validation_data=(X_val,y_val),
          callbacks=[tensorboard_callback, stop])

Epoch 1/99
Epoch 2/99
Epoch 3/99
Epoch 4/99
Epoch 5/99
Epoch 6/99
Epoch 7/99
Epoch 8/99
Epoch 9/99
Epoch 10/99
Epoch 11/99
Epoch 12/99
Epoch 13/99
Epoch 14/99
Epoch 15/99
Epoch 16/99
Epoch 17/99
Epoch 18/99
Epoch 19/99
Epoch 20/99
Epoch 21/99
Epoch 22/99
Epoch 23/99
Epoch 24/99
Epoch 25/99
Epoch 26/99
Epoch 27/99
Epoch 28/99
Epoch 29/99
Epoch 30/99
Epoch 31/99
Epoch 32/99
Epoch 33/99
Epoch 34/99
Epoch 35/99
Epoch 36/99
Epoch 37/99
Epoch 38/99
Epoch 39/99
Epoch 40/99
Epoch 41/99
Epoch 42/99
Epoch 43/99
Epoch 44/99
Epoch 45/99
Epoch 46/99
Epoch 47/99
Epoch 48/99
Epoch 49/99
Epoch 50/99
Epoch 51/99
Epoch 52/99
Epoch 53/99
Epoch 54/99
Epoch 55/99
Epoch 56/99
Epoch 57/99
Epoch 58/99
Epoch 59/99
Epoch 60/99
Epoch 61/99
Epoch 62/99
Epoch 63/99
Epoch 64/99
Epoch 65/99
Epoch 66/99
Epoch 67/99
Epoch 68/99
Epoch 69/99
Epoch 70/99
Epoch 71/99
Epoch 72/99
Epoch 73/99
Epoch 74/99
Epoch 75/99
Epoch 76/99
Epoch 77/99
Epoch 78/99
Epoch 79/99
Epoch 80/99
Epoch 81/99
Epoch 82/99
Epoch 83/99
Epoch 84/99
E

<tensorflow.python.keras.callbacks.History at 0x145652be0>

In [213]:
%tensorboard --logdir logs

In [225]:
#Using Grid Search to fit and get score of model
logdir = os.path.join("logs", "EarlyStopping-Loss")

tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

In [228]:
# create model
model = KerasClassifier(build_fn=create_model, verbose=1)

# define the grid search parameters
param_grid = {'learning_rate':[.01,.001],
              'units': [10,20],
              'activation':['relu','sigmoid'],
              'epochs': [5,10],
              # paramswrapper --we use scikit learn conforms the model scikit learn api
              }

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1,cv=2)

grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Best: 0.6957684755325317 using {'activation': 'relu', 'epochs': 10, 'learning_rate': 0.01, 'units': 20}
Means: 0.6045404076576233, Stdev: 0.01651221513748169 with: {'activation': 'relu', 'epochs': 5, 'learning_rate': 0.01, 'units': 10}
Means: 0.6238386631011963, Stdev: 0.03581047058105469 with: {'activation': 'relu', 'epochs': 5, 'learning_rate': 0.01, 'units': 20}
Means: 0.6466456651687622, Stdev: 0.058617472648620605 with: {'activation': 'relu', 'epochs': 5, 'learning_rate': 0.001, 'units': 10}
Means: 0.6589263677597046, Stdev: 0.07089817523956299 with: {'activation': 'relu', 'epochs': 5, 'learning_rate': 0.001, 'units': 20}
Means: 0.6080491840839386, Stdev: 0.020020991563796997 with: {'activation': 'relu', 'epochs': 10, 'learning_rate': 0.01, 'units': 10}
Means: 0.6957684755325317, Stdev: 0.10774028301239014 with: {'activation': 'relu', 'epochs': 10, 'learning_rate': 0.01, 