<font size="+5">#04. Why Neural Networks Deeply Learn a Mathematical Formula?</font>

- Book + Private Lessons [Here ↗](https://sotastica.com/reservar)
- Subscribe to my [Blog ↗](https://blog.pythonassembly.com/)
- Let's keep in touch on [LinkedIn ↗](www.linkedin.com/in/jsulopz) 😄

# Machine Learning, what does it mean?

> - The Machine Learns...
>
> But, **what does it learn?**

In [2]:
%%HTML
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Machine Learning, what does it mean? ⏯<br><br>· The machine learns...<br><br>Ha ha, not funny! 🤨 What does it learn?<br><br>· A mathematical equation. For example: <a href="https://t.co/sjtq9F2pq7">pic.twitter.com/sjtq9F2pq7</a></p>&mdash; Jesús López (@sotastica) <a href="https://twitter.com/sotastica/status/1449735653328031745?ref_src=twsrc%5Etfw">October 17, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

# How does the Machine Learn?

## In a Linear Regression

In [1]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/Ht3rYS-JilE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

## In a Neural Network

In [2]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/IHZwWFHWa-w?start=329" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

A Practical Example → [Tesla Autopilot](https://www.tesla.com/AI)

An Example where It Fails → [Tesla Confuses Moon with Semaphore](https://twitter.com/Carnage4Life/status/1418920100086784000?s=20)

# Load the Data

> - Simply execute the following lines of code to load the data.
> - This dataset contains **statistics about Car Accidents** (columns)
> - In each one of **USA States** (rows)

https://www.kaggle.com/fivethirtyeight/fivethirtyeight-bad-drivers-dataset/

In [3]:
import seaborn as sns

df = sns.load_dataset(name='car_crashes', index_col='abbrev')
df.sample(5)

Unnamed: 0_level_0,total,speeding,alcohol,not_distracted,no_previous,ins_premium,ins_losses
abbrev,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
MO,16.1,6.923,5.474,14.812,13.524,790.32,144.45
NE,14.9,1.937,5.215,13.857,13.41,732.28,114.82
RI,11.1,3.774,4.218,10.212,8.769,1148.99,148.58
AR,22.4,4.032,5.824,21.056,21.28,827.34,142.39
WY,17.4,7.308,5.568,14.094,15.66,791.14,122.04


# Neural Network Concepts in Python

## Initializing the `Weights`

> - https://keras.io/api/layers/initializers/

### How to `kernel_initializer` the weights?

$$
accidents = speeding \cdot w_1 + alcohol \cdot w_2 \ + ... + \ ins\_losses \cdot w_7
$$

In [4]:
from tensorflow.keras import Sequential, Input
from tensorflow.keras.layers import Dense

In [5]:
df.shape

(51, 7)

In [6]:
model = Sequential()
model.add(layer=Input(shape=(6,)))
model.add(layer=Dense(units=3, kernel_initializer='zeros'))
model.add(layer=Dense(units=1))

#### Make a Prediction with the Neural Network

> - Can we make a prediction for for `Washington DC` accidents
> - With the already initialized Mathematical Equation?

In [7]:
X = df.drop(columns='total')
y = df.total

In [8]:
AL = X[:1]

In [9]:
AL

Unnamed: 0_level_0,speeding,alcohol,not_distracted,no_previous,ins_premium,ins_losses
abbrev,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AL,7.332,5.64,18.048,15.04,784.55,145.08


#### Observe the numbers for the `weights`

In [12]:
model.get_weights()

[array([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]], dtype=float32),
 array([0., 0., 0.], dtype=float32),
 array([[-1.198971 ],
        [-0.8542514],
        [-0.5962893]], dtype=float32),
 array([0.], dtype=float32)]

#### Predictions vs Reality

> 1. Calculate the Predicted Accidents and
> 2. Compare it with the Real Total Accidents

#### `fit()` the `model` and compare again

In [10]:
model.compile(loss='mse', metrics=['mse'])

In [11]:
model.fit(X, y, epochs=500, verbose=1)

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78

Epoch 88/500
Epoch 89/500
Epoch 90/500
Epoch 91/500
Epoch 92/500
Epoch 93/500
Epoch 94/500
Epoch 95/500
Epoch 96/500
Epoch 97/500
Epoch 98/500
Epoch 99/500
Epoch 100/500
Epoch 101/500
Epoch 102/500
Epoch 103/500
Epoch 104/500
Epoch 105/500
Epoch 106/500
Epoch 107/500
Epoch 108/500
Epoch 109/500
Epoch 110/500
Epoch 111/500
Epoch 112/500
Epoch 113/500
Epoch 114/500
Epoch 115/500
Epoch 116/500
Epoch 117/500
Epoch 118/500
Epoch 119/500
Epoch 120/500
Epoch 121/500
Epoch 122/500
Epoch 123/500
Epoch 124/500
Epoch 125/500
Epoch 126/500
Epoch 127/500
Epoch 128/500
Epoch 129/500
Epoch 130/500
Epoch 131/500
Epoch 132/500
Epoch 133/500
Epoch 134/500
Epoch 135/500
Epoch 136/500
Epoch 137/500
Epoch 138/500
Epoch 139/500
Epoch 140/500
Epoch 141/500
Epoch 142/500
Epoch 143/500
Epoch 144/500
Epoch 145/500
Epoch 146/500
Epoch 147/500
Epoch 148/500
Epoch 149/500
Epoch 150/500
Epoch 151/500
Epoch 152/500
Epoch 153/500
Epoch 154/500
Epoch 155/500
Epoch 156/500
Epoch 157/500
Epoch 158/500
Epoch 159/500
Epoc

Epoch 175/500
Epoch 176/500
Epoch 177/500
Epoch 178/500
Epoch 179/500
Epoch 180/500
Epoch 181/500
Epoch 182/500
Epoch 183/500
Epoch 184/500
Epoch 185/500
Epoch 186/500
Epoch 187/500
Epoch 188/500
Epoch 189/500
Epoch 190/500
Epoch 191/500
Epoch 192/500
Epoch 193/500
Epoch 194/500
Epoch 195/500
Epoch 196/500
Epoch 197/500
Epoch 198/500
Epoch 199/500
Epoch 200/500
Epoch 201/500
Epoch 202/500
Epoch 203/500
Epoch 204/500
Epoch 205/500
Epoch 206/500
Epoch 207/500
Epoch 208/500
Epoch 209/500
Epoch 210/500
Epoch 211/500
Epoch 212/500
Epoch 213/500
Epoch 214/500
Epoch 215/500
Epoch 216/500
Epoch 217/500
Epoch 218/500
Epoch 219/500
Epoch 220/500
Epoch 221/500
Epoch 222/500
Epoch 223/500
Epoch 224/500
Epoch 225/500
Epoch 226/500
Epoch 227/500
Epoch 228/500
Epoch 229/500
Epoch 230/500
Epoch 231/500
Epoch 232/500
Epoch 233/500
Epoch 234/500
Epoch 235/500
Epoch 236/500
Epoch 237/500
Epoch 238/500
Epoch 239/500
Epoch 240/500
Epoch 241/500
Epoch 242/500
Epoch 243/500
Epoch 244/500
Epoch 245/500
Epoch 

Epoch 262/500
Epoch 263/500
Epoch 264/500
Epoch 265/500
Epoch 266/500
Epoch 267/500
Epoch 268/500
Epoch 269/500
Epoch 270/500
Epoch 271/500
Epoch 272/500
Epoch 273/500
Epoch 274/500
Epoch 275/500
Epoch 276/500
Epoch 277/500
Epoch 278/500
Epoch 279/500
Epoch 280/500
Epoch 281/500
Epoch 282/500
Epoch 283/500
Epoch 284/500
Epoch 285/500
Epoch 286/500
Epoch 287/500
Epoch 288/500
Epoch 289/500
Epoch 290/500
Epoch 291/500
Epoch 292/500
Epoch 293/500
Epoch 294/500
Epoch 295/500
Epoch 296/500
Epoch 297/500
Epoch 298/500
Epoch 299/500
Epoch 300/500
Epoch 301/500
Epoch 302/500
Epoch 303/500
Epoch 304/500
Epoch 305/500
Epoch 306/500
Epoch 307/500
Epoch 308/500
Epoch 309/500
Epoch 310/500
Epoch 311/500
Epoch 312/500
Epoch 313/500
Epoch 314/500
Epoch 315/500
Epoch 316/500
Epoch 317/500
Epoch 318/500
Epoch 319/500
Epoch 320/500
Epoch 321/500
Epoch 322/500
Epoch 323/500
Epoch 324/500
Epoch 325/500
Epoch 326/500
Epoch 327/500
Epoch 328/500
Epoch 329/500
Epoch 330/500
Epoch 331/500
Epoch 332/500
Epoch 

Epoch 349/500
Epoch 350/500
Epoch 351/500
Epoch 352/500
Epoch 353/500
Epoch 354/500
Epoch 355/500
Epoch 356/500
Epoch 357/500
Epoch 358/500
Epoch 359/500
Epoch 360/500
Epoch 361/500
Epoch 362/500
Epoch 363/500
Epoch 364/500
Epoch 365/500
Epoch 366/500
Epoch 367/500
Epoch 368/500
Epoch 369/500
Epoch 370/500
Epoch 371/500
Epoch 372/500
Epoch 373/500
Epoch 374/500
Epoch 375/500
Epoch 376/500
Epoch 377/500
Epoch 378/500
Epoch 379/500
Epoch 380/500
Epoch 381/500
Epoch 382/500
Epoch 383/500
Epoch 384/500
Epoch 385/500
Epoch 386/500
Epoch 387/500
Epoch 388/500
Epoch 389/500
Epoch 390/500
Epoch 391/500
Epoch 392/500
Epoch 393/500
Epoch 394/500
Epoch 395/500
Epoch 396/500
Epoch 397/500
Epoch 398/500
Epoch 399/500
Epoch 400/500
Epoch 401/500
Epoch 402/500
Epoch 403/500
Epoch 404/500
Epoch 405/500
Epoch 406/500
Epoch 407/500
Epoch 408/500
Epoch 409/500
Epoch 410/500
Epoch 411/500
Epoch 412/500
Epoch 413/500
Epoch 414/500
Epoch 415/500
Epoch 416/500
Epoch 417/500
Epoch 418/500
Epoch 419/500
Epoch 

Epoch 436/500
Epoch 437/500
Epoch 438/500
Epoch 439/500
Epoch 440/500
Epoch 441/500
Epoch 442/500
Epoch 443/500
Epoch 444/500
Epoch 445/500
Epoch 446/500
Epoch 447/500
Epoch 448/500
Epoch 449/500
Epoch 450/500
Epoch 451/500
Epoch 452/500
Epoch 453/500
Epoch 454/500
Epoch 455/500
Epoch 456/500
Epoch 457/500
Epoch 458/500
Epoch 459/500
Epoch 460/500
Epoch 461/500
Epoch 462/500
Epoch 463/500
Epoch 464/500
Epoch 465/500
Epoch 466/500
Epoch 467/500
Epoch 468/500
Epoch 469/500
Epoch 470/500
Epoch 471/500
Epoch 472/500
Epoch 473/500
Epoch 474/500
Epoch 475/500
Epoch 476/500
Epoch 477/500
Epoch 478/500
Epoch 479/500
Epoch 480/500
Epoch 481/500
Epoch 482/500
Epoch 483/500
Epoch 484/500
Epoch 485/500
Epoch 486/500
Epoch 487/500
Epoch 488/500
Epoch 489/500
Epoch 490/500
Epoch 491/500
Epoch 492/500
Epoch 493/500
Epoch 494/500
Epoch 495/500
Epoch 496/500
Epoch 497/500
Epoch 498/500
Epoch 499/500
Epoch 500/500


<keras.callbacks.History at 0x28d48be86d0>

##### Observe the numbers for the `weights`

##### Predictions vs Reality

> 1. Calculate the Predicted Accidents and
> 2. Compare it with the Real Total Accidents

In [12]:
y_pred = model.predict(X)

In [13]:
dfsel = df[['total']].copy()
dfsel['pred_zeros_after_fit'] = y_pred
dfsel.head()

Unnamed: 0_level_0,total,pred_zeros_after_fit
abbrev,Unnamed: 1_level_1,Unnamed: 2_level_1
AL,18.8,18.841764
AK,18.1,18.193848
AZ,18.6,18.252439
AR,22.4,22.01379
CA,12.0,13.098405


In [14]:
mse = ((dfsel.total - dfsel.pred_zeros_after_fit)**2).mean()
mse

1.19102670622752

### How to `kernel_initializer` the weights to 1?

In [15]:
dfsel['pred_ones_after_fit'] = y_pred
dfsel.head()

Unnamed: 0_level_0,total,pred_zeros_after_fit,pred_ones_after_fit
abbrev,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AL,18.8,18.841764,18.841764
AK,18.1,18.193848,18.193848
AZ,18.6,18.252439,18.252439
AR,22.4,22.01379,22.01379
CA,12.0,13.098405,13.098405


In [16]:
mse = ((dfsel.total - dfsel.pred_ones_after_fit)**2).mean()
mse

1.19102670622752

### How to `kernel_initializer` the weights to `glorot_uniform` (default)?

## Play with the Activation Function

> - https://keras.io/api/layers/activations/

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/IHZwWFHWa-w?start=558" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

### Use `sigmoid` activation in last layer

In [17]:
model = Sequential()
model.add(layer=Input(shape=(6,)))
model.add(layer=Dense(units=3, kernel_initializer='glorot_uniform'))
model.add(layer=Dense(units=1, activation='sigmoid'))

In [18]:
model.compile(loss='mse', metrics=['mse'])

#### `fit()` the Model

In [19]:
model.fit(X, y, epochs=500, verbose=0)

<keras.callbacks.History at 0x28d4a0120d0>

#### Predictions vs Reality

> 1. Calculate the Predicted Accidents and
> 2. Compare it with the Real Total Accidents

In [20]:
y_pred = model.predict(X)

In [21]:
dfsel['pred_sigmoid'] = y_pred
dfsel.head()

Unnamed: 0_level_0,total,pred_zeros_after_fit,pred_ones_after_fit,pred_sigmoid
abbrev,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AL,18.8,18.841764,18.841764,1.0
AK,18.1,18.193848,18.193848,1.0
AZ,18.6,18.252439,18.252439,1.0
AR,22.4,22.01379,22.01379,1.0
CA,12.0,13.098405,13.098405,1.0


In [22]:
mse = ((dfsel.total - dfsel.pred_sigmoid)**2).mean()
mse

235.40764705882347

#### Observe the numbers for the `weights`

> - Have they changed?

In [None]:
model.get_weights()

### Use `linear` activation in last layer

### Use `tanh` activation in last layer

### Use `relu` activation in last layer

### How are the predictions changing? Why?

## Optimizer

> - https://keras.io/api/optimizers/#available-optimizers

Optimizers comparison in GIF → https://mlfromscratch.com/optimizers-explained/#adam

Tesla's Neural Network Models is composed of 48 models trainned in 70.000 hours of GPU → https://tesla.com/ai

1 Year with a 8 GPU Computer → https://twitter.com/thirdrowtesla/status/1252723358342377472

### Use Gradient Descent `SGD`

In [None]:
model = Sequential()
model.add(layer=Input(shape=(6,)))
model.add(layer=Dense(units=3, kernel_initializer='glorot_uniform'))
model.add(layer=Dense(units=1, activation='sigmoid'))

#### `compile()` the model

In [None]:
model.compile(optimizer='sgd', loss='mse', metrics=['mse'])

#### `fit()` the Model

In [None]:
history = model.fit(X, y, epochs=500, verbose=0)

#### Predictions vs Reality

> 1. Calculate the Predicted Accidents and
> 2. Compare it with the Real Total Accidents

In [None]:
y_pred = model.predict(X)

In [None]:
dfsel['pred_gsd'] = y_pred
dfsel.head()

In [None]:
mse = ((dfsel.total - dfsel.pred_sgd)**2).mean()
mse

#### Observe the numbers for the `weights`

> - Have they changed?

In [None]:
model.get_weights()


#### View History

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()

### Use `ADAM`

### Use `RMSPROP`

### Does it take different times to get the best accuracy? Why?

## Loss Functions

> - https://keras.io/api/losses/

### `binary_crossentropy`

### `sparse_categorical_crossentropy`

### `mean_absolute_error`

### `mean_squared_error`

## In the end, what should be a feasible configuration of the Neural Network for this data?

# Common Errors

## The `kernel_initializer` Matters

## The `activation` Function Matters

## The `optimizer` Matters

## The Number of `epochs` Matters

## The `loss` Function Matters

## The Number of `epochs` Matters

# Neural Network's importance to find **Non-Linear Patterns** in the Data

> - The number of Neurons & Hidden Layers

https://towardsdatascience.com/beginners-ask-how-many-hidden-layers-neurons-to-use-in-artificial-neural-networks-51466afa0d3e

https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.87287&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

## Summary

- Mathematical Formula
- Weights / Kernel Initializer
- Loss Function
- Activation Function
- Optimizers

## What cannot you change arbitrarily of a Neural Network?

- Input Neurons
- Output Neurons
- Loss Functions
- Activation Functions