# Neural Networks & Deep Learning
### Marc Pomar

## 1. Networks types and concept definitions
Here are some basic network architectures you must know before starting. However in most cases architectures are combined to make the most of each network.

* **Multilayer perceptron (MLPs):** Fully connected network, aka deep feedfoward networks. Those networks are common in simple logistic an linear regression problems. Not optimal decision for sequentials and multi-dimensional data patterns, require lots of parameters to barely work.

<img src="https://raw.githubusercontent.com/ledell/sldm4-h2o/master/mlp_network.png" width="300">

* **Recurrent neural networks (RNNs):** Best for sequential data input (time series, audio classification, voice recognition, etc.). Those networks remember past states or have some kind of short term and long term past memory.

<img src="https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/image_folder_6/recurrent.jpg" width="300">

* **Convolutional Neural Networks (CNNs):** Best use case for image tasks or multidimensional data (images, videos, deep stereo images, etc.), those networks use the [convolution](https://en.wikipedia.org/wiki/Convolution) operation to learn feature kernels on data (specially images). Think of it as decomposing a big image classification problem in small parts, like recognizing a face can be decomposed in recognizing eye, mouth, nose, etc.

<img src="https://i0.wp.com/vinodsblog.com/wp-content/uploads/2018/10/CNN-2.png?resize=1300%2C479&ssl=1" width="300">

### 1.1 Concepts to know

* **Loss function / Objective:** The goal when training a neural netwokr is to reduce the output value of the loss function. This is a powerful indicative that your network is learning, but remember to evaluate the network with test data to ensure quality.

Examples of loss functions:
- mean_squared_error, mean_absolute_error, mean_absolute_percentage_error, mean_squared_logarithmic_error, squared_hinge, hinge, logcosh, binary_crossentropy, categorical_crossentropy, cosine_proximity, etc.

- https://keras.io/losses/

<div style="text-align:center">
<img src="https://www.researchgate.net/profile/Victor_Suarez-Paniagua/publication/334643403/figure/fig3/AS:783985458302977@1563928107841/The-training-stage-of-a-Neural-Network-where-the-loss-function-is-decreasing-in-each.png" width="230" style="display:inline-block">

<img src="https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2018/11/Line-Plots-of-Sparse-Cross-Entropy-Loss-and-Classification-Accuracy-over-Training-Epochs-on-the-Blobs-Multi-Class-Classification-Problem.png" width="300" style="display:inline-block">
</div>


* **Optimizer:** An iterative method for optimizing an objective/loss function. Basically, the algorithm that analyzes actual network neuron parameters and tune those to adjust for better output 
    - Stochastic Gradient Descent: https://en.wikipedia.org/wiki/Stochastic_gradient_descent
    - AdaGrad: http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
    - Adam/Adamax: https://arxiv.org/abs/1412.6980v8
    
Optimizer algorithms visualized https://bl.ocks.org/EmilienDupont/aaf429be5705b219aaaf8d691e27ca87

More optimizers implementations info here: 
```
- https://keras.io/optimizers/
- https://towardsdatascience.com/why-visualize-gradient-descent-optimization-algorithms-a393806eee2
- https://www.pyimagesearch.com/2016/10/17/stochastic-gradient-descent-sgd-with-python/
```

    
<img src="https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2017/05/Comparison-of-Adam-to-Other-Optimization-Algorithms-Training-a-Multilayer-Perceptron.png" width="300">


* **Regularizer:** The role of the regularizer is to ensure that the trained model generalizes to new data by not overfitting the trained network. 
    - https://towardsdatascience.com/over-fitting-and-regularization-64d16100f45c

<img src="https://www.bogotobogo.com/python/scikit-learn/images/NeuralNetwork7-Overfitting/Overfitting.png" width="400">


## 2. Starting small, a neuron

<img src="https://upload.wikimedia.org/wikipedia/commons/1/10/Blausen_0657_MultipolarNeuron.png" width="400"/>

<img src="https://draftin.com/images/34832?token=vQiHNdPnUSPiPJcJcgobMGedDJRvgguccVapCN76gZnxqVQIKczfq4BqUQ06bWdVXnabb3tScv_04nigKqMZjS4" width="300"/>

**This is our model of a neuron applied to neural networks:**
<img src="https://miro.medium.com/max/880/1*vGj29ZBD1kH1kDlGQspPxA.png" width="400"/>

1. It takes the inputs and multiplies them by their weights,
2. then it sums them up,
3. after that it applies the activation function to the sum.

<img src="https://miro.medium.com/max/1066/1*7bStIbUZ3vOEYFx92MOnMg.png" width="300"/>

We can extend to **neural networks**

<img src="https://miro.medium.com/max/1306/1*ex8Dh_kowIrI-UPun6RvVw.png" width="300"/>

## Refs:
- https://www.youtube.com/watch?v=tIeHLnjs5U8&feature=emb_logo
- https://becominghuman.ai/understanding-neural-networks-1-the-concept-of-neurons-287be36d40f
- https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6