<a href="https://colab.research.google.com/github/albertopolini/Advanced-Machine-Learning/blob/main/1_Multi_layer_Fully_Connected_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://s3.amazonaws.com/keras.io/img/keras-logo-2018-large-1200.png" width="20%" />

## Keras: Deep Learning library for Theano and TensorFlow

>Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. 

>It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
ref: https://keras.io/

<a name="kaggle"></a>
### Kaggle Challenge Data

>The Otto Group is one of the world’s biggest e-commerce companies, A consistent analysis of the performance of products is crucial. However, due to diverse global infrastructure, many identical products get classified differently.
For this competition, we have provided a dataset with 93 features for more than 200,000 products. The objective is to build a predictive model which is able to distinguish between our main product categories. 
Each row corresponds to a single product. There are a total of 93 numerical features, which represent counts of different events. All features have been obfuscated and will not be defined any further.

https://www.kaggle.com/c/otto-group-product-classification-challenge/data

## Logistic Regression

This algorithm allows us to solve problems of classification (supervised learning). 

In fact, to estimate the dependent variable, now we make use of the so-called **logistic function** or **sigmoid**. 

It is precisely because of this feature we call this algorithm logistic regression.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Logistic-curve.svg/1200px-Logistic-curve.svg.png" width="50%" />

## Data Preparation

## Utility functions

Utility functions to load Kaggle Otto Group Challenge Data.

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils


def load_data(path, train=True):
    """Load data from a CSV File
    
    Parameters
    ----------
    path: str
        The path to the CSV file
        
    train: bool (default True)
        Decide whether or not data are *training data*.
        If True, some random shuffling is applied.
        
    Return
    ------
    X: numpy.ndarray 
        The data as a multi dimensional array of floats
    ids: numpy.ndarray
        A vector of ids for each sample
    """
    text = pd.read_csv(path, encoding = "ISO-8859-2")
    df = pd.read_csv(path)
    X = df.values.copy()
    if train:
        np.random.shuffle(X)  
        X, labels = X[:, 1:-1].astype(np.float32), X[:, -1]
        return X, labels
    else:
        X, ids = X[:, 1:].astype(np.float32), X[:, 0].astype(str)
        return X, ids
        
        
def preprocess_data(X, scaler=None):
    """Preprocess input data by standardise features 
    by removing the mean and scaling to unit variance"""
    if not scaler:
        scaler = StandardScaler()
        scaler.fit(X)
    X = scaler.transform(X)
    return X, scaler


def preprocess_labels(labels, encoder=None, categorical=True):
    """Encode labels with values among 0 and `n-classes-1`"""
    if not encoder:
        encoder = LabelEncoder()
        encoder.fit(labels)
    y = encoder.transform(labels).astype(np.int32)
    if categorical:
        y = np_utils.to_categorical(y)
    return y, encoder

In [2]:
import numpy as np
import matplotlib.pyplot as plt

## Import data

In [3]:
url_train = 'https://raw.githubusercontent.com/leriomaggio/deep-learning-keras-tensorflow/master/data/kaggle_ottogroup/train.csv'
url_test = 'https://raw.githubusercontent.com/leriomaggio/deep-learning-keras-tensorflow/master/data/kaggle_ottogroup/test.csv'
X_train, labels = load_data(url_train, train=True)

print("Training set data")
print(X_train.shape)

print("Training set labels")
print(labels)

Training set data
(61878, 93)
Training set labels
['Class_6' 'Class_2' 'Class_3' ... 'Class_2' 'Class_3' 'Class_6']


## Preprocess data

In [4]:
X_train, labels = load_data(url_train, train=True)
X_train, scaler = preprocess_data(X_train)
Y_train, encoder = preprocess_labels(labels)

X_test, ids = load_data(url_test, train=False)
X_test, _ = preprocess_data(X_test, scaler)

nb_classes = Y_train.shape[1]
print(nb_classes, 'classes')

dims = X_train.shape[1]
print(dims, 'dims')

9 classes
93 dims


In [None]:
help(preprocess_data)
help(preprocess_labels)

Help on function preprocess_data in module __main__:

preprocess_data(X, scaler=None)
    Preprocess input data by standardise features 
    by removing the mean and scaling to unit variance

Help on function preprocess_labels in module __main__:

preprocess_labels(labels, encoder=None, categorical=True)
    Encode labels with values among 0 and `n-classes-1`



In [5]:
np.unique(labels)

array(['Class_1', 'Class_2', 'Class_3', 'Class_4', 'Class_5', 'Class_6',
       'Class_7', 'Class_8', 'Class_9'], dtype=object)

In [6]:
Y_train  # one-hot encoding

array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 1.],
       ...,
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 1.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

---

# Using Keras

In [7]:
from keras.models import Sequential, Input, Model
from keras.layers import Dense, Activation

In [8]:
#sequential api
dims = X_train.shape[1]
print(dims, 'dims')
print("Building model...")

nb_classes = Y_train.shape[1]
print(nb_classes, 'classes')

model = Sequential()
model.add(Dense(nb_classes, input_shape=(dims,), activation='gelu'))
model.add(Activation('softmax'))

model.summary()

model.compile(optimizer='adam', loss='categorical_crossentropy')

model.fit(X_train, Y_train)

93 dims
Building model...
9 classes
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 9)                 846       
_________________________________________________________________
activation (Activation)      (None, 9)                 0         
Total params: 846
Trainable params: 846
Non-trainable params: 0
_________________________________________________________________


<keras.callbacks.History at 0x7f79db1c32d0>

Simplicity is pretty impressive right? :)

Now lets understand:
<pre>The core data structure of Keras is a <b>model</b>, a way to organize layers. The main type of model is the <b>Sequential</b> model, a linear stack of layers.</pre>


What we did here is stacking a Fully Connected (<b>Dense</b>) layer of trainable weights from the input to the output and an <b>Activation</b> layer on top of the weights layer.

In [9]:
# "Model" api
dims = X_train.shape[1]
print(dims, 'dims')
print("Building model...")

nb_classes = Y_train.shape[1]
print(nb_classes, 'classes')

inputs = Input(shape=(dims,))
x = Dense(nb_classes, activation='sigmoid')(inputs)
output = Activation('softmax')(x)
model = Model(inputs, output)

model.summary()

model.compile(optimizer='sgd', loss='categorical_crossentropy')
model.fit(X_train, Y_train)

93 dims
Building model...
9 classes
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 93)]              0         
_________________________________________________________________
dense_1 (Dense)              (None, 9)                 846       
_________________________________________________________________
activation_1 (Activation)    (None, 9)                 0         
Total params: 846
Trainable params: 846
Non-trainable params: 0
_________________________________________________________________


<keras.callbacks.History at 0x7f79d5ce28d0>

>The **Model** groups layers into an object with training and inference features. In Functional model,
part or all of the inputs directly connected to the output layer

#### Dense Layer

```python
from keras.layers.core import Dense

Dense(units, activation=None, use_bias=True, 
      kernel_initializer='glorot_uniform', bias_initializer='zeros', 
      kernel_regularizer=None, bias_regularizer=None, 
      activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

* `units`: int > 0.

* `init`: name of initialization function for the weights of the layer (see initializations), or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don't pass a weights argument.

* `activation`: name of activation function to use (see activations), or alternatively, elementwise Theano function. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).

* `weights`: list of Numpy arrays to set as initial weights. The list should have 2 elements, of shape (input_dim, output_dim) and (output_dim,) for weights and biases respectively.

* `kernel_regularizer`: instance of WeightRegularizer (eg. L1 or L2 regularization), applied to the main weights matrix.

* `bias_regularizer`: instance of WeightRegularizer, applied to the bias.

* `activity_regularizer`: instance of ActivityRegularizer, applied to the network output.

* `kernel_constraint`: instance of the constraints module (eg. maxnorm, nonneg), applied to the main weights matrix.

* `bias_constraint`: instance of the constraints module, applied to the bias.

* `use_bias`: whether to include a bias (i.e. make the layer affine rather than linear).

## (some) others `keras.core.layers`

* `keras.layers.core.Flatten()`
* `keras.layers.core.Reshape(target_shape)`
* `keras.layers.core.Permute(dims)`

```python
model = Sequential()
model.add(Permute((2, 1), input_shape=(10, 64)))
# now: model.output_shape == (None, 64, 10)
# note: `None` is the batch dimension
```

* `keras.layers.core.Lambda(function, output_shape=None, arguments=None)`
* `keras.layers.core.ActivityRegularization(l1=0.0, l2=0.0)`

<img src="https://github.com/leriomaggio/deep-learning-keras-tensorflow/blob/master/imgs/dl_overview.png?raw=true" >

Credits: Yam Peleg ([@Yampeleg](https://twitter.com/yampeleg))

##### Activation

```python
from keras.layers.core import Activation

Activation(activation)
```

**Supported Activations** : softmax, elu, relu, tanh, sigmoid, linear, ... https://keras.io/api/layers/activations

**Advanced Activations**: https://keras.io/api/layers/activations/#about-advanced-activation-layers

# Multi-Layer Fully Connected Networks

<img src="https://github.com/leriomaggio/deep-learning-keras-tensorflow/blob/master/imgs/MLP.png?raw=true" width="45%">

#### Forward and Backward Propagation

<img src="https://github.com/leriomaggio/deep-learning-keras-tensorflow/blob/master/imgs/backprop.png?raw=true" width="50%">

**Q:** _How hard can it be to build a Multi-Layer Fully-Connected Network with keras?_

**A:** _It is basically the same, just add more layers!_

In [10]:
model = Sequential()
model.add(Dense(1000, input_shape=(dims,), activation = "relu"))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(optimizer='sgd', loss='categorical_crossentropy')

model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 1000)              94000     
_________________________________________________________________
dense_3 (Dense)              (None, 9)                 9009      
_________________________________________________________________
activation_2 (Activation)    (None, 9)                 0         
Total params: 103,009
Trainable params: 103,009
Non-trainable params: 0
_________________________________________________________________


In [11]:
model.fit(X_train, Y_train, epochs=20, 
          batch_size=128, verbose=True)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f79d741fb50>

---

# Your Turn!

## Hands On - Keras Fully Connected


Take couple of minutes and try to play with the number of layers and the number of parameters in the layers. 

In [35]:
model = Sequential()
model.add(Dense(100, input_shape=(dims,)))

model.add(Dense(80, input_shape=(dims,)))
model.add(Activation('relu'))

model.add(Dense(50, input_shape=(dims,)))
model.add(Activation('relu'))

model.add(Dense(20, input_shape=(dims,)))
model.add(Activation('relu'))

# FC@80 -> FC@50

model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy')

model.summary()

Model: "sequential_13"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_59 (Dense)             (None, 100)               9400      
_________________________________________________________________
dense_60 (Dense)             (None, 80)                8080      
_________________________________________________________________
activation_36 (Activation)   (None, 80)                0         
_________________________________________________________________
dense_61 (Dense)             (None, 50)                4050      
_________________________________________________________________
activation_37 (Activation)   (None, 50)                0         
_________________________________________________________________
dense_62 (Dense)             (None, 20)                1020      
_________________________________________________________________
activation_38 (Activation)   (None, 20)              

In [36]:
model.fit(X_train, Y_train, epochs=20, batch_size=128, verbose=True)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f79cf7d9310>

Building a question answering system, an image classification model, a Neural Turing Machine, a word2vec embedder or any other model is just as fast. The ideas behind deep learning are simple, so why should their implementation be painful?