# Lesson 5

## DL Top View

### What is DL?

<img width="80%" src="img/AI_scheme.png">

<img width="30%" src="img/NN.png">


### Why DL models now?
<img width="30%" src="img/MarkI.jpg">
<img width="80%" src="img/ML-timeline.png">

**In 2012 we found the convergence of three fields:**

* **Data**
* **Computational Power**
* **Mathematical Modelling**

### Some successful cases

* **Tesla:** https://www.tesla.com/autopilot
* **AlphaZero:** https://deepmind.com/blog/alphazero-shedding-new-light-grand-games-chess-shogi-and-go/
* **BioMind:** https://biomind.ai/product-biomind/
* **FaceNet** https://github.com/davidsandberg/facenet
* **Google Speech to Text** which implies sound separation https://cloud.google.com/speech-to-text/?hl=es
5

### Problem taxonomy

* **Supervised** such as MNIST dataset.
    * Classification: $\mathbb{R}^m \longrightarrow \{0,1,...,n\}$
    * Regression: $\mathbb{R}^m \longrightarrow \mathbb{R}^n$
* **Unsupervised** such as client segmentation.
    * Clustering
    * Dimensionality Reduction
    * Compressing/Encoding

* **Reinforcement** such as learning how to play chess.

<img width="50%" src="img/ml_problems.png">

### Data Types
We have to deal with many different datatypes, that must be converted to numeric tensors:

* **Structured** such as tabular data gathered from sensors.
* **Non Structured** such as audio, image or text.
* **Graphs** such as Logistic Schemes. 
* **Other Structures** such as time series.

Of course those datatypes can be mixed, for instance imagine a logistic graph that gets updated every day.

## Load and preprocessing the data with TF
### Tabular data

In [1]:
import os

import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

import warnings
warnings.filterwarnings('ignore')

In [2]:
assert tf.__version__ == '2.0.0-alpha0'

In [3]:
! rm -r data

In [4]:
! ls; mkdir data;

img  Lesson5.ipynb  mnist_data


In [5]:
! cd data; ls

In [6]:
! cd data; wget http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/adult.data

--2019-11-14 22:56:33--  http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/adult.data
Resolviendo mlr.cs.umass.edu (mlr.cs.umass.edu)... 128.119.246.96
Conectando con mlr.cs.umass.edu (mlr.cs.umass.edu)[128.119.246.96]:80... conectado.
Petición HTTP enviada, esperando respuesta... 200 OK
Longitud: 3974305 (3,8M) [text/plain]
Guardando como: “adult.data”


2019-11-14 22:56:38 (813 KB/s) - “adult.data” guardado [3974305/3974305]



In [7]:
! cd data; wget http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/adult.test

--2019-11-14 22:56:38--  http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/adult.test
Resolviendo mlr.cs.umass.edu (mlr.cs.umass.edu)... 128.119.246.96
Conectando con mlr.cs.umass.edu (mlr.cs.umass.edu)[128.119.246.96]:80... conectado.
Petición HTTP enviada, esperando respuesta... 200 OK
Longitud: 2003153 (1,9M) [text/plain]
Guardando como: “adult.test”


2019-11-14 22:56:42 (575 KB/s) - “adult.test” guardado [2003153/2003153]



Initialize column names

In [8]:
cols = ["age",
"workclass",
"fnlwgt",
"education",
"education-num",
"marital-status",
"occupation",
"relationship",
"race",
"sex",
"capital-gain",
"capital-loss",
"hours-per-week",
"native-country",
"label"]

Load Pandas DataFrame

In [9]:
train_df = pd.read_csv("data/adult.data",header=None,index_col=None,names=cols)
test_df = pd.read_csv("data/adult.test",header=None,index_col=None,names=cols)
del train_df["fnlwgt"],test_df["fnlwgt"]

test_df = test_df.drop(test_df.index[0])

train_df["label"] = train_df["label"].apply(lambda x: 0 if x == " <=50K" else 1)
test_df["label"] = test_df["label"].apply(lambda x: 0 if x == " <=50K." else 1)

test_df["age"] = test_df["age"].apply(lambda x: int(x))
test_df["education-num"] = test_df["education-num"].apply(lambda x: int(x))

In [10]:
features = cols[:-1]
label = cols[-1]

In [11]:
train_df.head()

Unnamed: 0,age,workclass,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,label
0,39,State-gov,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,0
1,50,Self-emp-not-inc,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,0
2,38,Private,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,0
3,53,Private,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,0
4,28,Private,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,0


In [12]:
train_df.dtypes

age                int64
workclass         object
education         object
education-num      int64
marital-status    object
occupation        object
relationship      object
race              object
sex               object
capital-gain       int64
capital-loss       int64
hours-per-week     int64
native-country    object
label              int64
dtype: object

Convert `DataFrame` to a Tensor DataSet

In [13]:
# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, label_name, shuffle=True, batch_size=32):
    dataframe = dataframe.copy()
    labels = dataframe.pop(label_name)
    ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
    if shuffle:
        ds = ds.shuffle(buffer_size=len(dataframe))
    ds = ds.batch(batch_size)
    return ds

In [14]:
batch_size = 5 # A small batch sized is used for demonstration purposes
train_ds = df_to_dataset(train_df, "label", batch_size=batch_size)

# Batch example
batch_example = next(iter(train_ds))[0]
batch_example

{'age': <tf.Tensor: id=28, shape=(5,), dtype=int32, numpy=array([28, 39, 40, 37, 32], dtype=int32)>,
 'workclass': <tf.Tensor: id=40, shape=(5,), dtype=string, numpy=
 array([b' Private', b' Private', b' Private', b' Private',
        b' Federal-gov'], dtype=object)>,
 'education': <tf.Tensor: id=31, shape=(5,), dtype=string, numpy=
 array([b' HS-grad', b' Prof-school', b' 10th', b' Prof-school',
        b' Assoc-voc'], dtype=object)>,
 'education-num': <tf.Tensor: id=32, shape=(5,), dtype=int32, numpy=array([ 9, 15,  6, 15, 11], dtype=int32)>,
 'marital-status': <tf.Tensor: id=34, shape=(5,), dtype=string, numpy=
 array([b' Married-civ-spouse', b' Married-civ-spouse', b' Divorced',
        b' Married-civ-spouse', b' Never-married'], dtype=object)>,
 'occupation': <tf.Tensor: id=36, shape=(5,), dtype=string, numpy=
 array([b' Craft-repair', b' Prof-specialty', b' Craft-repair',
        b' Prof-specialty', b' Prof-specialty'], dtype=object)>,
 'relationship': <tf.Tensor: id=38, shape=(5

Objects are strings so we need to convert them to numeric data before creating a tensor (tensorflow array)

In [15]:
sex_feature_column = tf.feature_column.categorical_column_with_vocabulary_list(
        key="sex",
        vocabulary_list=[" Male", " Female"])

In [16]:
# This function is just to test our parsing or preprocessing functions
def demo(feature_column,example_batch):
    feature_layer = tf.keras.layers.DenseFeatures(feature_column)
    print(feature_layer(example_batch))

In [17]:
sex_one_hot = tf.feature_column.indicator_column(sex_feature_column)
demo(sex_one_hot,batch_example)

W1114 22:56:43.594201 140645411436352 deprecation.py:323] From /home/josem/my_envs/fastai/lib/python3.6/site-packages/tensorflow/python/ops/lookup_ops.py:1347: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1114 22:56:43.600770 140645411436352 deprecation.py:323] From /home/josem/my_envs/fastai/lib/python3.6/site-packages/tensorflow/python/feature_column/feature_column_v2.py:4307: IndicatorColumn._variable_shape (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
W1114 22:56:43.601675 140645411436352 deprecation.py:323] From /home/josem/my_envs/fastai/lib/python3.6/site-packages/tensorflow/python/feature_column/feature_column_v2.py:4362: VocabularyListCategoricalColumn._num_buckets (from tensorflow.p

tf.Tensor(
[[1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]], shape=(5, 2), dtype=float32)


Now we can do that with all the different column types ...

In [18]:
feature_columns = []

# numeric cols
for header in ["age", "capital-gain","capital-loss","hours-per-week",]:
    feature_columns.append(tf.feature_column.numeric_column(header))

In [19]:
# bucketized cols
age_buckets = tf.feature_column.bucketized_column(feature_columns[0], 
                                                  boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
feature_columns.append(age_buckets)

In [20]:
cat_features = ["workclass",
"education",
"education-num",
"marital-status",
"occupation",
"relationship",
"race",
"sex",
"native-country"]

In [21]:
def generate_cat_feature_columns(cat_feature_name,train_df,emb_dim=None):
    unique_vals = train_df[cat_feature_name].unique().tolist()
    feature_column = tf.feature_column.categorical_column_with_vocabulary_list(
        key=cat_feature_name,
        vocabulary_list=unique_vals)
    if emb_dim == None:
        one_hot = tf.feature_column.indicator_column(feature_column)
        return one_hot
    else:
        one_hot = tf.feature_column.indicator_column(feature_column)
        emb_feature = tf.feature_column.embedding_column(feature_column, dimension=emb_dim)
        return one_hot,emb_feature

In [22]:
generate_cat_feature_columns("sex",train_df,emb_dim=None)

IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='sex', vocabulary_list=(' Male', ' Female'), dtype=tf.string, default_value=-1, num_oov_buckets=0))

In [23]:
# indicator and embedding cols
for cat in cat_features:
    feature_columns = feature_columns + list(generate_cat_feature_columns(cat,train_df,emb_dim=8))

In [24]:
# crossed cols
# feature_columns[19] is the one_hot embedding for sex_column
crossed_feature = tf.feature_column.crossed_column([age_buckets, sex_feature_column], hash_bucket_size=1000)
crossed_feature = tf.feature_column.indicator_column(crossed_feature)
feature_columns.append(crossed_feature)

Create a `Feature Layer`

In [25]:
feature_layer = tf.keras.layers.DenseFeatures(feature_columns)

In [26]:
batch_size = 1000
train_ds = df_to_dataset(train_df, "label", batch_size=batch_size)

In [27]:
test_ds = df_to_dataset(test_df, "label", shuffle=False, batch_size=batch_size)

In [28]:
model = tf.keras.Sequential([
    feature_layer, ## First Layer of the network!!!
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(train_ds,
          #validation_data=val_ds,
          epochs=10)

Epoch 1/10


W1114 22:56:44.177774 140645411436352 deprecation.py:323] From /home/josem/my_envs/fastai/lib/python3.6/site-packages/tensorflow/python/feature_column/feature_column_v2.py:2758: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1114 22:56:44.246205 140645411436352 deprecation.py:323] From /home/josem/my_envs/fastai/lib/python3.6/site-packages/tensorflow/python/feature_column/feature_column_v2.py:4362: CrossedColumn._num_buckets (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7fea346fbf60>

In [29]:
loss, accuracy = model.evaluate(test_ds)
print("Accuracy", accuracy)

Accuracy 0.8233524


### Try to do the same data load with the titanic dataset
https://www.kaggle.com/c/titanic

### Image Data

In [30]:
cwd = os.getcwd()

In [31]:
data_root_orig = f"{cwd}/mnist_data"

In [32]:
all_image_paths_w_labels = []
for item in os.listdir(data_root_orig):
    for item2 in os.listdir(f"{data_root_orig}/{item}"):
        for image in os.listdir(f"{data_root_orig}/{item}/{item2}"):
            all_image_paths_w_labels.append([f"{data_root_orig}/{item}/{item2}/{image}",item,item2])

In [33]:
all_paths_df = pd.DataFrame(all_image_paths_w_labels,columns=["path","model_phase","label"])

In [34]:
all_paths_df.head()

Unnamed: 0,path,model_phase,label
0,/home/josem/Escritorio/cursBS2/Daivec-ML-cours...,train,4
1,/home/josem/Escritorio/cursBS2/Daivec-ML-cours...,train,4
2,/home/josem/Escritorio/cursBS2/Daivec-ML-cours...,train,4
3,/home/josem/Escritorio/cursBS2/Daivec-ML-cours...,train,4
4,/home/josem/Escritorio/cursBS2/Daivec-ML-cours...,train,4


In [35]:
train_df = all_paths_df[all_paths_df["model_phase"] == "train"]
test_df = all_paths_df[all_paths_df["model_phase"] == "test"]

In [36]:
def preprocess_image(image):
    image = tf.image.decode_jpeg(image, channels=1)
    image = tf.image.resize(image, [28, 28])
    image /= 255.0
    # label preprocessing if needed
    return image

def load_and_preprocess_image(path):
    image = tf.io.read_file(path)
    return preprocess_image(image)

def preprocess_label(label):
    label = tf.cast(label,tf.int64)
    label = tf.one_hot(label, 10)
    return label

In [37]:
train_ds_images = tf.data.Dataset.from_tensor_slices(
    train_df["path"].values.tolist()).map(load_and_preprocess_image)

train_ds_labels = tf.data.Dataset.from_tensor_slices(
    train_df["label"].values.astype(int).tolist()).map(preprocess_label)

train_ds = tf.data.Dataset.zip((train_ds_images, train_ds_labels))

train_ds

<ZipDataset shapes: ((28, 28, 1), (10,)), types: (tf.float32, tf.float32)>

In [38]:
test_ds_images = tf.data.Dataset.from_tensor_slices(
    test_df["path"].values.tolist()).map(load_and_preprocess_image)

test_ds_labels = tf.data.Dataset.from_tensor_slices(
    test_df["label"].values.astype(int).tolist()).map(preprocess_label)

test_ds = tf.data.Dataset.zip((test_ds_images, test_ds_labels))

test_ds

<ZipDataset shapes: ((28, 28, 1), (10,)), types: (tf.float32, tf.float32)>

In [39]:
BATCH_SIZE = 1000
AUTOTUNE = tf.data.experimental.AUTOTUNE

# Setting a shuffle buffer size as large as the dataset ensures that the data is
# completely shuffled.
ds = train_ds.shuffle(buffer_size=60000)
ds = ds.repeat()
ds = ds.batch(BATCH_SIZE)
# `prefetch` lets the dataset fetch batches, in the background while the model is training.
ds = ds.prefetch(buffer_size=AUTOTUNE)

In [40]:
image_batch, label_batch = next(iter(ds))

In [41]:
label_batch.numpy()[:10]

array([[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]], dtype=float32)

In [42]:
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32,3),
    tf.keras.layers.Conv2D(32,3),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(10, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(ds,
          epochs=3,
          steps_per_epoch=60000/BATCH_SIZE)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7fe9fc70e710>

In [43]:
# Setting a shuffle buffer size as large as the dataset ensures that the data is
# completely shuffled.
ds = test_ds.shuffle(buffer_size=60000)
ds = ds.repeat()
ds = ds.batch(BATCH_SIZE)
# `prefetch` lets the dataset fetch batches, in the background while the model is training.
ds = ds.prefetch(buffer_size=AUTOTUNE)

In [44]:
loss, accuracy = model.evaluate(ds,steps=10000/BATCH_SIZE)
print("Accuracy", accuracy)

Accuracy 0.9915701


### Try to do the same with the mnist-fashion dataset
https://www.kaggle.com/zalando-research/fashionmnist

## TF Datasets

In [45]:
!pip install tensorflow-datasets

[33mYou are using pip version 18.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [46]:
import tensorflow_datasets as tfds

# See all registered datasets
tfds.list_builders()

['abstract_reasoning',
 'bair_robot_pushing_small',
 'caltech101',
 'cats_vs_dogs',
 'celeb_a',
 'celeb_a_hq',
 'chexpert',
 'cifar10',
 'cifar100',
 'cifar10_corrupted',
 'cnn_dailymail',
 'coco2014',
 'colorectal_histology',
 'colorectal_histology_large',
 'cycle_gan',
 'diabetic_retinopathy_detection',
 'dsprites',
 'dtd',
 'dummy_dataset_shared_generator',
 'dummy_mnist',
 'emnist',
 'fashion_mnist',
 'flores',
 'glue',
 'groove',
 'higgs',
 'horses_or_humans',
 'image_label_folder',
 'imagenet2012',
 'imagenet2012_corrupted',
 'imdb_reviews',
 'iris',
 'kmnist',
 'lm1b',
 'lsun',
 'mnist',
 'moving_mnist',
 'multi_nli',
 'nsynth',
 'omniglot',
 'open_images_v4',
 'oxford_flowers102',
 'oxford_iiit_pet',
 'para_crawl',
 'quickdraw_bitmap',
 'rock_paper_scissors',
 'shapes3d',
 'smallnorb',
 'squad',
 'starcraft_video',
 'sun397',
 'svhn_cropped',
 'ted_hrlr_translate',
 'ted_multi_translate',
 'tf_flowers',
 'titanic',
 'ucf101',
 'voc2007',
 'wikipedia',
 'wmt15_translate',
 'wmt1

### Restart the kernel

In [47]:
import os

import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

import warnings
warnings.filterwarnings('ignore')

## Visualization methods for High Dimensional Data
We want to find the "latent" features in our data. Then we have two kinds of techniques:
* **Matrix factorization**: PCA (Principal Component Analysis), Linear Autoencoders, etc.
* **Neighbour Graphs**: Isomap, T-SNE, UMAP

We are going to use MNIST dataset which has 28*28 dimensions as our initial example, so we can see how the following algorithms perform:

In [48]:
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

### PCA

In [49]:
from sklearn.decomposition import PCA

In [50]:
x_train.shape

(60000, 28, 28)

In [51]:
data = tf.reshape(x_train,[60000,-1]).numpy()

In [52]:
pca = PCA(n_components=2)
pca.fit(data)  

PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)

In [53]:
plot_data = np.concatenate([pca.transform(data),y_train.reshape(-1,1)],axis=1)[:2000,:]

In [54]:
!pip install plotly

[33mYou are using pip version 18.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [55]:
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go

In [56]:
trace0 = go.Scatter(
    x = plot_data[:,0],
    y = plot_data[:,1],
    name = "class",
    hoveron = "points",
    mode = 'markers',
#     text = Target.unique(),
    showlegend = False,
    marker = dict(
        size = 8,
        color = plot_data[:,2],
        colorscale ='Jet',
        showscale = False,
        line = dict(
            width = 2,
            color = 'rgb(255, 255, 255)'
        ),
        opacity = 0.8
    )
)
data = [trace0]

layout = dict(title = 'PCA (Principal Component Analysis)',
              hovermode= 'closest',
              yaxis = dict(zeroline = False),
              xaxis = dict(zeroline = False),
              showlegend= True
             )

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='styled-scatter')

### T-sne

In [57]:
!pip install cmake --upgrade

Requirement already up-to-date: cmake in /home/josem/my_envs/fastai/lib/python3.6/site-packages (3.13.3)
[33mYou are using pip version 18.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [58]:
!pip install MulticoreTSNE

[33mYou are using pip version 18.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [59]:
data = tf.reshape(x_train,[60000,-1]).numpy()

In [60]:
try:
    from MulticoreTSNE import MulticoreTSNE as TSNE

    tsne = TSNE(n_jobs=4)
    Y = tsne.fit_transform(data)
except:
    from sklearn.manifold import TSNE
    
    tsne = TSNE()
    Y = tsne.fit_transform(data)

In [61]:
plot_data = np.concatenate([Y,y_train.reshape(-1,1)],axis=1)[:2000,:]

In [62]:
trace0 = go.Scatter(
    x = plot_data[:,0],
    y = plot_data[:,1],
    name = "class",
    hoveron = "points",
    mode = 'markers',
#     text = Target.unique(),
    showlegend = False,
    marker = dict(
        size = 8,
        color = plot_data[:,2],
        colorscale ='Jet',
        showscale = False,
        line = dict(
            width = 2,
            color = 'rgb(255, 255, 255)'
        ),
        opacity = 0.8
    )
)
data = [trace0]

layout = dict(title = 'T-SNE (T-distributed Stochastic Neighbor Embedding)',
              hovermode= 'closest',
              yaxis = dict(zeroline = False),
              xaxis = dict(zeroline = False),
              showlegend= True
             )

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='styled-scatter')

### UMAP

In [63]:
data = tf.reshape(x_train,[60000,-1]).numpy()

In [64]:
!pip install umap-learn

[33mYou are using pip version 18.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [65]:
import umap

In [66]:
embedding = umap.UMAP(n_neighbors=10,
                      min_dist=0.1,
                      metric='correlation').fit_transform(data)

In [67]:
plot_data = np.concatenate([embedding,y_train.reshape(-1,1)],axis=1)[:2000,:]

In [68]:
trace0 = go.Scatter(
    x = plot_data[:,0],
    y = plot_data[:,1],
    name = "class",
    hoveron = "points",
    mode = 'markers',
#     text = Target.unique(),
    showlegend = False,
    marker = dict(
        size = 8,
        color = plot_data[:,2],
        colorscale ='Jet',
        showscale = False,
        line = dict(
            width = 2,
            color = 'rgb(255, 255, 255)'
        ),
        opacity = 0.8
    )
)
data = [trace0]

layout = dict(title = 'UMAP (Uniform Manifold Aproximation Projection)',
              hovermode= 'closest',
              yaxis = dict(zeroline = False),
              xaxis = dict(zeroline = False),
              showlegend= True
             )

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='styled-scatter')

### Which is the method that performs the best in the titanic dataset? And which is more explainable?

## Introduction to DNNs

The most simple NN. The Perceptron:
<img width="50%" src="img/perceptron.png">
$$\widehat{y} = \sigma(w_0 + X^tW)$$

We can use a huge variety of activation functions, such as the sigmoid, the Hyperbolic tangent or the ReLU. We use activation functions to introduce non-linearities into our function.
$$\sigma = \frac{1}{1+e^{-z}}$$
$$\sigma' = \sigma (1-\sigma)$$
`tf.nn.sigmoid(z)`
$$tanh(z) = \frac{e^{z}-e^{-z}}{e^{z}+e^{-z}}$$
$$tanh' = 1- tanh^2$$
`tf.nn.tanh(z)`

$$ReLU(z) = max(0,z)$$
$$ReLU' = 0 \;\;if\;\; z < 0 \;\;else\;\; 1$$
`tf.nn.relu(z)`


Stacking the perceptron process in width and also in depth, we create Deep Neural Networks, with what is called Dense Layers. And the forward propagation formula will be the following:
<img width="50%" src="img/dnn.png">

\begin{eqnarray} 
  z^{l}_j = \sigma\left( \sum_k w^{l}_{jk} z^{l-1}_k + b^l_j \right),
\tag{23}\end{eqnarray}

Now as the neural network was randomly initialized, which means that those $w_ij$ are random, how can we minimize the error of the network with respect to those weights? Here is an example of a cost function known as the MSE:
\begin{eqnarray}
  C = \frac{1}{2n} \sum_x \|y(x)-z^L(x)\|^2,
\tag{26}\end{eqnarray}

Now to optimise those coefficients, we are going to use the so called Back Propagation Algorithm:

\begin{eqnarray} 
  \delta^L = \nabla_{z^L} C \odot \sigma'(z^L).
\tag{BP1}\end{eqnarray}

\begin{eqnarray} 
  \delta^l = ((w^{l+1})^T \delta^{l+1}) \odot \sigma'(z^l),
\tag{BP2}\end{eqnarray}

\begin{eqnarray}  \frac{\partial C}{\partial b^l_j} =
  \delta^l_j.
\tag{BP3}\end{eqnarray}

\begin{eqnarray}
  \frac{\partial C}{\partial w^l_{jk}} = z^{l}_k \delta^l_j.
\tag{BP4}\end{eqnarray}

So with the following formula, we are descending in the steepest direction to find the weights and the biases that minimize the cost function (which is called Gradient Descent method):
$$w^l \rightarrow
  w^l-\frac{\eta}{m} \sum_x \delta^{x,l} (z^{x,l})^T$$
  
$$b^l \rightarrow b^l-\frac{\eta}{m}
  \sum_x \delta^{x,l}$$

But tensorflow computes all those formulas, so we only need to define the model and train it. Here you can visualize how a real NN loss function looks like:

<img width="50%" src="img/nn_loss.png">

As here we will find lots of local minimums, it seems that using the Gradient Descent maybe is not the best technique that we can use. 

This opens the door to Stochastic Gradient Descent methods, Adaptative Learning Algorithms, and Regularization techniques such as Dropout or early stopping.