<a href="https://colab.research.google.com/github/axel-sirota/introduction-to-ml-course/blob/main/Day3/Neural_Nets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Networks and Deep Learning

© Data Trainers LLC. GPL v 3.0.

Author: Axel Sirota

## Basic Networks

A neural network is an algorithm that goes through a sequence of steps  performing Linear and Non-Linear Algebra operations, resulting in a high capacity algorithm to perform, at first, classification. We can later tune them for other uses but the main idea here will be classification.

<img src="https://www.dropbox.com/scl/fi/hkcsitf300tmvcqq4vnxw/neural.jpg?rlkey=x75rh7cdx73d9vfztlv2ed2hq&raw=1"  align="center"/>

Mathematically, is quite simple: Each circle, or neuron, performs the following operation:

$$
z_{i+1} = f(x_i*W_{i}^{k} + b_k)
$$

Let's dissect this formula. $x_i$ refers to the entry ith of the input Tensor X. The important part is that $w_{i}^{k}$ which is the weight for the dimension i and neuron k. Overall then if we count all neurons we have a matrix multiplication of the tensor $X$ with the weights $W$ and we have a term $b$ which are the biases and normally is set to 0.

The process called training is update the weights $W$ of each layer to make the loss minimum.

<img src="https://www.dropbox.com/scl/fi/1dluzgxbb3bqiqz4fiwsi/training.jpg?rlkey=m9sdn15i8jxzutw5vr6facvey&raw=1"  align="center"/>

### Getting the data

We are going to use one of the public datasets already parsed by Tensorflow, the IMDB one.

In [None]:
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Lambda, Embedding, Dropout
import keras.backend as K
from sklearn.model_selection import train_test_split
from tensorflow.nn import leaky_relu
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

max_features=15000
epochs = 25
batch_size = 256
embedding_dim = 100

In [None]:
(x_train, y_train), (x_val, y_val) = keras.datasets.imdb.load_data(num_words=max_features)
print(len(x_train), "Training sequences")
print(len(x_val), "Validation sequences")

We first need to ensure our input tensor is square, so we need to calculate it's width.

In [None]:
def get_maximum_review_length(X):
    maximum = 0
    for tokenized_review in X:
        candidate = len(tokenized_review)
        if candidate > maximum:
            maximum = candidate
    return maximum


maxlen = max(get_maximum_review_length(x_train), get_maximum_review_length(x_val))

In [None]:
x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen, padding='post')
x_val = keras.preprocessing.sequence.pad_sequences(x_val, maxlen=maxlen, padding='post')

In [None]:
x_val, x_test, y_val, y_test = train_test_split(x_val, y_val, test_size=0.33, random_state=42)

In [None]:
x_train.shape, x_val.shape, x_test.shape

Notice all our datasets have the same width, so we can input them into Tensorflow. Notice most entries will have tons of 0's and that is OK since we are goin to use a llayer called Embedding which understands 0s are pads

In [None]:

x_train[0]

### Training

Our model will be very simple. An embedding layer and then a lot of Dense layers, which are these fully connected layers we learned before. An important aspect is the Lambda there, **can you guess why is it there?**

In [None]:
model = Sequential()
model.add(Embedding(input_dim=max_features, output_dim=embedding_dim, input_length=maxlen))
model.add(Dense(50, activation=leaky_relu))
model.add(Lambda(lambda x: K.mean(x, axis=1)))
model.add(Dense(25, activation='relu'))
model.add(Dropout(rate=0.15))
model.add(Dense(1))

As our model does not use an activation function at the end, we must set **from_logits=True**

In [None]:
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer='adam', metrics='accuracy')
model.summary()

In [None]:
callback = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=5, min_delta=0.01, mode='max')
history = model.fit(x=x_train, y=y_train, epochs=epochs, batch_size=batch_size, validation_data=(x_val, y_val), workers=5, callbacks=[callback])


In [None]:
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.ylim([0, max(plt.ylim())])
plt.xlabel('Epoch #')
plt.ylabel('CE/token')
plt.legend()

In [None]:
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label='val_accuracy')
plt.ylim([0, max(plt.ylim())])
plt.xlabel('Epoch #')
plt.ylabel('CE/token')
plt.legend()

In [None]:
model.evaluate(x_test, y_test)

We did good! **How would you do it to evaluate the model?** This means write a review, convert it to the corresponding Tensor and use the `predict` method to get the prediction if it's positive or not.

### Now you do it
<img src="https://www.dropbox.com/scl/fi/s9kv1dytq4qzr8g19y3r0/hands_on.jpg?rlkey=yz8kq22sfdgc7lsgmm1e0fksr&raw=1" width="100" height="100" align="right"/>

1) Do the exercise above on evaluating our model

2) Use the Reuters news dataset from Keras with `keras.datasets.reuters.load_data()` and try to replicate what we did, test a new model and see what accuracy you get!

Hint: You will find a surprise in the middle but you know how to handle it!
Hint2: You may want to use the `CategoricalCrossEntropy` loss.

## Handling complex datasets

This time we will do as above, but instead of using a processed dataset, which are rare, we will use a free text dataset of news headlines and their categories and we will predict the category.

### Getting the data

In [None]:
!pip install 'gensim==4.2.0' swifter

In [None]:
import multiprocessing
import warnings
import nltk
import swifter
import gensim
from keras.initializers import Constant

embedding_dim = 300
epochs=100
batch_size = 250
corpus_size=25000

def set_session_with_gpus_and_cores():
  cores = multiprocessing.cpu_count()
  gpus = len(tf.config.list_physical_devices('GPU'))
  config = tf.compat.v1.ConfigProto( device_count = {'GPU': gpus  , 'CPU': cores} , intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
  sess = tf.compat.v1.Session(config=config)
  K.set_session(sess)

set_session_with_gpus_and_cores()
warnings.filterwarnings('ignore')
nltk.download('punkt')

In [None]:
%%writefile get_data.sh
if [ ! -f news.csv ]; then
  wget -O news.csv https://www.dropbox.com/s/352x7xzivf60zgc/news.csv?dl=0
fi


In [None]:
!bash get_data.sh

In [None]:
path = './news.csv'
news_pre = pd.read_csv(path, header=0).sample(n=corpus_size).reset_index(drop=True)

In [None]:
news_pre.head()

As you can see, this dataset is of text, not numbers so we need to do that mapping ourselves and be diligent on it. The first step in NLP is always to preprocess the text into tokens, in this case words

In [None]:
def preprocess_text(text, should_join=True):
    # Here you can add more magic
    if should_join:
      return ' '.join(gensim.utils.simple_preprocess(text))
    else:
      return gensim.utils.simple_preprocess(text)

We will use swifter since it is very useful to use multiprocessing on Pandas apply.

In [None]:
news = news_pre.title.swifter.apply(preprocess_text)

### Creating a word2vec model and the initialization Tensor

As we said what we need is to create a Tensor such that for every sentence in a batch, for every word in that sentence, we get an ID representing that word. This will be a rectanguular tensor (because we padded) and that will be the input to the Embedding layer to later learn, for each word and sentence, the best 50 dimensional representation of the word

In [None]:
class MyCorpus:
    """An iterator that yields sentences (lists of str)."""

    def __iter__(self):
        corpus_path = 'news.csv'
        for line in open(corpus_path):
            # assume there's one document per line, tokens separated by whitespace
            yield preprocess_text(line, should_join=False)

import gensim.models

sentences = MyCorpus()
word2vec = gensim.models.Word2Vec(sentences=sentences, vector_size=embedding_dim)
word2vec_model = word2vec.wv

That's it! gensim is super util to create this mapping from word to index in a fast way.

In [None]:
weights = tf.constant(word2vec_model.vectors)    # -> This goes into the Embedding layer and we will freeze it
vocab_size = len(word2vec_model.index_to_key)

In [None]:
weights.shape


If you check the shape it gives you for everyone of the 12342 words it has seen a 300 dimensional (in this case) representation

In [None]:
news_preprocessed = pd.DataFrame()
news_preprocessed['label'] = news_pre.category.map({'Business': 0, 'Sports': 1, 'Sci/Tech': 2, 'World': 3})
news_preprocessed['title'] = news
news_preprocessed

In [None]:
def get_maximum_review_length(df):
    maximum = 0
    for ix, row in df.iterrows():
        candidate = len(preprocess_text(row.title, should_join=False))
        if candidate > maximum:
            maximum = candidate
    return maximum

In [None]:
maximum = get_maximum_review_length(news_preprocessed)
maximum

Here we do what we said above. Iterate through the news df and for every word, if it exists in the word2vec model, put into X for that review and that word the index of the embedding (check index_to_key)


In [None]:
X = np.zeros((len(news_preprocessed), maximum))
for index, row in news_preprocessed.iterrows():
  ix = 0
  for word in preprocess_text(row.title, should_join=False):
    if word in word2vec_model.key_to_index:    # If the word exists in the word2vec embedding
      representation = word2vec_model.key_to_index[word]    # use the index
    else:
      representation = 0    # otherwise put a 0
    X[index, ix] = representation
    ix+= 1
y = news_preprocessed.label

In [None]:
X[0]

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train = tf.constant(X_train)
X_test = tf.constant(X_test)
y_train = tf.one_hot(tf.constant(y_train), 4)  # 4 Categories
y_test = tf.one_hot(tf.constant(y_test), 4)    # 4 Categories

### Training

In [None]:
model = Sequential()
model.add(Embedding(input_dim=weights.shape[0], output_dim=embedding_dim, input_length=maximum, embeddings_initializer=Constant(weights), trainable=True))
model.add(Dense(100, activation=leaky_relu))
model.add(Dense(50, activation='relu'))
model.add(Lambda(lambda x: K.mean(x, axis=1), output_shape=(None, embedding_dim,)))
model.add(Dense(50, activation=leaky_relu))
model.add(Dense(4))

Notice we pass to the Embedding the weights and set al the other parameters easily. Next we compile the model but as we use many classes we must use **CategoricalCrossEntropy** as you have seen in the exercise, and we set **from_logits=True**

In [None]:
model.compile(loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True), optimizer='adam', metrics=['accuracy'])
model.summary()

In [None]:
callback = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=5, min_delta=0.01, mode='max')
history = model.fit(x=X_train, y=y_train, epochs=epochs, batch_size=batch_size, validation_data=(X_test, y_test), workers=5, callbacks=[callback])


In [None]:
import matplotlib.pyplot as plt

# function for plotting loss
def plot_metrics(train_metric, val_metric=None, metric_name=None, title=None, ylim=5):
    plt.title(title)
    plt.ylim(0,ylim)
    plt.plot(train_metric,color='blue',label=metric_name)
    if val_metric is not None: plt.plot(val_metric,color='green',label='val_' + metric_name)
    plt.legend(loc="upper right")

In [None]:
plot_metrics(history.history['loss'], history.history['val_loss'], "Loss", "Loss", ylim=10.0)


In [None]:
plot_metrics(history.history['accuracy'], history.history['val_accuracy'], "accuracy", "accuracy", ylim=1.0)


### Evaluation

In [None]:
x_val = np.zeros((2, maximum))
for index, row in enumerate(['supercomputer will put workers jobless soon', 'patriots goes winning super bowl']):
    ix = 0
    for word in preprocess_text(row, should_join=False):
        if word not in word2vec_model:
            representation = 0
        else:
            representation = word2vec_model.index_to_key.index(word)
        x_val[index, ix] = representation
        ix += 1
y_val = tf.one_hot([0,1], depth=4)

In [None]:
x_val

In [None]:
y_val

In [None]:
model.predict(x_val)

In [None]:
model.evaluate(X_test, y_test)