# All in One Popular ML/NN/CNN/RNN Models Implementation 📊 💻🧐

<a id="cell-intro"></a>
# 1. Introduction

This cheatsheet contains:

1.   **Popular Machine Learning Models**
2.   **(Deep) Neural Networks**
3.   **Convolutional Neural Networks**
4.   **Recurrent Neural Networks**
5.   **Natural Language Processing**

Always import "[⚠️ Global Libraries for every project ⚠️](#cell-global_lib)" first to make anything else work within this notebook.

The datasets used are:
1. [Alpaca Dataset for Image Classification](https://www.kaggle.com/datasets/sid4sal/alpaca-dataset-small)
2. [Best regression model](https://www.kaggle.com/datasets/anshulkataria/best-regression-model)
3. [Breast Tumor Features](https://www.kaggle.com/datasets/ayushish12/breast-tumor-features)
4. [Car Object Detection](https://www.kaggle.com/datasets/sshikamaru/car-object-detection)
5. [IMDB Dataset of 50K Movie Reviews](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews)  
6. [Rearranged Brain Tumor dataset](https://www.kaggle.com/datasets/nsff591/rearranged-brain-tumor-dataset-from-ahmed-hamada)
7. [Restaurant Reviews](https://www.kaggle.com/datasets/nanuprasad/restaurant-reviews) 
8. [Semantic Segmentation for Self Driving Cars](https://www.kaggle.com/datasets/kumaresanmanickavelu/lyft-udacity-challenge)
9. [Ubiquant Market Prediction Pickle Dataset](https://www.kaggle.com/datasets/lonnieqin/ubiquant-market-prediction-half-precision-pickle)
10. [White Wine Quality](https://www.kaggle.com/datasets/piyushagni5/white-wine-quality)




<a id="cell-toc"></a>
# 2. Table of Content

1 [Introduction](#cell-intro)

2 [Table of Content](#cell-toc)

3 [Model Explanation and Implementation](#cell-models)

[⚠️ Global Libraries for every project ⚠️](#cell-global_lib)

3.1 [Basic Regression](#cell-basic_regression)

* [Importing simple regression data](#cell-basic_regression_import)
1.  [Linear Regression](#cell-basic_linear_regression)
2.  [Multi-linear regression](#cell-basic_multi_linear_regression)
3.  [Polynomial regression](#cell-basic_poly_regression)
4.  [Decision tree regression](#cell-basic_dtr)
5.  [Random forest regression](#cell-basic_rfr)
6.  [Support Vector regression](#cell-basic_svmr)

3.2 [Basic Classification](#cell-basic_c)

* [Importing Simple classification data](#cell-basic_c_data)

1.  [Logistic regression](#cell-basic_c_lr)
2.  [Naive bayes](#cell-basic_c_nb)
3.  [K nearest neighbours(KNN)](#cell-basic_c_knn)
4.  [Suport vector machine (SVM)](#cell-basic_c_svm)
5.  [Kernel SVM](#cell-basic_c_ksvm)
6.  [Decision tree classification](#cell-basic_c_dtc)
7.  [Random forest classification](#cell-basic_c_rfc)
____
*(the following models can also be used for regression)*

8.  [LightLGBM model](#cell-basic_c_lgbm)
9.  [XGBoost model](#cell-basic_c_xgbm)
10. [Catboost model](#cell-basic_c_cbm)

3.3 [Neural Networks (NN)](#cell-nn)

* [Importing Simple NN data](#cell-nn_data)

1. [Neural Network: Sequential() keras](#cell-nn_seq)
2. [Neural Network: Functional Keras](#cell-nn_func)
3. [Neural Network: PyTorch](#cell-nn_pytorch)
4. [Self Organizing Maps (SOM)](#cell-nn_som)
5. [Boltzman Machine](#cell-nn_bm)
6. [AutoEncoders](#cell-nn_ae)

3.4 [Convolutional Neural Networks (CNN)](#cell-cnn)

* [Importing Simple CNN data](#cell-cnn_data)

1. [Convolutional Neural Network: Sequential() keras](#cell-cnn_seq)
2. [Convolutional Neural Network: Functional keras](#cell-cnn_func)
3. [Skip Connections/Residual blocks](#cell-cnn_skip)
4. [Inception Network](#cell-cnn_incept)
5. [Depthwise Separable Convolution](#cell-cnn_dsc)
6. [Transfer Learning](#cell-cnn_tran)
7. [Object Detection](#cell-cnn_obj)
8. [Semantic Segmentation](#cell-cnn_semantic_seg)
9. [Face Recognition](#cell-cnn_face_rec)
10. [Neural Style Transfer](#cell-cnn_style_transf)

* [Importing Simple CNN data](#cell-cnn_data)

3.5 [Recurrent Neural Networks (RNN)](#cell-rnn)

* [Importing Simple RNN data](#cell-rnn_data)
1. [Long short-term memory (LSTM)](#cell-rnn_lstm)

3.6 [Natural Language Processing (NLP)](#cell-nlp)
1. [Sentiment Analysis](#cell-nlp_sent)

4 [References](#cell-references)

<a id="cell-models"></a>
# 3. Model Explanation and Implementation

<a id="cell-global_lib"></a>
## ⚠️ Global Libraries for every project ⚠️

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split

<a id="cell-basic_regression"></a>
## 3.1 Basic Regression

<a id="cell-basic_regression_import"></a>
### Importing Simple Regression data

In [None]:
regressionDataset = pd.read_csv('../input/best-regression-model/Data.csv')
x = regressionDataset.iloc[:, :-1].values
y = regressionDataset.iloc[:, -1].values

print("Dimensions: " + str(regressionDataset.shape))

In [None]:
regressionDataset.head()

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)

<a id="cell-basic_linear_regression"></a>
### 3.1.1. Linear Regression

<img src="https://www.reneshbedre.com/assets/posts/reg/reg_front.svg" width="650" align="centr"/>

<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/8119b3ed1259aa8ff15166488548104b50a0f92e" width="150" align="centr"/>

Linear Regression is the supervised Machine Learning model in which the model finds the best fit linear line between the independent and dependent variable. The model will alter the β and ε to fit the model.

In [None]:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x_train, y_train)

In [None]:
from sklearn.metrics import r2_score
y_pred = regressor.predict(x_test)
r2_score(y_test, y_pred)

Note: R² cannot be used for nonlinear regression but is a good indicator for linear regressions. (the closer to 1 the better)

In [None]:
plt.scatter(y_test,y_pred)

If we would have a perfect straight line, this would mean R² would be 1 and our predictive model would have been 100% accurate. This is not the case but the shape of the scatter plot does show that we can predict pretty close to it and we even notice some outliers which can be tracked down with further research on the dataset if we wanted to.

<a id="cell-basic_multi_linear_regression"></a>
### 3.1.2. Multi-linear regression

**Difference between Linear and multi-linear regression:**

Simple linear regression has only one x and one y variable.
Multiple linear regression has one y and two or more x variables/features.

We actually use a multi-linear dataset here, which means that our previous **Linear regression code is the same for the multi-linear version.** 

<a name="cell-basic_poly_regression"></a>
### 3.1.3. Polynomial regression

<img src="https://static.javatpoint.com/tutorial/machine-learning/images/machine-learning-polynomial-regression.png" width="650" align="centr"/>

<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/bc6e10cc75097fa66e7e02d6a75491d14a0c4aba" width="500" align="centr"/>

In [None]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly_reg = PolynomialFeatures(degree = 4) # degrees = amount of features (= amount of x variables)
x_poly = poly_reg.fit_transform(x_train)
regressor = LinearRegression()
regressor.fit(x_poly, y_train)

In [None]:
from sklearn.metrics import r2_score
y_pred = regressor.predict(poly_reg.transform(x_test))
r2_score(y_test, y_pred)

In [None]:
plt.scatter(y_test,y_pred)

<a id="cell-basic_dtr"></a>
### 3.1.4. Decision tree regression

<img src="https://scikit-learn.org/stable/_images/sphx_glr_plot_tree_regression_001.png" width="650" align="centr"/>

In [None]:
from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor(random_state = 0)
regressor.fit(x_train, y_train)

In [None]:
from sklearn.metrics import r2_score
y_pred = regressor.predict(x_test)
r2_score(y_test, y_pred)

In [None]:
plt.scatter(y_test,y_pred)

<a id="cell-basic_rfr"></a>
### 3.1.5. Random forest regression

<img src="https://upload.wikimedia.org/wikipedia/commons/7/76/Random_forest_diagram_complete.png" width="650" align="centr"/>

In [None]:
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
regressor.fit(x_train, y_train)

In [None]:
from sklearn.metrics import r2_score
y_pred = regressor.predict(x_test)
r2_score(y_test, y_pred)

In [None]:
plt.scatter(y_test,y_pred)

<a id="cell-basic_svmr"></a>
### 3.1.6. Support Vector regression

<img src="https://i.stack.imgur.com/29nu8.png" width="650" align="centr"/>

**Feature scaling is needed with SVM**

In [None]:
y_svm = y.reshape(len(y),1)

x_train, x_test, y_svm_train, y_svm_test = train_test_split(x, y_svm, test_size = 0.2, random_state = 0)

from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
sc_y = StandardScaler()
x_svm_train = sc_x.fit_transform(x_train)
y_svm_train = sc_y.fit_transform(y_svm_train)

In [None]:
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(x_svm_train, y_svm_train)

In [None]:
from sklearn.metrics import r2_score
y_pred = sc_y.inverse_transform(regressor.predict(sc_x.transform(x_test)).reshape(-1,1))
r2_score(y_svm_test, y_pred)

In [None]:
plt.scatter(y_svm_test,y_pred)

<a id="cell-basic_c"></a>
## 3.2 Basic Classification

<a id="cell-basic_c_data"></a>
### Importing Simple classification data

In [None]:
dataset = pd.read_csv('../input/breast-tumor-features/breast tumor.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

print("Dimensions: " + str(dataset.shape))

In [None]:
dataset.head()

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

<a id="cell-basic_c_lr"></a>
### 3.2.1. Logistic regression

<img src="https://rajputhimanshu.files.wordpress.com/2018/03/linear_vs_logistic_regression.jpg" width="650" align="centr"/>

In [None]:
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(x_train, y_train)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

<a id="cell-basic_c_nb"></a>
### 3.2.2. Naive Bayes

<img src="https://thatware.co/wp-content/uploads/2020/04/naive-bayes.png" width="650" align="centr"/>

In [None]:
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train, y_train)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

<a id="cell-basic_c_knn"></a>
### 3.2.3. K-nearest neighbours (KNN)

<img src="https://raw.githubusercontent.com/artifabrian/dynamic-knn-gpu/master/knn.png" width="650" align="centr"/>

In [None]:
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(x_train, y_train)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

<a id="cell-basic_c_svm"></a>
### 3.2.4. Suport vector machine (SVM)

<img src="https://static.javatpoint.com/tutorial/machine-learning/images/support-vector-machine-algorithm.png" width="650" align="centr"/>

In [None]:
from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(x_train, y_train)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

<a id="cell-basic_c_ksvm"></a>
### 3.2.5. Kernel SVM

<img src="https://slidetodoc.com/presentation_image_h/2f087c491fa6549a8cb25688253dec9b/image-14.jpg" width="650" align="centr"/>

In [None]:
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(x_train, y_train)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

<a id="cell-basic_c_dtc"></a>
### 3.2.6. Desicion tree classification

<img src="https://static.javatpoint.com/tutorial/machine-learning/images/decision-tree-classification-algorithm2.png" width="650" align="centr"/>

Weak against overfitting, use random forest to prevent that, if that is the case

In [None]:
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(x_train, y_train)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

<a id="cell-basic_c_rfc"></a>
### 3.2.7. Random forest classification

<img src="https://upload.wikimedia.org/wikipedia/commons/7/76/Random_forest_diagram_complete.png" width="650" align="centr"/>

In [None]:
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(x_train, y_train)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

<a id="cell-basic_c_lgbm"></a>
### 3.2.8. LightLGBM model

<img src="https://miro.medium.com/max/3000/1*AZsSoXb8lc5N6mnhqX5JCg.png" width="650" align="centr"/>
<img src="https://miro.medium.com/max/3000/1*whSa8rY4sgFQj1rEcWr8Ag.png" width="650" align="centr"/>

Light GBM is a gradient boosting framework that uses tree based learning algorithm.

Light GBM grows tree vertically while other algorithm grows trees horizontally meaning that Light GBM grows tree leaf-wise while other algorithm grows level-wise. It will choose the leaf with max delta loss to grow. When growing the same leaf, Leaf-wise algorithm can reduce more loss than a level-wise algorithm.

Why is it good? 



*   High speed
*   Can handle large datasets
*   low memory use
*   supports gpu learning



In [None]:
import lightgbm as lgb

# params still need to be optimized
params = {}
params['learning_rate'] = 0.003
params['boosting_type'] = 'gbdt'
params['objective'] = 'binary'
params['metric'] = 'binary_logloss'
params['sub_feature'] = 0.5
params['num_leaves'] = 10
params['min_data'] = 50
params['max_depth'] = 10

classifier_lgb = lgb.LGBMClassifier(**params)
clf = classifier_lgb.fit(x_train, y_train)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = clf.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

Some extra explanation on the parameters if needed:
https://medium.com/@pushkarmandot/https-medium-com-pushkarmandot-what-is-lightgbm-how-to-implement-it-how-to-fine-tune-the-parameters-60347819b7fc

<a id="cell-basic_c_xgbm"></a>
### 3.2.9. XGBoost

<img src="https://www.researchgate.net/profile/Li-Mingtao-2/publication/335483097/figure/fig3/AS:934217085100032@1599746118459/A-general-architecture-of-XGBoost.ppm" width="650" align="centr"/>

In [None]:
from xgboost import XGBClassifier

import warnings # ignoring some warning here. I have to update the xgb code to no longer send deprecated warnings 
warnings.filterwarnings('ignore')

classifier = XGBClassifier(use_label_encoder=True,eval_metric='logloss')
classifier.fit(x_train, y_train)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

In [None]:
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score


accuracies = cross_val_score(estimator = classifier, X = x_train, y = y_train, cv = 10)
print("Accuracy: {:.2f} %".format(accuracies.mean()*100))
print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))

<a id="cell-basic_c_cbm"></a>
### 3.2.10. CatBoost

<img src="https://miro.medium.com/max/875/1*E006sjlIjabDJ3jNixRSnA.png" width="650" align="centr"/>

source: https://medium.com/riskified-technology/xgboost-lightgbm-or-catboost-which-boosting-algorithm-should-i-use-e7fda7bb36bc

<img src="https://i1.wp.com/thaddeus-segura.com/wp-content/uploads/2020/10/cb9.png?resize=923%2C262&ssl=1" width="650" align="centr"/>

In [None]:
from catboost import CatBoostClassifier
classifier = CatBoostClassifier()
classifier.fit(x_train, y_train, logging_level='Silent')

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

<a id="cell-nn"></a>
## 3.3 Neural Networks (NN)

<a id="cell-nn_data"></a>
### Importing Simple Data

This dataset is a classification problem, so we will have to Encode the dataset using the OneHotEncoder.

In [None]:
data = pd.read_csv('../input/white-wine-quality/winequality-white.csv', sep=';')
data.head(5)

In [None]:
# splitting the features(x) with the expected result(y) (quality of wine)
x = data.iloc[:, :-1].values
y = data.iloc[:, -1].values

In [None]:
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder()
y = ohe.fit_transform(y.reshape(-1,1)).toarray()

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= 0.2, random_state=555)

print("Unscaled training data example:" + np.array2string(x_train[0], formatter={'float_kind':lambda x: "%.0f" % x}))

In [None]:
from sklearn.preprocessing import StandardScaler
# saving unscaled versions for testing purposes if needed
unscaled_x_train = x_train
unscaled_x_test = x_test

sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

print("Scaled training data example:" + np.array2string(x_train[0], formatter={'float_kind':lambda x: "%.3f" % x}))

<a id="cell-nn_seq"></a>
### 3.3.1 Neural Network: Sequential() keras

<img src="https://nickmccullum.com/images/python-deep-learning/how-do-neural-networks-really-work/completed-neural-network.png" width="650" align="centr"/>

This code snipet contains a simple sequential keras model. It can be used for simple models which only have one input and output. Otherwise you should use functional keras API.

Activation functions commonly used:

<img src="https://www.researchgate.net/profile/Aaron-Stebner-2/publication/341310767/figure/fig7/AS:890211844255749@1589254451431/Common-activation-functions-in-artificial-neural-networks-NNs-that-introduce.ppm" width="650" align="centr"/>


In [None]:
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dropout, Dense, BatchNormalization

BATCH_SIZE = 32
VAL_SPLIT = 0.1
EPOCH_NUM = 15
LEARNING_RATE = 0.0002

model = Sequential()

model.add(Dense(units = 32, activation = 'relu', input_dim = 11))
model.add(Dense(units=64,activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(units=64,activation='relu'))
model.add(BatchNormalization())
model.add(Dense(units=64,activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(units=32,activation='relu'))
model.add(Dense(units=7,activation='softmax'))

opt = keras.optimizers.Adam(learning_rate=LEARNING_RATE)
model.compile(optimizer= opt, loss='categorical_crossentropy',metrics=['accuracy'])

fitted_model= model.fit(x_train, y_train, validation_split= VAL_SPLIT, epochs= EPOCH_NUM, batch_size= BATCH_SIZE)

In [None]:
model.summary()
tf.keras.utils.plot_model(model,  show_shapes=True,expand_nested=True)

Analysis on Training the model:

If your training (blue line) starts to heavily outperform the validation (yellow line), then you are overfitting. Solutions against this:

* Change learning rate higher/lower OR change it to a decaying learning rate
* Regularize your model by using BatchNormalization and or Dropout
* Data Engineer your dataset, to become more robust (example: binning)
* Make the neural network smaller

Deciding how big your neural network should be:

* Make your neural network bigger as long as it does not overfit the data on Val/Test-dataset
* The more data you have, the bigger your model can become without overfitting
* Don't make too big of a NN if you don't have a lot of data. Example: having 1k dataset but 1million parameters in the model will almost always create a bad model. But it is normal that you have more parameters than data samples, just don't go overboard and start thinking about regularization if it does get out of hand. Regularization helps a lot with bigger models.

In [None]:
def plot_metrics(history):
  metrics = ['loss', 'accuracy']
  for n, metric in enumerate(metrics):
    try:
      name = metric.replace("_"," ").capitalize()
      plt.plot(history.epoch, history.history[metric], label='Train')
      plt.plot(history.epoch, history.history['val_'+metric], linestyle="--", label='Val')
      plt.xlabel('Epoch')
      plt.ylabel(name)
      if metric == 'loss':
        plt.ylim([0, plt.ylim()[1]])
      elif metric == 'auc':
        plt.ylim([0.8,1])
      else:
        plt.ylim([0,1])
      plt.legend()
      plt.show()  
    except:
      pass
plot_metrics(model.history)

plt.title(label='Zoomed in Accuracy plot')
plt.plot(fitted_model.history["accuracy"],label='Train')
plt.plot(fitted_model.history["val_accuracy"],linestyle="--",label='Validation')
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.show()

Post processing for this specific dataset. This might be different for a case by case basis.

In [None]:
y_pred = model.predict(x_test)

pred = list()
for i in range(len(y_pred)):
  pred.append(np.argmax(y_pred[i]))

test = list()
for i in range(len(y_test)):
  test.append(np.argmax(y_test[i]))

Analysis of the result

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(test, pred)
print(cm)
print(accuracy_score(test, pred))

In [None]:
from sklearn.metrics import classification_report
report = classification_report(test,pred)
print(report)

<a id="cell-nn_func"></a>
### 3.3.2 Neural Network: Functional Keras

<img src="https://www.researchgate.net/publication/330230427/figure/fig5/AS:962670173892631@1606529863497/A-multi-layer-neural-network-with-n-inputs-at-least-two-hidden-layers-and-one-output.png" width="650" align="centr"/>

You should use Functional keras if you have more complex models with multiple inputs and outputs. I will use an example which was widely used during the https://www.kaggle.com/competitions/ubiquant-market-prediction competition where 1 feature had it's own model and got merged with the other 300 features later in the model. The reason being, that the 1 feature had a significant meaning to the outcome of the output value. This did not always mean a better performance, but it was a great idea.

I would also recommend using functional keras for complex Convolutional Neural Networks but more on that later.

In [None]:
%%time
train_df = pd.read_pickle('../input/ubiquant-market-prediction-half-precision-pickle/train.pkl')

In [None]:
# making the dataset smaller for example purposes
train_df = train_df[:250_000]
import gc
gc.collect()

Altering previous dataset to fit our model.

In [None]:
# Preprocessing the data
investment_id = train_df.pop("investment_id") # splitsing investment_id from the dataset to be fed seperatly into the NN
investment_id_unique = pd.DataFrame(investment_id.unique())
investment_id_unique_size = len(investment_id_unique) + 1

train_df.pop("time_id") # deleting the time_id row
y = train_df.pop("target")

n_features = 300
features = [f'f_{i}' for i in range(n_features)]

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.layers import Dropout, Dense, BatchNormalization

BATCH_SIZE = 64
VAL_SPLIT = 0.1
EPOCH_NUM = 3
LEARNING_RATE = 0.001

# Defining the input size
investment_id_inputs = tf.keras.Input((1, ), dtype=tf.uint16)
features_inputs = tf.keras.Input((300, ), dtype=tf.float16)

# Encoding the investment_id within the Neural Net, this is not always necessary based what kind of data you have
investment_id_lookup_layer = layers.IntegerLookup(max_tokens=investment_id_unique_size)
investment_id_lookup_layer.adapt(investment_id_unique)

# Defining the model of investment_id
investment_id_x = investment_id_lookup_layer(investment_id_inputs)
investment_id_x = layers.Embedding(investment_id_unique_size, 32, input_length=1)(investment_id_x)
investment_id_x = layers.Reshape((-1, ))(investment_id_x) # Reshaping the model into the right dimensions after the embedding
investment_id_x = layers.Dense(64, activation='swish')(investment_id_x)
investment_id_x = layers.BatchNormalization()(investment_id_x)
investment_id_x = layers.Dense(32, activation='swish')(investment_id_x)

# Defining the model for all other features
feature_x = layers.Dense(256, activation='swish')(features_inputs)
feature_x = layers.Dense(256, activation='swish')(feature_x)
feature_x = layers.BatchNormalization()(feature_x)
feature_x = layers.Dense(128, activation='swish')(feature_x)
feature_x = layers.Dropout(0.25)(feature_x)

# Merging the 2 models into one
x = layers.Concatenate(axis=1)([investment_id_x, feature_x])
x = layers.Dense(64, activation='swish', kernel_regularizer="l2")(x)
x = layers.Dropout(0.2)(x)
x = layers.Dense(32, activation='swish', kernel_regularizer="l2")(x)
x = layers.BatchNormalization()(x)
x = layers.Dense(16, activation='swish', kernel_regularizer="l2")(x)

output = layers.Dense(1)(x)

model = tf.keras.Model(inputs=[investment_id_inputs, features_inputs], outputs=[output])
model.compile(optimizer=tf.optimizers.Adam(LEARNING_RATE), loss='mse', metrics='mse')

In [None]:
model.summary()
tf.keras.utils.plot_model(model,  show_shapes=True,expand_nested=True)

In [None]:
with tf.device('/gpu:0'):# if no GPU is found it will run with the CPU
    model.fit([investment_id, train_df], y, validation_split= VAL_SPLIT, epochs = EPOCH_NUM, batch_size = BATCH_SIZE)

In [None]:
def plot_metrics(history):
  metrics = ['loss', 'mse']
  for n, metric in enumerate(metrics):
    try:
      name = metric.replace("_"," ").capitalize()
      plt.plot(history.epoch, history.history[metric], label='Train')
      plt.plot(history.epoch, history.history['val_'+metric], linestyle="--", label='Val')
      plt.xlabel('Epoch')
      plt.ylabel(name)
      if metric == 'loss':
        plt.ylim([0, plt.ylim()[1]])
      elif metric == 'auc':
        plt.ylim([0.8,1])
      else:
        plt.ylim([0,1])
      plt.legend()
      plt.show()  
    except:
      pass
plot_metrics(model.history)

plt.title(label='Zoomed in Mse plot')
plt.plot(model.history.history["mse"],label='Train')
plt.plot(model.history.history["val_mse"],linestyle="--",label='Validation')
plt.legend()
plt.ylabel('Mse')
plt.xlabel('Epoch')
plt.legend()
plt.show()

In [None]:
# cleaning up RAM
import gc

del train_df
del y
del investment_id
gc.collect()

<a id="cell-cnn"></a>
## 3.4 Convolutional Neural Networks (CNN)

<a id="cell-cnn_data"></a>
### Importing Simple Data

In [None]:
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing import image

train_data_dir= '../input/rearranged-brain-tumor-dataset-from-ahmed-hamada/DataSet'
test_data_dir= '../input/rearranged-brain-tumor-dataset-from-ahmed-hamada/TestDataSet'

IMG_WIDTH = 180
IMG_HEIGHT = 180
BATCH_SIZE = 32

train_datagen = ImageDataGenerator(
        rescale=1./255,
        horizontal_flip=True,
        validation_split=0.2)

training_set = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(IMG_WIDTH, IMG_HEIGHT),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='training') # set as training data



validation_datagen = ImageDataGenerator(
        rescale=1./255,
        validation_split=0.2)

# Based on example code of keras data preprocessing api doc and Salik Hussaini example code#
validation_set = validation_datagen.flow_from_directory(
    train_data_dir,
    target_size=(IMG_WIDTH, IMG_HEIGHT),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='validation')

<a id="cell-cnn_seq"></a>
## 3.4.1 Convolutional Neural Networks: Sequential() keras

<img src="https://miro.medium.com/max/1400/1*vkQ0hXDaQv57sALXAJquxA.jpeg" width="650" align="centr"/>

**Convolution:** Is applying a filter on an image to capture for example "edges" or "patterns" of images. These filters can be set but are often parameters and trained by the model.

<img src="https://1.cms.s81c.com/sites/default/files/2021-01-06/ICLH_Diagram_Batch_02_17A-ConvolutionalNeuralNetworks-WHITEBG.png" width="650" align="centr"/>

**Padding:** Padding is making the previous image/input bigger by adding n-"empty blocks" around the image/input. This allows our convolution or pooling to capture edges better. If people use "same"-padding. They choose a specific amount of n-"empty blocks" to add, which would generate the same dimension of output after convolving/pooling as they had received as input.

<img src="https://miro.medium.com/max/666/1*noYcUAa_P8nRilg3Lt_nuA.png" width="650" align="centr"/>

**Stride:** The speed at which your filter/pooling goes through your image/input. If they say: Stride=2. This means that you would go 2 steps to the right every filter/pooling instead of 1 step.
**Pooling:** There are 2 commen pooling methods:

* Max pooling: Take the maximum of an element within that pool matrix
* Average pooling: Average over all the elements within that pool matrix

<img src="https://miro.medium.com/max/1192/1*KQIEqhxzICU7thjaQBfPBQ.png" width="650" align="centr"/>

There is a trend to lower the dimensions of the image throughout the CNN and increase the amount of filters throughout the model. This is based on LeNet-5's research paper but I am not qualified to say if this is usefull or good in all CNN problems.

<img src="https://cdn.analyticsvidhya.com/wp-content/uploads/2021/03/Screenshot-from-2021-03-18-12-52-17.png" width="650" align="centr"/>

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.models import Sequential
from keras.layers import Dropout, Dense, BatchNormalization, Conv2D, Flatten, MaxPool2D

BATCH_SIZE = 64
VAL_SPLIT = 0.1
EPOCH_NUM = 5
LEARNING_RATE = 0.001
IMG_WIDTH = 180
IMG_HEIGHT = 180

model = Sequential()

model.add(Conv2D(filters=32,kernel_size=3,activation='relu',input_shape=[IMG_WIDTH, IMG_HEIGHT,3]))
model.add(MaxPool2D(pool_size=2,strides=2))
model.add(Conv2D(filters=32,kernel_size=3,activation='relu'))
model.add(MaxPool2D(pool_size=2,strides=2))
model.add(Flatten())

model.add(Dense(units=300,activation='relu'))
model.add(Dense(units=1,activation='sigmoid'))

model.compile(optimizer=tf.optimizers.Adam(LEARNING_RATE) ,loss='binary_crossentropy',metrics=['accuracy'])

In [None]:
model.summary()
tf.keras.utils.plot_model(model,  show_shapes=True,expand_nested=True)

In [None]:
with tf.device('/gpu:0'):# if no GPU is found it will run with the CPU
    history = model.fit(x = training_set, validation_data=validation_set, epochs = EPOCH_NUM, batch_size = BATCH_SIZE)

Evaluating the model

In [None]:
def plot_metrics(history):
  metrics = ['loss', 'accuracy']
  for n, metric in enumerate(metrics):
    try:
      name = metric.replace("_"," ").capitalize()
      plt.plot(history.epoch, history.history[metric], label='Train')
      plt.plot(history.epoch, history.history['val_'+metric], linestyle="--", label='Val')
      plt.xlabel('Epoch')
      plt.ylabel(name)
      if metric == 'loss':
        plt.ylim([0, plt.ylim()[1]])
      elif metric == 'auc':
        plt.ylim([0.8,1])
      else:
        plt.ylim([0,1])
      plt.legend()
      plt.show()  
    except:
      pass
plot_metrics(model.history)

plt.title(label='Zoomed in Accuracy plot')
plt.plot(history.history["accuracy"],label='Train')
plt.plot(history.history["val_accuracy"],linestyle="--",label='Validation')
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.show()

In [None]:
from os import listdir
from os.path import isfile, join

test_labels = listdir(test_data_dir)
test_image_array= []
number_of_files= 0.
prediction_accuracy= 0

for folder_number in range(len(test_labels)):
  for path in listdir(test_data_dir+'/'+test_labels[folder_number]):
    number_of_files+=1

    test_image= image.load_img(path=test_data_dir+'/'+test_labels[folder_number]+'/'+path,target_size=(IMG_WIDTH,IMG_HEIGHT))
    test_image= image.img_to_array(test_image)

    normalized_test_image = test_image*1./255

    test_image= np.expand_dims(normalized_test_image,axis=0)
    prediction= model.predict(test_image)
    #changing the prediction "numbers" to classnames
    if prediction >= 0.5:
      prediction= 'yes'
    else:
      prediction= 'no'
    if prediction == test_labels[folder_number]:
      prediction_accuracy+=1

prediction_accuracy= prediction_accuracy/number_of_files

print('prediction accuracy= '+ str(prediction_accuracy*100)+'%')

<a id="cell-cnn_func"></a>
## 3.4.2 Convolutional Neural Networks: Functional keras

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.layers import Dropout, Dense, BatchNormalization, Conv2D, Flatten, MaxPool2D

BATCH_SIZE = 64
VAL_SPLIT = 0.1
EPOCH_NUM = 5
LEARNING_RATE = 0.001
IMG_WIDTH = 180
IMG_HEIGHT = 180

# Defining the input size
x_inputs = tf.keras.Input((IMG_WIDTH, IMG_HEIGHT,3))

Z1 = layers.Conv2D(filters = 32 , kernel_size= (3,3), padding='same')(x_inputs)
A1 = layers.ReLU()(Z1)
P1 = layers.MaxPool2D(pool_size = (2,2), strides= (2,2), padding='valid')(A1)
Z2 = layers.Conv2D(filters = 32 , kernel_size= (3,3), strides = 1)(P1)
A2 = layers.ReLU()(Z2)
P2 = layers.MaxPool2D(pool_size=(2,2), strides=(2,2))(A2)
F = layers.Flatten()(P2)

Z3 = tf.keras.layers.Dense(units=300, activation='relu')(F)
outputs = tf.keras.layers.Dense(units=1, activation='sigmoid')(Z3)

model = tf.keras.Model(inputs=x_inputs, outputs=outputs)
model.compile(optimizer=tf.optimizers.Adam(LEARNING_RATE) ,loss='binary_crossentropy',metrics=['accuracy'])

In [None]:
model.summary()
tf.keras.utils.plot_model(model,  show_shapes=True,expand_nested=True)

In [None]:
with tf.device('/gpu:0'):# if no GPU is found it will run with the CPU
    history = model.fit(x = training_set, validation_data=validation_set, epochs = EPOCH_NUM, batch_size = BATCH_SIZE)

In [None]:
def plot_metrics(history):
  metrics = ['loss', 'accuracy']
  for n, metric in enumerate(metrics):
    try:
      name = metric.replace("_"," ").capitalize()
      plt.plot(history.epoch, history.history[metric], label='Train')
      plt.plot(history.epoch, history.history['val_'+metric], linestyle="--", label='Val')
      plt.xlabel('Epoch')
      plt.ylabel(name)
      if metric == 'loss':
        plt.ylim([0, plt.ylim()[1]])
      elif metric == 'auc':
        plt.ylim([0.8,1])
      else:
        plt.ylim([0,1])
      plt.legend()
      plt.show()  
    except:
      pass
plot_metrics(model.history)

plt.title(label='Zoomed in Accuracy plot')
plt.plot(history.history["accuracy"],label='Train')
plt.plot(history.history["val_accuracy"],linestyle="--",label='Validation')
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.show()

<a id="cell-cnn_skip"></a>
## 3.4.3 Skip Connections/Residual blocks

<img src="https://miro.medium.com/max/1140/1*D0F3UitQ2l5Q0Ak-tjEdJg.png" width="650" align="centr"/>

<img src="https://theaisummer.com/static/2c373d3667071700748bf451c4e62b78/3accd/long-skip-connection.jpg" width="650" align="centr"/>

Skip connections/Residual blocks are used to solve the performance degradation problem associated with deep neural architectures. (This solution was introduced with the ResNet paper) See graph below to show the problem that it will prevent.

<img src="https://miro.medium.com/max/1280/1*Ku0ChYxemQyF1hz348ExVA.png" width="650" align="centr"/>

It can sometimes be usefull to make an **Identity Block** method mentioned in the ResNet paper to easily make a 2-layer/3-layer skip. In an identy block, your input dimensions should be te same as the output. However, if this is not the case you can add a conv layer in the skip to compensate this dimension difference. This is called a **Convolution Block**

<img src="https://i.stack.imgur.com/37qzA.png" width="650" align="centr"/>

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.layers import Dropout, Dense, BatchNormalization, Conv2D, Flatten, MaxPool2D, Activation, Add

BATCH_SIZE = 32
VAL_SPLIT = 0.1
EPOCH_NUM = 5
LEARNING_RATE = 0.001
IMG_WIDTH = 180
IMG_HEIGHT = 180


# Defining the input size
x_inputs = tf.keras.Input((IMG_WIDTH, IMG_HEIGHT,3))

X = Conv2D(filters = 32, kernel_size = (3,3), strides = (1,1), padding='same')(x_inputs)
X = BatchNormalization(axis = 3)(X) # Default axis
X = Activation('relu')(X)
X = MaxPool2D(pool_size = (2,2), strides= (2,2), padding='same')(X)

# Save the input value. You'll need this later to add back to the main path. 
X_shortcut = X

X = Conv2D(filters = 32, kernel_size = (3,3), strides = (1,1), padding='same')(X)
X = BatchNormalization(axis = 3)(X) # Default axis
X = Activation('relu')(X)

X = Conv2D(filters=32, kernel_size=(3, 3), strides=(1, 1), padding='same')(X)
X = BatchNormalization(axis=3)(X)
X = Activation('relu')(X)

X = Conv2D(filters=32, kernel_size=(3, 3), strides=(1, 1), padding='same')(X)
X = BatchNormalization(axis=3)(X)

# Shortcut
X = Add()([X, X_shortcut])
X = Activation('relu')(X)

X = layers.MaxPool2D(pool_size=(2,2), strides=(2,2))(X)
X = layers.Flatten()(X)

X = tf.keras.layers.Dense(units=300, activation='relu')(X)
outputs = tf.keras.layers.Dense(units=1, activation='sigmoid')(X)

model = tf.keras.Model(inputs=x_inputs, outputs=outputs)
model.compile(optimizer=tf.optimizers.Adam(LEARNING_RATE) ,loss='binary_crossentropy',metrics=['accuracy'])

In [None]:
model.summary()
tf.keras.utils.plot_model(model,  show_shapes=True,expand_nested=True)

In [None]:
with tf.device('/gpu:0'):# if no GPU is found it will run with the CPU
    history = model.fit(x = training_set, validation_data=validation_set, epochs = EPOCH_NUM, batch_size = BATCH_SIZE)

We have too little data to justify using this big of a model, hence the worse accuracy result (also the model is pretty bad to be honest). Skip connections/residual blocks should only be used in very deep neural networks. I hope the example code could at least help jumpstart your specific problem.

In [None]:
def plot_metrics(history):
  metrics = ['loss', 'accuracy']
  for n, metric in enumerate(metrics):
    try:
      name = metric.replace("_"," ").capitalize()
      plt.plot(history.epoch, history.history[metric], label='Train')
      plt.plot(history.epoch, history.history['val_'+metric], linestyle="--", label='Val')
      plt.xlabel('Epoch')
      plt.ylabel(name)
      if metric == 'loss':
        plt.ylim([0, plt.ylim()[1]])
      elif metric == 'auc':
        plt.ylim([0.8,1])
      else:
        plt.ylim([0,1])
      plt.legend()
      plt.show()  
    except:
      pass
plot_metrics(model.history)

plt.title(label='Zoomed in Accuracy plot')
plt.plot(history.history["accuracy"],label='Train')
plt.plot(history.history["val_accuracy"],linestyle="--",label='Validation')
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.show()

<a id="cell-cnn_incept"></a>
### 3.4.4 Inception Network/Network in Network

Using a 1x1 filter + Activation is usefull to shrink/enlarge the input. This could result in lower computation power. This is used in the inception network to lower the computation power needed before concatenating the different matricis together. The use of a 1x1 Conv is sometimes also referred to a "**Bottleneck**".

<img src="https://i.ytimg.com/vi/KfV8CJh7hE0/maxresdefault.jpg" width="650" align="centr"/>

<img src="https://miro.medium.com/max/1276/1*qVQbA9GYe5VKIQtAQWPM9w.jpeg" width="650" align="centr"/>

The auxiliary classifiers in the GooLeNet help regularize the network and provide extra classifiers which might be usefull in ensemble models.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.layers import Dropout, Dense, BatchNormalization, Conv2D, Flatten, MaxPool2D, Activation, Add, Conv1D

BATCH_SIZE = 32
VAL_SPLIT = 0.1
EPOCH_NUM = 5
LEARNING_RATE = 0.001
IMG_WIDTH = 180
IMG_HEIGHT = 180


# Defining the input size
x_inputs = tf.keras.Input((IMG_WIDTH, IMG_HEIGHT,3))

X = Conv2D(filters = 32, kernel_size = (3,3), strides = (2,2))(x_inputs)
X = BatchNormalization(axis = 3)(X) # Default axis
X = Activation('relu')(X)
X = MaxPool2D(pool_size = (2,2), strides= (2,2))(X)

X1 = Conv1D(filters = 5, kernel_size = 1, padding = 'same')(X)
X1 = Activation('relu')(X1)

X2 = Conv1D(filters = 10, kernel_size = 1, padding = 'same')(X)
X2 = Activation('relu')(X2)

X3 = Conv1D(filters = 5, kernel_size = 1, padding = 'same')(X)
X3 = BatchNormalization(axis = 3)(X3) # Default axis
X3 = Activation('relu')(X3)
X3 = Conv2D(filters = 15, kernel_size = (3,3), strides = (1,1), padding='same')(X3)
X3 = BatchNormalization(axis = 3)(X3) # Default axis
X3 = Activation('relu')(X3)

X4 = Conv1D(filters = 10, kernel_size = 1, padding = 'same')(X)
X4 = Activation('relu')(X4)
X4 = MaxPool2D(pool_size = (1,1), strides= (1,1), padding='same')(X4)

X = layers.Concatenate()([X1, X2, X3, X4])

X = layers.MaxPool2D(pool_size=(2,2), strides=(2,2))(X)
X = layers.Flatten()(X)

X = tf.keras.layers.Dense(units=300, activation='relu')(X)
outputs = tf.keras.layers.Dense(units=1, activation='sigmoid')(X)

model = tf.keras.Model(inputs=x_inputs, outputs=outputs)
model.compile(optimizer=tf.optimizers.Adam(LEARNING_RATE) ,loss='binary_crossentropy',metrics=['accuracy'])

In [None]:
model.summary()
tf.keras.utils.plot_model(model,  show_shapes=True,expand_nested=True)

In [None]:
with tf.device('/gpu:0'):# if no GPU is found it will run with the CPU
    history = model.fit(x = training_set, validation_data=validation_set, epochs = EPOCH_NUM, batch_size = BATCH_SIZE)

In [None]:
def plot_metrics(history):
  metrics = ['loss', 'accuracy']
  for n, metric in enumerate(metrics):
    try:
      name = metric.replace("_"," ").capitalize()
      plt.plot(history.epoch, history.history[metric], label='Train')
      plt.plot(history.epoch, history.history['val_'+metric], linestyle="--", label='Val')
      plt.xlabel('Epoch')
      plt.ylabel(name)
      if metric == 'loss':
        plt.ylim([0, plt.ylim()[1]])
      elif metric == 'auc':
        plt.ylim([0.8,1])
      else:
        plt.ylim([0,1])
      plt.legend()
      plt.show()  
    except:
      pass
plot_metrics(model.history)

plt.title(label='Zoomed in Accuracy plot')
plt.plot(history.history["accuracy"],label='Train')
plt.plot(history.history["val_accuracy"],linestyle="--",label='Validation')
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.show()

<a id="cell-cnn_dsc"></a>
### 3.4.5 Depthwise Separable Convolution

By using the depthwise and pointwise convolution we create the depthwise seprable conv used in MobileNet v1 for low computing applications. In MobileNet v2 a skip connection over the block was added and a pointwise conv before the depthwise conv within the block. (also known as **bottleneck block**) (see picture)

<img src="https://miro.medium.com/max/448/1*fE-1I6D8A4B9QAUgZEbLUA.png" width="650" align="centr"/>

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.layers import Dropout, Dense, BatchNormalization, Conv2D, Flatten, MaxPool2D, Activation, Add, DepthwiseConv2D

BATCH_SIZE = 32
VAL_SPLIT = 0.1
EPOCH_NUM = 5
LEARNING_RATE = 0.001
IMG_WIDTH = 180
IMG_HEIGHT = 180


# Defining the input size
x_inputs = tf.keras.Input((IMG_WIDTH, IMG_HEIGHT,3))

X = Conv2D(filters = 3, kernel_size = (5,5), strides = (1,1))(x_inputs)
X = BatchNormalization(axis = 3)(X) # Default axis
X = Activation('relu')(X)
X = MaxPool2D(pool_size = (2,2), strides= (2,2))(X)

# Save the input value. You'll need this later to add back to the main path. 
X_shortcut = X

X = Conv2D(filters = 18, kernel_size = (1,1), strides = (1,1))(X)
X = BatchNormalization(axis = 3)(X) # Default axis
X = Activation('relu')(X)

X = DepthwiseConv2D(kernel_size=(3, 3), strides=(1, 1), padding='same')(X)
X = BatchNormalization(axis=3)(X)
X = Activation('relu')(X)

X = Conv2D(filters=3, kernel_size=(1, 1), strides=(1, 1), padding='same')(X)
X = BatchNormalization(axis=3)(X)
X = Activation('relu')(X)

# Shortcut
X = Add()([X, X_shortcut])

X = layers.MaxPool2D(pool_size=(2,2), strides=(2,2))(X)
X = layers.Flatten()(X)

X = tf.keras.layers.Dense(units=300, activation='relu')(X)
outputs = tf.keras.layers.Dense(units=1, activation='sigmoid')(X)

model = tf.keras.Model(inputs=x_inputs, outputs=outputs)
model.compile(optimizer=tf.optimizers.Adam(LEARNING_RATE) ,loss='binary_crossentropy',metrics=['accuracy'])

In [None]:
model.summary()
tf.keras.utils.plot_model(model,  show_shapes=True,expand_nested=True)

In [None]:
with tf.device('/gpu:0'):# if no GPU is found it will run with the CPU
    history = model.fit(x = training_set, validation_data=validation_set, epochs = EPOCH_NUM, batch_size = BATCH_SIZE)

In [None]:
def plot_metrics(history):
  metrics = ['loss', 'accuracy']
  for n, metric in enumerate(metrics):
    try:
      name = metric.replace("_"," ").capitalize()
      plt.plot(history.epoch, history.history[metric], label='Train')
      plt.plot(history.epoch, history.history['val_'+metric], linestyle="--", label='Val')
      plt.xlabel('Epoch')
      plt.ylabel(name)
      if metric == 'loss':
        plt.ylim([0, plt.ylim()[1]])
      elif metric == 'auc':
        plt.ylim([0.8,1])
      else:
        plt.ylim([0,1])
      plt.legend()
      plt.show()  
    except:
      pass
plot_metrics(model.history)

plt.title(label='Zoomed in Accuracy plot')
plt.plot(history.history["accuracy"],label='Train')
plt.plot(history.history["val_accuracy"],linestyle="--",label='Validation')
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.show()

<a id="cell-cnn_tran"></a>
### 3.4.6 Transfer Learning

The idea of transfer learning is taking another model and deleting the output layer or several of it's layers/weights to train them again on your dataset and your specific output. This works well in CNN's because grasping features like edges and higher dimensional filters, are generally the same for most image based problems. This causes us to benefit from others well trained models.

<img src="https://miro.medium.com/max/1400/1*f2_PnaPgA9iC5bpQaTroRw.png" width="650" align="centr"/>

In [None]:
from tensorflow.keras.preprocessing import image_dataset_from_directory


BATCH_SIZE = 32
IMG_SIZE = (160, 160)
VAL_SPLIT = 0.2

directory = "../input/alpaca-dataset-small/dataset"

train_dataset = image_dataset_from_directory(directory,
                                             shuffle=True,
                                             batch_size=BATCH_SIZE,
                                             image_size=IMG_SIZE,
                                             validation_split=VAL_SPLIT,
                                             subset='training',
                                             seed=159)
validation_dataset = image_dataset_from_directory(directory,
                                             shuffle=True,
                                             batch_size=BATCH_SIZE,
                                             image_size=IMG_SIZE,
                                             validation_split=VAL_SPLIT,
                                             subset='validation',
                                             seed=159)

In [None]:
import tensorflow as tf
# I will talk more about these 2 lines of code in the Data Engineering part
AUTOTUNE = tf.data.experimental.AUTOTUNE
train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)

We are going to use the MobileNetV2 model to transfer learn from to classify alpaca's.

In [None]:
IMG_SHAPE = IMG_SIZE + (3,)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')

# If you want to see the model structure, uncomment the following 2 lines of code
# base_model.summary()
# tf.keras.utils.plot_model(base_model,  show_shapes=True,expand_nested=True)

Last 2 layers of the model is considered as the top layer, if you had included the Top in previous code as True you can read them with below code. (uncomment if you want to see it)

In [None]:
# nb_layers = len(base_model.layers)
# print(base_model.layers[nb_layers - 2].name)
# print(base_model.layers[nb_layers - 1].name)

Freezing (part of) the model is to prevent it to relearn everything again.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.layers import Dropout, Dense, BatchNormalization, Conv2D, Flatten, MaxPool2D, Activation, Add, GlobalAveragePooling2D
from tensorflow.keras.layers.experimental.preprocessing import RandomFlip, RandomRotation

base_model.trainable=False
# Make the layers trainable from layer 125 onwards
for layer in base_model.layers[125:]:
    layer.trainable = True

BATCH_SIZE = 32
VAL_SPLIT = 0.1
EPOCH_NUM = 10
LEARNING_RATE = 0.001
IMG_SHAPE = IMG_SIZE + (3,)

# create the input layer (Same as the imageNetv2 input size)
inputs = tf.keras.Input(IMG_SHAPE) 

# apply data augmentation to the inputs if you want. Or this can be done outside of the model
data_augmentation = RandomFlip("horizontal")(inputs)
x = RandomRotation(0.15)(data_augmentation)

# data preprocessing using the same weights the model was trained on
x = tf.keras.applications.mobilenet_v2.preprocess_input(x) 

# set training to False to avoid keeping track of statistics in the batch norm layer
x = base_model(x, training=False) 

# using global avg pooling to summarize the info in each channel
x = GlobalAveragePooling2D()(x) 
x = Dropout(0.3)(x)

outputs = Dense(1, activation='sigmoid')(x)

model = tf.keras.Model(inputs, outputs)
model.compile(optimizer=tf.optimizers.Adam(LEARNING_RATE) ,loss='binary_crossentropy',metrics=['accuracy'])

In [None]:
with tf.device('/gpu:0'):# if no GPU is found it will run with the CPU
    history = model.fit(x = train_dataset, validation_data = validation_dataset, epochs = EPOCH_NUM, batch_size = BATCH_SIZE)

In [None]:
def plot_metrics(history):
  metrics = ['loss', 'accuracy']
  for n, metric in enumerate(metrics):
    try:
      name = metric.replace("_"," ").capitalize()
      plt.plot(history.epoch, history.history[metric], label='Train')
      plt.plot(history.epoch, history.history['val_'+metric], linestyle="--", label='Val')
      plt.xlabel('Epoch')
      plt.ylabel(name)
      if metric == 'loss':
        plt.ylim([0, plt.ylim()[1]])
      elif metric == 'auc':
        plt.ylim([0.8,1])
      else:
        plt.ylim([0,1])
      plt.legend()
      plt.show()  
    except:
      pass
plot_metrics(model.history)

plt.title(label='Zoomed in Accuracy plot')
plt.plot(history.history["accuracy"],label='Train')
plt.plot(history.history["val_accuracy"],linestyle="--",label='Validation')
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.show()

In [None]:
# cleaning up RAM
import gc

del train_dataset
del validation_dataset
gc.collect()

<a id="cell-cnn_obj"></a>
## 3.4.7 Object Detection

To understand Object Detection, we first need to refresh a few concepts.

We first want to detect the object and then classify it. This can be done all within one model, by representing our output layer as the following:

**[Pc,bx,by,bh,bw,C1,C2,C3,...]**

Meanings:

* **Pc**(is there an object or not?)
* **bx**(bounding box X value)
* **by**(bounding box Y value)
* **bh**(bounding box height)
* **bw**(bounding box Width)
* **C1**(class 1),**C2**(class 2),**C3**,...

If we want to detect multiple objects at the same time, we can add another of the above group to the output layer untill we are satisfied. For example: to detect 5 objects at the same time, we need 5 [Pc,bx,by,bh,bw,C1,C2,C3,...] concatenated together as output layer.

If we want to detect Landmarks. We remove the bh and bw out of the output and just save the x and y locations of each landmark instead. (example: face recognition)

To detect the position of the object, we don't want to check every pixel for an object. This would be computationally inefficient. The solution to this is splitting the image into smaller parts and only checking those for an object. This has several implementation and solutions.

**Sliding windows**

<img src="https://i.stack.imgur.com/tQZI2.jpg" width="650" align="centr"/>

**YOLO**

(Yolo also makes use of anchor boxes, more on that later)

<img src="https://miro.medium.com/max/1152/1*m8p5lhWdFDdapEFa2zUtIA.jpeg" width="650" align="centr"/>

**Non-max Suppression**
Non-max Suppression looks at all the possible bounding boxes around an object (Intersection over Union smaller than 0.5 for example) and takes the one with the highest probability.

<img src="https://2628535719-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M5-0RGo4dZhdwCIHrC1%2F-M5-0TEuN77zAmTpQjIr%2F-M5-0WMNj3vkyPeFge_4%2FNon-Max%20Suppression.JPG?generation=1586990829149087&alt=media" width="650" align="centr"/>

**Intersection over Union**

<img src="https://media5.datahacker.rs/2018/11/IoU.png" width="650" align="centr"/>

**Anchor Boxes**
(used in YOLO)
By adding different styles of boxes, we can detect multiple objects of different sizes within the "image fragment" we are looking at.

<img src="https://media5.datahacker.rs/2018/11/ancor_1-1.png" width="650" align="centr"/>


<a id="cell-cnn_semantic_seg"></a>
## 3.4.8 Semantic Segmentation

Semantic segmentation is known for predicting classes on images, pixel-by-pixel. (like cars or the road)

<img src="https://miro.medium.com/max/800/1*WKwbz04uLR0ds5M0xiCdjg.jpeg" width="650" align="centr"/>

**Transposed Convolution**:

<img src="https://preview.redd.it/swxoctxkj9341.jpg?width=1210&format=pjpg&auto=webp&s=3c1e3301fae700f730b8ace194f33331ada2bdad" width="650" align="centr"/>

The transposed convolution is used to upscale the dimensions in the **U-Net** architecture.

<img src="https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/u-net-architecture.png" width="650" align="centr"/>

The following code is heavily inspired by coursera deep learning specialization and https://www.kaggle.com/code/oluwatobiojekanmi/carla-image-semantic-segmentation-with-u-net . 

In [None]:
import os
import tensorflow as tf
from sklearn.model_selection import train_test_split

# importing the data

image_path = ["../input/lyft-udacity-challenge/"+"data"+i+"/"+"data"+i+"/CameraRGB/" for i in ['A', 'B', 'C', 'D', 'E']]
mask_path = ["../input/lyft-udacity-challenge/"+"data"+i+"/"+"data"+i+"/CameraSeg/" for i in ['A', 'B', 'C', 'D', 'E']]

def list_image_paths(directory_paths):
    image_paths = []
    for directory in range(len(directory_paths)):
        image_filenames = os.listdir(directory_paths[directory])
        for image_filename in image_filenames:
            image_paths.append(directory_paths[directory] + image_filename)
    return image_paths

image_paths = list_image_paths(image_path) 
mask_paths = list_image_paths(mask_path)
number_of_images, number_of_masks = len(image_paths), len(mask_paths)
print(f"1. There are {number_of_images} images and {number_of_masks} masks in our dataset")

# First split the image paths into training and validation sets
train_image_paths, val_image_paths, train_mask_paths, val_mask_paths = train_test_split(image_paths, mask_paths, train_size=0.8, random_state=0)
# Keep part of the validation set as test set
validation_image_paths, test_image_paths, validation_mask_paths, test_mask_paths = train_test_split(val_image_paths, val_mask_paths, train_size = 0.80, random_state=0)

def read_image(image_path, mask_path):
    image = tf.io.read_file(image_path)
    image = tf.image.decode_png(image, channels=3)
    image = tf.image.convert_image_dtype(image, tf.float32)
    image = tf.image.resize(image, (256, 256), method='nearest')

    mask = tf.io.read_file(mask_path)
    mask = tf.image.decode_png(mask, channels=3)
    mask = tf.math.reduce_max(mask, axis=-1, keepdims=True)
    mask = tf.image.resize(mask, (256, 256), method='nearest')
    
    return image, mask

def data_generator(image_paths, mask_paths, buffer_size, batch_size):   
    image_list = tf.constant(image_paths) 
    mask_list = tf.constant(mask_paths)
    dataset = tf.data.Dataset.from_tensor_slices((image_list, mask_list))
    dataset = dataset.map(read_image, num_parallel_calls=tf.data.AUTOTUNE)
    dataset = dataset.cache().shuffle(buffer_size).batch(batch_size)
    
    return dataset

batch_size = 32
buffer_size = 500

train_dataset = data_generator(train_image_paths, train_mask_paths, buffer_size, batch_size)
validation_dataset = data_generator(validation_image_paths, validation_mask_paths, buffer_size, batch_size)
test_dataset = data_generator(test_image_paths, test_mask_paths, buffer_size, batch_size)

Creating the model by making encoding blocks (downsampling) and decoding blocks (upsampling)

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.layers import Dropout, Dense, BatchNormalization, Conv2D, Flatten, MaxPool2D, Activation, Add, GlobalAveragePooling2D, Conv2DTranspose

def conv_block(inputs=None, filters=32, dropout_prob=0, max_pooling=True):
    """
    Convolutional downsampling block
    
    Arguments:
        inputs -- Input tensor
        n_filters -- Number of filters for the convolutional layers
        dropout_prob -- Dropout probability
        max_pooling -- Use MaxPooling2D to reduce the spatial dimensions of the output volume
    Returns: 
        next_layer, skip_connection --  Next layer and skip connection outputs
    """

    conv = Conv2D(filters, # Number of filters
                  3,# Kernel size   
                  padding='same',
                  kernel_initializer= 'he_normal')(inputs)
    conv = BatchNormalization()(conv)
    conv = Activation("relu")(conv)
    
    conv = Conv2D(filters, # Number of filters
                  3,# Kernel size   
                  padding='same',
                  kernel_initializer= 'he_normal')(conv)
    conv = BatchNormalization()(conv)
    conv = Activation("relu")(conv)
    
    # if dropout_prob > 0 add a dropout layer, with the variable dropout_prob as parameter
    if dropout_prob > 0:
        conv = Dropout(dropout_prob)(conv)
        
    # if max_pooling is True add a MaxPooling2D with 2x2 pool_size
    if max_pooling:
        next_layer = MaxPool2D(2,strides=2)(conv)  
    else:
        next_layer = conv
        
    skip_connection = conv
    
    return next_layer, skip_connection

def upsampling_block(expansive_input, contractive_input, filters=32):
    """
    Convolutional upsampling block
    
    Arguments:
        expansive_input -- Input tensor from previous layer
        contractive_input -- Input tensor from previous skip layer
        n_filters -- Number of filters for the convolutional layers
    Returns: 
        conv -- Tensor output
    """
    
    up = Conv2DTranspose(
                 filters,    # number of filters
                 3,# Kernel size
                 strides=2,
                 padding='same')(expansive_input)
    
    # Merge the previous output and the contractive_input
    merge = layers.Concatenate()([up, contractive_input])
    
    conv = Conv2D(filters, # Number of filters
                  3,# Kernel size   
                  padding='same',
                  kernel_initializer= 'he_normal')(merge)
    
    conv = BatchNormalization()(merge)
    conv = Activation("relu")(conv)
    
    conv = Conv2D(filters, # Number of filters
                  3,# Kernel size   
                  padding='same',
                  kernel_initializer= 'he_normal')(conv)
    conv = BatchNormalization()(merge)
    conv = Activation("relu")(conv)
    
    return conv

def unet_model(input_size, filters=32, n_classes=23):
    """
    Unet model
    
    Arguments:
        input_size -- Input shape 
        n_filters -- Number of filters for the convolutional layers
        n_classes -- Number of output classes
    Returns: 
        model -- tf.keras.Model
    """
    inputs = tf.keras.Input(input_size)
    
    # Contracting Path (encoding)
    cblock1 = conv_block(inputs=inputs, filters=filters, max_pooling=True)
    # Chain the first element of the output of each block to be the input of the next conv_block. 
    # Double the number of filters at each new step
    
    cblock2 = conv_block(inputs=cblock1[0], filters=filters*2, max_pooling=True)
    cblock3 = conv_block(inputs=cblock2[0], filters=filters*4)

    cblock4 = conv_block(inputs=cblock3[0], filters=filters*8, max_pooling=True)

    cblock5 = conv_block(inputs=cblock4[0], filters=filters*16, dropout_prob=0.25, max_pooling=False) 
    
    # Expanding Path (decoding)
    # From here,at each step, use half the number of filters of the previous block 
    ublock6 = upsampling_block(cblock5[0], cblock4[1], filters*8)
    # Chain the output of the previous block as expansive_input and the corresponding contractive block output.
    # Note that you must use the second element of the contractive block i.e before the maxpooling layer. 
    
    ublock7 = upsampling_block(ublock6, cblock3[1], filters*4)
    ublock8 = upsampling_block(ublock7, cblock2[1], filters*2)
    ublock9 = upsampling_block(ublock8, cblock1[1], filters)

    conv10 = Conv2D(filters,
                 3,
                 activation='relu',
                 padding='same',
                 kernel_initializer='he_normal')(ublock9)

    output = Conv2D(n_classes, kernel_size = (1,1), activation='softmax', padding='same')(conv10)
    
    model = tf.keras.Model(inputs=inputs, outputs=output)

    return model

img_height = 256
img_width = 256
num_channels = 3
filters = 32
n_classes = 13
LEARNING_RATE = 0.01

model = unet_model((img_height, img_width, num_channels), filters=32, n_classes=23)
model.compile(optimizer = tf.optimizers.Adam(LEARNING_RATE), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

In [None]:
model.summary()
tf.keras.utils.plot_model(model,  show_shapes=True,expand_nested=True)

In [None]:
BATCH_SIZE = 32
EPOCH_NUM = 7

with tf.device('/gpu:0'):# if no GPU is found it will run with the CPU
    history = model.fit(x = train_dataset, validation_data = validation_dataset, epochs = EPOCH_NUM, batch_size = BATCH_SIZE)

In [None]:
def plot_metrics(history):
  metrics = ['loss', 'accuracy']
  for n, metric in enumerate(metrics):
    try:
      name = metric.replace("_"," ").capitalize()
      plt.plot(history.epoch, history.history[metric], label='Train')
      plt.plot(history.epoch, history.history['val_'+metric], linestyle="--", label='Val')
      plt.xlabel('Epoch')
      plt.ylabel(name)
      if metric == 'loss':
        plt.ylim([0, plt.ylim()[1]])
      elif metric == 'auc':
        plt.ylim([0.8,1])
      else:
        plt.ylim([0,1])
      plt.legend()
      plt.show()  
    except:
      pass
plot_metrics(model.history)

plt.title(label='Zoomed in Accuracy plot')
plt.plot(history.history["accuracy"],label='Train')
plt.plot(history.history["val_accuracy"],linestyle="--",label='Validation')
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.show()

In [None]:
# cleaning up RAM
import gc

del train_dataset
del validation_dataset
del test_dataset
gc.collect()

<a id="cell-cnn_face_rec"></a>
## 3.4.9 Facial Recognition

To recognize someone, we make use of the triplet loss function **Triplet loss** is a loss function for machine learning algorithms where a reference input (called anchor) is compared to a matching input (called positive) and a non-matching input (called negative).

The "distance" between these are being compared and if they are below/above a certain treshhold, they will result as a match or not.

<img src="https://miro.medium.com/max/1302/1*SKWGC3ehCbGCsbJVge6kmg.png" width="650" align="centr"/>

I recommend using **FaceNet** model and transfer learn it for your project to start out.

<a id="cell-cnn_style_transf"></a>
## 3.4.10 Neural Style Transfer

We take an image (content) and merge it together with a specific style (another image).

<img src="https://hackernoon.com/hn-images/1*k5Q_NYr1niC-qjWMr-lUCg.png" width="650" align="centr"/>

We calculate the cost function based on following formula between the style and the content:

<img src="https://assets.website-files.com/5ac6b7f2924c652fd013a891/5ddd8faa5ec068074fc684b0_Tcyrhi9Soqdop2OAXQoz2yg25nCsdrBNAWWTYzdNK83V8srvlMJJv9KXmQR9PC6Pa_ktiwdvdc-CBhRNX_CsaQcl0oKS92_gSjDj0q9xKigaipvuqQHWFAtEE6a3ulK_znVZ_tI.png" width="650" align="centr"/>

<img src="https://miro.medium.com/max/1294/1*ZgW520SZr1QkGoFd3xqYMw.jpeg" width="650" align="centr"/>

**Approach**: We load a pre-trained model like VGG/ImageNet/MobileNet/ResNet remove the top and implement the cost function around the middle layers of the pretrained model. (This works as these pretrained models have well trained layers to capture features of the image) We try to capture both the shallow and deeper features by implementing it around the middle layers of the model.

<a id="cell-rnn"></a>
## 3.5 Recurrent Neural Networks (RNN)

<a id="cell-rnn_data"></a>
### Importing Simple Data

In [None]:
!pip install yfinance
import yfinance as yf

In [None]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

WINDOW_SIZE = 40

aapl = yf.Ticker("AAPL")
X = aapl.history("max").loc[:,['Close']]
y = X[WINDOW_SIZE:]

# Scale the data
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
y_scaled = scaler.transform(y)

# putting WINDOW_SIZE amount of datapoints in a list which rolls over throughout the data. This is used as input for the neural network
X_df = []
for i in range(WINDOW_SIZE, len(X_scaled)):
    X_df.append(X_scaled[i-WINDOW_SIZE:i])

print(np.shape(X_df))
X_train, X_test, y_train, y_test = train_test_split(X_df, y_scaled, test_size = 0.2, random_state = 0, shuffle=False)

<a id="cell-rnn_lstm"></a>
## 3.5.1 Long short-term memory (LSTM)

<img src="https://miro.medium.com/max/674/1*jikKbzFXCq-IYnFZankIMg.png" width="650" align="centr"/>

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.layers import Dropout, Dense, BatchNormalization, LSTM, Activation, Reshape

BATCH_SIZE = 32
VAL_SPLIT = 0.1
EPOCH_NUM = 5
LEARNING_RATE = 0.01
WINDOW_SIZE = 40

# Defining the input size
x_inputs = tf.keras.Input((WINDOW_SIZE, 1))

# If we are working with text, we might use Bidirectional LSTM
x = LSTM(units = 40, return_sequences=False)(x_inputs)
x = Dense(units = 16, activation = 'relu')(x)

output = Dense(units = 1, activation = 'linear')(x)

model = tf.keras.Model(inputs=x_inputs, outputs=output)
model.compile(optimizer=tf.optimizers.Adam(LEARNING_RATE), loss='mse',metrics=['mse'])

In [None]:
model.summary()
tf.keras.utils.plot_model(model,  show_shapes=True,expand_nested=True)

In [None]:
with tf.device('/gpu:0'):# if no GPU is found it will run with the CPU
    history = model.fit(x = np.array(X_train), y = np.array(y_train), validation_split = VAL_SPLIT, epochs = EPOCH_NUM, batch_size = BATCH_SIZE)

In [None]:
def plot_metrics(history):
  metrics = ['loss', 'mse']
  for n, metric in enumerate(metrics):
    try:
      name = metric.replace("_"," ").capitalize()
      plt.plot(history.epoch, history.history[metric], label='Train')
      plt.plot(history.epoch, history.history['val_'+metric], linestyle="--", label='Val')
      plt.xlabel('Epoch')
      plt.ylabel(name)
      if metric == 'loss':
        plt.ylim([0, plt.ylim()[1]])
      elif metric == 'auc':
        plt.ylim([0.8,1])
      else:
        plt.ylim([0,1])
      plt.legend()
      plt.show()  
    except:
      pass
plot_metrics(model.history)

plt.title(label='Zoomed in mse plot')
plt.plot(history.history["mse"],label='Train')
plt.plot(history.history["val_mse"],linestyle="--",label='Validation')
plt.legend()
plt.ylabel('mse')
plt.xlabel('Epoch')
plt.legend()
plt.show()

In [None]:
pred = model.predict(np.array(X_test))

plt.title("Test data compared to prediction")
plt.plot(range(len(pred)), pred, label='Predicted')
plt.plot(range(len(y_test)), y_test, label='Real Price')
plt.legend()
plt.show()

<a id="cell-nlp"></a>
## 3.6 Natural Language Processing (NLP)

<a id="cell-nlp_sent"></a>
## 3.6.1 Sentiment Analysis

Project approach:

* Clean the data by deleting Stopwords, Stemming and put it all into the "corpus"
* Tokenize your corpus with "bag of words" or term frequency-inverse document frequency (TF-IDF)
* Train the model

**Corpus**: A corpus is a collection of authentic text or audio organized into dataset.

**Stemming**: Stemming is the process of reducing a word to one or more stems. A stemming dictionary maps a word to its lemma (stem)

**Tokenization**: Tokenization is the process of representing raw text in smaller units called tokens. These tokens can then be mapped with numbers to further feed to an NLP model

TF-IDF: 

<img src="https://miro.medium.com/max/661/1*3K9GIOVLNu0cRvQap_KaRg.png" width="650" align="centr"/>

As I did not cover the BERT model, which can be used for a wide variety of NLP tasks, I would like to recommend to also check this notebook as a better coverage of the topic. https://www.kaggle.com/code/andreshg/nlp-glove-bert-tf-idf-lstm-explained

In [None]:
# importing the training data
imdb_data = pd.read_csv('../input/imdb-dataset-of-50k-movie-reviews/IMDB Dataset.csv')
print(imdb_data.shape)
imdb_data.head(10)

In [None]:
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from bs4 import BeautifulSoup
import re

imdb_data_len = len(imdb_data.iloc[:, 0])
all_stopwords = stopwords.words('english')
all_stopwords.remove('not')

ps = PorterStemmer()

# new list with cleaned data
corpus = []
for i in range(imdb_data_len): # this can also be written as imdb_data['review']
  review = BeautifulSoup(imdb_data['review'][i], "html.parser").get_text()
  review = re.sub('\[[^]]*\]',' ',review)
  review = re.sub('[^a-zA-z0-9]',' ',review)
  review = review.lower()
  review = review.split()

  review = [ps.stem(word) for word in review if not word in set(all_stopwords)]
  review = ' '.join(review)

  corpus.append(review)
    
print("Done cleaning the data!")

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
#Tfidf vectorizer
tv = TfidfVectorizer(max_features = 1500)
x = tv.fit_transform(corpus).toarray()
y = imdb_data.iloc[:,1].values

In [None]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state=0)

from sklearn.linear_model import LogisticRegression

classifier_LR = LogisticRegression(max_iter = 500,random_state = 0)
classifier_LR.fit(x_train, y_train)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

y_pred_LR = classifier_LR.predict(x_test)
cm = confusion_matrix(y_test, y_pred_LR)
print(cm)
print(accuracy_score(y_test, y_pred_LR))

In [None]:
import gc

del x
del y
gc.collect()

<a id="cell-references"></a>
# 4. References

I would like to give credit to all Dataset Providers and the Udemy/Coursera courses who taught me most of my knowledge. I would also like to thank the Kaggle community to providing Educational projects that helped me better understand the concepts within Artificial Intelligence.