# TensorFlow Developer Exam Overview

The purpose of this notebook is a deep dive into preparing for the TensorFlow Developer Certificate exam. This includes very useful information regarding building and tuning deep learning models, along with a full overview on every section listed in the [TensorFlow Certificate Candidate Handbook](https://www.tensorflow.org/static/extras/cert/TF_Certificate_Candidate_Handbook.pdf).

In addition to specific information that is useful to know when building models, there is also a list of resources & readings where I grab this information from which go deeper into the understanding/details of specific information.

I have listed out external and personal notebooks, github repos, and datasets that would be useful to go over or reference as needed.

## Notebook Information

This section goes over the details, information, Table of Contents, and a list of resources referenced in this notebook and readings that are useful to know.

### Imports

All the packages required to fully run this notebook.

In [31]:
import datetime
import os

import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_hub as hub

### Details

Information pertaining to this specific notebook.

In [2]:
print(f'Notebook last run (end-to-end): {datetime.datetime.now()}')

Notebook last run (end-to-end): 2023-10-03 07:48:55.497323


### Table of Contents

The various sections and links to those section in this notebook.

#### Section-1: Guide to TensorFlow & Modeling

1. TensorFlow Inputs & Outputs Table
   * 1.1. Inputs
   * 1.2. Outputs
2. Overfitting
3. Pooling
   * 3.1. Average Pooling
   * 3.2. Max Pooling
   * 3.3. Global Pooling
4. Learning

#### Section-2 TensorFlow Certificate Candidate Handbook

1. TensorFlow Development Skills
    - 1.1. Know how to program in Python, resolve Python issues, and compile and run Python programs in PyCharm.
    - 1.2. Know how to find information about TensorFlow APIs, including how to find guides and API references on tensorflow.org.
    - 1.3. Know how to debug, investigate, and solve error messages from the TensorFlow API.
    - 1.4. Know how to search beyond tensorflow.org, as and when necessary, to solve your TensorFlow questions.
    - 1.5. Know how to create ML models using TensorFlow where the model size is reasonable for the problem being solved.
    - 1.6. Know how to save ML models and check the model file size.
    - 1.7. Understand the compatibility discrepancies between different versions of TensorFlow.
2. Building and training neural network models using TensorFlow 2.x
    - 2.1. Use TensorFlow 2.x.
    - 2.2. Build, compile and train machine learning (ML) models using TensorFlow.
    - 2.3. Preprocess data to get it ready for use in a model.
    - 2.4. Use models to predict results.
    - 2.5. Build sequential models with multiple layers.
    - 2.6. Build and train models for binary classification.
    - 2.7. Build and train models for multi-class categorization.
    - 2.8. Plot loss and accuracy of a trained model.
    - 2.9. Identify strategies to prevent overfitting, including augmentation and dropout.
    - 2.10. Use pretrained models (transfer learning).
    - 2.11. Extract features from pre-trained models.
    - 2.12. Ensure that inputs to a model are in the correct shape.
    - 2.13. Ensure that you can match test data to the input shape of a neural network.
    - 2.14. Ensure you can match output data of a neural network to specified input shape for test data.
    - 2.15. Understand batch loading of data.
    - 2.16. Use callbacks to trigger the end of training cycles.
    - 2.17. Use datasets from different sources.
    - 2.18. Use datasets in different formats, including json and csv.
    - 2.19. Use datasets from tf.data.datasets.
3. Image Classification
    - 3.1. Define Convolutional neural networks with Conv2D and pooling layers.
    - 3.2. Build and train models to process real-world image datasets.
    - 3.3. Understand how to use convolutions to improve your neural network.
    - 3.4. Use real-world images in different shapes and sizes.
    - 3.5. Use image augmentation to prevent overfitting.
    - 3.6. Use ImageDataGenerator.
    - 3.7. Understand how ImageDataGenerator labels images based on the directory structure.
4. Natural language processing (NLP)
    - 4.1. Build natural language processing systems using TensorFlow.
    - 4.2. Prepare text to use in TensorFlow models.
    - 4.3. Build models that identify the category of a piece of text using binary categorization
    - 4.4. Build models that identify the category of a piece of text using multi-class categorization
    - 4.5. Use word embeddings in your TensorFlow model.
    - 4.6. Use LSTMs in your model to classify text for either binary or multi-class categorization.
    - 4.7. Add RNN and GRU layers to your model.
    - 4.8. Use RNNS, LSTMs, GRUs and CNNs in models that work with text.
    - 4.9. Train LSTMs on existing text to generate text (such as songs and poetry)
5. Time series, sequences and predictions
    - 5.1. Train, tune and use time series, sequence and prediction models.
    - 5.2. Train models to predict values for both univariate and multivariate time series.
    - 5.3. Prepare data for time series learning.
    - 5.4. Understand Mean Absolute Error (MAE) and how it can be used to evaluate accuracy of sequence models.
    - 5.5. Use RNNs and CNNs for time series, sequence and forecasting models.
    - 5.6. Identify when to use trailing versus centred windows.
    - 5.7. Use TensorFlow for forecasting.
    - 5.8. Prepare features and labels.
    - 5.9. Identify and compensate for sequence bias.
    - 5.10. Adjust the learning rate dynamically in time series, sequence and prediction models.

### Resources/Readings

The various resources/articles used to create this notebook, along with various resources/articles that are very useful.

#### Repositories
1. [mrdbourke TensorFlow Course Repository](https://github.com/mrdbourke/tensorflow-deep-learning/tree/main)
2. [kolasniwash Tensorflow Certificate Study Guide](https://github.com/kolasniwash/tensorflow-certification-study-guide)

#### Test Taker Articles
3. [Article by Judy Traj](https://medium.com/@judytraj007/getting-the-google-tensorflow-developer-certification-51cf1e4c2bf9)
4. [Article by R. Barbero](https://medium.com/@rbarbero/tensorflow-certification-tips-d1e0385668c8)
5. [LinkedIn Article by Vivek Bomabtkar](https://www.linkedin.com/pulse/tensorflow-developer-certification-vivek-bombatkar/)

#### Tutorials/Guides
6. [TensorFlow: Guide to Getting Started](https://www.kaggle.com/code/nicholasjhana/tensorflow-guide-to-getting-started/notebook)

#### Notebooks
7. [LLM Notebook in TensorFlow by FOUCARDM on Kaggle](https://www.kaggle.com/code/foucardm/tensorflow-certification-guide-text-data/notebook)
8. [Univariate Time Series Forecasting with TensorFlow](https://www.kaggle.com/code/nicholasjhana/univariate-time-series-forecasting-with-keras/notebook)
9. [Multi-Variate Time Series Forecasting with TensorFlow](https://www.kaggle.com/code/nicholasjhana/multi-variate-time-series-forecasting-tensorflow/notebook)

#### Documentation
10. [Bidirectional RNN Architecture](https://www.geeksforgeeks.org/bidirectional-lstm-in-nlp/)
11. [LSTM Architecture](https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/)
12. [GRU Architecture](https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be)
13. [Metrics in Timeseries Forecasting](https://mlpills.dev/time-series/error-metrics-for-time-series-forecasting/)
14. [Reducing Overfitting](https://www.kdnuggets.com/2019/12/5-techniques-prevent-overfitting-neural-networks.html)
15. [Data Augmentation](https://www.tensorflow.org/tutorials/images/data_augmentation)
16. [Layer Regularizers](https://johnthas.medium.com/regularization-in-tensorflow-using-keras-api-48aba746ae21)
17. [Global Average Pooling](https://iq.opengenus.org/global-average-pooling/)
18. [Introduction to Pooling Layers in CNN](https://towardsai.net/p/l/introduction-to-pooling-layers-in-cnn)
19. [Global Pooling in Convolutional Neural Networks](https://blog.paperspace.com/global-pooling-in-convolutional-neural-networks/)
20. [A Guide to Convolution Arithmetic for Deep Learning](https://arxiv.org/pdf/1603.07285v1.pdf)
21. [Understand Transposed Convolutions and Build your own Transposed Convolution Layer from Scratch](https://towardsdatascience.com/understand-transposed-convolutions-and-build-your-own-transposed-convolution-layer-from-scratch-4f5d97b2967)
22. [Improving Performance of Convolutional Neural Network](https://medium.com/@dipti.rohan.pawar/improving-performance-of-convolutional-neural-network-2ecfe0207de7)
23. [Activation Functions](https://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html)
24. [Univariate Time Series](https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44.htm)
25. [Multivariate Time Series](https://towardsdatascience.com/a-step-by-step-guide-to-feature-engineering-for-multivariate-time-series-162ccf232e2f)
26. [Single vs. Multivariate Time Series](https://www.analyticsvidhya.com/blog/2018/09/multivariate-time-series-guide-forecasting-modeling-python-codes/#Univariate_Vs._Multivariate_Time_Series_Forecasting_Python)
27. [Trailing vs. Centered Moving Average](https://machinelearningmastery.com/moving-average-smoothing-for-time-series-forecasting-python/)
28. [Learning Curves for Diagnosing Machine Learning Performance](https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/)

#### Tools
29. [CNN Explainer](https://poloclub.github.io/cnn-explainer/)
30. [NN Playground](https://playground.tensorflow.org/)

#### Study Guides
31. [TesnsorFlow Exam Study Guide (In spanish)](https://tensorflow.backprop.fr/build-and-train-neural-network-models-using-tensorflow-2-x/use-tensorflow-2-x/)
32. [Zero to Mastery - Book Version of Course](https://dev.mrdbourke.com/tensorflow-deep-learning/)

## Section-1: Guide to TensorFlow & Modeling

TensorFlow and Modeling has alottt of information, but there are certain key areas that are extremely useful to keep in mind. This section will go over the information that is absolutely crucial in creating good models with TensorFlow.

### [1. TensorFlow Inputs & Outputs Table](https://www.kaggle.com/code/nicholasjhana/tensorflow-guide-to-getting-started/notebook#Inputs)

When building models, there are certain things to remember depending on the specific problem being modeled. The following tables summarize the input/output and model configurations for each type of the various problems being solved [4].

#### Inputs

Input shapes depend on the type of problem and network architecture. Input shape can be defined in the first layer of the network either calling the `input_shape` parameter or using the `tf.keras.layers.Input` class.

| Data Type      | Input Shape                                      |
| :--------------| :------------------------------------------------|
| Image          | (image height, image width, number of channels)  |
| Sequence       | (number of sequence steps, number of features)   |
| Structured     | (samples/batch size, features)                   |

#### Outputs
 
| Problem Type	        | Output Neurons        | Target Format   |	Loss Type	                               | Last Neuron Activation |
| :---------------------| :-------------------- | :-------------- | :----------------------------------------- | :--------------------- |
| Binary Classification	| 1	                    | Binary	      | binary_crossentropy                        | sigmoid                |
| Multi Classification	| Number of classes  	| One-Hot Encoded | categorical_crossentropy                   | softmax                |
| Multi Classification	| Number of classes	    | Label Encoded	  | sparse_categorical_crossentropy	           | softmax                |
| Regression	        | Number of predictions	| Numeric	      | Any regression metric: MSE/RMSE/MSLE/Huber | None                   |

### [2. Overfitting](https://www.kdnuggets.com/2019/12/5-techniques-prevent-overfitting-neural-networks.html)

Overfitting occurs when the model's predictions become highly variant. That is we see large variations between predictions in an effort to fit closer to the training set. The opposite can also occur, underfitting where the predictions do not generalize effectively.

There are a few methods to recognize if the model overfit the training data.

* Training loss declines while validation loss is constant or rises.
* A large gap between the training accuracy/ROC AUC/etc and the validation.
* Validation score does not change while validation loss declines.

### [3. Pooling](https://machinelearningmastery.com/pooling-layers-for-convolutional-neural-networks/)

Pooling layers provide an approach to down sampling feature maps by summarizing the presence of features in patches of the feature map. Pooling involves selecting a pooling operation, much like a filter to be applied to feature maps. The size of the pooling operation or filter is smaller than the size of the feature map; specifically, it is almost always 2×2 pixels applied with a stride of 2 pixels.

This means that the pooling layer will always reduce the size of each feature map by a factor of 2, e.g. each dimension is halved, reducing the number of pixels or values in each feature map to one quarter the size. For example, a pooling layer applied to a feature map of 6×6 (36 pixels) will result in an output pooled feature map of 3×3 (9 pixels).

The pooling operation is specified, rather than learned. Two common functions used in the pooling operation are:

* **Average Pooling**: Calculate the average value for each patch on the feature map.
* **Maximum Pooling (or Max Pooling)**: Calculate the maximum value for each patch of the feature map.

In addition to Max and Average pooling, we can also do Global or Normal pooling for both Max and Average pooling ([Global Average Pooling](https://iq.opengenus.org/global-average-pooling/))

#### Avergage Pooling

One of the types of pooling that isn’t used very often is average pooling, instead of taking the max within each filter we take the average. It does the same task as max pooling which is to reduce the dimensionality of images. This results in avergaging out the features rather than making prominant features more prominant.

<img src="screenshots/average_pooling.png" alt="Average Pooling" style="width:600px;"/>

#### Max Pooling

Maximum pooling, or max pooling, is a pooling operation that calculates the maximum, or largest, value in each patch of each feature map.

The results are down sampled or pooled feature maps that highlight the most present feature in the patch, not the average presence of the feature in the case of average pooling. This has been found to work better in practice than average pooling for computer vision tasks like image classification.

&#128273; **NOTE** This is commonly used with computer vision because it simply reduces images to smaller feature maps, highlighting prominant features within the top level image.

<img src="screenshots/max_pooling.png" alt="Average Pooling" style="width:600px;"/>

#### Global Pooling

The pooling technique reduces each feature map channel to a single value. This value depends on the type of global pooling, which can be any of the previously explained pooling types. In other words, applying global pooling is similar to using a filter of the exact dimensions of the feature map.

Global pooling layers often replace the Flatten or Dense output layers.

&#128273; **NOTE** This is commonly used with language models to reduce the multi-dimensional embeddings into a single, flattened out feature vector.

## Section-2: TensorFlow Certificate Candidate Handbook

The TensorFlow certificate exam provides a handbook that gives details on what to know for the exam. This section goes over each section of the Handbook, providing sample information regarding each question.

* [TensorFlow Certificate Candidate Handbook](https://www.tensorflow.org/static/extras/cert/TF_Certificate_Candidate_Handbook.pdf)

### 1. TensorFlow Development Skills
You need to demonstrate that you understand how to develop software programs using TensorFlow and that you can find the information you need to work as an ML practitioner.

#### 1.1. Know how to program in Python, resolve Python issues, and compile and run Python programs in PyCharm.
* [Downloading PyCharm](https://www.jetbrains.com/pycharm/)
* [Learn PyCharm](https://www.jetbrains.com/pycharm/learn/)

#### 1.2. Know how to find information about TensorFlow APIs, including how to find guides and API references on tensorflow.org.
* [API Documentation](https://www.tensorflow.org/api_docs/python/tf)

#### 1.3. Know how to debug, investigate, and solve error messages from the TensorFlow API.
* [Debugging Tips](https://towardsdatascience.com/debugging-in-tensorflow-392b193d0b8)
* [TensorFlow Errors](https://www.tensorflow.org/api_docs/python/tf/errors)

#### 1.4. Know how to search beyond tensorflow.org, as and when necessary, to solve your TensorFlow questions.

#### 1.5. Know how to create ML models using TensorFlow where the model size is reasonable for the problem being solved.

* Sequential Model
* Functional API Model

##### Sequential Model

* `tf.keras.models.Sequential`
* https://www.tensorflow.org/api_docs/python/tf/keras/Sequential

In [3]:
sequential_model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(2,), name='input_layer'),
    tf.keras.layers.Dense(10, activation='relu', name='hidden_layer_1'),
    tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid, name='output'),
], name='sequential_model')

##### Functional API

* `tf.keras.models.Model`
* https://www.tensorflow.org/api_docs/python/tf/keras/Model

In [4]:
inputs = tf.keras.layers.Input(shape=(2,), name='input_layer')
x = tf.keras.layers.Dense(10, activation='relu', name='hidden_layer_1')(inputs)
outputs = tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid, name='output')(x)

functional_model = tf.keras.models.Model(inputs, outputs, name='functional_model')

#### 1.6. Know how to save ML models and check the model file size.

Models can be saved in one of two formats: `h5` or `TF`.

* [Saving and Loading Models Article](https://www.kdnuggets.com/2021/02/saving-loading-models-tensorflow.html)

##### Saving Models

In [5]:
sequential_model.save('./models/sequential_model')

INFO:tensorflow:Assets written to: ./models/sequential_model/assets


INFO:tensorflow:Assets written to: ./models/sequential_model/assets


In [6]:
functional_model.save('./models/functional_model', save_format='h5')



  saving_api.save_model(


##### Loading Models

In [10]:
sequential_model_loaded = tf.keras.models.load_model('./models/sequential_model')





In [13]:
functional_model_loaded = tf.keras.models.load_model('./models/functional_model')





##### Checking File Size

**TODO** Come back to check this

In [23]:
total_size = 0
for dir, _, files in os.walk('./models/sequential_model'):
    total_size += sum([os.path.getsize(f'{dir}/{file}') for file in files])

total_size

50965

In [16]:
os.path.getsize('./models/functional_model')

16016

#### 1.7. Understand the compatibility discrepancies between different versions of TensorFlow.

* https://www.tensorflow.org/guide/migrate/tf1_vs_tf2

### 2. Building and training neural network models using TensorFlow 2.x

You need to understand the foundational principles of machine learning (ML) and deep learning (DL) using TensorFlow 2.x.

#### 2.1. Use TensorFlow 2.x.

* https://blog.tensorflow.org/2019/09/tensorflow-20-is-now-available.html

In [7]:
tf.__version__

'2.13.0'

#### 2.2. Build, compile and train machine learning (ML) models using TensorFlow.

**Building Models**
* `tf.keras.models.Sequential`: Simple way of creating models with a single input, and are sequentially setup layers.
* `tf.keras.models.Model`: More customizable way of creating models, allowing multiple inputs, parallel/wide networks, and deep networks.

**Compiling Models**
* `model.compile`: Used to set how the model should learn through setting the optimizer and loss function.

**Training Models**
* `model.fit`: Used to actually train the models through passing in all the training specific information (data, epochs, validation data, batch size, etc.).

In [17]:
# Build Model (Sequential API)
model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(1,)),
    tf.keras.layers.Dense(5),
])

# Build Model (Functional API)
inputs = tf.keras.layers.Input(shape=(1,))
outputs = tf.keras.layers.Dense(5)(inputs)
model = tf.keras.models.Model(inputs, outputs)

In [18]:
# Compile Model
model.compile(loss='mae',
              optimizer=tf.keras.optimizers.legacy.Adam(),
              metrics=['mae', 'mse'])

In [19]:
# Train Model
history = model.fit(
    x=[1],
    y=[[1,2, 3, 4, 5]],
    epochs=1,
    callbacks=[],
    verbose=0)

#### 2.3. Preprocess data to get it ready for use in a model.

* Categorical Data needs to be converted to a numerical representation
    - Encode data using `sklearn.preprocessing.LabelEncoder`
    - One-hot Encode data using `sklearn.preprocessing.OneHotEncoder` or `tf.one_hot`
* Preprocessing Layers
    - [Preprocessing Layers](https://www.tensorflow.org/guide/keras/preprocessing_layers)
* Normalization:

#### 2.4. Use models to predict results.

* `model.predict`: Used to predict the results given the corresponding inputs.

**NOTE**: the prediction outputs the probabilities in classification models, or the value in regression models. This is problematic for classification models, because classification means that is should be one value of a set of options rather than a probability.

* `tf.argmax`: Grabs the index of the maximum value (used to grab the corresponding classification class).
* `tf.round`: Rounds the probabilities. This doesn't always work for multi-class classification because all the probs can round to 0 if there are many classifications it can fall into.

In [22]:
tf.argmax([0.1, 0.2, 0.6, 0.12, 0.14])  # Highest index is 2

<tf.Tensor: shape=(), dtype=int64, numpy=2>

In [23]:
tf.round([0.1, 0.51, 0.3, 0.6, 0.12, 0.14])

<tf.Tensor: shape=(6,), dtype=float32, numpy=array([0., 1., 0., 1., 0., 0.], dtype=float32)>

#### 2.5. Build sequential models with multiple layers.

* `tf.keras.models.Sequential`

In [24]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(1,), name='Input'),
    tf.keras.layers.Dense(5),
    tf.keras.layers.Dense(5),
    tf.keras.layers.Dense(5),
    tf.keras.layers.Dense(1, name='Output'),
], name='SequentialModel')

#### 2.6. Build and train models for binary classification.

Binary classification is when the output is either 1 or 0 (only two choices).

* Ouput Shape = `(1,)`
* Output Activation Function = `sigmoid`
* Loss Function = `binary_crossentropy`

In [26]:
# Binary Model
binary_classification_model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(3,)),
    tf.keras.layers.Dense(1, activation='sigmoid'),
], name='binary_classification')

# Binary Model Compiling
binary_classification_model.compile(loss='binary_crossentropy',
                                    optimizer=tf.keras.optimizers.legacy.Adam())

#### 2.7. Build and train models for multi-class categorization.

Multi-class categorization is when the output is more than two options. This requires converting each category into a numerical representation.

* Output Shape = `(num_classes,)`
* Output Activation Function = `softmax`
* Loss Function = `categorical_crossentropy` if one-hot encoded format; `spare_categorical_crossentropy` if label encoded format

In [97]:
labels = tf.constant(['dog', 'cat', 'bird', 'dog'])
total_categories = 3  # dog, cat, bird

In [89]:
# Encoding Labels (With sklearn)
label_encoder = LabelEncoder()
encoded_labels = label_encoder.fit_transform(labels)

encoded_labels

array([2, 1, 0, 2])

In [98]:
# One-Hot-Encoding Labels (With TensorFlow)
one_hot_encoded_labels_tf = tf.one_hot(encoded_labels, total_categories)

# One-Hot-Encoding Labels (With sklearn)
one_hot_encoder = OneHotEncoder(sparse_output=False)
one_hot_encoded_labels_sklearn = one_hot_encoder.fit_transform(encoded_labels.reshape(-1, 1))

one_hot_encoded_labels_tf, one_hot_encoded_labels_sklearn

(<tf.Tensor: shape=(4, 3), dtype=float32, numpy=
 array([[0., 0., 1.],
        [0., 1., 0.],
        [1., 0., 0.],
        [0., 0., 1.]], dtype=float32)>,
 array([[0., 0., 1.],
        [0., 1., 0.],
        [1., 0., 0.],
        [0., 0., 1.]]))

In [100]:
# Multi-class Model
multiclass_classification_model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(1,)),
    tf.keras.layers.Dense(total_categories, activation='sigmoid'),
], name='binary_classification')

# Multi-class Model Compiling

# Using with one_hot_encoded_labels_tf when fitting
multiclass_classification_model.compile(loss='categorical_crossentropy',
                                        optimizer=tf.keras.optimizers.legacy.Adam())

# Using with encoded_labels when fitting
multiclass_classification_model.compile(loss='sparse_categorical_crossentropy',
                                        optimizer=tf.keras.optimizers.legacy.Adam())

#### 2.8. Plot loss and accuracy of a trained model.

When fitting a model, a history object is returned with the designated metrics the model was compiled with, along with the loss. This can be plotted using matplotlib.

* [History Object](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/History): The History object is associated as a callback that is automatically applied to the model when fitting and is returned by the fit method.

Some things to consider for the loss curves include:
* Ideally, the loss curve vor the validation loss and training loss will follow each other
* Point where training loss continues to decrease, but validation loss starts to stabilize implies `overfitting` (see section 2.9 below).

**NOTE** I have a personal toolbox, [py-learning-toolbox](https://github.com/bkubick/py-learning-toolbox), that will plot the histroy curve using the function, `ml_toolbox.analysis.history.plot_history`, and the designated metric desired to plot.

![image](screenshots/loss_curve.png)

#### 2.9. Identify strategies to prevent overfitting, including augmentation and dropout.

Overfitting occurs when the model learns the training data too well, that it can't predict test data or validation well. There are a handful of things that can be done to mitigate/prevent overfitting [Prevent Overfitting](https://www.kdnuggets.com/2019/12/5-techniques-prevent-overfitting-neural-networks.html).

* **Decrease complexity of model**: Not much to this, and not much to do to determine best route of simplifying model. Ultimately, just want to reduce the number of trainable parameters in the model.
* **Early Stopping**: A callback that can be implemented to stop the model from training once the validation loss is no longer decreasing.
* **More Data**: An obvious thing to do is add more data, but this isn't always possible.
* **Data Augmentation**: Transforming data to expand the dataset, and give a model various ways of viewing data. Specifically for image classification, popular data augmentation techniques include flipping image, rotating image, zoom image, etc.
* **Regularization**: A technique used to reduce complexity of a model through adding penalty terms to the loss function. This is done by adding regularizers to the corresponding layer through `kernel_regularizer`, `bias_regularizer`, or `activity_regularizer` kwarg.
  
    | L1 Regularization                                | L2 Regularization                         |
    | :-----------------                               | :----------------                         |
    | Penalizes sum of absolute values of weights      | Penalizes sum of square values of weights |
    | Generates model that is simple and interpretable | Able to learn complex data patterns       |
    | Robust to outliers                               | Not robust to outliers                    |

* **Dropout**: A technique that deactivates random neurons in the model while training, forcing each neuron to "learn", in theory, making each neuron more robust.

In [113]:
# Early Stopping Callback
# `patience`: how many epochs without improvement before ending training
# `start_from_epoch`: which epoch to start looking at stopping
early_stopping_callback = tf.keras.callbacks.EarlyStopping(patience=10, start_from_epoch=10)

In [117]:
# Data Augmentation ImageDataGenerator (Not Preferred)
data_augmented_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255,
    rotation_range=0.2,
    width_shift_range=0.2,
    height_shift_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

# Data Augmentation (Preferred Method)
data_augmentation_model = tf.keras.models.Sequential([
    tf.keras.layers.RandomFlip('horizontal'),
    tf.keras.layers.RandomRotation(0.2),
    tf.keras.layers.RandomZoom(0.2),
    tf.keras.layers.RandomHeight(0.2),
    tf.keras.layers.RandomWidth(0.2),
    tf.keras.layers.Rescaling(1./255),
], name='DataAugmentation')

In [107]:
# Regularization
regularization_model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(1,)),
    tf.keras.layers.Dense(5,
                          kernel_regularizer=tf.keras.regularizers.l1(0.01),
                          bias_regularizer=tf.keras.regularizers.l2(0.01),
                          activity_regularizer=tf.keras.regularizers.l1_l2(0.01, 0.01)),
    tf.keras.layers.Dense(1),
])

In [110]:
# Dropout
dropout_model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(1,)),
    tf.keras.layers.Dense(5),
    tf.keras.layers.Dropout(.2),
    tf.keras.layers.Dense(1),
])

#### 2.10. Use pretrained models (transfer learning).

Transfer learning can be done by utilizing state of the art architectures trained on signifcant data through various studies, allowing for powerful models without the need for extreme training.

TensorFlow Hub provides various models that can be used for transfer learning.
* https://www.tensorflow.org/hub

In [120]:
transfer_model = hub.KerasLayer("https://tfhub.dev/google/nnlm-en-dim128/2")

#### 2.11. Extract features from pre-trained models.
Feature extraction in transfer learning is when you take the underlying patterns (also called weights) a pretrained model has learned and adjust its outputs to be more suited to your problem. [[D. Bourke, Zero to Mastery](https://dev.mrdbourke.com/tensorflow-deep-learning/04_transfer_learning_in_tensorflow_part_1_feature_extraction/)]

In short, use transfer learning as a non-trainable base model, then pass the output of that to one or multiple layers dedicated to a specific problem (i.e. image classification by taking a model that knows 1000 foods, and add output layer to limit it to a set of only 10 foods).

In [126]:
base_model = hub.KerasLayer("https://tfhub.dev/google/nnlm-en-dim128/2")
base_model.trainable = False

inputs = tf.keras.layers.Input(shape=(), dtype=tf.string)
x = base_model(inputs)
outputs = tf.keras.layers.Dense(5)(x)  # Feature Extraction Used to dictate the outputs

model = tf.keras.models.Model(inputs, outputs)

#### 2.12. Ensure that inputs to a model are in the correct shape.

Inputs to a model are the specific parameters associated with the problem that are used to predict the output. In TensorFlow, the `shape` keyword to each layer automatically applies batching to the data ahead of time, meaning all that has to be specified is the shape of the inputs.

| Data Type      | Input Shape                                      |
| :--------------| :------------------------------------------------|
| Image          | (image height, image width, number of channels)  |
| Text           | (1,) - the string of the text                    |
| Sequence       | (number of sequence steps, number of features)   |
| Feed Forward   | (features)                   |

In [129]:
image_input = tf.keras.layers.Input(shape=(224, 224, 3))
text_input = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
sequence_input = tf.keras.layers.Input(shape=(7,))  # Window size of 7, 0 additional features
feed_forward = tf.keras.layers.Input(shape=(2,))  # two features used to predict

image_input.shape, text_input.shape, sequence_input.shape, feed_forward.shape  # The None is held for the batch size

(TensorShape([None, 224, 224, 3]),
 TensorShape([None, 1]),
 TensorShape([None, 7]),
 TensorShape([None, 2]))

#### 2.13. Ensure that you can match test data to the input shape of a neural network.

#### 2.14. Ensure you can match output data of a neural network to specified input shape for test data.

#### 2.15. Understand batch loading of data.

Batching is the process of using multiple samples of data at a time while training, to provide memory efficiency, faster training, and improved generalization. This is done in one of two ways in TensorFlow, while fitting, or through the dataset itself.

**NOTE** batch sizes in multiples of 8 typically work better with GPU's

In [133]:
# Fitting with batches
# Build Model (Sequential API)
model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(1,)),
    tf.keras.layers.Dense(5),
])

# Compile Model
model.compile(loss='mae', optimizer=tf.keras.optimizers.legacy.Adam())

# Train Model
model.fit(
    x=[1],
    y=[[1,2, 3, 4, 5]],
    epochs=1,
    batch_size=1,  # Batch Size
    verbose=0)





<keras.src.callbacks.History at 0x155259a00>

In [137]:
# Batching with dataset
dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3, 1, 33, 5, 3])
batched_dataset = dataset.batch(3)
batched_dataset

<_BatchDataset element_spec=TensorSpec(shape=(None,), dtype=tf.int32, name=None)>

#### 2.16. Use callbacks to trigger the end of training cycles.

Callbacks are used when training a model, and are ran at the end of each epoch. They can be used for saving, logging, adjusting learning rates, ending training, etc.

* https://www.tensorflow.org/api_docs/python/tf/keras/callbacks

In [139]:
# Used to stop training once model stops improving
early_stopping = tf.keras.callbacks.EarlyStopping()

# Used to store the model at the end of each epoch as a checkpoint.
# This is useful when needing to stop training early and pickup where you left off.
# Also useful to save the best trained model
checkpoint = tf.keras.callbacks.ModelCheckpoint('./checkpoints')

# Logs the metrics for each epoch to the corresponding CSV.
csv_logger = tf.keras.callbacks.CSVLogger('./logs/history.csv')

# Reduces the learning rate by a factor when the learning stops (hits a plateau)
lr_reducer = tf.keras.callbacks.ReduceLROnPlateau()

# LR Scheduler is used to update the learning rate for each epoch
# This is useful when triying to determine the optimal learning rate
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch, lr: lr * 10 ** 2)

# Custom callback
lambda_callback = tf.keras.callbacks.LambdaCallback(on_batch_begin=lambda batch,logs: print(batch))

#### 2.17. Use datasets from different sources.

Dataset can come from many different areas: tensorflow datasets, github, kaggle, pdfs, documents, raw websites, etc. The key to working with each dataset is to understand what the data is through visualization.

I listed a few ways of downloading datasets listed below for various formats.

* NOTE: for `.txt` file types, I have a file reader utility in my `py-learning-toolbox` repository: `data_toolbox.read_txt_file_from_url` or `data_toolbox.read_txt_file_from_directory`.

In [15]:
# TensorFlow Datasets
ds = tfds.load('mnist', split='train', shuffle_files=True)

2023-10-03 10:31:40.640145: W tensorflow/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata.google.internal".


[1mDownloading and preparing dataset 11.06 MiB (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /Users/brandonkubick/tensorflow_datasets/mnist/3.0.1...[0m


Dl Completed...: 100%|████████████████████████████| 5/5 [00:01<00:00,  4.38 file/s]


[1mDataset mnist downloaded and prepared to /Users/brandonkubick/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.[0m


In [20]:
# TensorFlow Keras
ds = tf.keras.datasets.fashion_mnist.load_data()

In [18]:
# Reading the dataset from the raw csv file on the public github file
insurance = pd.read_csv('https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv')
insurance.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


In [21]:
currency = pd.read_json('https://api.exchangerate-api.com/v4/latest/USD')
currency.head()

Unnamed: 0,provider,WARNING_UPGRADE_TO_V6,terms,base,date,time_last_updated,rates
AED,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,USD,2023-10-03,1696291201,3.67
AFN,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,USD,2023-10-03,1696291201,78.23
ALL,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,USD,2023-10-03,1696291201,100.95
AMD,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,USD,2023-10-03,1696291201,401.9
ANG,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,USD,2023-10-03,1696291201,1.79


#### 2.18. Use datasets in different formats, including json and csv.

See section 2.17. above about how to use datasets from different resources

#### 2.19. Use datasets from tf.data.datasets.

This is actually not the case, tensorflow datasets is the primary source (`tensorflow_datasets`).

* https://www.tensorflow.org/datasets/performances

### 3. Image classification

You need to understand how to build image recognition and object detection models with deep neural networks and convolutional neural networks using TensorFlow 2.x.

#### 3.1. Define Convolutional neural networks with Conv2D and pooling layers.

Convolution layers are the standard layers in image classification which implement the Convolutional Neural Network Architecture.

* `Conv2D` is the standard convolutional layer for image classification problems used to **detect features**. This is a condensed layer that incorporates the standard CNN architecture.
* `Conv2DTranspose` is the standard layer for deconvolution used to **create features** by dispersing the detected features to a broader area.
    * NOTE: This is used typically in autoencoder problems

Pooling layers are typically used in conjunctions with convolutional layers which have many functions, but primarily are used to either smooth out or sharpen prominant features depending on the pooling used.

* `AveragePooling2D` is a less used pooling layer that is used to smooth out features by averaging the patch of features and reducing typically a 2x2 patch to a 1x1 cell.
* `MaxPooling2D` is most commonly used as the pooling layer in CNN architectures due to it sharpening prominant features through taking the max value in a 2x2 patch, and reducing the patch to a 1x1 cell with the max value.
* `GlobalMaxPooling2D`/`GlobalAveragePooling2D` is less likely used for image classification, but is used with CNN architectures and instead of reducing and NxM matrix to a N/2xM/2 matrix, it flattens the matrix into a single vector.

In [3]:
conv2d_layer = tf.keras.layers.Conv2D(filters=5, kernel_size=3, padding='valid')
conv2d_transpose_layer = tf.keras.layers.Conv2DTranspose(filters=5, kernel_size=3, padding='valid')

In [4]:
average_pooling_2d_layer = tf.keras.layers.AveragePooling2D()
max_pooling_2d_layer = tf.keras.layers.MaxPooling2D()
global_average_pooling_2d_layer = tf.keras.layers.GlobalAveragePooling2D()
global_max_pooling_2d_layer = tf.keras.layers.GlobalMaxPooling2D()

#### 3.3. Understand how to use convolutions to improve your neural network.

Convolutional Neural Networks are like all other neural networks when it comes to improving.
1. More Data (if not possible, then Augment data)
2. Tune Hyper Parameters
3. Reduce Over/under fitting

#### 3.4. Use real-world images in different shapes and sizes.

The whole purpose of building out these models is to be able to take a real world image, and predict what it is. To do this however, requires ensuring that the image matches the parameters associated with the model.

For instance, if an image comes in that is of size 1400x1400 pixels, but you model was trained on images with 224x224 pixels, this leads to shape errors when trying to input the data. To fix this, we typically need to preprocess an image before passing it to the model

&#128273; **NOTE**: I have a function in my personal `py-learning-toolbox` repository that loads and preprocesses an image directly (`ml_toolbox.preprocessing.image.load_and_resize_image`.

#### 3.5. Use image augmentation to prevent overfitting.
Data Augmentation is the process of transforming data to expand the dataset, and give a model various ways of viewing data. Specifically for image classification, popular data augmentation techniques include flipping image, rotating image, zoom image, etc.

&#128273; **Note**: Data augmentation is inactive at test time so input images will only be augmented during calls to Model.fit (not Model.evaluate or Model.predict).

In [13]:
# Data Augmentation ImageDataGenerator (Not Preferred)
data_augmented_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255,
    rotation_range=0.2,
    width_shift_range=0.2,
    height_shift_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

# Data Augmentation (Preferred Method)
data_augmentation_model = tf.keras.models.Sequential([
    tf.keras.layers.RandomFlip('horizontal'),
    tf.keras.layers.RandomRotation(0.2),
    tf.keras.layers.RandomZoom(0.2),
    tf.keras.layers.RandomHeight(0.2),
    tf.keras.layers.RandomWidth(0.2),
    tf.keras.layers.Rescaling(1./255),
], name='DataAugmentation')

#### 3.6. Use ImageDataGenerator.

The `ImageDataGenerator` has since been deprecated in favor of the `tf.keras.utils.image_dataset_from_directory`, but is required to learn as part of the exam.

The `ImageDataGenerator` flows in image files from a corresponding directory in batches. The purpose of this is to limit the amount of data in memory at the time of accessing each image, along with providing an easy way to augment the data directly (see section 3.5. above for details on image augmentation.

In [11]:
# Data Augmentation ImageDataGenerator (Not Preferred)
image_data_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255,
    rotation_range=0.2,
    width_shift_range=0.2,
    height_shift_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

data = image_data_generator.flow_from_directory('./data/sample_food/train',
                                                target_size=(224, 224),
                                                batch_size=1,
                                                class_mode='categorical')

Found 4 images belonging to 2 classes.


#### 3.7. Understand how ImageDataGenerator labels images based on the directory structure.

The `ImageDataGenerator` labels images based on the directory structure by the following:

```
├── test
│   ├── cheese
│   │   ├── 000001.png
│   │   ├── ...
│   ├── carrot
│   │   ├── 000001.png
│   │   ├── ...
├── train
│   ├── cheese
│   │   ├── 000002.png
│   │   ├── ...
│   ├── carrot
│   │   ├── 000001.png
│   │   ├── ...

```

It then maps each image name to the parent directory (i.e. cheese or carrot)

In [10]:
# Data Augmentation ImageDataGenerator (Not Preferred)
image_data_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
data = image_data_generator.flow_from_directory('./data/sample_food/train',
                                                target_size=(224, 224),
                                                batch_size=1,
                                                class_mode='categorical')
data.class_indices

Found 4 images belonging to 2 classes.


{'carrot': 0, 'cheese': 1}

### 4. Natural language processing (NLP)

You need to understand how to use neural networks to solve natural language processing problems using TensorFlow.

#### 4.1. Build natural language processing systems using TensorFlow.

NLP is taking language, and using it to predict things such as text generation, text classification, text translation, etc. This falls into a sequence to sequence problem, and RNN or Conv1D networks can be used to build out these models.

* https://www.tensorflow.org/tutorials/text

#### 4.2. Prepare text to use in TensorFlow models.

In addition to removing and formatting any characters you may not want to worry about in your data, text needs to be converted into a numerical format in order for it to be modeled.

* Text Vectorization: Process of converting text into numerical format
* Token Embedding: Process of vectorizing individual tokens into learnable parameters. This is typically used in Language Processing rather than One-hot-encoded text variables because it provides learnable parameters, and due to the vast number of words, we won't be working with huge OneHotEncoder matrices.
    * i.e. Embedding the word family might look something like:
        * `family` --> `text_vectorizer('family')` --> `127` --> `token_embedding(127)` --> `[.853123, .34123, .049123, ....]`
    * The above example converts the word `family` into an n length vector of numerical entries.
* One Hot Encoder: Process of converting text values into numerical values be specifying a new iput variable for each unique word. Due to there being a new variable for each unique word, One Hot Encoders are not often used with LLM aside from determining the output variable because it can results in huge amounts of variables.

##### Text Vectorization

In [8]:
# Test Data
sentences = ['You have done well my padawan.', "You're a wizard Harry.", "Get to the chopper, the building is on fire!"]
sample_sentence = ['Harry is a padawan wizard.']

In [21]:
# Text Vectorization
word_vectorizer = tf.keras.layers.TextVectorization(max_tokens=1000,
                                                    output_sequence_length=12,
                                                    standardize='lower_and_strip_punctuation')
word_vectorizer.adapt(sentences)

word_vectorizer(sample_sentence), word_vectorizer.get_vocabulary()

(<tf.Tensor: shape=(1, 12), dtype=int64, numpy=array([[13, 11, 19,  8,  5,  0,  0,  0,  0,  0,  0,  0]])>,
 ['',
  '[UNK]',
  'the',
  'youre',
  'you',
  'wizard',
  'well',
  'to',
  'padawan',
  'on',
  'my',
  'is',
  'have',
  'harry',
  'get',
  'fire',
  'done',
  'chopper',
  'building',
  'a'])

In [22]:
char_vectorizer = tf.keras.layers.TextVectorization(max_tokens=1000,
                                                    output_sequence_length=12,
                                                    standardize='lower_and_strip_punctuation',
                                                    split='character')
char_vectorizer.adapt(sentences)

char_vectorizer(sample_sentence), char_vectorizer.get_vocabulary()

(<tf.Tensor: shape=(1, 12), dtype=int64, numpy=array([[ 8,  4,  5,  5,  9,  2,  7, 20,  2,  4,  2, 15]])>,
 ['',
  '[UNK]',
  ' ',
  'e',
  'a',
  'r',
  'o',
  'i',
  'h',
  'y',
  't',
  'n',
  'd',
  'w',
  'u',
  'p',
  'l',
  'g',
  'z',
  'v',
  's',
  'm',
  'f',
  'c',
  'b'])

##### Token Embeddings

In [12]:
token_embedding = tf.keras.layers.Embedding(
    input_dim=len(word_vectorizer.get_vocabulary()),
    output_dim=5,
    mask_zero=True)

token_embedding(word_vectorizer(sample_sentence))

<tf.Tensor: shape=(1, 5, 5), dtype=float32, numpy=
array([[[ 0.04119045, -0.04244716,  0.0028836 , -0.04536789,
          0.01513002],
        [-0.03763362,  0.01767519, -0.03392949, -0.0370407 ,
         -0.03699964],
        [-0.00488741,  0.00283382,  0.04399434,  0.04421136,
         -0.01885287],
        [-0.03914008,  0.02328603, -0.02744038,  0.02392345,
          0.04472652],
        [ 0.03515441,  0.04961933, -0.02426612, -0.02699971,
          0.01845104]]], dtype=float32)>

In [13]:
char_embedding = tf.keras.layers.Embedding(
    input_dim=len(char_vectorizer.get_vocabulary()),
    output_dim=5,
    mask_zero=True)

char_embedding(char_vectorizer(sample_sentence))

<tf.Tensor: shape=(1, 25, 5), dtype=float32, numpy=
array([[[-0.04124928, -0.00360743,  0.04356502, -0.02659626,
          0.03048421],
        [-0.04939149,  0.01697184, -0.00858854,  0.0381493 ,
         -0.03224466],
        [-0.03248235,  0.02507995,  0.03194323,  0.03108955,
         -0.00893488],
        [-0.03248235,  0.02507995,  0.03194323,  0.03108955,
         -0.00893488],
        [ 0.00541637,  0.01810095,  0.03862338, -0.00354055,
         -0.0013557 ],
        [-0.00377619, -0.03032162,  0.01447741,  0.03560669,
         -0.02133771],
        [-0.00211342, -0.01595372, -0.03345835, -0.03977276,
         -0.02123622],
        [ 0.03083609,  0.02962705, -0.01760997,  0.0243361 ,
          0.04298132],
        [-0.00377619, -0.03032162,  0.01447741,  0.03560669,
         -0.02133771],
        [-0.04939149,  0.01697184, -0.00858854,  0.0381493 ,
         -0.03224466],
        [-0.00377619, -0.03032162,  0.01447741,  0.03560669,
         -0.02133771],
        [ 0.04196498, -0

#### 4.3. Build models that identify the category of a piece of text using binary categorization

* Binary Classification implies that the output is either 1 or 0.
* Text implies text vectorization and token embedding layers.

In [26]:
# Test Data
sentences = ['You have done well my padawan.', "You're a wizard Harry.", "Get to the chopper, the building is on fire!"]

In [25]:
# Word Vectorizer
word_vectorizer = tf.keras.layers.TextVectorization(max_tokens=1000,
                                                    output_sequence_length=12,
                                                    standardize='lower_and_strip_punctuation')
word_vectorizer.adapt(sentences)

# Word Embedding
token_embedding = tf.keras.layers.Embedding(
    input_dim=len(word_vectorizer.get_vocabulary()),
    output_dim=5,
    mask_zero=True)

# Model with Embedings
inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
x = word_vectorizer(inputs)
x = token_embedding(x)
x = tf.keras.layers.LSTM(8)(x)
outputs = tf.keras.layers.Dense(1, name='binary_output')(x)

model = tf.keras.models.Model(inputs, outputs)
model.summary()

Model: "model_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_8 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_15 (Tex  (None, 12)                0         
 tVectorization)                                                 
                                                                 
 embedding_9 (Embedding)     (None, 12, 5)             100       
                                                                 
 lstm_7 (LSTM)               (None, 8)                 448       
                                                                 
 binary_output (Dense)       (None, 1)                 9         
                                                                 
Total params: 557 (2.18 KB)
Trainable params: 557 (2.18 KB)
Non-trainable params: 0 (0.00 Byte)
_____________________________

#### 4.4. Build models that identify the category of a piece of text using multi-class categorization

* Multi-class Classification implies that the output is one output of multiple possibilities.
    * This means the output needs to be one-hot-encoded (`categorical_crossentropy` as loss function).
* Text implies text vectorization and token embedding layers.

In [27]:
sentences = ['You have done well my padawan.', "You're a wizard Harry.", "Get to the chopper, the building is on fire!"]
output = ['Star Wars', 'Harry Potter', 'Arnold']

In [41]:
# One Hot Encoded Output (Used in Fit function)
one_hot_encoder = OneHotEncoder(sparse_output=False)
encoded_labels = one_hot_encoder.fit_transform(np.array(output).reshape((-1,1)))

# Word Vectorizer
word_vectorizer = tf.keras.layers.TextVectorization(max_tokens=1000,
                                                    output_sequence_length=12,
                                                    standardize='lower_and_strip_punctuation')
word_vectorizer.adapt(sentences)

# Word Embedding
token_embedding = tf.keras.layers.Embedding(
    input_dim=len(word_vectorizer.get_vocabulary()),
    output_dim=5,
    mask_zero=True)

# Model with Embedings
inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
x = word_vectorizer(inputs)
x = token_embedding(x)
x = tf.keras.layers.LSTM(8)(x)
outputs = tf.keras.layers.Dense(3, name='binary_output')(x)  # There are 3 categories

model = tf.keras.models.Model(inputs, outputs)
model.summary()

Model: "model_12"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_13 (InputLayer)       [(None, 1)]               0         
                                                                 
 text_vectorization_20 (Tex  (None, 12)                0         
 tVectorization)                                                 
                                                                 
 embedding_14 (Embedding)    (None, 12, 5)             100       
                                                                 
 lstm_12 (LSTM)              (None, 8)                 448       
                                                                 
 binary_output (Dense)       (None, 3)                 27        
                                                                 
Total params: 575 (2.25 KB)
Trainable params: 575 (2.25 KB)
Non-trainable params: 0 (0.00 Byte)
____________________________

#### 4.5. Use word embeddings in your TensorFlow model.

See section 4.2 going into embeddings and why they are needed. See below for how to implement them into a model.

In [42]:
# Test Data
sentences = ['You have done well my padawan.', "You're a wizard Harry.", "Get to the chopper, the building is on fire!"]

In [20]:
# Word Vectorizer
word_vectorizer = tf.keras.layers.TextVectorization(max_tokens=1000,
                                                    output_sequence_length=12,
                                                    standardize='lower_and_strip_punctuation')
word_vectorizer.adapt(sentences)

# Word Embedding
token_embedding = tf.keras.layers.Embedding(
    input_dim=len(word_vectorizer.get_vocabulary()),
    output_dim=5,
    mask_zero=True)

# Model with Embedings
inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
x = word_vectorizer(inputs)
x = token_embedding(x)
x = tf.keras.layers.LSTM(8)(x)
outputs = tf.keras.layers.Dense(4)(x)

model = tf.keras.models.Model(inputs, outputs)
model.summary()

Model: "model_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_6 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_11 (Tex  (None, 12)                0         
 tVectorization)                                                 
                                                                 
 embedding_7 (Embedding)     (None, 12, 5)             100       
                                                                 
 lstm_5 (LSTM)               (None, 8)                 448       
                                                                 
 dense_5 (Dense)             (None, 4)                 36        
                                                                 
Total params: 584 (2.28 KB)
Trainable params: 584 (2.28 KB)
Non-trainable params: 0 (0.00 Byte)
_____________________________

#### 4.6. Use LSTMs in your model to classify text for either binary or multi-class categorization.

See sectins 4.3-4.5 for examples of using LSTM's as well.

&#128273; **NOTE**:  LSTM and GRU layers work best with the `tanh` activation function (default for TensorFlow).

In [None]:
# Test Data
sentences = ['You have done well my padawan.', "You're a wizard Harry.", "Get to the chopper, the building is on fire!"]

##### Single LSTM Layer

In [None]:
# Word Vectorizer
word_vectorizer = tf.keras.layers.TextVectorization(max_tokens=1000,
                                                    output_sequence_length=12,
                                                    standardize='lower_and_strip_punctuation')
word_vectorizer.adapt(sentences)

# Word Embedding
token_embedding = tf.keras.layers.Embedding(
    input_dim=len(word_vectorizer.get_vocabulary()),
    output_dim=5,
    mask_zero=True)

# Model with Embedings
inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
x = word_vectorizer(inputs)
x = token_embedding(x)
x = tf.keras.layers.LSTM(8)(x)
outputs = tf.keras.layers.Dense(4)(x)

model = tf.keras.models.Model(inputs, outputs)
model.summary()

##### Stacked LSTM Layers

* `return_sequences=True` needs to be set for every stacked LSTM layer, aside from the last LSTM layer.

&#128273; **NOTE**: Simple Networks, a single LSTM is usually sufficient, and 2 LSTM layers can typically handle more complex patterns. In most cases, this is all you will need. More than 2 LSTM layers can improve models, but at the cost of resources.

In [45]:
# Word Vectorizer
word_vectorizer = tf.keras.layers.TextVectorization(max_tokens=1000,
                                                    output_sequence_length=12,
                                                    standardize='lower_and_strip_punctuation')
word_vectorizer.adapt(sentences)

# Word Embedding
token_embedding = tf.keras.layers.Embedding(
    input_dim=len(word_vectorizer.get_vocabulary()),
    output_dim=5,
    mask_zero=True)

# Model with Embedings
inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
x = word_vectorizer(inputs)
x = token_embedding(x)
x = tf.keras.layers.LSTM(8, return_sequences=True)(x)
x = tf.keras.layers.LSTM(8, return_sequences=True)(x)
x = tf.keras.layers.LSTM(8, return_sequences=True)(x)
x = tf.keras.layers.LSTM(8)(x)
outputs = tf.keras.layers.Dense(4)(x)

model = tf.keras.models.Model(inputs, outputs)
model.summary()

Model: "model_14"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_16 (InputLayer)       [(None, 1)]               0         
                                                                 
 text_vectorization_23 (Tex  (None, 12)                0         
 tVectorization)                                                 
                                                                 
 embedding_17 (Embedding)    (None, 12, 5)             100       
                                                                 
 lstm_22 (LSTM)              (None, 12, 8)             448       
                                                                 
 lstm_23 (LSTM)              (None, 12, 8)             544       
                                                                 
 lstm_24 (LSTM)              (None, 12, 8)             544       
                                                          

#### 4.7. Add RNN and GRU layers to your model.

See section 4.6 for adding LSTM layers.

In [46]:
# Test Data
sentences = ['You have done well my padawan.', "You're a wizard Harry.", "Get to the chopper, the building is on fire!"]

##### LSTM Layer

In [47]:
# Word Vectorizer
word_vectorizer = tf.keras.layers.TextVectorization(max_tokens=1000,
                                                    output_sequence_length=12,
                                                    standardize='lower_and_strip_punctuation')
word_vectorizer.adapt(sentences)

# Word Embedding
token_embedding = tf.keras.layers.Embedding(
    input_dim=len(word_vectorizer.get_vocabulary()),
    output_dim=5,
    mask_zero=True)

# Model with Embedings
inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
x = word_vectorizer(inputs)
x = token_embedding(x)
x = tf.keras.layers.LSTM(8)(x)
outputs = tf.keras.layers.Dense(1, name='binary_output')(x)

model = tf.keras.models.Model(inputs, outputs)
model.summary()

Model: "model_15"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_17 (InputLayer)       [(None, 1)]               0         
                                                                 
 text_vectorization_24 (Tex  (None, 12)                0         
 tVectorization)                                                 
                                                                 
 embedding_18 (Embedding)    (None, 12, 5)             100       
                                                                 
 lstm_26 (LSTM)              (None, 8)                 448       
                                                                 
 binary_output (Dense)       (None, 1)                 9         
                                                                 
Total params: 557 (2.18 KB)
Trainable params: 557 (2.18 KB)
Non-trainable params: 0 (0.00 Byte)
____________________________

##### GRU Layer

In [48]:
# Word Vectorizer
word_vectorizer = tf.keras.layers.TextVectorization(max_tokens=1000,
                                                    output_sequence_length=12,
                                                    standardize='lower_and_strip_punctuation')
word_vectorizer.adapt(sentences)

# Word Embedding
token_embedding = tf.keras.layers.Embedding(
    input_dim=len(word_vectorizer.get_vocabulary()),
    output_dim=5,
    mask_zero=True)

# Model with Embedings
inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
x = word_vectorizer(inputs)
x = token_embedding(x)
x = tf.keras.layers.GRU(8)(x)
outputs = tf.keras.layers.Dense(1, name='binary_output')(x)

model = tf.keras.models.Model(inputs, outputs)
model.summary()

Model: "model_16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_18 (InputLayer)       [(None, 1)]               0         
                                                                 
 text_vectorization_25 (Tex  (None, 12)                0         
 tVectorization)                                                 
                                                                 
 embedding_19 (Embedding)    (None, 12, 5)             100       
                                                                 
 gru (GRU)                   (None, 8)                 360       
                                                                 
 binary_output (Dense)       (None, 1)                 9         
                                                                 
Total params: 469 (1.83 KB)
Trainable params: 469 (1.83 KB)
Non-trainable params: 0 (0.00 Byte)
____________________________

##### Bidirectional LSTM Layer

In [49]:
# Word Vectorizer
word_vectorizer = tf.keras.layers.TextVectorization(max_tokens=1000,
                                                    output_sequence_length=12,
                                                    standardize='lower_and_strip_punctuation')
word_vectorizer.adapt(sentences)

# Word Embedding
token_embedding = tf.keras.layers.Embedding(
    input_dim=len(word_vectorizer.get_vocabulary()),
    output_dim=5,
    mask_zero=True)

# Model with Embedings
inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
x = word_vectorizer(inputs)
x = token_embedding(x)
x = tf.keras.layers.Bidirectional(
    tf.keras.layers.LSTM(8),
)(x)
outputs = tf.keras.layers.Dense(1, name='binary_output')(x)

model = tf.keras.models.Model(inputs, outputs)
model.summary()

Model: "model_17"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_19 (InputLayer)       [(None, 1)]               0         
                                                                 
 text_vectorization_26 (Tex  (None, 12)                0         
 tVectorization)                                                 
                                                                 
 embedding_20 (Embedding)    (None, 12, 5)             100       
                                                                 
 bidirectional (Bidirection  (None, 16)                896       
 al)                                                             
                                                                 
 binary_output (Dense)       (None, 1)                 17        
                                                                 
Total params: 1013 (3.96 KB)
Trainable params: 1013 (3.96 

##### RNN Layer

An `RNN` layer in TensorFlow must take in a layer that inherits an RNN Cell instance. For example, the following are provided `Cell` layers:
* `tf.keras.layers.LSTMCell`
* `tf.keras.layers.GRUCell`
* `tf.keras.layers.SimpleRNNCell`

In order to stack `Cell` instances when building an RNN, you must use the `StackedRNNCells` class with a list of `Cell` instances as it's argument.
* `tf.keras.layers.StackedRNNCells`

* [TensorFlow Documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RNN)

In [51]:
# Word Vectorizer
word_vectorizer = tf.keras.layers.TextVectorization(max_tokens=1000,
                                                    output_sequence_length=12,
                                                    standardize='lower_and_strip_punctuation')
word_vectorizer.adapt(sentences)

# Word Embedding
token_embedding = tf.keras.layers.Embedding(
    input_dim=len(word_vectorizer.get_vocabulary()),
    output_dim=5,
    mask_zero=True)

# Model with Embedings
inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
x = word_vectorizer(inputs)
x = token_embedding(x)
x = tf.keras.layers.RNN(tf.keras.layers.SimpleRNNCell(5))(x)
outputs = tf.keras.layers.Dense(1, name='binary_output')(x)

model = tf.keras.models.Model(inputs, outputs)
model.summary()

Model: "model_18"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_21 (InputLayer)       [(None, 1)]               0         
                                                                 
 text_vectorization_28 (Tex  (None, 12)                0         
 tVectorization)                                                 
                                                                 
 embedding_22 (Embedding)    (None, 12, 5)             100       
                                                                 
 rnn (RNN)                   (None, 5)                 55        
                                                                 
 binary_output (Dense)       (None, 1)                 6         
                                                                 
Total params: 161 (644.00 Byte)
Trainable params: 161 (644.00 Byte)
Non-trainable params: 0 (0.00 Byte)
____________________

In [53]:
# Word Vectorizer
word_vectorizer = tf.keras.layers.TextVectorization(max_tokens=1000,
                                                    output_sequence_length=12,
                                                    standardize='lower_and_strip_punctuation')
word_vectorizer.adapt(sentences)

# Word Embedding
token_embedding = tf.keras.layers.Embedding(
    input_dim=len(word_vectorizer.get_vocabulary()),
    output_dim=5,
    mask_zero=True)

# Model with Embedings
inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
x = word_vectorizer(inputs)
x = token_embedding(x)

rnn_cells = [tf.keras.layers.LSTMCell(5) for i in range(3)]
stacked_rnn_cells = tf.keras.layers.StackedRNNCells(rnn_cells)
x = tf.keras.layers.RNN(stacked_rnn_cells)(x)
outputs = tf.keras.layers.Dense(1, name='binary_output')(x)

model = tf.keras.models.Model(inputs, outputs)
model.summary()

Model: "model_19"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_23 (InputLayer)       [(None, 1)]               0         
                                                                 
 text_vectorization_30 (Tex  (None, 12)                0         
 tVectorization)                                                 
                                                                 
 embedding_24 (Embedding)    (None, 12, 5)             100       
                                                                 
 rnn_1 (RNN)                 (None, 5)                 660       
                                                                 
 binary_output (Dense)       (None, 1)                 6         
                                                                 
Total params: 766 (2.99 KB)
Trainable params: 766 (2.99 KB)
Non-trainable params: 0 (0.00 Byte)
____________________________

#### 4.8. Use RNNS, LSTMs, GRUs and CNNs in models that work with text.

See section 4.7 example for how to build models that work with text.

#### 4.9. Train LSTMs on existing text to generate text (such as songs and poetry)

I did a separate notebook for this.
* [My Poetry Generation Notebook](https://github.com/bkubick/deep-learning-development/blob/main/projects/poetry_generator/poetry_generator.ipynb)

### 5. Time series, sequences and predictions

You need to understand how to solve time series and forecasting problems in TensorFlow.

#### 5.1. Train, tune and use time series, sequence and prediction models.

Sequence and time series models utilize RNN or CNN architectures, because the order of which the data appears matters.

* [Train, tune, and use time series](https://tensorflow.backprop.fr/time-series-sequences-and-predictions/train-tune-and-use-timeseries-sequence-and-prediction-models/)

#### 5.2. Train models to predict values for both univariate and multivariate time series.

* **Univariate**: The term "univariate time series" refers to a time series that consists of single (scalar) observations recorded sequentially over equal time increments.
    * This is the time `windows` as done in the Bit Predict notebook, where we took the previous 7 days closing prices, and used it to predict the next day's closing price.
    * `[1, 2, 3, 4, 5]` --> `[6]`
* **Multivariate**: A Multivariate time series has more than one time series variable. Each variable depends not only on its past values but also has some dependency on other variables. This dependency is used for forecasting future values.
    * This is the experiment in Bit Predict, where we used both the time `windows` of closing costs, as well as the block reward for the given day.
    * `[1, 2, 3, 4, 5, block_reward]` --> `[6]`

#### 5.3. Prepare data for time series learning.

Preparing data for time series learning consists of windowing the series for a given window size. For instance:
* `window([1, 2, 3, 4, 5], size=3, shift=1, stride=1)` --> `[[1, 2, 3], [2, 3, 4], [3, 4, 5]]`

In [55]:
dataset = tf.data.Dataset.range(7).window(3, shift=1, drop_remainder=True, stride=1)

for window in dataset:
    print(list(window.as_numpy_iterator()))

[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
[4, 5, 6]


2023-10-04 10:51:07.694391: W tensorflow/core/framework/dataset.cc:956] Input of Window will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.


#### 5.4. Understand Mean Absolute Error (MAE) and how it can be used to evaluate accuracy of sequence models.

MAE is calculated by taking the absolute difference between the predicted and actual values ​​and averaging them. [10]

* **Advantages** MAE is simple and easy to interpret as the mean error is expressed in the same units as the original data. It is less sensitive to outliers compared to other error metrics such as mean squared error (MSE).
* **Disadvantages** MAE does not distinguish between overestimation and underestimation and does not provide information about the direction or magnitude of individual errors. In addition, depending on your particular problem it may be seen as a disadvantage that it does not penalize wrong predictions as much as MSE.

In [6]:
tf.metrics.mae([1, 2, 3, 4], [2, 3, 4, 5])

<tf.Tensor: shape=(), dtype=int32, numpy=1>

#### 5.5. Use RNNs and CNNs for time series, sequence and forecasting models.

* [LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM): Long Short Term Memory RNN [8]
* [GRU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRU): Gated Recurrent Unit RNN. The purpose of this architecture is to solve the vanishing gradient problem. [9]
* [Bidirectional](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Bidirectional): Layer that process the sequence input in both directions using the corresponding RNN layer as a parameter (LSTM, GRU, etc.). [7]

In [5]:
# Simple LSTM Layer
single_rnn_layer = tf.keras.layers.LSTM(12)
stacked_rnn_layer = tf.keras.layers.LSTM(12, return_sequences=True)

gru_layer = tf.keras.layers.GRU(12)

bidirectional_lstm_layer = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(12))
stacked_bidirectional_lstm_layer = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(12, return_sequences=True))

In [54]:
# Conv Layers
conv_layer = tf.keras.layers.Conv1D(filters=16, kernel_size=3)
pooling_layer = tf.keras.layers.GlobalMaxPooling1D()

#### 5.6. Identify when to use trailing versus centered windows.

* **Centered Moving Average** looks at the average around the given time, and of such will require knowledge of future values.
    * center_ma(t) = mean(obs(t-1), obs(t), obs(t+1))
    * A center moving average can be used as a general method to remove trend and seasonal components from a time series, a method that we often cannot use when forecasting.
* **Trailing Moving Average** only uses historical observations and is used on time series forecasting.
    * trail_ma(t) = mean(obs(t-2), obs(t-1), obs(t))

#### 5.7. Use TensorFlow for forecasting.

Forecasting is the process of using the predicted value to forecast the next value for a given horizon. 

* NOTE: I have a custom utility to help with this in my `py-learning-toolbox` repo, `ml_toolbox.preprocessing.timeseries.make_future_forecasts` (I plan to move this later, but for now, it will stay here until I finish taking the exam).

#### 5.8. Prepare features and labels.

Windowing is the main aspect of time series analysis. To extract features and labels for a deep learning model with windowed data, you need to break up the window and horizon.
* Window: the number of features used
* Horizon: the number of values to predict into the future

```
[0,1,2,3] --> [0,1,2] [3]
[1,2,3,4] --> [1,2,3] [4]
[2,3,4,5] --> [2,3,4] [5]
[3,4,5,6] --> [3,4,5] [6]
```

&#128273; **NOTE**: I have a function in my `py-deep-learning` repository that will sequence, window, and split data into features and labels as shown in the example above
* `ml_toolbox.preprocessing.timeseries.make_windowed_dataset`
* **[Alternatively]** `ml_toolbox.preprocessing.timeseries.make_windows` with `ml_toolbox.preprocessing.timeseries.get_labels`

#### 5.9. Identify and compensate for sequence bias.

Sequence bias is when the order of things can impact the selection of things. 

TODO: Need to figure this out.

#### 5.10. Adjust the learning rate dynamically in time series, sequence and prediction models.

This can be done using the learning rate callbacks as mentioned in section 2.16.

* `ReduceLROnPlateau` callback
* `LearningRateScheduler` callback

In [57]:
# Reduces the learning rate by a factor when the learning stops (hits a plateau)
lr_reducer = tf.keras.callbacks.ReduceLROnPlateau()

# LR Scheduler is used to update the learning rate for each epoch
# This is useful when triying to determine the optimal learning rate
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch, lr: lr * 10 ** 2)