# TensorFlow Developer Professional Exam Sourcebook

The following notebook is a compilation of notes about different terms, topics, processes, and techniques related to the skills and knowledge that are tested on in the TensorFlow Developer Exam. It will contain plain text explanation, pseudo-code, and code snippets throughout depending on the context.

The contents are roughly as follows:

***Processes***

*(1) Basic Neural Net Regression*

*(2) Basic Neural Net Classification*

*(3) Convolutional Neural Networks*

*(4) Transfer Learning - Feature Extraction*

*(5) Transfer Learning - Fine-Tuning*

*(6) Natural Language Processing*

*(7) Time Series - Compiled Process (from 2 examples)*

***Non-Problem-Specific Tasks***
- Save models in various formats
- Plotting loss & accuracy
- Preventing overfitting
- Data augmentation
- Dropout
- Batch loading of data
- Callbacks
- Dataset formats (JSON, CSV, etc.)
- Dynamically adjusting learning rates
- Visualization of various clf + other metrics
- Making a dataset performant using tf.data API
- Creating ensemble models
- Accounting for horizon variance in time series prediction
- Plotting prediction intervals


***General Vocabulary and Terms List for Flashcard Study***

Vocabulary terms can be found in bold throughout the notebook and are listed at the end of the notebook in alphabetical order with a brief definition and/or description.

# Processes

In [1]:
# Import common libraries for occasional use throughout notebook
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import sklearn
import seaborn as sns

# Common sklearn imports
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import r2_score, mean_squared_error
from sklearn import metrics
from math import sqrt

%matplotlib inline

## *(1) Basic Neural Network Regression*

#### *Create/collect feature and labels*

This step involves defining the input features and the corresponding output labels for your regression problem. You should have a clear understanding of the problem you're trying to solve, and select the appropriate **features** and **labels** for your dataset. This step may also involve pre-processing and cleaning the data to **remove outliers**, **missing values**, or other anomalies.

In [None]:
# Code snippets you may find useful
ins = pd.read_csv("data/insurance_data/insurance.csv")

ins_copy = ins.copy() # create copy for split

# Check for missing values/duplicates (important to impute before splitting)
ins.isna().sum()
ins.duplicated().sum()

# Create features and labels
X = ins_copy.drop("charges", axis=1)
y = ins_copy["charges"]

# Split data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

#### *Explore and visualize the data*

Once you have collected the data, it's important to **explore** and **visualize** it to gain insights and identify patterns. You should use techniques such as **scatter plots**, **histograms**, and **correlation matrices** to understand the relationships between the features and labels.

In [None]:
# Explore details of the specific dataset
ins.dtypes # check column data types
ins.shape
ins.info()
ins.describe()



# Visualize the dataset to maximize familiarity
ins.head() # pass no. of columns you want to see
ins.tail()


plt.figure(figsize=(12,10)) # plot hist chart
plt.title("`Charge` amount distribution by `sex`")
plt.xlim(ins.charges.min(), ins.charges.max())
sns.histplot(data=ins,
              x="charges",
              hue="sex",
             palette="plasma", kde=True);

plt.figure(figsize=(12,10)) # alterative hist
plt.xlim(18, 64)
sns.histplot(ins["age"], bins=10, kde=True, alpha=0.6);


plt.figure(figsize=(12,10)) # plot strip plot
plt.title("Age-Charge Correlation")
plt.xticks(rotation=60)
sns.stripplot(x="age",
              y="charges",
             data=ins,
             hue="sex",
             palette="plasma",
             dodge=False);


unique_regions = ins.region.unique() # find unique categorical values, plot

plt.style.use('ggplot') # choosing sns style

by_reg = pd.pivot_table(ins, index="region", values="charges")
by_reg = by_reg.sort_values(by="charges", ascending=True)
by_reg

plot = by_reg.plot(kind="barh", figsize=(12,8),
                   title="Charges by Region", legend=True, 
                   edgecolor="black", lw=2)


sex = pd.pivot_table(ins, index="sex", values="charges") # check overall %s
sex.plot.pie(figsize=(5,5), subplots=True,
             autopct='%1.1f%%', title="Charges by sex");


plt.figure(figsize=(12,10)) # searching for correlations w/ more hist plots
plt.title("`Charges` amount distribution by `smoker`")
plt.xlim(ins.charges.min(), ins.charges.max())
sns.histplot(data=ins,
              x="charges",
              hue="smoker",
             palette="magma", kde=True);


plt.style.use('ggplot')

by_children = pd.pivot_table(ins, index="children", values="charges")
by_children = by_children.sort_values(by="children", ascending=True)
by_children

plot = by_children.plot(kind="barh", figsize=(12,8), # plotting barh 
                   title="Charges by number of Children", legend=True, 
                   edgecolor="black", lw=2)

                    # pivot table
bmi = pd.pivot_table(ins, index="age", values="bmi", aggfunc="mean")
bmi.plot(kind="line", figsize=(12,8), lw=4, marker="o",
         markersize=7, markerfacecolor="white",
         markeredgecolor="black", markeredgewidth=2)
plt.title("Average BMI by Age")
plt.xlim(15, 70)
plt.ylim(28, 33)
plt.axhline(y=ins.bmi.mean(), ls="--", color="black")


plt.figure(figsize=(6,5)) # plot numerical correlation heatmap
sns.heatmap(cp_ins.corr(), annot=True, cmap="plasma");


cp_ins["sex"]=pd.factorize(cp_ins.sex)[0] # categorical --> numerical
cp_ins["smoker"]=pd.factorize(cp_ins.smoker)[0]
cp_ins["region"]=pd.factorize(cp_ins.region)[0]

plt.figure(figsize=(10,8)) # another correlation matrix/heatmap
sns.heatmap(cp_ins.corr(), annot=True, cmap="icefire");

#### *Check shapes*

Before building the neural network, you should ensure that the dimensions of the input and output data are consistent with the expected shapes of the model. This involves checking the shape of the input features, the number of output labels, and the number of samples in the training/validation/test data.

In [None]:
# Check shapes of data
X_train.shape, X_test.shape, y_train.shape, y_test.shape

len(X_train), len(X_test), len(y_train), len(y_test) # find lengths (samples)



# For other types of info in TensorFlow...
example = tf.constant([1, 2, 3])

example.shape
example.dtype
example.ndim
example[x] # index at some location 'x'
tf.size(example)
tf.cast(example, dtype=tf.DTYPE_HERE)

tf.reduce_min(example) # find the minimum
tf.reduce_max(example) # find the maximum
tf.reduce_mean(example) # find the mean
tf.reduce_sum(example) # find the sum



# Other potentially useful tf functions
ex_var = tf.math.reduce_variance(tf.cast(example, dtype=tf.float32))
ex_std = tf.math.sqrt(ex_var)

#### *Create a model*

In this step, you will define the architecture of your neural network. This involves specifying the number of layers, the number of neurons in each layer, the **activation functions**, and the type of **optimizer** and **loss function** to be used. You should also define the input and output layers, which will match the shape of your input and output data.

In [2]:
# You may need to take further steps before creating the model

# Some imports from sklearn to prepare the data
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder


# One-hot encoding with pandas get_dummies()
ins_onehot = pd.get_dummies(ins_copy)

X = ins_onehot.drop(["charges"], axis=1) # split new one-hot data
y = ins_onehot["charges"]


# Create column transfer
ct = make_column_transformer(
    (MinMaxScaler(), ["age", "bmi", "children"]), # scale numerical values from 0-1
    (OneHotEncoder(handle_unknown="ignore"), ["sex", "smoker", "region"])
)

# [ RECREATE DATA FEATURES/LABELS IF NECESSARY ]

ct.fit(X_train) # fit column transformer to train data


# Transform training and test data with normalization (MinMaxScaler & OneHot...)
X_train_normal = ct.transform(X_train)
X_test_normal = ct.transform(X_test) # must test on same data as trained on

NameError: name 'ins_copy' is not defined

In [None]:
# Create a model

# If you desire reproducible results
tf.random.set_seed(42) # any number

# Define any callbacks
early_stop_cb = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)

# Build suitable model
ins_model = tf.keras.Sequential([
    tf.keras.layers.Dense(100),
    tf.keras.layers.Dense(10),
    tf.keras.layers.Dense(1),
])

#### *Compile the model*

After creating the model, you should compile it by specifying the optimizer, the loss function, and the **metrics** to be used during training. This step prepares the model for training.

In [None]:
# Compile model
ins_model.compile(loss = tf.keras.optimizers.mae,
                 optimizer = tf.keras.optimizers.Adam(lr=0.001),
                 metrics=['mae'])

#### *Fit the model*

Once the model is compiled, you can fit it to the training data by specifying the number of epochs, the **batch size**, and the validation split. This step involves training the model on the training data, evaluating its performance on the validation data, and adjusting the model parameters accordingly.

In [None]:
# Run model - capture history object to visualize performance afterward 
history = ins_model.fit(X_train_normal,
                       y_train, 
                        epochs=100,
                       callbacks=[early_stop_cb])

# Plot history (loss/training curve)
pd.DataFrame(history.history).plot()
plt.ylabel("loss")
plt.xlabel("epochs")

#### *Evaluate the model*

After training the model, you should evaluate its performance on the test data. This step involves making predictions on the test data, calculating the evaluation metrics, and comparing the results with the expected values.

In [None]:
ins_model.evaluate(X_test_normal, y_test)

In [None]:
ins_model.predict(X_test_normal)

#### *Visualize and plot models*

After training the model, you should evaluate its performance on the test data. This step involves making predictions on the test data, calculating the evaluation metrics, and comparing the results with the expected values.

In [None]:
# Find layers of a model
ins_model.layers # <-- can be indexed , if necessary


# Get patterns of a given layer in a network
weights, biases = ins_model.layers[0].get_weights()


# Shapes
weights.shape

In [None]:
# Visualize models using keras
from tensorflow.keras.utils import plot_model

ins_model.summary()

# OR

plot_model(ins_model, show_shapes=True)

#### *Save the model*

Once you are satisfied with the performance of the model, you should save it for future use. You can save the model in the **SavedModel** or **HDF5**/.h5 format, which will allow you to reuse the model for inference or further training.

In [None]:
# Save in .h5 format
ins_model.save('model_name.h5')

## *(2) Basic Neural Network Classification*

### ***(a) Binary Classification***

#### *Gather, format, and split data*

This step involves gathering your data and formatting it in a way that the neural network can process it. You should also **split the data** into training, validation, and test sets.

In [None]:
from sklearn.datasets import make_circles

# Make 1000 examples
n_samples = 1000

# Create circles
X, y = make_circles(n_samples,
                    noise=0.03,
                    random_state=42)

# SEE BELOW FOR DATA SPLIT (POST VISUALIZATION/EXPLORATION)

#### *Visualize the data*

Once you have formatted the data, it's important to visualize it to gain insights and identify patterns. You should use techniques such as scatter plots, histograms, and correlation matrices to understand the relationships between the features and labels.

In [None]:
# Check out the features and labels
X[:20], y[:50]



# Visualize as a table and with matplotlib
import pandas as pd
circles = pd.DataFrame({"X0":X[:, 0],
                        "X1": X[:, 1],
                        "label": y})
circles

import matplotlib.pyplot as plt
plt.scatter(X[:, 0], X[:,1], c=y, cmap=plt.cm.RdYlBu)



# Check lengths and shapes
X.shape, y.shape

len(X), len(y)

X[0], y[0]



# Split into train and test sets
X_train, y_train = X[:800], y[:800]
X_test, y_test = X[800:], y[800:]

X_train.shape, X_test.shape, y_train.shape, y_test.shape

#### *Specify callbacks*

**Callbacks** are functions that are called during training at specific points to perform actions such as saving the model, adjusting the **learning rate**, or stopping training if the model is not improving. You should specify the appropriate callbacks for your binary classification problem.

In [None]:
# Model checkpoint callback
from tensorflow.keras.callbacks import ModelCheckpoint

checkpoint_path = "model_checkpoint.h5"

# Create a callback that saves the model's weights
model_checkpoint = ModelCheckpoint(filepath=checkpoint_path,
                                   save_weights_only=True,
                                   save_best_only=True,
                                   monitor='val_loss',
                                   mode='min',
                                   verbose=1)



# EarlyStopping callback
from tensorflow.keras.callbacks import EarlyStopping

# Create a callback that stops the training when val_loss doesn't improve for 5 epochs
early_stopping = EarlyStopping(monitor='val_loss', patience=5)



# LearningRateScheduler
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-3 * 10**(epoch/20))


# SEE 'CALLBACKS' IN NON-PROBLEM SPECIFICS TASKS SECTION

#### *Create/import model*

In this step, you will create or import the architecture of your neural network. This involves specifying the number of layers, the number of neurons in each layer, the activation functions, and the type of optimizer and loss function to be used. You should also define the input and output layers, which will match the shape of your input and output data.

In [None]:
tf.random.set_seed(42)

# Create the model
bin_model = tf.keras.Sequential([
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid"), # <-- non-linearity
])

#### *Compile and fit*

After creating the model, you should compile it by specifying the optimizer, the loss function, and the metrics to be used during training. This step prepares the model for training. You can then fit the model to the training data by specifying the number of epochs, the batch size, and the validation split.

In [None]:
# Compile model
bin_model.compile(loss = tf.keras.losses.BinaryCrossentropy(), # <-- binary
                 optimizer = tf.keras.optimizers.Adam(learning_rate=0.001),
                 metrics=['accuracy'])

# Fit model
bin_model.fit(X_train, y_train, epochs-100,
              callbacks=[lr_scheduler], verbose=True)

# Plot to visualize loss curves
pd.DataFrame(history.history).plot() # <-- plot 'loss' and 'accuracy'
plt.title("Model Loss Curves")

#### *Ensure introduction of non-linearity*

For binary classification problems, it's important to introduce non-linearity in the output layer using the **sigmoid activation** function. This ensures that the output of the model is between 0 and 1, which can be interpreted as the probability of the input belonging to the positive class.

In [None]:
# This has been commented on above in the 'Create/import model' section

#### *Learning Rate Scheduler callback*

The learning rate is an important hyperparameter that determines the step size of the optimizer during training. You can use the **Learning Rate Scheduler*** callback to adjust the learning rate during training to improve the model's performance. You can also plot the learning rate using the *tf.range* and *plt.semilogx* functions to visualize the effect of the learning rate on the model's performance.

In [None]:
# See 'Specify Callbacks' section

#### *Evaluate using accuracy, precision/recall, F1 score, confusion matrix, and classification report*

Once the model is trained, you should evaluate its performance on the test data. Binary classification problems can be evaluated using metrics such as **accuracy**, **precision**, **recall**, **F1 score**, and the **confusion matrix**. You can also generate a **classification report** to get a detailed summary of the model's performance.

In [None]:
# SEE 'NON-PROBLEM SPECIFIC TASKS' SECTION FOR CLF METRICS CODE SNIPPETS #

### ***(b) Multi-class Classification***

#### *Import data*

This step involves importing the data in a format that the neural network can process. You should ensure that the data is in a format that can be used by TensorFlow, such as CSV, JSON, or NumPy arrays.

In [None]:
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist

# In this particular example, the dataset comes pre-sorted (split)
(train_data, train_labels), (test_data, test_labels) = fashion_mnist.load_data()

#### *Split into train/test/val*

Once you have imported the data, you should split it into training, validation, and test sets. The training set is used to train the model, the validation set is used to tune the hyperparameters, and the test set is used to evaluate the final model.

In [None]:
# See above cell, where this was done along with the data import

#### *Check shapes + some examples*

After splitting the data, you should **check the shapes** of the input and output data. You should also print out some examples of the input and output data to ensure that the data is being correctly processed.

In [None]:
# Check the shape of a single example
train_data[0].shape, train_labels[0].shape

#### *Visualize*

Visualizing the data can help to gain insights and identify patterns in the data. You should use techniques such as scatter plots, histograms, and correlation matrices to understand the relationships between the features and labels.

In [None]:
# Plot a single sample
import matplotlib.pyplot as plt
plt.imshow(train_data[0]);


# PLot an example image and its label
index_of_choice = 200
plt.imshow(train_data[17], cmap=plt.cm.binary)
plt.title(class_names[train_labels[index_of_choice]])


# Plot multiple random images of the data
import random
plt.figure(figsize=(7,7))

for i in range(4):
  ax = plt.subplot(2,2,i+1)
  rand_index = random.choice(range(len(train_data)))
  plt.imshow(train_data[rand_index], cmap=plt.cm.binary)
  plt.title(class_names[train_labels[rand_index]])
  plt.axis(False)

#### *Create list of class_names*

For multi-class classification problems, you should create a **list of class_names** to map the output classes to their corresponding names.

In [None]:
# Create list of class_names (human-readable)
class_names=["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

#### *Build NN architecture*

The architecture of the neural network for multi-class classification problems will be similar to binary classification problems, but the output layer will use the **softmax activation** function. You should also use the **CategoricalCrossentropy loss** function to compute the loss during training. It's important to scale (normalize) the data to ensure that the inputs are between 0 and 1.

In [None]:
# Be sure that your data is scaled (normalized) before running through model
train_data_norm = train_data / 255.0
test_data_norm = test_data / 255.0

# Check the min and max values of the scaled training data
train_data_norm.min(), train_data_norm.max()

# Should be (0.0, 1.0)


# Now that your data is normalized, build a suitable model for your problem:
tf.random.set_seed(42)

cat_model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax")
])

#### *Train and export model*

After creating the model, you should train it using the training and validation sets. You can then export the model to a format that can be used for inference.

In [None]:
# Compile model
cat_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
               optimizer=tf.keras.optimizers.Adam(),
               metrics=["accuracy"])


# Fit & train model
cat_model_history = cat_model.fit(train_data_norm, train_labels, epochs=10,
                          validation_data=(test_data_norm, test_labels))

#### *Plot LR decay for ideal rate*

The learning rate is an important hyperparameter that determines the step size of the optimizer during training. You can use a **learning rate decay function** to adjust the learning rate during training to improve the model's performance. You can also **plot the learning rate decay** to visualize the effect of the learning rate on the model's performance.

In [None]:
# Plot learning rate decay curve
import numpy as np
import matplotlib.pyplot as plt

lrs = 1e-3 * 10**(tf.range(40)/20)
plt.semilogx(lrs, fit_lr_history.history["loss"])
plt.xlabel("Learning rate")
plt.ylabel("Loss")
plt.title("Find the Ideal LR")

#### *Evaluate using classification metrics*

Once the model is trained, you should evaluate its performance on the test data. Multi-class classification problems can be evaluated using metrics such as accuracy, precision, recall, F1 score, and the confusion matrix. You can also generate a classification report to get a detailed summary of the model's performance.

In [None]:
# SEE 'NON-PROBLEM SPECIFIC TASKS' SECTION FOR CLF METRICS CODE SNIPPETS #

#### *Train for longer OR change NN architecture*

If the model's performance is not satisfactory, you can train the model for longer or change the architecture of the neural network. You can experiment with different hyperparameters and architectures to improve the model's performance.

In [None]:
# SEE 'PREVENT OVERFITTING' IN NON-PROBLEM SPECIFIC TASKS # 

## *(3) Convolutional Neural Networks*

### ***(a) Binary Image Classification***

#### *Import dataset or set variables for file/directory paths (train/test/validation)*

In addition to importing the dataset or setting variables for the file/directory paths, you should also check if there are any missing or corrupted files. You may also want to consider splitting your data into training, validation, and test sets to avoid overfitting.

In [None]:
import zipfile
!wget https://storage.googleapis.com/ztm_tf_course/food_vision/pizza_steak.zip

# Unzip the downloaded file
zip_ref = zipfile.ZipFile("pizza_steak.zip")
zip_ref.extractall()
zip_ref.close()



# Navigate through dirs/files
!ls pizza_steak
!ls pizza_steak/train
!ls pizza_steak/train/steak



# Walk through dirs and list number of files
import os

for dirpath, dirnames, filenames in os.walk("pizza_steak"):
  print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")



# Another way to find out how many imgs are in a file
num_steak_images_train = len(os.listdir("pizza_steak/train/steak"))
num_pizza_images_train = len(os.listdir("pizza_steak/train/pizza"))

num_steak_images_train, num_pizza_images_train



# Get the classnames programatically
import pathlib
import numpy as np
data_dir = pathlib.Path("pizza_steak/train")
class_names = np.array(sorted([item.name for item in data_dir.glob("*")]))
# create list of class names from the subdirectories
print(class_names)

#### *Data preprocessing*

 In addition to normalizing pixel values between 0 and 1, you should also consider **resizing your images** to a fixed size to ensure consistency across all images. You can use tf.image.resize() for this purpose.

In [None]:
# Visualize our images before carrying out any permanent actions
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import random

def view_random_image(target_dir, target_class):
  # Setup the target directory (view images from here)
  target_folder = target_dir+target_class

  # Get a random image path
  random_image = random.sample(os.listdir(target_folder), 1)
  print(random_image)

  # Read in the image and plot it with matplotlib
  img = mpimg.imread(target_folder+"/"+random_image[0])
  plt.imshow(img)
  plt.title(target_class)
  plt.axis("off");

  print(f"Image shape: {img.shape}") # show shape of the image

  return img



# View a random image from the training dataset
img = view_random_image(target_dir="pizza_steak/train/",
                        target_class="pizza")

img.shape

#### *Import from directories and turn into batches*

You can use the **tf.keras.preprocessing.image.ImageDataGenerator** class to import images from directories and turn them into batches. This class also allows you to perform **data augmentation** techniques such as random rotations, flips, and shifts.

In [None]:
# < WITHOUT DATA AUGMENTATION > #
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# set seed
tf.random.set_seed(42)


# Preprocess data (get all the pixel values between 0-1; also called scaling/normalization)
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)


 # Setup paths to our data directories
train_dir = "/content/pizza_steak/train"
test_dir = "/content/pizza_steak/test"


# Import data from directories and turn it into batches
train_data = train_datagen.flow_from_directory(directory=train_dir,
                                             batch_size=32,
                                            target_size=(224, 224),
                                            class_mode="binary", # <---
                                            seed=42)

test_data = train_datagen.flow_from_directory(directory=test_dir,
                                               batch_size=32,
                                               target_size=(224,224),
                                               class_mode="binary", # <---
                                               seed=42)



# Get a sample of a training data batch
images, labels = train_data.next()
len(images), len(labels)

# How many batches are there?
len(train_data)

1500/32 # batches in the dataset (rounded up)



# < ADDING DATA AUGMENTATION (SEE NON-SPECIFIC TASKS SECTION, TOO) > #

# Create ImageDataGenerator training instance with data augmentation
train_datagen_aug = ImageDataGenerator(rescale=1/255.,
                                       rotation_range=0.2,
                                       shear_range=0.2,
                                       zoom_range=0.2,
                                       width_shift_range=0.2,
                                       height_shift_range=0.2,
                                       horizontal_flip=True)

train_datagen_aug_shuff = train_datagen_aug.flow_from_directory(train_dir,
                                                          target_size=(224,224),
                                                          class_mode="binary",
                                                          shuffle=True)

#### *Build CNN model - ensure shapes are correct*

 When building a CNN model, you need to make sure that the shapes of the input and output layers are correct. You should also consider using **padding** to preserve the spatial dimensions of the input image.

In [None]:
# Build a CNN model (same as the tiny VGG on the CNN explainer website)
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(filters=10,
                           kernel_size=3,
                           activation="relu",
                           input_shape=(224,224,3)),
    tf.keras.layers.Conv2D(10, 3, activation="relu"),
    tf.keras.layers.MaxPool2D(pool_size=2,
                              padding="valid"),
    tf.keras.layers.Conv2D(10, 3, activation="relu"),
    tf.keras.layers.Conv2D(10, 3, activation="relu"),
    tf.keras.layers.MaxPool2D(2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(1, activation="sigmoid")
])



# Alternatively...

# Create the model (this will be the baseline - 3 layer CNN)
cnn = Sequential([
    Conv2D(filters=10,
           kernel_size=3,
           strides=1,
           padding="valid",
           activation="relu",
           input_shape=(224,224,3)),
    Conv2D(10, 3, activation="relu"),
    Conv2D(10, 3, activation="relu"),
    Flatten(),
    Dense(1, activation="sigmoid") # binary classification
])

#### *Compile and fit model*

When compiling the model, you should specify the loss function, optimizer, and metrics to be used during training. For binary image classification, you should use **binary cross-entropy** as the loss function and **Adam optimizer**. You should also consider using **early stopping** to prevent overfitting.

In [None]:
# Compile our CNN
model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
              optimizer=tf.keras.optimizers.Adam(),
              metrics=["accuracy"])


# Fit the model
history = model.fit(train_data, # train_data Object creates X/y for us
                    epochs=5,
                    steps_per_epoch=len(train_data),
                    validation_data=valid_data,
                    validation_steps=len(valid_data))


# Plot loss curves (SEE NON-SPECIFIC TASKS CODE)

#### *Evaluate the model*

To evaluate the performance of your model, you can use metrics such as accuracy, precision, recall, and F1-score. You should also consider using a confusion matrix and classification report to get a more detailed analysis of the model's performance.

In [None]:
# Bring in an example image
!wget [URL]

# Visualize the image
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

steak = mpimg.imread("03-steak.jpeg")
plt.imshow(steak)
plt.axis(False)

steak.shape



# Make predictions on the custom image
model.predict(tf.expand_dims(steak, axis=0))



# OPTIONAL: Functionize the loading and prepping of custom images
def load_and_prep_image(filename, img_shape=224):
  """
  Reads an image from filename, turns it into a tensor and reshapes it 
  to (img_shape, img_shape, color_channels)
  """
  # Read in the image
  img = tf.io.read_file(filename)

  # Decode the read file into a tensor
  img = tf.image.decode_image(img)

  # Resize the image
  img = tf.image.resize(img, size=[img_shape, img_shape])

  # Rescale to get all of the image values between 0 & 1
  img = img/255.

  return img


# Load and preprocess custom image
steak = load_and_prep_image("03-steak.jpeg")

#### *Consider improvements*

You can consider adding more convolutional layers and increasing the number of filters in each layer to capture more complex features in the images. You can also add a fully connected layer after flattening to further refine the features. Additionally, you can use data augmentation techniques to generate more training data and reduce overfitting. Finally, you can add **regularization** layers such as **MaxPool2D** to reduce overfitting.

### ***(b) Multi-class Image Classification***

*(Quite similar to the above, with a few important distinctions. The steps are explored below in slightly less depth than the other Processes.)*

(i) Import the dataset or set variables for file/directory paths for the training and testing data.

(ii) Assign the training and testing data

(iii) Check the file sizes and formats to ensure that they are compatible with the neural network architecture

(iv) Get examples and inspect them programmatically to get a better understanding of the data.

(v) Preprocess the images by **normalizing** the pixel values to be between 0 and 1

(vi) Import the images from the directories and turn them into batches.

(vii) Build the CNN model, ensuring that the input shape and output shape are correctly specified for the number of classes in the dataset, and the activation function for the output layer is *softmax*

In [None]:
# Build out an appropriate CNN architecture
cnn = Sequential([
    Conv2D(10, 3, activation="relu", input_shape=(224, 224,3)),
    Conv2D(10, 3, activation="relu"),
    MaxPool2D(pool_size=2),
    Conv2D(10, 3, activation="relu"),
    Conv2D(10, 3, activation="relu"),
    MaxPool2D(),
    Flatten(),
    Dense(10, activation="softmax") # <-- 'softmax' instead of 'sigmoid'
])

(viii) Compile the model, specifying the loss function as *CategoricalCrossentropy*

In [None]:
cnn.compile(loss=CategoricalCrossentropy(),
                     optimizer=Adam(),
                     metrics=["accuracy"])

(ix) Fit the model and evaluate it using metrics such as accuracy, precision/recall, and F1 score

In [None]:
cnn_hist = cnn.fit(train_data, 
                    epochs=5,
                    steps_per_epoch=len(train_data),
                    validation_data=test_data,
                    validation_steps=len(test_data))

(x) Visualize the accuracy and loss curves over the epochs to ensure that the model is not overfitting or underfitting

(xi) Make and visualize predictions on the test set to assess the performance of the model

(xii) Consider using transfer learning by importing a pre-trained model, such as **VGG16** or **ResNet**, and adding additional layers to fine-tune the model for the specific classification task

## *(4) Transfer Learning - Feature Extraction*

#### *Import the data*

Load the data using standard file I/O libraries or functions. The data should be formatted in a way that is compatible with TensorFlow.

In [None]:
import zipfile

# Download the data (10% of 10 Food Classes repo from Food101 paper)
!wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_10_percent.zip

# Unzip the downloaded file
zip_ref = zipfile.ZipFile("10_food_classes_10_percent.zip")
zip_ref.extractall()
zip_ref.close()



import os

# Walk through directory and list number of files
for dirpath, dirnames, filenames in os.walk("10_food_classes_10_percent"):
  print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")

#### *Prepare and split the data*

Prepare the data by converting it to a format that is compatible with TensorFlow. Then, split the data into training, validation, and testing sets.

In [None]:
# Setup data inputs
from tensorflow.keras.preprocessing.image import ImageDataGenerator

IMG_SHAPE = (224,224)
BATCH_SIZE = 32
EPOCHS = 5

train_dir = "10_food_classes_10_percent/train/"
test_dir = "10_food_classes_10_percent/test/"

train_data_gen = ImageDataGenerator(rescale=1/255.)
test_data_gen = ImageDataGenerator(rescale=1/255.)

print("Training images:")
train_data_10_percent = train_data_gen.flow_from_directory(train_dir,
                                                           target_size=IMG_SHAPE,
                                                           batch_size=BATCH_SIZE,
                                                           class_mode="categorical")

print("Testing images:")
test_data = test_data_gen.flow_from_directory(test_dir,
                                              target_size=IMG_SHAPE,
                                              batch_size=BATCH_SIZE,
                                              class_mode="categorical")

#### *Preprocess the data*

Preprocess the data by normalizing the pixel values and resizing the images, if necessary.

#### *Setup some callbacks*

Use TensorBoard callback to monitor and visualize model training progress, **Model checkpoint callback** to save the best model during training, and Early stopping callback to stop training the model when it's no longer improving.

In [None]:
# Example TensorBoard callback (functionized for each new model)
import datetime

def create_tb_callback(dir_name, experiment_name):
  log_dir = dir_name + "/" + experiment_name + "/" + datetime.datetime.now().strftime("%Y%m%d, %H%M%S")
  tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir)
  print(f"Saving TensorBoard log files to: {log_dir}")
  return tensorboard_callback

#### *Select a model architecture from TF Hub*

Choose a pre-trained model from **TensorFlow Hub** that is compatible with the task:

*https://thfub.dev/

#### *Save model url(s) to a variable*

Save the URL of the pre-trained model to a variable that can be used to instantiate the model later.

In [None]:
# Models selected for example comparison
resnet_url = "https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/4"

efficientnet_url = "https://tfhub.dev/tensorflow/efficientnet/b0/feature-vector/1"

mobilenet_url = "https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/5"

#### *Create a pretrained model instance with hub.KerasLayer(...)*

Instantiate the pre-trained model using the **hub.KerasLayer** function. This function loads the model and freezes its weights so that they are not updated during training.

In [None]:
# Ensure necessary features imported
import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.keras import layers


# Example create_model function to create models from TF Hub URLs
def create_model(model_url, num_classes=10):
  """
  Takes a TF Hub URL and creates a keras.Sequential model with it.

  Args:
    model_url (str): A TensorFlow Hub feature extraction URL.
    num_classes (int): Number of output neurons in the output layer,
      should be equal to number of target classes; default 10.

  Returns:
    An uncompiled keras.Sequential model with _model_url as feature 
    extractor layer and Dense output layer with num_classes output neurons.
  """

  # Download pretrained model and save it as a Keras layer
  feature_extractor_layer = hub.KerasLayer(model_url,
                                           trainable=False, # freeze the already learned patterns
                                           name="feature_extraction_layer",
                                           input_shape=IMG_SHAPE+(3,))
  # Create our own model
  model = tf.keras.Sequential([
      feature_extractor_layer,
      layers.Dense(num_classes, activation="softmax", name="output_layer")
  ])

  return model



# Create Resnet model
resnet_model = create_model(resnet_url,
                            num_classes=train_data_10_percent.num_classes)

# EfficientNet
eff_net_model = create_model(efficientnet_url,
                            num_classes=train_data_10_percent.num_classes)


# MobileNet
mobilenet_model=create_model(model_url=mobilenet_url,
                             num_classes=train_data_10_percent.num_classes)

#### *Compile and fit model with any callbacks*

Compile the model by specifying the loss function, optimizer, and metrics. Then, fit the model to the training data using the fit method, and specify any relevant callbacks.

In [None]:
# Compile and fit ResNet
resnet_model.compile(loss=tf.keras.losses.CategoricalCrossentropy(),
                     optimizer=tf.keras.optimizers.Adam(),
                     metrics=["accuracy"])

resnet_hist = resnet_model.fit(train_data_10_percent, epochs=5,
                               steps_per_epoch=len(train_data_10_percent),
                               validation_data=test_data,
                               validation_steps=len(test_data),
                               callbacks=[create_tb_callback(dir_name="tensorflow_hub",
                                                             experiment_name="resnet50v2"
                                                             )])


# Compile and fit EfficientNet
eff_net_model.compile(loss=tf.keras.losses.CategoricalCrossentropy(),
                     optimizer=tf.keras.optimizers.Adam(),
                     metrics=["accuracy"])

eff_net_hist = eff_net_model.fit(train_data_10_percent, epochs=10,
                               steps_per_epoch=len(train_data_10_percent),
                               validation_data=test_data,
                               validation_steps=len(test_data),
                               callbacks=[create_tb_callback(dir_name="tensorflow_hub",
                                                             experiment_name="efficientnetb0"
                                                             )])



# Compile and fit MobileNet
mobilenet_model.compile(loss=tf.keras.losses.CategoricalCrossentropy(),
                        optimizer=tf.keras.optimizers.Adam(),
                        metrics=["accuracy"])

mobilenet_model.compile(loss=tf.keras.losses.CategoricalCrossentropy(),
                        optimizer=tf.keras.optimizers.Adam(),
                        metrics=["accuracy"])

from tensorflow.python import train
mobilenet_hist = mobilenet_model.fit(train_data_10_percent, epochs=10,
                                     steps_per_epoch=len(train_data_10_percent),
                                     validation_data=test_data,
                                     validation_steps=len(test_data))

#### *Visualize results*

 Visualize the results of the trained model, such as accuracy and loss, using appropriate plotting functions.

In [None]:
# See model summary
[model_name].summary()



# Plot the loss curves
def plot_loss_curves(history):
  """
  Plots the loss curves from the training and validation models from the history
  attribute of a trained model.
  """

  accuracy = history.history["accuracy"]
  val_accuracy = history.history["val_accuracy"]

  loss = history.history["loss"]
  val_loss = history.history["val_loss"]

  epochs = range(len(history.history["loss"]))

  # Plot loss
  plt.plot(epochs, loss, label="training_loss")
  plt.plot(epochs, val_loss, label="val_loss")
  plt.title("Loss")
  plt.xlabel("Epochs")
  plt.grid(False)
  plt.legend()

  # Plot the accuracy
  plt.figure()
  plt.plot(epochs, accuracy, label="training_accuracy")
  plt.plot(epochs, val_accuracy, label="val_accuracy")
  plt.title("Accuracy")
  plt.xlabel("Epochs")
  plt.grid(False)
  plt.legend();



# View specific model layers and weights
[model_name].layers[0].weights



# View and compare model histories with TensorBaord
!tensorboard dev upload --logdir ./tensorflow_hub/ \
  --name "EfficientNetB0 vs. ResNet50v2" \
  --description "Comparing two different TF Hub feature extraction architectures using 10% of the training data." \
  --one_shot

# < ADDITIONAL TENSORBOARD INFO > #

# See what TensorBoard experiments you have
!tensorboard dev list

# Deleting an experiment
!tensorboard dev delete --experiment_id [paste exp id]

# Confirm deletion by rechecking the experiments list
!tensorboard dev list

#### *Review callback data*

Review the data generated by the callbacks, such as loss and accuracy over epochs, to gain insights into the training process and identify potential issues.

## *(5) Transfer Learning - Fine-Tuning*

#### *Import and preprocess the data*

In [None]:
# Import 10% of the training data
!wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_10_percent.zip

unzip_data("10_food_classes_10_percent.zip") # from helper_functions



# Check out how many images and subdirectories are in the dataset
walk_through_dir("10_food_classes_10_percent")

# Create training and test directory paths
train_dir = "10_food_classes_10_percent/train"
test_dir = "10_food_classes_10_percent/test"

*(a) **tf.keras.preprocessing.image_dataset_from_directory** or a similar method.*

In [None]:
IMG_SIZE = (224, 224)
BATCH_SIZE = 32

train_data_10_percent = tf.keras.preprocessing.image_dataset_from_directory(directory=train_dir,
                                                                            image_size=IMG_SIZE,
                                                                            label_mode="categorical",
                                                                            batch_size=BATCH_SIZE)

test_data = tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir,
                                                                            image_size=IMG_SIZE,
                                                                            label_mode="categorical",
                                                                            batch_size=BATCH_SIZE)

*(b) Preprocess the data by rescaling, resizing, and possibly applying data augmentation using **tf.keras.preprocessing.image.ImageDataGenerator***

In [None]:
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

# Create data augmentation stage with horizontal flipping, rotations, zoomz, etc.
data_augmentation = keras.Sequential([
    preprocessing.RandomFlip("horizontal"),
    preprocessing.RandomRotation(0.2),
    preprocessing.RandomZoom(0.2),
    preprocessing.RandomHeight(0.2),
    preprocessing.RandomWidth(0.2)
    # preprocessing.Rescale(1./255) # keep for models like ResNet50v2 (Effnet has this built in)
], name = "data_augmentation")

*(c) Split the data into training and validation sets. (See Step (a) if dealing with images)*

#### *Choose a pre-trained model*

*(a) Choose a pre-trained model from tf.keras.applications or TensorFlow Hub that is suitable for the problem at hand.*

*(b) Import the pre-trained model and set its weights to be non-trainable.* 

In [None]:
# 1. Create the base model with tf.keras.applications
base_model = tf.keras.applications.EfficientNetB0(include_top=False)

# 2. Freeze the base model (so the underlying pretrained patterns aren't updated)
base_model.trainable = False

#### *Build the fine-tuning model*

*(a) Create a new **tf.keras.Sequential** or **tf.keras.Model** that includes the pre-trained model as a layer.*

*(b) Add new trainable layers on top of the pre-trained model to adapt it to the new problem.*

*(c) Compile the model with an appropriate loss function, optimizer, and metrics.*

In [None]:
# 3. Create inputs into our model
inputs = tf.keras.layers.Input(shape=(224,224,3), name="input_layer")

# 4. If using a model like ResNet50v2, you will need to normalize inputs
x = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)(inputs)

# 5. Pass the inputs to the base_model
x = base_model(inputs)
print(f"Shape after passing inputs through base model: {x.shape}")

# 6. Average pool the outputs of the base_model (aggregate all the most important information, reduce computations)
x = tf.keras.layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(x)
print(f"Shape after GlobalAveragePooling2D: {x.shape}")

# 7. Create the output activation layer
outputs = tf.keras.layers.Dense(10, activation="softmax", name="output_layer")(x)

# 8. Combine the inputs with the outputs into a model
model = tf.keras.Model(inputs, outputs)

# 9. Compile the model
model.compile(loss=tf.keras.losses.CategoricalCrossentropy(),
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

# 10. Fit the model and save its history
history = model.fit(train_data_10_percent, epochs=5,
                        steps_per_epoch=len(train_data_10_percent),
                        validation_data=test_data,
                        validation_steps=int(0.25 * len(test_data)),
                        callbacks=[create_tensorboard_callback(dir_name="transfer_learning",
                                                               experiment_name="10_percent_feature_extraction")])

#### *Fine-tune the model*

*(a) Train the model on the new data, possibly using a learning rate scheduler, early stopping, or other callbacks.*

*(b) Gradually unfreeze some of the layers in the pre-trained model and continue training to improve performance.*

In [None]:
# View layers
model.layers



# Check if trainable
for layer in model.layers:
    print(layer, layer.trainable)
    
    
    
# To begin fine-tuning, set the desired layers of base_model to be trainable
base_model.trainable = True

# Freeze the other layers
for layer in base_model.layers[:-10]:
    layer.trainable = False
    
# Recompile after adjusting layers
model_2.compile(loss="categorical_crossentropy",
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), # typically, you should lower learning rate by 10x when fine-tuning
                metrics=["accuracy"])

# Confirm changes were successful
for layer_number, layer in enumerate(model_2.layers[2].layers):
  print(layer_number, layer.name, layer.trainable)

#### *Evaluate and visualize the results*

*(a) Evaluate the performance of the fine-tuned model on the validation set using accuracy, precision, recall, F1-score, or other appropriate metrics.*

*(b) Visualize the training and validation accuracy and loss over time to diagnose overfitting or underfitting.*

*(c) Visualize some of the model's predictions to gain insights into its behavior.*

#### *Consider further improvements*

*(a) Experiment with different pre-trained models or different architectures for the fine-tuning layers.*

*(b) Increase the amount of data, apply more aggressive data augmentation, or use techniques such as transfer learning with **data synthesis**.*

In [None]:
# Example of how to incorporate Data Synthesis into a model...

import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import PIL.Image as Image
import os

# Define the paths to the input and output directories
input_dir = "/path/to/input/images"
output_dir = "/path/to/output/images"

# Load the pre-trained model from TensorFlow Hub
model_url = "https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/5"
module = hub.KerasLayer(model_url, input_shape=(224, 224, 3))

# Define the data generator to read images from the input directory
data_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

# Generate the feature vectors for the input images
input_data = data_generator.flow_from_directory(input_dir,
                                                 target_size=(224, 224),
                                                 batch_size=_,
                                                 shuffle=[False, True],
                                                 class_mode=['binary', 'categorical'])
features = module.predict(input_data)

# Synthesize new images using the feature vectors
for i, feature in enumerate(features):
    # Generate a random vector to add noise to the feature vector
    noise = np.random.normal(loc=0, scale=0.1, size=feature.shape)
    # Add noise to the feature vector
    feature += noise
    # Decode the feature vector to an image
    image = module(np.array([feature]))[0]
    image = Image.fromarray(np.uint8(image * 255))
    # Save the synthesized image to the output directory
    image.save(os.path.join(output_dir, f"synthesized_{i}.jpg"))

*(c) Fine-tune the model for longer or with a different learning rate schedule to achieve better performance.*

*(d) Use regularization techniques such as dropout or **batch normalization** to improve the model's generalization ability.*

In [None]:
# Example of regularization using Dropout or Batch Normalization...

import tensorflow as tf

# Define the model architecture with regularization, dropout, and batch normalization
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1), kernel_regularizer=tf.keras.regularizers.l2(0.01)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10)
])

# Compile the model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Train the model
history = model.fit(train_images, train_labels, epochs=10,
                    validation_data=(test_images, test_labels))


## *(6) Natural Language Processing*

#### *Import data and any helper functions*

In [None]:
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py
    
from helper_functions import create_tensorboard_callback, plot_loss_curves, compare_historys, unzip_data


import pandas as pd

train_df = pd.read_csv("train.csv")
test_df = pd.read_csv("test.csv")
train_df.head(5)

# To shuffle df...
train_df_shuffled = train_df.sample(frac=1, random_state=42)

#### *Visualize and preprocess the data*

In [None]:
# How many examples of each class are there?
train_df.target.value_counts()
len(train_df), len(test_df)



# Visualize some random text training samples
import random
random_index = random.randint(0, len(train_df)-5)
for row in train_df_shuffled[["text", "target"]][random_index:random_index+5].itertuples():
  _, text, target = row
  print(f"Target: {target}", "(real disaster)" if target > 0 else "(not real disaster)")
  print(f"Text:\n{text}\n")
  print("---\n")
    
    
    
# Split into training and validation
from sklearn.model_selection import train_test_split

train_sentences, val_sentences, train_labels, val_labels = train_test_split(train_df_shuffled["text"].to_numpy(),
                                                                              train_df_shuffled["target"].to_numpy(),
                                                                              test_size=0.1, #10% for val split
                                                                              random_state=42)
# Check the lengths
len(train_sentences), len(train_labels), len(val_sentences), len(val_labels)

(a) Clean and preprocess the text data (e.g., remove **stop words**, punctuation, lowercasing, **stemming**, **lemmatization**)

In [None]:
# 1. REMOVING STOP WORDS WITH JUST PYTHON AND TENSORFLOW

import tensorflow as tf
from tensorflow.keras.preprocessing.text import text_to_word_sequence
from tensorflow.keras.preprocessing.sequence import pad_sequences

stop_words = set(tf.keras.datasets.imdb.get_word_index().keys()) # retrieve stop words from IMDb dataset

def remove_stopwords(text):
    words = text_to_word_sequence(text)
    filtered_words = [word for word in words if word not in stop_words]
    return ' '.join(filtered_words)




# 2. REMOVING PUNCTUATION AND LOWERCASING

import string

def remove_punctuation(text):
    return text.translate(str.maketrans('', '', string.punctuation))

def lowercase(text):
    return text.lower()



# 3. STEMMING WITH NLTK LIBRARY & VANILLA TENSORFLOW

from nltk.stem import PorterStemmer # using nltk library (1)

stemmer = PorterStemmer()

def stem_text(text):
    words = text.split()
    stemmed_words = [stemmer.stem(word) for word in words]
    return ' '.join(stemmed_words)

import tensorflow as tf # using just tensorflow and python (2)
from tensorflow.strings import split 

def stem_text(text):
    stemmer = tf.keras.preprocessing.text.stem.PorterStemmer()
    words = split(text, ' ')
    stemmed_words = [stemmer.stem(word.numpy().decode('utf-8')) for word in words]
    return tf.constant(' '.join(stemmed_words))



# 4. LEMMATIZATION WITH NLTK LIBRARY & TENSORFLOW

import nltk # using nltk library (1)
from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')

lemmatizer = WordNetLemmatizer()

def lemmatize_text(text):
    words = text.split()
    lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
    return ' '.join(lemmatized_words)


import tensorflow as tf # alternative using TensorFlow (2)
from tensorflow.keras.preprocessing.text import text_to_word_sequence
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.tokenize import word_tokenize
import re
import string

nltk.download('stopwords') # Download stop words
stop_words = set(stopwords.words('english'))

stemmer = PorterStemmer() # Initialize stemmer and lemmatizer
lemmatizer = WordNetLemmatizer()

def preprocess_text(text):
    text = text.lower() # Lowercase the text
    
    # Remove punctuations
    text = text.translate(str.maketrans('', '', string.punctuation))
    words = word_tokenize(text) # Tokenize the text
    
    # Remove stop words
    words = [word for word in words if word.casefold() not in stop_words]
    
    # Stem the words
    words = [stemmer.stem(word) for word in words]
    
    # Lemmatize the words
    words = [lemmatizer.lemmatize(word) for word in words]
    
    # Join the words to form a preprocessed text
    preprocessed_text = ' '.join(words)
    
    return preprocessed_text

(b) Split the text into **tokens** using **tokenization** (e.g., using nltk.tokenize, spaCy, or TensorFlow Tokenizer)

In [None]:
# TextVectorization with default parameters (TENSORFLOW)
text_vectorizer = TextVectorization(max_tokens=2, # how many words in our vocab (automatically add OOV)
                                    standardize="lower_and_strip_punctuation",
                                    split="whitespace",
                                    ngrams=None, # create groups of n words
                                    output_mode="int",
                                    output_sequence_length=None, # how long of sequences
                                    pad_to_max_tokens=True)

# Setup text vectorization variables
max_vocab_length = 10000 # max words in vocabulary
max_length = 15 # max length of a sequence (words in a tweet)

text_vectorizer = TextVectorization(max_tokens=max_vocab_length,
                                    output_mode="int",
                                    output_sequence_length=max_length)

# Fit the text vectorizer to the training text
text_vectorizer.adapt(train_sentences)



# TextVectorization using NLTK library
import nltk
nltk.download('punkt')

text = "This is a sample sentence for tokenization. Let's see how it works!"
tokens = nltk.word_tokenize(text)
print(tokens)

(c) Convert the data into numerical vectors using:
* **Embedding** (e.g., using **tf.keras.layers.Embedding**)
* **One-hot encoding** (e.g., using **tf.keras.preprocessing.text.one_hot**)

In [None]:
# Embedding
from tensorflow.keras import layers

embedding = layers.Embedding(input_dim=max_vocab_length,
                             output_dim=128,
                             embeddings_initializer="uniform",
                             input_length=max_length)
embedding

# TEST: Get a random sentence from the traing set
random_sentence = random.choice(train_sentences)
print(f"Original text:\n {random_sentence}\
        \n\nEmbedded version:")

# Embed the random sentence (turn it into a dense vector of fixed size)
sample_embed = embedding(text_vectorizer([random_sentence]))
sample_embed



# One-hot encoding labels
import tensorflow as tf
from sklearn.preprocessing import OneHotEncoder
one_hot_encoder = OneHotEncoder(sparse=False)
train_labels_one_hot = one_hot_encoder.fit_transform(train_df["target"].to_numpy().reshape(-1, 1))
val_labels_one_hot = one_hot_encoder.transform(val_df["target"].to_numpy().reshape(-1, 1))
test_labels_one_hot = one_hot_encoder.transform(test_df["target"].to_numpy().reshape(-1, 1))

# check what labels look like
train_labels_one_hot



# Label encode labels
# Extract labels ("target") and encode them into integers
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
train_labels_encoded = label_encoder.fit_transform(train_df["target"].to_numpy())
val_labels_encoded = label_encoder.fit_transform(val_df["target"].to_numpy())
test_labels_encoded = label_encoder.fit_transform(test_df["target"].to_numpy())

# Check what training labels look like
train_labels_encoded

(d) Pad the sequences to make them of the same length (e.g., using **tf.keras.preprocessing.sequence.pad_sequences**)

In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras import layers

# Assume sequences is a list of integer sequences
sequences = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]

# Set max_length to the desired maximum length of the sequences
max_length = 4

# Pad the sequences to make them of the same length
padded_sequences = pad_sequences(sequences, maxlen=max_length)

# Embedding
embedding = layers.Embedding(input_dim=max_vocab_length,
                             output_dim=128,
                             embeddings_initializer="uniform",
                             input_length=max_length)

# Embed the padded sequences
embedded_sequences = embedding(padded_sequences)

(e) Split the data into training, validation, and test sets (if you haven't already)

#### *Build and train the model*

(a) Specify the model architecture (e.g., using tf.keras.Sequential or tf.keras.Functional API)

In [None]:
# Get a baseline model (NOT RECOMMENDING FOR ACTUAL USE IN EXAM)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import Adam, SGD

from helper_functions import create_tensorboard_callback

# Create a directory to save Tensorboard logs
SAVE_DIR = "model_logs"

# Build model with the Functional API
inputs = layers.Input(shape=(1,), dtype=tf.string)
x = text_vectorizer(inputs) # turn input text into numbers
x = embedding(x) # create an embedding of the numerical inputs
#x = layers.GlobalAveragePooling1D()(x) # model didn't work w/o this layer
x = layers.GlobalMaxPool1D()(x) # performs better than average pooling
outputs = layers.Dense(1, activation="sigmoid")(x) # sigmoid for binary output

model = tf.keras.Model(inputs, outputs, name="model_1_dense")
model.summary()

(b) Compile the model (e.g., using tf.keras.Model.compile)

In [None]:
# Compile
model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=Adam(),
                metrics=["accuracy"])

(c) Fit the model on the training data (e.g., using tf.keras.Model.fit)

In [None]:
# Fit the model
model_history = model.fit(x=train_sentences,
                              y=train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[create_tensorboard_callback(dir_name=SAVE_DIR,
                                                                     experiment_name="model_1_dense")])

#### *Evaluate the model*

(a) Evaluate the model on the validation and test sets (e.g., using tf.keras.Model.evaluate)

In [None]:
# Evaluate
model.evaluate(val_sentences, val_labels)

# Collect set of prediction probabilities
model_pred_probs = model.predict(val_sentences)
model_pred_probs

# Convert model prediction probabilities to same as label format
model_preds = tf.squeeze(tf.round(model_pred_probs))
model_preds[:20]

# Calculate model results
model_results = get_scores(val_labels, model_preds) # see Non-Specific Tasks for get_scores code

model_results

# Compare, if you want
np.array(list(model_results.values())) > np.array(list(other_results.values()))

print(model_results)
print(other_results)

(b) Visualize the model's training history (e.g., using matplotlib)

(c) Visualize learned embeddings

In [None]:
# Get the vocabulary from our text vectorizer
words_in_vocab = text_vectorizer.get_vocabulary()
len(words_in_vocab), words_in_vocab[:10]

# Get the weight matrix of embedding layer (numerical representations of each token learned during training)
embed_weights = model_1.get_layer("embedding").get_weights()[0]
embed_weights



# Create embedding files (from TF's word embedding documentation)
import io
out_v = io.open('vectors.tsv', 'w', encoding='utf-8')
out_m = io.open('metadata.tsv', 'w', encoding='utf-8')

for index, word in enumerate(words_in_vocab):
  if index == 0:
    continue # skip 0, it's padding
  vec = embed_weights[index]
  out_v.write('\t'.join([str(x) for x in vec]) + "\n")
  out_m.write(word + "\n")
out_v.close()
out_m.close()

# Download files from colab to open in Projector
try:
  from google.colab import files
  files.download('vectors.tsv')
  files.download('metadata.tsv')
except Exception:
  pass

#### *Tune the model*

(a) Adjust the model hyperparameters (e.g., learning rate, dropout rate, number of layers, number of hidden units)

(b) Try different model architectures (e.g., using convolutional layers or recurrent layers)

In [None]:
# 1. LSTM MODEL

# Create an LSTM model
from tensorflow.keras import layers

# Build model with functional API and print shapes throughout the process
inputs = layers.Input(shape=(1,), dtype="string")

x = text_vectorizer(inputs)
x = embedding(x)

x = layers.LSTM(64, return_sequences=True)(x) # when stacking RNN cells, you must set return_sequences = True
x = layers.LSTM(64)(x)

x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(1, activation="sigmoid")(x)

lstm_model = tf.keras.Model(inputs, outputs, name="LSTM_model")

lstm_model.compile(loss="binary_crossentropy",
                optimizer = tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

lstm_model_history = lstm_model.fit(train_sentences,
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences,
                                               val_labels),
                              callbacks=[create_tensorboard_callback(SAVE_DIR,
                                                                     "LSTM_model")])



# 2. GATED RECURRENT UNIT (GRU) MODEL

# Build an RNN using the GRU cell
from tensorflow.keras import layers
inputs = layers.Input(shape=(1,), dtype=tf.string)
x = text_vectorizer(inputs)
x = embedding(x)

x = layers.GRU(64)(x)
# x = layers.GRU(64, return_sequences=True) # to stack RNNs on top of one another
# x = layers.LSTM(32, return_sequences=True)(x)
# x = layers.GRU(64)(x)

# layers.Dense(64, activation = "relu")(x)
# x = layers.GlobalAveragePooling1D()(x)
outputs = layers.Dense(1, activation = "sigmoid")(x)

gru_model = tf.keras.Model(inputs, outputs, name="model_3")

gru_model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=Adam(),
                metrics=["accuracy"])

gru_model_history = gru_model.fit(train_sentences,
                              train_labels,
                              epochs=10,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[create_tensorboard_callback(SAVE_DIR,
                                                                     "GRU_model")])



# 3. BIDIRECTIONAL LSTM MODEL

# Build a bidirectional RNN in TensorFlow
inputs = layers.Input(shape=(1,), dtype="string", name="input_layer")
x = text_vectorizer(inputs)
x = embedding(x)

# x = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(x) # if multiple
x = layers.Bidirectional(layers.LSTM(64))(x)
outputs = layers.Dense(1, activation ="sigmoid")(x)
bi_lstm_model = tf.keras.Model(inputs, outputs, name="bidirectional_model")

# compile & fit



# 4. CONV1D MODEL

# To test out embedding layer, Conv1D and max pooling...
embedding_test = embedding(text_vectorizer(["this is a test sentence"]))
conv_1d = layers.Conv1D(filters=32,
                        kernel_size=5, # ngram of 5 (looks at 5 words at a time)
                        activation="relu",
                        padding="valid") # output is smaller than input shape
conv_1d_output = conv_1d(embedding_test)
max_pool = layers.GlobalMaxPool1D()
max_pool_output = max_pool(conv_1d_output) # equivalent to get the most important feature

embedding_test.shape, conv_1d_output.shape, max_pool_output.shape

# Build conv1d model with Funcitonal API
inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)

x = text_vectorizer(inputs)
x = embedding(x)

x = layers.Conv1D(filters=64,
                  kernel_size=5,
                  activation="relu",
                  padding="valid")(x)
x = layers.GlobalAveragePooling1D()(x)

outputs = layers.Dense(1, activation="sigmoid")(x)
conv1d_model = tf.keras.Model(inputs, outputs, name="conv1d_model")



# 5. TRANSFER LEARNING MODEL (UNIVERSAL SENTENCE ENCODER)

# Create a Keras layer using the USE prtrained layer from TF Hub
sentence_encoder_layer = hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4",
                                        input_shape=[],
                                        dtype=tf.string,
                                        trainable=False,
                                        name="USE")

# Create USE model using sequential API
use__tl_model = tf.keras.Sequential([
    sentence_encoder_layer,
    layers.Dense(64, activation="relu"),
    layers.Dense(1, activation="sigmoid", name="output_layer"),
], name="USE_TL_model")

#### *Deploy the model*

(a) Save the trained model (e.g., using tf.keras.Model.save)

In [None]:
# Save your best performing model (HDF5 format)
best_model.save("best_model.h5")

# Alternatively, SavedModel format...
best_model.save("best_model_SavedModel_format")

(b) Load the saved model and make predictions (e.g., using tf.keras.models.load_model and **tf.keras.Model.predict**)

In [None]:
# Load model with custom Hub Layer (required HDF5 format)
import tensorflow_hub as hub
loaded_best_model = tf.keras.models.load_model("best_model.h5",
                                            custom_objects={"KerasLayer": hub.KerasLayer})

# How does the loaded model perform? (Confirm its the same model)
loaded_best_model.evaluate(val_sentences, val_labels)

#### *Improvements*

(a) *Consider how you want your data to look at and which format to aim for:*

* It's important to consider the format and structure of the data, especially if you have unstructured text data. You may want to preprocess the data by removing stop words, special characters, or perform stemming or lemmatization to reduce the vocabulary size and improve model performance. Additionally, you may want to consider encoding the data in different formats, such as one-hot encoding, **bag-of-words**, or **term frequency-inverse document frequency (TF-IDF)** to capture different aspects of the text data.

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

# Create tokenization and modelling pipeline
model = Pipeline([
    ("tfidf", TfidfVectorizer()), # convert words to numbers using tf-idf
    ("clf", MultinomialNB()) # model the text
])

# Fit the pipeline to the training data
model.fit(train_sentences, train_labels)

(b) *Create char-level tokenizer:*

*  In addition to using a word-level tokenizer, you can also consider creating a **character-level tokenizer** to capture character-level information in the text. 
* This involves splitting sentences into individual characters, finding the average example char-length, and checking the distribution of character lengths to identify the range that covers around 95% of the sequences. You'll also need to create a list of all keyboard characters, create a character tokenizer instance, and test it on the data.

In [None]:
# Create a function to split sentences into characters
def split_chars(text):
  return " ".join(list(text))

# Text splitting non-character-level sequence into characters
split_chars(random_sentence)



# Split sequence-level data splits into char-level data splits
train_chars = [split_chars(sentence) for sentence in train_sentences]
val_chars = [split_chars(sentence) for sentence in val_sentences]
test_chars = [split_chars(sentence) for sentence in test_sentences]



# What's the average character length?
char_len = [len(sentence) for sentence in train_sentences]
mean_char_len = np.mean(char_len)
mean_char_len



# Check distribution of sequences at the character level
import matplotlib.pyplot as plt
plt.hist(char_len, bins=7);



# What character length covers 95% of sequences?
output_seq_char_len = int(np.percentile(char_len, 95))
output_seq_char_len



# Get all keyboard characters with Python
import string
alphabet = string.ascii_lowercase + string.digits + string.punctuation
alphabet



# Create char-level token vectorizer instance
NUM_CHAR_TOKENS = len(alphabet) + 2 # add 2 for space and [OOV/UNK] tokens
char_vectorizer = TextVectorization(max_tokens=NUM_CHAR_TOKENS,
                                    output_sequence_length=output_seq_char_len,
                                    name="char_vectorizer")



# Adapt character vectorizer to training chars
char_vectorizer.adapt(train_chars)



# Check character vocab stats
char_vocab = char_vectorizer.get_vocabulary()
print(f"Number of different characters in character vocab: {len(char_vocab)}")
print(f"5 most common characters: {char_vocab[:5]}")
print(f"5 least common characters: {char_vocab[-5:]}")



# Test character vectorizer
random_train_chars = random.choice(train_chars)
print(f"Charified text:\n {random_train_chars}")
print(f"\nLength of random_train_chars: {len(random_train_chars.split())}")

vectorized_chars = char_vectorizer([random_train_chars])
print(f"\nVectorized chars:\n {vectorized_chars}")
print(f"\nLength of vectorized chars: {len(vectorized_chars[0])}")

(c) *Create char-level embedding layer:*

* After creating a character-level tokenizer, you can create a **character-level embedding layer** to map each character to a vector representation. This can be useful in capturing morphology and spelling variations in the text.

In [None]:
# Create a character level embedding layer
char_embed = layers.Embedding(input_dim=len(char_vocab),
                                    output_dim=25,
                                    mask_zero=True,
                                    name="char_embed")



# Test character embedding layer
print(f"Charified text:\n {random_train_chars}\n")
char_embed_example = char_embed(char_vectorizer([random_train_chars]))
print(f"Embedded chars after vectorization and embedding:\n {char_embed_example}\n")
print(f"Character embedding shape: {char_embed_example.shape}")

(d) *Consider a Conv1D model for this (**see examples above**):*

* When working with character-level embeddings, you can consider using a **Conv1D** model instead of an **LSTM** or **GRU**. This is because a Conv1D model can learn to capture local patterns in the sequence, which may be more relevant for character-level information.

In [None]:
# EXAMPLE CONV1D CHAR-LEVEL EMBEDDING MODEL

# Build Conv1D model for character-level embeddings
inputs = layers.Input(shape=(1,), dtype=tf.string)

vectorizer = char_vectorizer(inputs)
embeddings = char_embed(vectorizer)

x = layers.Conv1D(filters=64,
                  kernel_size=5,
                  padding="same",
                  activation="relu")(embeddings)
x = layers.GlobalMaxPooling1D()(x)

outputs= layers.Dense(num_classes, activation="softmax")(x)
char_level_model = tf.keras.Model(inputs, outputs, name="char_level_model")

(e) *Combine token + char embedding layers with **layers.concatenate**:*

* You can combine both the token-level and character-level embeddings by using the concatenate layer in Keras. This allows the model to capture both word-level and character-level information in the text.

In [None]:
# EXAMPLE TOKEN + CHAR EMBEDDINGS MODEL

# 1. Setup token inputs/model
token_inputs = layers.Input(shape=[], dtype=tf.string, name="token_input")
token_embeddings = tf_hub_embedding_layer(token_inputs)
token_output = layers.Dense(128, activation="relu")(token_embeddings)
token_model = tf.keras.Model(inputs=token_inputs,
                             outputs=token_output)

# 2. Setup char inputs/model
char_inputs = layers.Input(shape=(1,), dtype=tf.string, name="char_input")
char_vectors = char_vectorizer(char_inputs)
char_embeddings = char_embed(char_vectors)
char_bi_lstm = layers.Bidirectional(layers.LSTM(25))(char_embeddings) # bi-LSTM shown in Figure 1 of https://arxiv.org/pdf/1612.05251.pdf
char_model = tf.keras.Model(inputs=char_inputs,
                            outputs=char_bi_lstm)

# 3. Concatenate token and char inputs (create hybrid token embedding)
token_char_concat = layers.Concatenate(name="token_char_hybrid")([token_model.output, 
                                                                  char_model.output])

# 4. Create output layers - addition of dropout discussed in 4.2 of https://arxiv.org/pdf/1612.05251.pdf
combined_dropout = layers.Dropout(0.5)(token_char_concat)
combined_dense = layers.Dense(200, activation="relu")(combined_dropout) # slightly different to Figure 1 due to different shapes of token/char embedding layers
final_dropout = layers.Dropout(0.5)(combined_dense)
output_layer = layers.Dense(num_classes, activation="softmax")(final_dropout)

# 5. Construct model with char and token inputs
model_4 = tf.keras.Model(inputs=[token_model.input, char_model.input],
                         outputs=output_layer,
                         name="model_4_token_and_char_embeddings")

(f) *Combine chars and tokens into a dataset using tf.data.Dataset (.from_tensor_slices) + Prefetch/batch:*

* After combining the token-level and character-level embeddings, you can create a dataset using the **tf.data.Dataset.from_tensor_slices** method and preprocess it by **batching** and **prefetching** the data to improve training speed and memory efficiency.

In [None]:
# Create char-level dataset
train_char_dataset = tf.data.Dataset.from_tensor_slices((train_chars, train_labels_one_hot)).batch(32).prefetch(tf.data.AUTOTUNE)
val_char_dataset = tf.data.Dataset.from_tensor_slices((val_chars, val_labels_one_hot)).batch(32).prefetch(tf.data.AUTOTUNE)
test_char_dataset = tf.data.Dataset.from_tensor_slices((test_chars, test_labels_one_hot)).batch(32).prefetch(tf.data.AUTOTUNE)

(g) *Create positional embeddings:*

* To capture the positional information of the text, you can create **positional embeddings** that represent the order of the words or characters in the sequence. This involves checking the distribution of the "line_number" column, creating one-hot tensors of the "line_number" column, and checking the distribution and coverage of the "total_lines" at different values.

In [None]:
import tensorflow as tf

# Load the dataset
dataset = ...

# Determine the maximum value for line_number
max_line_number = max(dataset['line_number'])

# Create one-hot tensors for line_number
line_number_onehot = tf.one_hot(dataset['line_number'] - 1, depth=max_line_number)

# Determine the maximum value for total_lines
max_total_lines = max(dataset['total_lines'])

# Create one-hot tensors for total_lines
total_lines_onehot = tf.one_hot(dataset['total_lines'] - 1, depth=max_total_lines)

# Concatenate the one-hot tensors
positional_embedding = tf.concat([line_number_onehot, total_lines_onehot], axis=-1)

(h) *Apply label smoothing:*

* **Label smoothing** is a regularization technique that involves replacing the hard label targets with a smoothed distribution. This can improve the generalization of the model and prevent overfitting, especially when working with small datasets or highly imbalanced classes.

***(Visit: https://www.pyimagesearch.com/2019/12/30/label-smoothing-with-keras-tensorflow-and-deep-learning/ for more info.)***

In [None]:
import tensorflow as tf

def label_smoothing_loss(y_true, y_pred, epsilon=0.1):
    """
    Computes the cross-entropy loss with label smoothing.

    Args:
        y_true: Ground truth labels, tensor of shape (batch_size, num_classes).
        y_pred: Predicted labels, tensor of shape (batch_size, num_classes).
        epsilon: Smoothing factor, float between 0 and 1.

    Returns:
        The label-smoothed cross-entropy loss.
    """
    num_classes = y_pred.shape[-1]
    y_true_smooth = (1 - epsilon) * y_true + epsilon / num_classes
    return tf.keras.losses.categorical_crossentropy(y_true_smooth, y_pred)

# Example usage
model.compile(optimizer='adam', loss=label_smoothing_loss)

## *(7) Time Series - Compiled Process*

#### *Explore the data*

This step includes not only checking the number of samples and formatting, but also understanding the **time range**, **frequency**, and any missing values or anomalies in the data. You may need to perform data cleaning, **imputation**, or normalization to prepare the data for modeling.

In [None]:
# Importing the data

df = pf.read_csv('/content/BTC_USD_2013-10-01_2021-05-18-CoinDesk.csv',
                parse_dates=["Date"],
                index_col=["Date"]) # parse date column as datetime

df["Price (Example)"].plot(kind='line')

prices = pd.DataFrame(df["Price (Example)"]).rename(columns={"Price (Example)": "Price"})



# Importing time series data with Python's CSV module
import csv
from datetime import datetime

timesteps = []
prices = []
with open("/content/BTC_USD_2013-10-01_2021-05-18-CoinDesk.csv", "r") as f:
  csv_reader = csv.reader(f, delimiter=",")
  next(csv_reader) # skip first line (titles)
  for line in csv_reader:
    timesteps.append(datetime.strptime(line[1], "%Y-%m-%d")) # get dates from strs
    prices.append(float(line[2])) # get closing price as a float


# Visualize more clearly with matplotlib
import matplotlib.pyplot as plt
prices.plot(figsize=(10, 7))
plt.ylabel("Price")
plt.title("Price of Item from XX/XX/XX to XX/XX/XX", fontsize=16)
plt.legend(fontsize=14)



# Splitting the data the RIGHT WAY
split_size = int(0.8.len(prices)) # 80% train, 20% test

# Create train data splits
X_train, y_train = timesteps[:split_size], prices[:split_size]

X_test, y_test = timesteps[split_size:], prices[split_size:]

X_train.shape, X_test.shape, y_train.shape, y_test.shape



# To make dataset more performant --> tf.data API
train_features_dataset = tf.data.Dataset.from_tensor_slices(X_train)
train_labels_dataset = tf.data.Dataset.from_tensor_slices(y_train)

test_features_dataset = tf.data.Dataset.from_tensor_slices(X_test)
test_labels_dataset = tf.data.Dataset.from_tensor_slices(y_test)

# Combine labels and features by zipping together -> (features, labels)
train_dataset = tf.data.Dataset.zip((train_features_dataset, train_labels_dataset))
test_dataset = tf.data.Dataset.zip((test_features_dataset, test_labels_dataset))

# Batch and prefetch
BATCH_SIZE = 1024
train_dataset = train_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

#### *Create windowed dataset(s)*

This involves creating sliding windows of fixed length and step size from the time series data. The **window size** and **step size** should be carefully chosen based on the nature of the problem and the time series characteristics, such as the **seasonality**, trend, and noise. You may also want to add additional features to the windowed data, such as **lagged values**, **moving averages**, or **Fourier transforms**.

In [None]:
# Setup global variables for WINDOW and HORIZON
HORIZON = 1
WINDOW = 7

# Functionize windowing of labels
def get_labelled_windows(x, horizon=HORIZON):
    """
    Creates labels for windowed dataset.

      E.g., if horizon = 1
      Input: [0, 1, 2, 3, 4, 5, 6, 7] -> Output: ([0, 1, 2, 3, 4, 5, 6], [7])
    """
    
    return x[:, :-horizon], x[:, -horizon:]

# Test out the function
test_window, test_label = get_labelled_windows(tf.expand_dims(tf.range(8)+1, axis=0))
print(f"Window: {tf.squeeze(test_window).numpy()} -> Label: {tf.squeeze(test_label).numpy()}")



# Create function to view NumPy arrays as windows
import numpy as np

def make_windows(x, window_size=WINDOW_SIZE, horizon=HORIZON):
  """ Turns a 1D array into a 2D array of sequential labelled windows of
  window_size with horizon size labels.
  """
  # 1. Create a window of specific window_size (add the horizon on the end for labelling later)
  window_step = np.expand_dims(np.arange(window_size+horizon), axis=0)

  # 2. Create a 2D array of multiple window steps (minus 1 to account for 0 indexing)
  window_indexes = window_step + np.expand_dims(np.arange(len(x)-(window_size + horizon-1)), axis=0).T
  print(f"Window indexes:\n {window_indexes, window_indexes.shape}")

  # 3. Use array produced array to index on a target array (the time series)
  windowed_array = x[window_indexes]

  # 4. Get the labelled windows (functionized above)
  windows, labels = get_labelled_windows(windowed_array, horizon=horizon)
  return windows, labels



# Test
full_windows, full_labels = make_windows(prices, WINDOW, HORIZON)
len(full_windows), len(full_labels)

for i in range(3):
    print(f'Window: {full_windows[i]} -> Label {full_label[i]})
          
          
          
# Functionize creation of train and test splits
def make_train_test_splits(windows, labels, test_split=0.2):
  """
  Splits matching pairs of windows and labels into train and test splits.
  """
  
  split_size = int(len(windows) * (1-test_split)) # default to an 80-20 tr-ts split
  train_windows = windows[:split_size]
  train_labels = labels[:split_size]
  test_windows = windows[split_size:]
  test_labels = labels[split_size:]

  return train_windows, test_windows, train_labels, test_labels

#### *Consider various potential models*

Time series problems can be approached with a variety of models, including statistical models, machine learning models, and deep learning models. Some popular models for time series forecasting include **ARIMA**, **SARIMA**, **Prophet**, LSTM, and CNN.

In [None]:
# 1.SIMPLE DENSE MODEL (WINDOW = 7, HORIZON = 1 <-- change as necessary)
dense_model = tf.keras.Sequential([
    layers.Dense(128, activation="relu"),
    layers.Dense(HORIZON, activation="linear")
], name="dense_model")

dense_model.compile(loss=tf.keras.losses.mae,
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["mae", "mse"])

dense_model_history = dense_model.fit(train_windows, train_labels, epochs=100,
                              batch_size=128,
                              validation_data=(test_windows, test_labels),
                              verbose=True,
                              callbacks=[create_model_checkpoint(model_name=model_4.name)])



# 2. CONV1D MODEL
expand_dims_layer = layers.Lambda(lambda x: tf.expand_dims(x, axis=1))

conv_model = tf.keras.Sequential([
    expand_dims_layer,
    layers.Conv1D(filters = 128,
                 kernel_size = 5,
                 padding='causal',
                 activation='relu'),
    layers.Dense(HORIZON)
], name = 'conv1d_model')



# 3. LSTM MODEL
inputs = layers.Input(shape=(WINDOW))

x = layers.Lambda(lambda x: tf.expand_dims(x, axis=1))(inputs)

#x = layers.LSTM(128, return_sequences=True)(x)
x = layers.LSTM(128, activation="relu")(x)
#x = layers.Dense(32, activation="relu")(x)

outputs = layers.Dense(HORIZON)(x)
lstm_model = tf.keras.Model(inputs, outputs, name="lstm_model")



# 4. BIDIRECTIONAL LSTM MODEL
bidir_lstm_model = tf.keras.Sequential([
    tf.keras.layers.Conv1D(64, kernel_size = 3,
                          strides = 1, padding = 'causal',
                          activation = 'relu',
                          input_shape = [WINDOW, HORIZON]),
    tf.keras.layers.Bidirectional(
        tf.keras.layers.LSTM(64, return_sequences = True)),
    tf.keras.layers.Bidirectional(
        tf.keras.layers.LSTM(64)),
    tf.keras.layers.Dense(WINDOW, activation = 'relu'),
    tf.keras.layers.Dense(10, activation = 'relu'),
    tf.keras.layers.Dense(1)
])



# 5. ENSEMBLE MODEL (SEE NON-PROBLEM-SPECIFIC TASKS SECTION)

#### *Prepare metrics + eval-pipeline function*

The evaluation metrics for time series models may differ from those used in other types of problems. For example, in forecasting problems, you may want to use metrics such as ***mean absolute error (MAE)**, **mean squared error (MSE)**, or **symmetric mean absolute percentage error (SMAPE)**. You may also need to define custom loss functions or performance measures based on the specific problem requirements. Additionally, it is important to establish an evaluation pipeline that allows you to compare and select the best model based on the validation results.

In [None]:
# MASE implementation
def mean_absolute_scaled_error(y_true, y_pred):
  """
  Implement MASE (assuming no seasonality of a given dataset).
  """
  mae = tf.reduce_mean(tf.abs(y_true-y_pred))

  # Find MAE of naive forecast (no seasonality)
  mae_naive_no_season = tf.reduce_mean(tf.abs(y_true[1:]-y_true[:-1]))

  return mae / mae_naive_no_season



# Prediction evaluation function
def evaluate_preds(y_true, y_pred):
  # Make sure float32 (for metric calculations)
  y_true = tf.cast(y_true, dtype=tf.float32)
  y_pred = tf.cast(y_pred, dtype=tf.float32)

  # Calculate various metrics
  mae = tf.keras.metrics.mean_absolute_error(y_true, y_pred)
  mse = tf.keras.metrics.mean_squared_error(y_true, y_pred) # puts and emphasis on outliers (all errors get squared)
  rmse = tf.sqrt(mse)
  mape = tf.keras.metrics.mean_absolute_percentage_error(y_true, y_pred)
  mase = mean_absolute_scaled_error(y_true, y_pred)
  
  return {"mae": mae.numpy(),
          "mse": mse.numpy(),
          "rmse": rmse.numpy(),
          "mape": mape.numpy(),
          "mase": mase.numpy()}



# Adjust evaluate_preds() fn to work for larger horizons
def evaluate_preds(y_true, y_pred):
  # Make sure float32 (for metric calculations)
  y_true = tf.cast(y_true, dtype=tf.float32)
  y_pred = tf.cast(y_pred, dtype=tf.float32)

  # Calculate various metrics
  mae = tf.keras.metrics.mean_absolute_error(y_true, y_pred)
  mse = tf.keras.metrics.mean_squared_error(y_true, y_pred) # puts and emphasis on outliers (all errors get squared)
  rmse = tf.sqrt(mse)
  mape = tf.keras.metrics.mean_absolute_percentage_error(y_true, y_pred)
  mase = mean_absolute_scaled_error(y_true, y_pred)
  
  # Account for different sized metrics (for longer horizons, reduce metrics to single value)
  if mae.ndim > 0:
    mae = tf.reduce_mean(mae)
    mse = tf.reduce_mean(mse)
    rmse = tf.reduce_mean(rmse)
    mape = tf.reduce_mean(mape)
    mase = tf.reduce_mean(mase)

  return {"mae": mae.numpy(),
          "mse": mse.numpy(),
          "rmse": rmse.numpy(),
          "mape": mape.numpy(),
          "mase": mase.numpy()}



# To compute a moving average on windowed time series data...
def moving_average_forecast(series, window_size):
    """
    Forecasts the mean of the last few values.
    
    If window_size = 1, then this is equivalent to a naive forecast
    """
    
    forecast = []
    
    for time in range(len(series) - window_size):
        forecast.append(series[time:time + window_size].mean())
        
    # Convert to NP array
    np_forecast = np.array(forecast)
    
    return np_forecast

#### *Define desired callbacks*

 Callbacks are functions that can be called at specific points during training to perform certain actions, such as saving model checkpoints, logging training progress, or early stopping. Some useful callbacks for time series modeling include ModelCheckpoint, EarlyStopping, TensorBoard, and **ReduceLROnPlateau**.

In [None]:
# Create modelling checkpoint to save the best model during training
import os

# Create a function to implement a ModelCheckpoint cb with a specific filename
def create_model_checkpoint(model_name, save_path="model_experiments"):
  return tf.keras.callbacks.ModelCheckpoint(filepath=os.path.join(save_path,
                                                                  model_name),
                                            monitor="val_loss",
                                            verbose=0, # limit output
                                            save_best_only=True)



# ReduceLROnPlateau callback
tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",
                                    patience=100,
                                    verbose=1)



# EarlyStopping callbacks
tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                patience=25,
                                restore_best_weights = True)

#### *Identify an optimal Learning Rate*

 Learning rate is a hyperparameter that determines the step size of gradient descent during training. Choosing an appropriate learning rate can significantly affect the model performance and training time. You can try different learning rates using techniques such as **grid search** or **random search**, or use adaptive learning rate methods such as Adam or RMSprop.

In [None]:
def adjust_learning_rate(dataset):
    
    model = create_uncompiled_model() # choose and run a model
    
    lr_scheduler = tf.keras.callbacks.LearningRateScheduler(
    lambda epoch: 1e-4 * 10**(epoch / 20))
    
    opt = tf.keras.optimizers.Adam()
    
    model.compile(loss = tf.keras.losses.Huber(),
                 optimizer = opt,
                 metrics = ['mae'])
    
    history = model.fit(dataset, epochs = 100, callbacks = [lr_scheduler])
    
    return history

# Run training with dynamic LR
lr_history = adjust_learning_rate(train_set)

#### *Make + plot forecasts*

After training the model, you can use it to make forecasts for future time steps. You may also want to visualize the forecasts and compare them with the actual data to assess the model accuracy and usefulness. Plotting the model residuals and error metrics can also help you diagnose any model deficiencies or anomalies.

In [None]:
# How many timesteps into the future to predict
INTO_FUTURE = 14



# 1. Create function to make predictions into the future
def make_future_forecasts(values, model, into_future, window_size=WINDOW_SIZE) -> list:
  """
  Make future forecasts into future steps after values ends.

  Returns future forecasts as a list of floats.
  """
  # 2. Create an empty list for future forecasts/prepare data to forecast on
  future_forecast = []
  last_window = values[-WINDOW_SIZE:]

  # 3. Make INTO_FUTURE number of predictions, altering the data thats predicted on each time
  for _ in range(INTO_FUTURE):
    # Predict on the last window then append it again, again, again...
    # (our model will eventually begin to make forecasts on its own forecasts)
    future_pred = model.predict(tf.expand_dims(last_window, axis=0))
    print(f"Predicting on:\n {last_window} -> Prediction: {tf.squeeze(future_pred).numpy()}\n")

    # Append prediction to future_forecast
    future_forecast.append(tf.squeeze(future_pred).numpy())

    # Update last window with new pred and get WINDOW_SIZE most recent preds
    # (model was trained on WINDOW_SIZE windows)
    last_window = np.append(last_window, future_pred)[-WINDOW_SIZE:]

  return future_forecast



# Make forecasts into the future
future_forecast = make_future_forecasts(y_all,
                                        model_9,
                                        INTO_FUTURE,
                                        WINDOW_SIZE)



def get_future_dates(start_date, into_future, offset=1):
  """
  Returns array of datetime values ranging from start_date to start_date+into_future.
  """
  start_date = start_date + np.timedelta64(offset, "D") # specify start date
  end_date = start_date + np.timedelta64(into_future, "D") # specify end date
  return np.arange(start_date, end_date, dtype="datetime64[D]")

# Last timestep of timesteps(currently in np.datetime64 format)
last_timestep = prices.index[-1]

# Get next two weeks of timesteps
next_time_steps = get_future_dates(start_date=last_timestep,
                                  into_future=INTO_FUTURE)

# Insert last timestep/final price into next timestep and future forecasts
# to prevent disjointed graph
next_time_steps = np.insert(next_time_steps, 0, last_timestep)
future_forecast = np.insert(future_forecast, 0, btc_price[-1])

# Plot future price predictions
plt.figure(figsize=(10, 7))
plot_time_series(prices.index,
                 price,
                 start=1500,
                 format="-",
                 label="Actual Price")
plot_time_series(next_time_steps,
                 future_forecast, 
                 format="-",
                 label="Predicted Price")



# ALTERNATIVE PLOTTING FUNCTION
def plot_series(time, series, format = "-", title = "", label = None, start = 0, end = None):
    """
    Plot the series.
    """
    
    plt.plot(time[started:end], series[start:end], format, label=label)
    plt.xlabel("Time")
    plt.ylabel("Value")
    plt.title(title)
    if label:
        plt.legend()
    plt.grid(True)

#### *Save + export model (.h5)*

Once you have selected the best model, you should save it to disk for future use or deployment. The Keras API provides a convenient way to save and load models in the HDF5 format, which can be easily integrated into other applications or frameworks.

In [None]:
model.save('saved_time_series_model/my_model')

# To compress the directory using tar...
! tar -czvf saved_model.tar.gz save_model/

# Non-Problem-Specific Tasks

#### *Save models in various formats*

* SavedModel format:

```
model.save("saved_trained_model_name")

```

* HDF5 format:

```
model.save('path/to/model.h5')
```

* Testing a loaded mdoel:

```
loaded_model = tf.keras.models.load_model("saved_trained_model_name")
loaded_model.evaluate(test_data)

model.evaluate(test_data)
```

#### *Plotting loss & accuracy*

In [25]:
def plot_loss_curves(history):
    """
    Returns separate loss curves for the training and validation metrics.
    """
    
    loss = baseline_hist.history["loss"]
    val_loss = baseline_hist.history["val_loss"]
    
    accuracy = baseline_hist.history["accuracy"]
    val_accuracy = baseline_hist.history["val_accuracy"]
    
    epochs = range(len(baseline_hist.history["loss"]))
    
    # Plot loss curves
    plt.plot(epochs, loss, label="training_loss")
    plt.plot(epochs, val_loss, label="val_loss")
    plt.title("loss")
    plt.xlabel("epochs")
    plt.legend()
    
    # Plot accuracy
    plt.figure()
    plt.plot(epochs, accuracy, label="training_accuracy")
    plt.plot(epochs, val_accuracy, label="val_accuracy")
    plt.title("accuracy")
    plt.xlabel("epochs")
    plt.legend()

#### *Preventing overfitting*

Some things to try, if you are running into problems with a model:
* Adding model layers (e.g., `Conv2D`, `MaxPool2D`, `Dense`, etc.)
* Increase the number of filters in each Conv layer
* Increase the number of hidden units
* Try implementing Dropout
* Add data augmentation
* Change activation functions (and ensure you've selected an appropriate one to begin with)
* Change the optimization function
* Optimize the learning rate (see below)
* Fit the model on more data
* Fit the model for longer
* Implement various callbacks
* Use ***transfer learning*** to leverage what another model has learned and use it for your own use case

#### *Data augmentation*

* Create an ImageDataGenerator instance with data augmentation
```
train_datagen_aug = ImageDataGenerator(rescale=1/255.,
                                      rotation_range=0.2,
                                      shear_range=0.2,
                                      zoom_range=0.2,
                                      width_shift_range=0.2,
                                      height_shift_range=0.2,
                                      horizontal_flip=True)
```

* Import data and augment it from training directory
```
train_datagen_aug = train_datagen_aug.flow_from_directory(train_dir,
                                                         target_size=(224, 224),
                                                         class_mode="binary",
                                                         shuffle=True)
```

Be sure to select appropriate values for each argument, especially your image `target_size` and `class_mode` (could be `categorical` instead of `binary`)

#### *Dropout*

* Define a NN architecture:
```
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
)]
```

* Add a Dropout layer:
```
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.5),   <----- Specify dropout rate %
    tf.keras.layers.Dense(10, activation='softmax')
)]
```

#### *Batch loading of data*

* Map preprocessing function to training data (and parallelize):
```
train_data = train_data.map(map_func=preprocess_img, num_parallel_calls=tf.data.AUTOTUNE)
```

* Shuffle train_data and turn it into batches and prefetch it (load it faster):
```
train_data = train_data.shuffle(buffer_size=1000).batch(batch_size=32).prefetch(buffer_size=tf.data.AUTOTUNE)
```

* Map preprocessing function to test data:
```
test_data = test_data.map(map_func=preprocess_img, num_parallel_calls=tf.data.AUTOTUNE)
```

* Turn test data into batches (don't need to shuffle)
```
test_data = test_data.batch(32).prefetch(tf.data.AUTOTUNE)
```

*(See pg 422 of Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow Book by Aurelion Geron)* 

#### *Callbacks*

* **ModelCheckpoint**:
```
filepath = "path/to/checkpoints
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=filepath,
                                                         save_best_only=True,
                                                         save_weights_only=True,
                                                         monitor='val_loss,
                                                         verbose=True)
```

* **EarlyStopping**:
```
earlystop_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                                       patience=5,
                                                       verbose=1)
```

* **TensorBoard**:
```
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir='directory_name/logs',
                                                       histogram_freq=1,
                                                       profile_batch=0)
```

* **LearningRateScheduler**:
```
learningrate_callback = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-3 * 0.9 ** epoch)
```

* **ReduceLROnPlateau**:
```
reducelr_callback = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss',
                                                         factor=0.1,
                                                         patience=3,
                                                         verbose=1)
```

* *Implement desired callbacks as follows*:
```
history = model.fit(x_train, y_train,
                    epochs=20,
                    batch_size=64,
                    validation_data=(x_test, y_test),
                    callbacks=[checkpoint_callback,
                                earlystop_callback,
                                tensorboard_callback,
                                learningrate_callback,
                                reducelr_callback])
```

#### *Dataset formats (JSON, CSV, pandas)*

* **JSON**:

```
import json
import os

!wget github.content/url

with open("file.json", "r") as f:
    example = json.load(f)
    
print(example)
```

* **CSV**:

```
import csv

with open('data.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        [ YOUR CODE HERE ]
        
```

* **pandas**:

```
import pandas as pd

df = pd.read_csv("data.csv")

df.head()
```

#### *Dynamically adjusting learning rates*

* Find ideal learning rate:

```
import numpy as np
import matplotlib.pyplot as plt

lrs = 1e-3 * 10**(tf.range(40)/20)
plt.semilogx(lrs, fit_lr_history.history["loss"])
plt.xlabel("Learning rate")
plt.ylabel("Loss")
plt.title("Find the ideal LR")
```

* Schedule learning rate for future training:

```
learningrate_callback = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-3 * 0.9 ** epoch)
```

#### *Visualization of various classification + other metrics*

* **Loss & Accuracy**:

```
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Model loss on the test set: {loss}")
print(f"Model accuracy on the test set: {(accuracy*100):.2f}%")
```

* **Precision, Recall, F1 score**

```
from sklearn.metrics import precision_score, recall_score, f1_score

precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
```

* **Confusion matrix**:

```
from sklearn.metrics import confusion_matrix

confusion_matrix(y_true = test_labels, y_pred = y_preds)
```

* **Create a more visual confusion matrix**:

(See `make_confusion_matrix` function below)

```
make_confusion_matrix(y_true=test_labels,
                        y_pred=y_preds,
                        classes=class_names,
                        figsize=(25, 25),
                        text_size=15)
```

* **OR, Functionize an Evaluation Pipeline**:

```
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

def get_scores(y_true, y_pred):

  model_accuracy = accuracy_score(y_true, y_pred) * 100
  model_precision, model_recall, model_f1, _ = precision_recall_fscore_support(y_true, y_pred, average="weighted")

  model_results = {"accuracy": model_accuracy,
                   "precision": model_precision,
                   "recall": model_recall,
                   "f1": model_f1}
  return model_results
```

In [4]:
import itertools
from sklearn.metrics import confusion_matrix

def make_confusion_matrix(y_true, y_pred, classes=None, figsize=(10, 10), text_size=15):
    
    cm = confusion_matrix(y_true, y_pred)
    cm_nrom = cm.astype("float")/cm.sum(axis=1)[:, np.newaxis]
    
    n_classes = cm.shape[0]
    
    fig, ax = plt.subplots(figsize = figsize)
    
    cax = ax.matshow(cm, cmap = plt.cm.Blues)
    fig.colorbar(cax)
    
    # Set labels to be classes
    if classes:
        labels = classes
    else:
        labels = np.arange(cm.shape[0])
        
    # Label the axes
    ax.set(title = "Confusion Matrix",
          xlabel = "Predicted label",
          ylabel = "True label",
          xticks = np.arange(n_classes),
          yticks = np.arange(n_classes),
          xticklabels = labels,
          yticklabels = labels)
    
    # Set x-axis labels to bottom
    ax.xaxis.set_label_position("bottom")
    ax.xaxis
    
    # Adjust label size
    ax.yaxis.label.set_size(text_size)
    ax.xaxis.label.set_size(text_size)
    ax.title.set_size(text_size)
    
    # Set threshold for different colors
    threshold = (cm.max() + cm.min()) / 2
    
    # Plot the text on the cell
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, f"{cm[i, j]} ({cm_norm[i, j]*100:.1f}%)",
                  horizontalalignment = "center",
                  color = "white" if cm[i, j] > threshold else "black",
                  size=15)

#### *Making a dataset performant using tf.data API*

```
train_features_dataset = tf.data.Dataset.from_tensor_slices(X_train)
train_labels_dataset = tf.data.Dataset.from_tensor_slices(y_train)

test_features_dataset = tf.data.Dataset.from_tensor_slices(X_test)
test_labels_dataset = tf.data.Dataset.from_tensor_slices(y_test)

train_dataset = tf.data.Dataset.zip((train_features_dataset, train_labels_dataset))
test_dataset = tf.data.Dataset.zip((test_features_dataset, test_labels_dataset))

BATCH_SIZE = 1024
train_dataset = train_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

train_dataset, test_dataset
```

#### *Creating ensemble methods*

In [8]:
HORIZON = 7 # specific to your problem
train_dataset = [1, 2, 3, 4, 5] # your traing data
test_dataset = [6, 7, 8, 9, 10] # your test data

def get_ensemble_models(horizon=HORIZON,
                        train_data=train_dataset,
                        test_data=test_dataset,
                        num_iter=10,
                        num_epochs=1000,
                        loss_fns=["mae", "mse", "mape"]):
  """
  Returns a list of num_iter models, each trained on MAE, MSE and MAPE loss.

  For example, if num_iter=10, a list of 30 trained models will be returned:
  10 * len(["mae", "mse", "mape"]).
  """
  # Make empty list for trained ensemble models
  ensemble_models = []

  # Create num_iter number of models per loss function
  for i in range(num_iter):
    # Build and fit a new model with a different loss function
    for loss_function in loss_fns:
      print(f"Optimizing model by reducing: {loss_function} for {num_epochs} epochs, model number: {i}")

      # Construct a simple model (similar to model_1)
      model = tf.keras.Sequential([
          layers.Dense(128, kernel_initializer="he_normal", activation="relu"),
          layers.Dense(128, kernel_initializer="he_normal", activation="relu"),
          layers.Dense(HORIZON)
      ])

      # Compile simple model with current loss_fn
      model.compile(loss=loss_function,
                    optimizer=tf.keras.optimizers.Adam(),
                    metrics=["mae", "mse"])
      
      # Fit the current model (i)
      model.fit(train_data,
                epochs=num_epochs,
                verbose=0,
                validation_data=test_data,
                callbacks=[tf.keras.callbacks.EarlyStopping(monitor="val_loss",
                                                            patience=200,
                                                            restore_best_weights=True),
                           tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",
                                                                patience=100,
                                                                verbose=1)])
      # Append fitted model to list of ensemble models
      ensemble_models.append(model)

  return ensemble_models

#### *Accounting for horizon variance in time series prediction*

* (1) Use comparison operator to carry out logic depending on metric shape:
* (2) Calculate various metrics
* (3) Account for different sized metrics (for longer horizons, reduce metrics to single value)

```
def evaluate_preds(y_true, y_pred):
    y_true = tf.cast(y_true, dtype=tf.float32)
    y_pred = tf.cast(y_pred, dtype=tf.float32)

    mae = tf.keras.metrics.mean_absolute_error(y_true, y_pred)
    mse = tf.keras.metrics.mean_squared_error(y_true, y_pred)
    rmse = tf.sqrt(mse)
    mape = tf.keras.metrics.mean_absolute_percentage_error(y_true, y_pred)
    mase = mean_absolute_scaled_error(y_true, y_pred)

    if mae.ndim > 0:
        mae = tf.reduce_mean(mae)
        mse = tf.reduce_mean(mse)
        rmse = tf.reduce_mean(rmse)
        mape = tf.reduce_mean(mape)
        mase = tf.reduce_mean(mase)

    return {"mae": mae.numpy(),
            "mse": mse.numpy(),
            "rmse": rmse.numpy(),
            "mape": mape.numpy(),
            "mase": mase.numpy()}
```

#### *Plotting prediction intervals*

`import numpy as np`

* Get the median/mean values of our ensemble preds

`ensemble_median = np.median(ensemble_preds, axis=0)`

* Plot the median of our ensemble_preds along with the prediction intervals

```
offset = 500
plt.figure(figsize=(10, 7))
plt.plot(X_test.index[offset:], y_test[offset:], "r", label="Test Data")
plt.plot(X_test.index[offset:], ensemble_median[offset:], "-", label="Ensemble Median")
plt.xlabel("Date")
plt.ylabel("BTC Price")
plt.fill_between(X_test.index[offset:],
                 (lower)[offset:],
                 (upper)[offset:], label="Prediction Intervals") # plot upper and lower bounds
plt.legend(loc="upper left", fontsize=14)
```

### *References*

# General Vocabulary and Terms List for Flashcard Study

#### *Alphabetical vocabulary list:*

* accuracy
* activation functions
* Adam optimization
* ARIMA
* bag-of-words
* batching
* batch normalization
* batch size
* binary cross entropy loss
* categorical cross entropy loss
* callbacks
* character-level embedding layer
* character-level tokenizers
* checking the shapes
* classification report
* Confusion matrix
* Conv1D models
* Correlation matrix
* data augmentation
* data synthesis
* early stopping callback
* Embedding
* exploring the data
* f1 score
* features
* Fourier transforms
* frequency
* GRU
* grid search
* histogram
* HDF5 format
* imputation
* KerasLayer
* labels
* label smoothing
* lagged values
* layer.concatenate
* learning rate
* learning rate decay function
* learning rate scheduler
* lemmatization
* list of class_names
* loss functions
* LSTM
* MaxPool2D
* mean absolute error
* mean squared error
* metrics
* missing values
* model checkpoint callback
* moving averages
* normalization
* one-hot encoding
* optimizers
* padding
* plot the learning rate decay
* positional embeddings
* precision
* prefetching
* Prophet
* random search
* recall
* ReduceLROnPlateau
* regularization
* removing outliers
* ResNet
* resizing your images
* SavedModel format
* SARIMA
* scatter plot
* seasonality
* sigmoid activation
* splitting the data
* softmax activation
* stemming
* step size
* stop words
* symmetric mean absolute percentage error
* TensorBoard callback
* TensorFlow Hub
* term frequency-inverse document frequency (TF-IDF)
* tf.data.Dataset.from_tensor_slices
* tf.keras.preprocessing.image_dataset_from_directory
* tf.keras.layers.Embedding
* tf.keras.Model
* tf.keras.Model.predict
* tf.keras.preprocessing.image.ImageDataGenerator
* tf.keras.preprocessing.sequence.pad_sequences
* tf.keras.preprocessing.text.one_hot
* time range
* tokens
* tokenization
* VGG16
* visualizing the data
* window size

#### *Definitions/Descriptions*

* **accuracy** - Accuracy is a metric used to evaluate the performance of a classification model. It measures the percentage of correct predictions made by the model

* **activation functions** - Activation functions are mathematical functions that are applied to the output of a neuron in a neural network. They introduce non-linearity into the network and help in modeling complex relationships between input and output variables. Common activation functions include `'sigmoid'`, `'relu'`, and `'tanh'`

* **Adam optimization** - Adam optimization is an optimization algorithm used in neural networks. It combines the benefits of two other optimization algorithms, stochastic gradient descent and momentum, to improve the speed and efficiency of model training

* **ARIMA** - ARIMA (*Autoregressive Integrated Moving Average*) is a statistical model used for time series analysis and forecasting. It is a popular model for modeling non-stationary time series data, and is characterized by three parameters: p, d, and q, which represent the order of the autoregressive, differencing, and moving average components, respectively

* **bag-of-words** - The bag-of-words model is a simple and effective way of representing text data for machine learning models. In this model, a piece of text is represented as a bag (multiset) of its individual words, disregarding grammar and word order, but keeping track of the frequency of each word in the text. The bag-of-words model is often used as a feature extraction technique for text classification, sentiment analysis, and other natural language processing (NLP) tasks

* **batching** - Batching is a technique used in machine learning to process data in batches or groups, rather than individually. Batching can improve the efficiency of model training by reducing the number of times the model needs to be updated based on individual data points. Batching is commonly used in NLP tasks to process sequences of text data of different lengths

* **batch normalization** - a technique used in deep learning to normalize the inputs to a layer in a neural network by scaling and shifting them to a standard normal distribution. This technique helps in reducing the internal covariate shift, which is the change in the distribution of inputs to a layer during training, and can improve the model's performance

* **batch size** - refers to the number of training examples used in each iteration of the training process. A larger batch size can lead to faster convergence, but it requires more memory and may lead to overfitting. A smaller batch size requires less memory, but it may converge more slowly

* **binary cross entropy loss** - a type of loss function used in neural networks for binary classification problems. It measures the difference between the predicted probability of a positive class and the actual probability of the positive class

* **categorical cross entropy loss** - a type of loss function used in neural networks for multi-class classification problems. It measures the difference between the predicted probability distribution and the actual probability distribution of the classes

* **callbacks** - functions that can be called during the training of a machine learning model. They can be used to perform various tasks, such as saving the model, monitoring the training progress, or adjusting the learning rate

* **character-level embedding layer** - A character-level embedding layer is a neural network layer that maps each character in a piece of text to a dense vector representation. This can be useful for NLP tasks where the meaning of a word is influenced by the specific characters it contains, such as misspellings, abbreviations, or slang. Character-level embedding layers can be used in conjunction with other neural network layers, such as Conv1D layers, to build machine learning models for text classification, sentiment analysis, and other tasks

* **character-level tokenizers** - NLP tools that break down text into individual characters instead of words. This can be useful for tasks where the meaning of a word is not as important as its spelling or for languages with complex character systems. Character-level tokenizers can be used in conjunction with other NLP techniques, such as embedding layers, to build machine learning models for text classification, named entity recognition, and other tasks

* **checking the shapes** - Checking the shapes refers to the process of verifying that the dimensions of the input and output tensors of a machine learning model are correct. This is important to ensure that the model is processing the data correctly and to prevent errors

* **classification report** - A classification report is a summary of the performance of a classification model. It includes metrics such as accuracy, precision, recall, and F1 score, as well as the confusion matrix and other information about the model

* **Confusion matrix** - A confusion matrix is a table used to evaluate the performance of a classification model. It shows the number of true positive, false positive, true negative, and false negative predictions made by the model

* **Conv1D models** - Conv1D models are neural network models that use one-dimensional convolutional layers to process input data. In the context of NLP, Conv1D models are often used for text classification, sentiment analysis, and other tasks where the input data is a sequence of text. The convolutional layers can learn to recognize patterns in the text, such as sequences of words or characters, that are indicative of the task at hand. Conv1D models are often used in conjunction with other neural network layers, such as pooling layers or recurrent layers, to build more complex models for NLP tasks

* **correlation matrix** - A correlation matrix is a table that shows the correlation coefficients between pairs of variables in a dataset. It is used to identify the strength and direction of the relationships between variables

* **data augmentation** - Data augmentation refers to the process of generating new training data from existing data by applying transformations such as rotation, zooming, and flipping. This can help to increase the diversity and size of the dataset, which can improve the performance of a machine learning model

* **data synthesis** - a technique used in machine learning to generate synthetic data that can be used to augment or supplement the existing data. Data synthesis is used to increase the size of the training data, improve the model's robustness, and reduce overfitting

* **early stopping callback** - It is a technique used in machine learning to prevent overfitting of the model. During the training process, the model is evaluated on a validation set, and the training is stopped when the model starts to perform poorly on the validation set. The early stopping callback is a function that monitors the model's performance on the validation set and stops the training when the model's performance starts to degrade. This helps in preventing overfitting, and the model can generalize better on unseen data

* **embedding** - a technique used in natural language processing to represent words or phrases as high-dimensional vectors in a mathematical space. Word embeddings are created by training a neural network on a large corpus of text data to learn the semantic relationships between words. Embeddings can be used to improve the accuracy of text classification, sentiment analysis, and information retrieval systems. They are also used in computer vision to represent images as high-dimensional vectors

* **exploring the data** - Exploring the data involves examining the dataset to understand its properties, distributions, and patterns. This is an important step in machine learning as it helps in identifying the features that are relevant for the model, detecting outliers, and understanding the relationships between the variables

* **f1 score** - F1 score is a metric used to evaluate the performance of a classification model. It is the harmonic mean of precision and recall, and it balances between the two metrics

* **features** - Features are the measurable and observable properties or characteristics of an object or phenomenon that are used to build a machine learning model. In machine learning, features are the input variables that are fed into the algorithm to predict an output. For example, in a dataset of housing prices, features could include the number of bedrooms, square footage, location, etc.

* **Fourier transforms** - A Fourier transform is a mathematical technique used to decompose a complex signal into its component frequencies. In machine learning and signal processing, Fourier transforms can be used to analyze and manipulate time series data, including speech recognition, image processing, and audio analysis

* **frequency** -  Frequency refers to the rate at which something occurs, and can be an important factor in machine learning and AI. For example, if you are analyzing a dataset of user behavior on a website, you might look at the frequency with which users visit the site or perform certain actions, such as clicking on a particular button. Frequency can also refer to the frequency of a signal or waveform, such as in audio or image processing. In these cases, frequency analysis techniques such as Fourier transforms or wavelet transforms may be used to extract useful features from the data


* **GRU** - GRU (Gated Recurrent Unit) is another type of recurrent neural network (RNN) architecture that is similar to LSTM but with fewer gates. GRUs use two gates (reset and update gates) to control the flow of information and better capture long-term dependencies in sequential data. Compared to LSTMs, GRUs are simpler and faster to train, but may not perform as well on more complex NLP tasks

* **grid search** - Grid search is a hyperparameter tuning technique used to search for the optimal combination of hyperparameters for a machine learning model. It involves defining a grid of possible hyperparameter values and training and evaluating the model on all possible combinations of hyperparameters

* **histogram** - A histogram is a graphical representation of the distribution of a dataset. It shows the frequency of data points that fall within a certain range of values. The x-axis represents the range of values, while the y-axis represents the frequency

* **HDF5 format** - The `HDF5` format is file format used to save and load machine learning models. It is a hierarchical format that can store large amounts of data and metadata, making it useful for deep learning models

* **imputation** - Imputation is a technique used to fill in missing values in a dataset. In machine learning, missing values can cause problems during training, as the model may not have enough information to make accurate predictions. Imputation involves filling in the missing values with estimates based on the remaining data

* **KerasLayer** - a layer in TensorFlow that allows you to use pre-trained models from Keras or TensorFlow Hub as a layer in your own model. The KerasLayer is a feature in TensorFlow that enables you to use a pre-trained model as a layer in your own model with a simple API

* **labels** - Labels are the output variables that machine learning algorithms try to predict based on the input variables or features. They represent the target or outcome of the model. For example, in a dataset of housing prices, the label could be the actual price of the house

* **label smoothing** - Label smoothing is a regularization technique used in machine learning to prevent overfitting and improve generalization. It involves adding a small amount of noise to the ground truth labels during training. This helps to encourage the model to be less confident about its predictions and avoid becoming too focused on specific examples in the training data

* **lagged values** - Lagged values refer to values of a time series that occurred at previous points in time. In time series analysis, lagged values can be used as features in a machine learning model to capture dependencies and correlations between past and present values. For example, if you are predicting the temperature tomorrow based on the temperature today, yesterday's temperature could be considered a lagged value. In TensorFlow, lagged values can be generated using functions such as `tf.keras.preprocessing.timeseries_dataset_from_array()`

* **layer.concatenate** - The `layer.concatenate` function is a method in TensorFlow used for concatenating (i.e., joining) two or more tensors along a specified axis. This function is commonly used in building neural network architectures to combine the outputs of multiple layers or different branches of a network

* **learning rate** - a hyperparameter that controls the step size during the optimization process. It determines how much the weights and biases of the model are adjusted during each iteration of the training process. A higher learning rate can lead to faster convergence, but it may also lead to instability and oscillation. A lower learning rate may take longer to converge, but it may be more stable and less likely to oscillate

* **learning rate decay function** - A learning rate decay function is a function that reduces the learning rate of a machine learning model over time. It is often used to help the model converge more efficiently and effectively

* **learning rate scheduler** - A learning rate scheduler is a function that adjusts the learning rate of a machine learning model during training. It can be used to improve the convergence of the model and prevent overfitting. Examples of learning rate schedulers include step decay, exponential decay, and cyclic learning rates

* **lemmatization** - a process of reducing a word to its base or dictionary form, which is known as the lemma. Lemmatization is a more sophisticated technique than stemming, as it considers the context of the word and maintains its grammatical and semantic meaning. This technique is commonly used in natural language processing to improve the accuracy of text analysis and information extraction

* **list of class_names** - A list of class_names is a list that contains the names of the different classes in a classification problem. It is often used in conjunction with a confusion matrix or classification report to label the rows and columns of the table

* **loss functions** - Loss functions are used to evaluate the performance of a machine learning model. They measure the difference between the predicted output and the actual output. The goal of a loss function is to minimize the difference between the predicted and actual output. Examples of loss functions include mean squared error, categorical cross-entropy, and binary cross-entropy

* **LSTM** - LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) architecture that is commonly used for natural language processing (NLP) tasks. LSTMs are designed to address the issue of vanishing gradients in traditional RNNs by using a memory cell and various gates (input, forget, and output gates) to selectively retain or discard information over time. This allows LSTMs to better capture long-term dependencies in sequential data, such as text

* **MaxPool2D** - It is a type of pooling layer used in convolutional neural networks (CNNs). The MaxPool2D layer reduces the spatial dimensions of the input by taking the maximum value in each window of the input. This helps in reducing the number of parameters in the model and makes it less sensitive to small shifts and distortions in the input

* **mean absolute error (MAE)** - MAE is a metric used to evaluate the accuracy of a machine learning model. It measures the average absolute difference between the predicted values and the true values. It is calculated by taking the average of the absolute differences between the predicted values and the true values

* **mean squared error (MSE)** - MSE is a metric used to evaluate the accuracy of a machine learning model. It measures the average squared difference between the predicted values and the true values. It is calculated by taking the average of the squared differences between the predicted values and the true values

* **metrics** - Metrics are used to measure the performance of a machine learning model. They provide a quantitative measure of how well the model is performing. Examples of metrics include accuracy, precision, recall, and F1 score

* **missing values** - Missing values are data points that are not available in a dataset. They can occur due to a variety of reasons, such as human error, data corruption, or data loss. Missing values can be handled by either removing the data points, imputing the missing values with a suitable value, or using algorithms that can handle missing data

* **model checkpoint callback** - It is a function used in machine learning to save the best model weights during the training process. The Model Checkpoint Callback is a feature in TensorFlow that allows you to save the best model weights based on a chosen metric, such as accuracy or loss. This helps in preventing the loss of progress in the training process and allows you to resume training from the last saved checkpoint

* **moving averages** - A moving average is a technique used to smooth out fluctuations in a time series by calculating the average value of a subset of neighboring data points over a sliding window. The moving average can help to identify trends and patterns in the data, and can be useful in forecasting and anomaly detection

* **normalization** - It is a technique used to preprocess the data before feeding it to a machine learning model. The idea is to scale the data so that it has zero mean and unit variance. This helps in improving the convergence of the model and makes it less sensitive to the scale of the input features. There are several types of normalization techniques, such as Z-score normalization, min-max normalization, and batch normalization

* **one-hot encoding** - a technique used to represent categorical variables as binary vectors. Each category is represented as a vector of zeros and ones, where the position corresponding to the category is set to one and all other positions are set to zero. One-hot encoding is commonly used in machine learning algorithms that require numerical inputs, such as neural networks and decision trees

* **optimizers** - Optimizers are algorithms used to optimize or adjust the weights and biases of a machine learning model during training. The goal of an optimizer is to minimize the loss function of the model by adjusting the weights and biases in the right direction. Examples of optimizers include Adam, RMSprop, and stochastic gradient descent (SGD)

* **padding** - Padding refers to the process of adding additional elements to the edge of an input image or sequence to ensure that it is the desired size for a machine learning model. This can be useful when working with datasets that contain images or sequences of different sizes

* **plot the learning rate decay** - Plotting the learning rate decay refers to the process of visualizing the learning rate decay function over time. This can help to identify any issues or anomalies with the function and ensure that it is working as intended

* **positional embeddings** - Positional embeddings are a type of feature representation used in natural language processing (NLP) tasks, such as text classification or machine translation. They are used to encode the position of each word or token in a sequence, such as a sentence or paragraph, so that the model can better understand the context of each word

* **precision** - Precision is a metric used to evaluate the performance of a classification model. It measures the percentage of true positive predictions out of all positive predictions made by the model

* **prefetching** - Prefetching is a technique used in machine learning to improve data loading performance. It involves loading the data for the next batch while the current batch is being processed. This way, the data is ready for processing as soon as the current batch is finished, reducing the wait time for data loading and potentially increasing the overall speed of training. In TensorFlow, prefetching can be implemented using the `tf.data.Dataset.prefetch()` method, which allows you to specify the number of batches to prefetch

* **Prophet** - Prophet is a forecasting model developed by Facebook that is designed to handle time series data with strong seasonal effects and multiple periods of seasonality. It is based on a generalized additive model (GAM) and uses Bayesian methods to estimate parameters and uncertainty intervals. Prophet has gained popularity for its ease of use and flexibility in handling a wide range of time series data

* **random search** - Random search is a hyperparameter tuning technique used to search for the optimal combination of hyperparameters for a machine learning model. It involves randomly sampling hyperparameters from a defined distribution and training and evaluating the model on each sampled combination. Unlike grid search, it does not search over all possible combinations of hyperparameters, making it more computationally efficient

* **recall** - Recall is a metric used to evaluate the performance of a classification model. It measures the percentage of true positive predictions out of all actual positive cases in the dataset

* **ReduceLROnPlateau** - `ReduceLROnPlateau` is a callback function in TensorFlow that automatically adjusts the learning rate of a model when the validation loss has stopped improving. It reduces the learning rate by a factor specified by the user, allowing the model to continue to learn at a lower rate until it converges

* **regularization** - It is a technique used to prevent overfitting in machine learning models. The idea is to add a penalty term to the loss function that the model is optimizing. The penalty term encourages the model to have smaller weights and biases, which can reduce overfitting. The two most commonly used regularization techniques are L1 and L2 regularization. L1 regularization adds a penalty term proportional to the absolute value of the weights, whereas L2 regularization adds a penalty term proportional to the square of the weights

* **removing outliers** - Outliers are extreme values in a dataset that are significantly different from the rest of the values. They can affect the accuracy of machine learning models by skewing the data. Removing outliers involves identifying and eliminating data points that are far away from the mean or median of the dataset

* **ResNet** - It is a deep convolutional neural network architecture that was developed by Microsoft Research Asia. The ResNet architecture introduced a new concept called residual connections, which allows the network to learn more easily and overcome the problem of vanishing gradients. ResNet is a popular choice for image classification, object detection, and segmentation tasks

* **resizing your images** - Resizing your images refers to the process of changing the dimensions of an image. This is often done to standardize the size of the images in a dataset or to prepare them for use in a machine learning model

* **SavedModel format** - The SavedModel format is a file format used to save and load machine learning models in TensorFlow. It is a portable format that can be used across different platforms and programming languages

* **SARIMA** - SARIMA (Seasonal Autoregressive Integrated Moving Average) is a variant of the ARIMA model that includes additional parameters to account for seasonality in the data. In addition to the p, d, and q parameters, SARIMA models also include seasonal components represented by the P, D, and Q parameters, which capture the seasonal autoregressive, differencing, and moving average components, respectively

* **scatter plot** - A scatter plot is a graphical representation of a dataset that displays the relationship between two variables. It is used to identify patterns, trends, and outliers in the data. The x-axis represents one variable, while the y-axis represents the other variable

* **seasonality** - Seasonality refers to a repeating pattern or cycle in a time series. In time series analysis, seasonality can be caused by various factors such as the time of day, day of the week, or month of the year

* **sigmoid activation** - Sigmoid activation is an activation function used in neural networks. It maps any input value to a value between 0 and 1, which makes it useful for binary classification problems. It is defined as `1 / (1 + exp(-x))`

* **splitting the data** - Splitting the data refers to the process of dividing a dataset into training, validation, and test sets. The training set is used to train the machine learning model, the validation set is used to tune the hyperparameters of the model, and the test set is used to evaluate the performance of the model on unseen data

* **softmax activation** - Softmax activation is a type of activation function used in neural networks. It maps the output of a model to a probability distribution over multiple classes. It is commonly used for multi-class classification problems

* **stemming** - a process of reducing a word to its base or root form by removing the suffixes or prefixes. Stemming is a commonly used technique in natural language processing to simplify the text data and reduce its dimensionality. It can help to improve the accuracy of text classification and information retrieval systems

* **step size** - Step size, also known as the stride, refers to the number of data points that the sliding window moves forward between each batch. The step size determines the degree of overlap between consecutive batches

* **stop words** - a common term used in natural language processing (NLP) to refer to words that are filtered out from the text because they are considered to be of little importance or redundant. Stop words are typically very common words such as "the", "a", "an", "in", "of", etc. and can be removed from the text before processing it with machine learning algorithms. Removing stop words can help to reduce the dimensionality of the text data and improve the accuracy of NLP models

* **symmetric mean absolute percentage error (SMAPE)** - SMAPE is a metric used to evaluate the accuracy of a machine learning model for time series data. It measures the percentage difference between the predicted values and the true values. Unlike MAE and MSE, SMAPE is symmetric, which means it gives equal weight to over-predictions and under-predictions

* **TensorBoard callback** - The TensorBoard callback is a feature in TensorFlow that allows you to visualize and monitor the training of a machine learning model. It provides real-time feedback on the model's performance and allows you to monitor the loss and accuracy of the model during training

* **TensorFlow Hub** - a library in TensorFlow that provides a central location for pre-trained machine learning models, including deep learning models, as well as modules and datasets. It allows you to easily use and transfer pre-trained models across different projects and applications

* **term frequency-inverse document frequency (TF-IDF)** - TF-IDF is a commonly used technique for text feature extraction in NLP. It is used to measure the importance of a word in a document or corpus by taking into account its frequency in the document and its frequency in the entire corpus. The TF-IDF score for a word increases proportionally to the number of times it appears in the document but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words are generally more common than others

* **tf.data.Dataset.from_tensor_slices** - The tf.data.Dataset.from_tensor_slices function is a method in TensorFlow used for creating a dataset from a given tensor. This function is commonly used for processing data in machine learning models, particularly for input data that can be represented as a tensor. The from_tensor_slices function creates a dataset where each element corresponds to a slice of the input tensor along the first dimension

* **tf.keras.preprocessing.image_dataset_from_directory** - a function in TensorFlow that allows you to create a dataset of images from a directory. This function automatically labels the images based on their subdirectory names and returns a TensorFlow Dataset object that can be used for training, evaluation, or prediction. This function also provides options for data augmentation, resizing, and shuffling

* **tf.keras.layers.Embedding** - a layer in TensorFlow that is used to create word embeddings in a neural network. The Embedding layer maps each word in the vocabulary to a high-dimensional vector, which is learned during the training process. The Embedding layer is commonly used in natural language processing tasks, such as text classification, sentiment analysis, and machine translation

* **tf.keras.Model** - a class in TensorFlow that allows you to create a more complex model by defining the input and output layers and connecting them with any number of layers in between. This class is used for more advanced models that have multiple inputs or outputs or contain shared layers

* **tf.keras.Model.predict** - a function in TensorFlow that is used to make predictions using a trained model. The predict function takes a set of input data as input and returns a set of predicted output values. The input data can be in the form of a numpy array or a TensorFlow Dataset object

* **tf.keras.preprocessing.image.ImageDataGenerator** - `tf.keras.preprocessing.image.ImageDataGenerator` is a tool in TensorFlow used for data augmentation of images. It can generate new images by applying transformations such as rotation, zooming, and flipping

* **tf.keras.preprocessing.sequence.pad_sequences** - a function in TensorFlow that is used to pad sequences of variable length with zeros or truncate them to a fixed length. The function is commonly used in natural language processing to ensure that all sequences have the same length, which is required for input to many machine learning algorithms

* **tf.keras.preprocessing.text.one_hot** - a function in TensorFlow that is used to convert a sequence of text into a sequence of one-hot encoded vectors. The function takes the text as input and returns a list of integers, where each integer represents a unique word in the vocabulary. The integers can then be converted to one-hot encoded vectors using the `to_categorical` function in TensorFlow

* **time range** - In the context of machine learning and AI, time range refers to the period of time over which data is collected or analyzed. For example, if you are analyzing stock market data, the time range might be a specific year or range of years. The time range can be an important factor in machine learning, as it can affect the type and amount of data available, as well as the modeling techniques that are most appropriate. For example, if you are analyzing time series data, you might use techniques such as autoregression or recurrent neural networks to model the data over time

* **tokens** - in natural language processing, a token refers to a sequence of characters that represents a single unit of meaning. Tokens can be words, phrases, or sentences, depending on the level of granularity required for the analysis

* **tokenization** - a process of breaking down a text into tokens or units of meaning. Tokenization is a common pre-processing step in natural language processing, which involves splitting the text into words, phrases, or sentences, depending on the requirements of the analysis

* **VGG16** - It is a deep convolutional neural network architecture that was developed by the Visual Geometry Group at the University of Oxford. The VGG16 architecture consists of 16 layers, including 13 convolutional layers and 3 fully connected layers. It has been trained on the ImageNet dataset and is widely used for image classification and object detection tasks. The VGG16 architecture has been shown to perform well on a wide range of computer vision tasks and is a popular choice for transfer learning

* **visualizing the data** - Visualizing the data involves representing the dataset in a graphical format, such as scatter plots, histograms, and correlation matrices. This helps in identifying trends, patterns, and relationships that are not easily visible in the raw data

* **window size** - Window size refers to the number of data points that are included in each batch during training or inference. In time series analysis, window size is used to define the length of the sliding window that is moved along the time axis to create overlapping batches of data. Choosing an appropriate window size can be important in machine learning, as it can affect the model's ability to capture patterns and dependencies in the data. In TensorFlow, window size can be set using functions such as `tf.keras.preprocessing.timeseries_dataset_from_array()`