

#  <span style="color:#0b186c;">Introduction to DL: Building Networks in TensorFlow</span>

---

“Deep learning is a subset of machine learning that's based on artificial neural networks. The learning process is deep because the structure of artificial neural networks consists of multiple input, output, and hidden layers. Each layer contains units that transform the input data into information that the next layer can use for a certain predictive task. Thanks to this structure, a machine can learn through its own data processing.” - Microsoft, 2022


<br></br>

## <span style="color:#0b186c;">Table of Contents:</span>
* [Artificial Neural Networks (ANNs)](#first-bullet)
* [Hello World](#second-bullet)
* [Dataset Information](#third-bullet)
* [Sequential Model](#fourth-bullet)
* [Conclusion](#fifth-bullet)

#  <span style="color:#0b186c;">Artificial Neural Networks (ANNs)</span><a class="anchor" id="first-bullet"></a>

---


Artificial Neural Networks (ANNs) are composed of artificial neurons, or nodes, that mimic the learning paths of the human brain. The ANN can contain any number of neurons organized in the form of any number of interconnected layers. The first layer, the input layer, contains nodes representing the features in the dataset used to train the model. The final layer, or output layer, indicates the dimentionality of the target variable. The layers between the input and output are hidden layers containing connections modeled as weights, which represent the neuron's interpretation of feature importance in predicting the output value. ANNs also use Bias, a constant parameter which helps adjust the weighted sum of the inputs to the neuron for the best fit on the data. Lastly, Activation Functions are non-linear transformations that define the output of a neuron before sending it to the next layer of neurons or finalizing the output determination of the model.


There are many different libraries associated with Deep Learning in Python, however, PyTorch and Tensorflow are two of the most common open-source frameworks used to create Neural Networks. While Tensorflow is an end-to-end open-source library for ML and DL, interfacing is accomplished through a library called Keras. There are 2 primary ways to create a Keras model:

- The Functional API, which is the easy-to-use, fully-featured API supporting arbitrary model architectures.
- The Sequential model, which is a limited single-input, single-output stack of layers (ordered sequentially).


## <span style="color:#0b186c;">Required Imports:</span>

<div class="alert alert-warning">

<b>Note:</b> If you have not previously installed these `packages`, you can use the cell below to perform the required `pip` installs.

</div>

In [None]:
# In case you still need to perform some pip installs:
! pip install --user pandas -q
! pip install --user numpy -q
! pip install --user tensorflow -q

In [None]:
# Dataframe and array libraries
import pandas as pd
import numpy as np

# Libraries for visualizing data
import matplotlib.pyplot as plt
import seaborn as sns

# Retrieves the dataset from Scikit-learn
from sklearn.datasets import load_iris

# Required for performing standardization
from sklearn.preprocessing import StandardScaler

# Required for training and validating a model
from sklearn.model_selection import train_test_split

# Required for instantiating and building Sequential neural networks
import tensorflow as tf
from tensorflow import keras

# Classification metrics and confusion matrix
from sklearn.metrics import confusion_matrix, accuracy_score, ConfusionMatrixDisplay

# Filters out warning messages
import warnings
warnings.filterwarnings('ignore')

#  <span style="color:#0b186c;">Hello World</span><a class="anchor" id="second-bullet"></a>

---

Neural networks learn how to appropriately weight the feature importance of input variables to be mapped to an output through an iterative learning process. The mathematical relationship between the various input variables and the output variable is approximated by the neural network model and is used to make predictions on future values. The simplest form of a Neural Network is a single neuron with only one interconnected layer. In Keras, the word for a simple, interconnected layer is `Dense`. 

In the cells below, we instantiate an array of `x` values that have a straightforward, mathematical relationship to their corresponding `y` values. Next, we build a single-layered `model` with one `unit` (neuron). We then `compile` the model by defining a `loss` and `optimizer` function. Since the model doesn't know the relationship between the x and y values, it has to guess. The loss function defines how the effectiveness of the model's guess is evaluated and scored by comparing the guessed answers with the known correct answers in the dataset. After each guess, the optimizer defines the logic used to update the weights learned by the model to minimize the loss function. This process continues iteratively for the defined number of `epochs` in the `fit` function.

In [None]:
# First, let's create some data with an easily discernable pattern
xs = np.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], dtype=float)
ys = np.array([-1.0, 9.0, 19.0, 29.0, 39.0, 49.0, 59.0, 69.0, 79.0, 89.0], dtype=float)

# Build a simple Sequential model
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])

# Compile the model
model.compile(optimizer='sgd', loss='mean_squared_error')

# Train the model
model.fit(xs, ys, epochs=100)

In [None]:
# Make a prediction
print(model.predict([10.0]))

<div class="alert alert-warning">

It is important to remember that the output of Neural Networks deals in probability. Based on the input data provided to the model, it approximates the probability of the relationship mapping between x and y. While we may be able to identify the relationship between these 10 data points mathematically, the model does not deal in certainties. Therefore, the output of our prediction may be very close, but not necessarily the exact answer.

</div>

#  <span style="color:#0b186c;">Dataset Information</span><a class="anchor" id="third-bullet"></a>

---

We will be using a dataset containing 3 species in the Iris genus, namely, Iris Setosa, Iris Versicolor and Iris Virginica found in the Gaspé Peninsula. For the purposes of an integral study, the collected Iris samples were, "all from the same pasture, and picked on the same day and measured at the same time by the same person with the same apparatus." The dataset contains 150 rows of data, 50 rows of data for each species of Iris flower. The column names represent the feature of the flower that was studied and recorded.

Our target dataset can be found in the Scikit-learn library, so we will be importing it directly from the library and storing it into a Pandas dataframe.

https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html

In [None]:
# Import the iris dataset
iris = load_iris(as_frame=True)

# Place the dataset into a dataframe
df = iris.frame 

# View the first 5 records in the dataset
df.head(10)



A sepal is a part of the flower of angiosperms. Usually green, sepals typically function as protection for the flower in bud, and often as support for the petals when in bloom. Petals are modified leaves that surround the reproductive parts of flowers. They are often brightly colored or unusually shaped to attract pollinators. 



The petal and sepal measurement values allow us to look for patterns related to the specific species of Iris flower.


<div class="alert alert-info">
   
We can use the `.info()` method for our dataframe to view a concise summary of the information contained within. This includes the number of observations, columns and data types, and any missing values.
    
 </div>

In [None]:
df.info()

<div class="alert alert-info">
   
For numerical features in the dataframe, we can use the `.describe()` method to view relevant statistical information about each of the features. Understanding these values can assist in identifying the presence of outliers.
    
 </div>

In [None]:
df.describe()

<div class="alert alert-info">
   
Additionally, we can use a `.pairplot()` from the `seaborn` library to visualize a scatter matrix of the independent variables. We can color code the plotted points based on the `target` feature to identify any discernable patterns in the measurement values.
    
 </div>

In [None]:
# Set the figure size
sns.set(rc={'figure.figsize':(12,8),'ytick.labelsize':(12)})

# Create a pairplot
sns.pairplot(df, hue = "target", palette = "Set2")

<div class="alert alert-info">
   
Our dataset contains an equal number of observations for each of the Iris flowers. We can visualize the target variable distributions with a pie chart:
    
</div>

In [None]:
# Create a pie chart for the target variable
df.target.value_counts().plot(kind='pie', figsize=(8, 8), fontsize=10, autopct='%1.0f%%')
plt.title("Target Variable Distribution", fontsize = 20)
plt.show()

<div class="alert alert-info">
   
Lastly, we can use the `.corr()` method on our dataframe to identify linear relationships between the independent variables and the dependent variable. This also helps identify collinearity that may exist amongst the independent variables as well. The correlation matrix can be enhanced by using a `.heatmap()` from the `seaborn` library that scales the specified hue based on the severity of the linear relationship.
    
</div>

In [None]:
# Set the figure size
sns.set(rc={'figure.figsize':(12,8),'ytick.labelsize':(12)})

# Use the corr method to create the correlation matrix
correlation_matrix = df.corr().round(2)

# Create a heatmap based on the severity of the linear relationship
sns.heatmap(data = correlation_matrix, annot = True, cmap = "Blues")
plt.title("Variable Correlation Heatmap\n", fontsize = 20)
plt.show()

#  <span style="color:#0b186c;">Sequential Model</span><a class="anchor" id="fourth-bullet"></a>

---

Expanding on our initial discussion on the `Sequential` model from Tensorflow, we can modify the layer architecture to create much more adaptable and robust solutions for more complex problems. While the Iris dataset is not necessarily complex, it is slightly more involved than a single input X value and a strong, linearly related y value. Therefore, we will add a couple more layers to our layer architecture and introduce activation functions.

<div class="alert alert-info">

The `ReLU` (Rectified Linear Unit) activation function is one of the most widely used activation functions in neural networks. It is a non-linear function that outputs the input value to the neuron directly if it is positive, otherwise, it converts the output to 0 and the neuron does not get activated.
    
The `softmax` activation function is used for neural networks that predict multinomial probability distribution. This means the dependent variable contains more than 2 class labels (multi-class classification).

</div>

In [None]:
# Split the independent (X) and dependent (y) variables
X = df.iloc[:, :-1]
y = df.iloc[:, -1].values

# Split the data into an 80/20 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

# Output the shape of the training set
X_train.shape

In [None]:
# Build Sequential model using Dense layers
model = tf.keras.models.Sequential([keras.layers.Dense(4, input_shape=[4]),
                                    keras.layers.Dense(16, activation=tf.nn.relu),
                                    keras.layers.Dense(3, activation=tf.nn.softmax)])
model.summary()

<div class="alert alert-warning">

Since we are using an activation function in our output layer, we must also change the `optimizer` and `loss` functions to appropriately reflect the change in our model's iterative learning process. 

</div>

In [None]:
# Compile the model with appropriate optimizer and loss functions
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the neural network on 100 epochs
model.fit(X_train, y_train, epochs=100)

In [None]:
# Plot the model performance over epochs
losses = pd.DataFrame(model.history.history)
losses.plot()

In [None]:
# Make predictions based on the X values in the test set
y_pred = model.predict(X_test)

y_pred

In [None]:
# Convert probabilities to labels, compare with true values
y_pred = np.argmax(y_pred, axis=-1) 

# Calculate the accuracy score of the test set
score = round((accuracy_score(y_test, y_pred) * 100), 2)

#Plot the confusion Matrix for the predictions
cm = confusion_matrix(y_test, y_pred)
cm_display = ConfusionMatrixDisplay(cm)
cm_display = cm_display.plot(include_values=True, cmap='Blues', ax=None, xticks_rotation='horizontal')
plt.grid(False)
plt.show()

# Print the accuracy score on the validation data
print(f"Accuracy = {score}%")

#  <span style="color:#0b186c;">Conclusion</span><a class="anchor" id="fourth-bullet"></a>

---