<center>
<table>
  <tr>
    <td><img src="https://portal.nccs.nasa.gov/datashare/astg/training/python/logos/nasa-logo.svg" width="100"/> </td>
     <td><img src="https://portal.nccs.nasa.gov/datashare/astg/training/python/logos/ASTG_logo.png?raw=true" width="80"/> </td>
     <td> <img src="https://www.nccs.nasa.gov/sites/default/files/NCCS_Logo_0.png" width="130"/> </td>
    </tr>
</table>
</center>

        
<center>
<h1><font color= "blue" size="+3">ASTG Python Courses</font></h1>
</center>

---

<center>
    <h1><font color="red">Introduction to Tensorflow</font></h1>
</center>

## Useful Reference

- <a href="https://www.mygreatlearning.com/blog/what-is-tensorflow-machine-learning-library-explained/">What is TensorFlow? The Machine Learning Library Explained</a>
- <a href="https://www.tensorflow.org/tutorials/keras/regression">Basic regression: Predict fuel efficiency</a>
- <a href="https://stackabuse.com/tensorflow-2-0-solving-classification-and-regression-problems/">Tensorflow 2.0: Solving Classification and Regression Problems</a>
- <a href="https://www.toptal.com/machine-learning/tensorflow-machine-learning-tutorial">Getting Started with TensorFlow: A Machine Learning Tutorial</a>
- <a href="https://sebastianraschka.com/faq/docs/tensorflow-vs-scikitlearn.html">What is the main difference between TensorFlow and scikit-learn?</a>
- <a href="https://adventuresinmachinelearning.com/python-tensorflow-tutorial/">Python TensorFlow Tutorial – Build a Neural Network</a>
- <a href="https://steadforce.com/en/first-steps-tensorflow-part-3/">A simple neural network with TensorFlow</a>

# <font color="red">What is TensorFlow?</font>
- TensorFlow is an open-source library for numerical computation and large-scale machine learning that ease `Google Brain TensorFlow`, the process of acquiring data, training models, serving predictions, and refining future results.
- Tensorflow bundles together Machine Learning and Deep Learning models and algorithms.
- The name `TensorFlow` is derived from the operations which neural networks perform on multidimensional data arrays or `tensors`! It’s literally a flow of tensors.
- It uses Python to provide a convenient front-end API for building applications with the framework, while executing those applications in high-performance C++.
- TensorFlow can train and run neural networks for applications such as handwritten digit classification, image recognition, word embeddings, sequence-to-sequence models for machine translation, natural language processing, and partial differential equations based simulations.
- TensorFlow supports both CPUs and GPUs computing devices.

#### How Does it Work?

- TensorFlow allows developers to create a graph of computations to perform. 
- Nodes in the graph represent mathematical operations (add, substract, multiply, etc.).
- Connections (edges) represent data which usually are multidimensional data arrays or tensors, that are communicated between these edges.
- Once you have the graph, the execution can be enabled either on regular CPUs or GPUs, or distributed across several of them so that the processing becomes much faster.

#### First Example of TensorFlow Graph

Consider the expression:
<center>
    a = (b + c) * (c + 2)
</center>
We can break this down into:
<center>
    d = b + c
    
    e = c + 2
    
    a = d * e
</center>
Now we can represent these operations graphically as:

![fig_gr1](https://i1.wp.com/adventuresinmachinelearning.com/wp-content/uploads/2017/03/Simple-graph-example.png)
Image Source: adventuresinmachinelearning.com

Note that the operations `d = b + c` and `e = c + 2` can be performed in parallel: potential of distributing such calcultions across CPUs and GPUs. 

**Second Example of TensorFlow Graph**

The graph below shows the computational graph of a three-layer neural network.
The animated data flows between different nodes in the graph are tensors which are multi-dimensional data arrays. 


![fig_gr2](https://i1.wp.com/adventuresinmachinelearning.com/wp-content/uploads/2017/03/TensorFlow-data-flow-graph.gif)


![fig_ml_steps](https://res.cloudinary.com/hevo/images/f_auto,q_auto/v1627535513/hevo-learn/Machine-Lerning-in-Data-Science-4/Machine-Lerning-in-Data-Science-4.png?_i=AA)
Image Source: learncloudbits.com

### Load the modules

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
import seaborn as sns

In [None]:
from sklearn.model_selection import train_test_split
from sklearn import metrics

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras import Model

In [None]:
print(f"Numpy version:      {np.__version__}")
print(f"Pandas version:     {pd.__version__}")
print(f"Seaborn version:    {sns.__version__}")
print(f"TensorFlow version: {tf.__version__}")

## <font color="blue">Problem Statement</font>

We consider the function: <br>
$$
f(x,y) = (1-(x^2 + y^3))e^{-\frac{1}{2}(x^2 + y^2)}
$$
<br>
defined in the domain $D=[-3,3] \times [-3,3]$.
<OL>
<LI> We randomnly select $n$ points in the domain $D$ and compute the function on those points to create a (training) dataset containing $n$ pairs points/values.
<LI> We use the dataset for training a ML algorithm.
<LI> We generate a uniform set of points (testing set) in $D$ to test the algorithm.
</OL>

## <font color="blue">Generating the Data</font>

#### Define the Function

In [None]:
def ff(x,y):
    return (1-(x**2+y**3))*np.exp(-(x**2+y**2)/2)

#### Create the Data

- We wan to create $50\times 30=1500$ random points in the domain $[-3,3] \times [-3,3]$.

In [None]:
num_dims = 2
nx = 50
ny = 30
num_points = nx * ny

# Boundary of the domain
a_min = -3.0
a_max = 3.0

In [None]:
X = np.random.uniform(a_min, a_max, (num_points, num_dims))

In [None]:
X.shape

In [None]:
X[0:9,:]

We determine the value of the function:

In [None]:
z = ff(X[:,0], X[:,1])

In [None]:
z.shape

In [None]:
z[0:9]

Now we can create a Pandas DataFrame:

In [None]:
data = pd.DataFrame({"x": X[:,0], "y": X[:,1], 
                           "TargetValues": z[:]})

In [None]:
data.head(9)

## <font color="blue">Data Gathering and Basic Analyses</font>

#### Splitting the data into training and testing sets
- We split the data into training and testing sets. 
- We train the model with 80% of the samples and test with the remaining 20%. 
- We do this to assess the model’s performance on unseen data.

In [None]:
xy_df = data.drop('TargetValues', axis = 1)
z_df = data['TargetValues']

In [None]:
xy_df

In [None]:
z_df

In [None]:
X_train, X_test, y_train, y_test = train_test_split(xy_df, 
                                                    z_df, 
                                                    test_size=0.2, 
                                                    random_state=42)

In [None]:
print(f"Train features shape: {X_train.shape}")

In [None]:
print(f"Test features shape: {X_test.shape}")

In [None]:
y_train

#### Plot the data to be trained

In [None]:
threedee = plt.figure().gca(projection='3d');
threedee.scatter(X_train['x'], X_train['y'], 
                 y_train);
threedee.set_xlabel('x');
threedee.set_ylabel('y');
threedee.set_zlabel('f(x,y)');
plt.show();

In [None]:
sns.kdeplot(x=X_train['x'], y=X_train['y'], 
            cmap="Blues", shade=True, bw_adjust=.5)

#### Check the overall statistics

In [None]:
stats_train = X_train.describe()
stats_train

#### Display the joint distribution of the columns from the training set

In [None]:
sns.pairplot(X_train, diag_kind="kde");

<font color="blue">Add noise in the training targets</font>

- The function `ff` is smooth.
- We want to add noise to the targets.
- We consider as noise a Gaussian normal distribution with `noise_mean` as mean and `noise_std` as standard deviation.

In [None]:
n_train = y_train.shape[0]
n_train

In [None]:
y_train

In [None]:
noise_mean = 0.0
noise_std  = 1.0e-2
noise = np.random.normal(noise_mean, noise_std, n_train)
noise.shape

In [None]:
#y_train = y_train + noise

In [None]:
y_train

## <font color="blue">Normailized the Data</font>

- In general, variables may not be a similar scale. High values would gain more importance in any distance-based calculations. 
- It is good practice to normalize features that use different scales and ranges. 
- Although the model might converge without feature normalization, it makes training more difficult, and it makes the resulting model dependent on the choice of units used in the input.

In [None]:
stats_train = X_train.describe().transpose()
stats_train

In [None]:
def normalize_data(x):
    """
       Normalize the data
    """
    return (x - stats_train['mean']) / stats_train['std']

**Normalize the data that will be used to train the model**

In [None]:
X_train_normed = normalize_data(X_train)

In [None]:
X_train_normed

**We also need to normalize the test dataset by projecting it into the same distribution that the model has been trained on**

In [None]:
X_test_normed = normalize_data(X_test)

<font color="blue">**The same normalization will have to be applied to any other data used in this model.**</font>

## <font color="blue">Build the Model</font>

#### Determining the Parameters of the Neural Network

- We have two (2) features and one (1) target.
- The Neural Network will have $ni=2$ input neurons and $no=1$ output neuron.
- In each hidden layer:
   - We want the number of hidden neurons to be less than twice the size of the input layer.
   - We select three (3) neurons in the first hidden layer and three (3) in the second.

#### Instantiate a sequential model using `keras`
- `keras` is TensorFlow's high-level API for building and training deep learning models. It's used for fast prototyping, state-of-the-art research, and production.
- <font color="red">The sequential model is the simplest model to use, especially when getting started.</font>
- It involves defining a Sequential class and adding layers to the model one by one in a linear manner, from input to output.
- The model needs to know what input shape (`input_shape`) it should expect. The first layer of the `Sequential` model needs to receive the information.

In the model below:

- The model expects rows of data with `num_shape` variables (the `input_shape=num_shape` argument)
- The first hidden layer has 3 nodes and uses the `relu` activation function.
- The second hidden layer has 3 nodes and uses the `relu` activation function.
- The output layer has one node and uses no activation function.

The rectified linear activation function (`relu`) is a piecewise linear function that will output the input directly if is positive, otherwise, it will output zero. 
- Because rectified linear units are nearly linear, they preserve many of the properties that make linear models easy to optimize with gradient-based methods. They also preserve many of the properties that make linear models generalize well.
- It has become the default activation function for many types of neural networks because a model that uses it is easier to train and often achieves better performance.

In [None]:
num_shape = len(X_train.keys())
num_nodes = 3
num_output = 1

model = keras.Sequential([
             layers.Dense(num_nodes, activation=tf.nn.relu, 
                          input_shape=[num_shape]),
             layers.Dense(num_nodes, activation=tf.nn.relu),
             layers.Dense(num_output) ])

The above model creation can also be written as:

```python
model = keras.Sequential()
model.add(layers.Dense(num_nodes, activation=tf.nn.relu, 
                       input_shape=[num_shape]))
model.add(layers.Dense(num_nodes, activation=tf.nn.relu))
model.add(layers.Dense(num_output))
```

Dense layers represent a function that maps the input tensor `x` to an output tensor `y` via the equation `y = Ax + b` where `A` (the kernel) and `b` (the bias) are parameters of the dense layer.

![nn](images/tensorflow_nn.png)

#### Compile the model
- Once you have specified the architecture of the network, you need to specify the method for back-propagation by choosing an optimizer and specify the loss.
- Compiling the model uses the efficient numerical libraries (Theano or TensorFlow) in the background.

Define the optimizer:

In [None]:
optimizer = tf.keras.optimizers.RMSprop(0.001)

Required to provide a loss function and an optimizer: 
- We are asking the network to use the `rmsprop` optimizer to change weights in such a way that the loss `mse` (mean squared error) is minimized at each iteration.

In [None]:
model.compile(loss = 'mse',
              optimizer = optimizer,
              metrics = ['mae', 'mse'])

#### Inspect the model

`model.summary()` is a useful method if you want to get an overview of your model and see the total number of parameters.
It prints:

- Name and type of all layers in the model.
- Output shape for each layer.
- Number of weight parameters of each layer.
-  If the model has general topology, the inputs each layer receives
- The total number of trainable and non-trainable parameters of the model.



In [None]:
model.summary()

In [None]:
import tensorflow.keras.backend as K

trainable_count = np.sum([K.count_params(w) for w in model.trainable_weights])
non_trainable_count = np.sum([K.count_params(w) for w in model.non_trainable_weights])

print('Total params: {:,}'.format(trainable_count + non_trainable_count))
print('Trainable params: {:,}'.format(trainable_count))
print('Non-trainable params: {:,}'.format(non_trainable_count))

[Let](https://towardsdatascience.com/counting-no-of-parameters-in-deep-learning-models-by-hand-8f1716241889):

- **i**: input size (2 in this case)
- **h**: size of hidden layers (3, 3 here)
- **o**: output size (1 in this case)

We have:
 
 $$
 \begin{align*}
 num\_params &=& connections\_between\_layers + biases\_in\_every\_layer \\
               &=& (i \times h + h \times o) + (h+o) \\
               &=& (2\times 3 + 3 \times 3 + 3 \times 1) + (3 + 3 + 1) \\
               &=& (2 \times 3 + 3) + (3 \times 3 + 3) + (3 \times 1 + 1) \\
               &=& 9 + 12 + 3 \\
               &=& 25
 \end{align*}
 $$
 
 
 

               
   input = **Input**((None, 2))
   <br>
   dense = **Dense**(3)(input)
      <br>
   dense = **Dense**(3)(dense)
    <br>
  output = **Dense**(1)(dense)
   <br>
   model = Model(input, output)

#### Try the model

10 samples from the training data and call `model.predict`.

In [None]:
example_batch = X_train_normed[:10]
example_result = model.predict(example_batch)
print(example_result)

It seems to be working, and it produces a result of the expected shape and type.

## <font color="blue">Train the Model</font>

Training occurs over epochs and each epoch is split into batches.

- **Epoch**: One pass through all of the rows in the training dataset.
- **Batch**: One or more samples considered by the model within an epoch before updating the internal model parameters (weights).
- One epoch is comprised of one or more batches, based on the chosen batch size and the model is fit for many epochs. 
- The model is "fit" to the training data using the `fit` method. We also specify the `batch_size` and the maximum number of `epochs` we want training to go on.
- The callback function is applied at given stages of the training procedure. We use it to get a view on internal states and statistics of the model during training.

Train the model for 1000 epochs, and record the training and validation accuracy in the history object.

In [None]:
# Display training progress by printing a 
# single dot for each completed epoch
class PrintDot(keras.callbacks.Callback):
      def on_epoch_end(self, epoch, logs):
          if epoch % 100 == 0: 
             print('')
          print('.', end='')

# How many times we go through the entire dataset
EPOCHS = 50

history = model.fit(X_train_normed, y_train,    
                    epochs=EPOCHS, verbose=1, 
                    callbacks=[PrintDot()])
#epochs=EPOCHS, validation_split = 0.2, verbose=0, callbacks=[PrintDot()])

#### Visualize the model's training progress

In [None]:
# Use the stats stored in the history object.
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()

In [None]:
print(history.history.keys())

In [None]:
keys = list(history.history.keys())
keys

In [None]:
def plot_history(history):
    hist = pd.DataFrame(history.history)
    hist['epoch'] = history.epoch

    plt.figure()
    plt.xlabel('Epoch')
    plt.ylabel('Mean Abs Error [Target]')
    plt.plot(hist['epoch'], hist[keys[1]], label='Train Error')
    plt.legend()
    plt.ylim([min(hist[keys[1]]) ,max(hist[keys[1]])])

    plt.figure()
    plt.xlabel('Epoch')
    plt.ylabel('Mean Square Error [$Target^2$]')
    plt.plot(hist['epoch'], hist[keys[2]], label='Train Error')            
    plt.legend()
    plt.ylim([0,max(hist[keys[2]])])

plot_history(history)

## <font color="blue">Evaluate the Model on Test Data</font>

**Compute the Scores**

In [None]:
loss, mae, mse = model.evaluate(X_test_normed, y_test, verbose=1)
#print("Testing set Mean Abs Error: {} ".format(mae))

**Make Prediction**

In [None]:
y_test_pred = model.predict(X_test_normed).flatten()

#### Model Evaluation

In [None]:
rmse = np.sqrt(np.mean((y_test - y_test_pred) ** 2))

print(f"The model performance for test set")
print(f"----------------------------------")
print(f"Root Mean Squared Error: {rmse}")

#### Do the 45-degree plot

In [None]:
plt.scatter(y_test, y_test_pred);
plt.xlabel('True Values');
plt.ylabel('Predictions');
plt.axis('equal');
plt.axis('square');
plt.xlim([0,plt.xlim()[1]]);
plt.ylim([0,plt.ylim()[1]]);
_ = plt.plot([-100, 100], [-100, 100]);

**Error Distribution**

In [None]:
sns.distplot(y_test_pred - y_test);

#### Plotting Function Using Predicted Values

In [None]:
threedee = plt.figure().gca(projection='3d');
threedee.scatter(X_test['x'], X_test['y'], y_test_pred);
threedee.set_xlabel('x');
threedee.set_ylabel('y');
threedee.set_zlabel('f(x,y)');
plt.show();