<center>
<table>
  <tr>
    <td><img src="https://portal.nccs.nasa.gov/datashare/astg/training/python/logos/nasa-logo.svg" width="100"/> </td>
     <td><img src="https://portal.nccs.nasa.gov/datashare/astg/training/python/logos/ASTG_logo.png?raw=true" width="80"/> </td>
     <td> <img src="https://www.nccs.nasa.gov/sites/default/files/NCCS_Logo_0.png" width="130"/> </td>
    </tr>
</table>
</center>

        
<center>
<h1><font color= "blue" size="+3">ASTG Python Course Series</font></h1>
</center>

---

<center>
    <h1><font color="red">Simple Regression with PyTorch</font></h1>
</center>

## <font color="red">Objective</font> <a class="anchor" id="sec_obj"></a>

The objective if this presentation is to create a ML model with PyTorch for a simple regression problem. We show the steps involved in creating and validating the model. 


__Target audience:__

This document is meant for people who want to start building their own AI/ML models with TensorFlow.

## <font color="red"> Python packages used</font>

- __Matplotlib__: Create visualization.
- __Pandas__: Data (two-dimensional labelled array) manipulation and analysis.
- __Seaborn__: Provide a high-level interface for creating attractive and informative statistical graphics. 
- __Scikit-Learn__:  Provide supervised and unsupervised Machine Learning algorithms.
- __PyTorch__: Used to to build, train, and evaluate a deep machine learning algorithm based on Neural Networks.

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
import seaborn as sns

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
import torch
from torch import nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

In [None]:
print(f"Numpy version:      {np.__version__}")
print(f"Pandas version:     {pd.__version__}")
print(f"Seaborn version:    {sns.__version__}")
print(f"PyTorch version:    {torch.__version__}")

# <font color="red">Machine Learning Steps</font> <a class="anchor" id="sec_tf_ml"></a>

We want to build here a Feed-Forward Neural Network (FNN) to solve the problem described below.

## <font color="blue">Problem Statement</font> <a class="anchor" id="sec_tf_pbl"></a>

We consider the function: <br>
$$
f(x,y) = (1-(x^2 + y^3))e^{-\frac{1}{2}(x^2 + y^2)}
$$
<br>
defined in the domain $D=[-3,3] \times [-3,3]$.
<OL>
<LI> We randomnly select $n$ points in the domain $D$ and compute the function on those points to create a dataset containing $n$ pairs points/values.
<LI> We use the split the dataset into a training set and a test set.
<LI> We create a ML algorithm based on the training set.
<LI> We validate the alogithm on the test set and perform adjustment as needed.
</OL>

## <font color="blue">Generating the Data</font> <a class="anchor" id="sec_tf_datagen"></a>

#### Define the Function

In [None]:
def ff(x,y):
    return (1-(x**2+y**3))*np.exp(-(x**2+y**2)/2)

#### Create the dataset

Boundary of the domain:

In [None]:
a_min = -3.0
a_max = 3.0

Number of dimensions:

In [None]:
num_dims = 2

Number of points:
    
- We want to create $40\times 40=1600$ random points in the domain $[-3,3] \times [-3,3]$.    

In [None]:
nx = 40
ny = 40
num_points = nx * ny

In [None]:
X = np.random.uniform(a_min, a_max, (num_points, num_dims))

In [None]:
X.shape

In [None]:
X[0:9,:]

We determine the value of the function:

In [None]:
z = ff(X[:,0], X[:,1])

In [None]:
z.shape

In [None]:
z[0:9]

## <font color="blue">Data Gathering and Basic Analyses</font> <a class="anchor" id="sec_tf_dataga"></a>

#### Splitting the data into training and testing sets
- We split the data into training and testing sets. 
- We train the model with 80% of the samples and test with the remaining 20%. 
- We do this to assess the model’s performance on unseen data.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    z, 
                                                    test_size=0.2, 
                                                    random_state=42)

In [None]:
print(f"Train features shape: {X_train.shape}")

In [None]:
print(f"Test features shape: {X_test.shape}")

In [None]:
y_train

#### Plot the data to be trained

In [None]:
fig = plt.figure()#.gca(projection='3d');
threedee = fig.add_subplot(projection='3d')
threedee.scatter(X_train[:,0], X_train[:,1], y_train);
threedee.set_xlabel('x');
threedee.set_ylabel('y');
threedee.set_zlabel('f(x,y)');
plt.show();

In [None]:
sns.kdeplot(x=X_train[:,0], y=X_train[:,1], 
            cmap="Blues", shade=True, bw_adjust=.5);

<font color="blue">Add noise in the training targets</font>

- The function `ff` is smooth.
- We want to add noise to the targets.
  - We consider as noise a Gaussian normal distribution with `noise_mean` as mean and `noise_std` as standard deviation.

In [None]:
n_train = y_train.shape[0]
n_train

In [None]:
y_train

In [None]:
noise_mean = 0.0
noise_std  = 1.0e-2
noise = np.random.normal(noise_mean, noise_std, n_train)
noise.shape

In [None]:
#y_train = y_train + noise

In [None]:
y_train

## <font color="blue">Normailized the Data</font> <a class="anchor" id="sec_tf_norm"></a>

- In general, variables may not be a similar scale. High values would gain more importance in any distance-based calculations. 
- It is good practice to normalize features that use different scales and ranges. 
- Although the model might converge without feature normalization, it makes training more difficult, and it makes the resulting model dependent on the choice of units used in the input.

In [None]:
data_mean = np.mean(X_train, axis=0)
data_std = np.std(X_train, axis=0)

In [None]:
def normalize_data(x, data_mean, data_std):
    """
       Normalize the data
    """
    return (x - data_mean) / data_std

__Normalize the data that will be used to train the model__

In [None]:
X_train_normed = normalize_data(X_train, data_mean, data_std)

__We also need to normalize the test dataset by projecting it into the same distribution that the model has been trained on__

In [None]:
X_test_normed = normalize_data(X_test, data_mean, data_std)

# <font color="red">Creating the model</font>

## <font color="blue">Set the hyperparameters</font>

It is a good practice to declare the following parameters before creating the model for ease of change and understanding.

__Dataset parameters__

These parameters are defines by the dataset used:

- number of features
- number of outputs

In [None]:
input_dim = X_train.ndim
output_dim = 1

__Model parameters__

- batch size
- number of epochs
- learning rate (optimizer steps)

In [None]:
batch_size_train = 34 # X_train_normed.shape[0]
batch_size_test = X_test_normed.shape[0] #10
num_epochs = 200
learning_rate = 0.01

In [None]:
num_hidden_nodes = 16

## <font color="blue">Building the ML model with PyTorch</font>

In [None]:
class BasicNeuralNetwork(nn.Module):
    '''
    Multi-layer perceptron for non-linear regression.
    '''
    def __init__(self, input_dim=2, num_hidden_nodes=32, output_dim=1):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, num_hidden_nodes),
            nn.ReLU(),
            nn.Linear(num_hidden_nodes, num_hidden_nodes),
            nn.ReLU(),
            nn.Linear(num_hidden_nodes, output_dim)
        )

    def forward(self, x):
        output = self.net(x)
        return output

In [None]:
torch.manual_seed(1)
basic_model = BasicNeuralNetwork(
    input_dim=input_dim, 
    num_hidden_nodes=num_hidden_nodes, 
    output_dim=output_dim
)

In [None]:
print(basic_model)

__Basic testing of the model__

In [None]:
x = [[-1.5, 0.25], [1.2, -0.6], [0.32, 0.9]]
x = torch.tensor(x)

with torch.no_grad():
    logits = basic_model(x)

print(logits)

## <font color="blue">Define a DataLoader</font>

In [None]:
class MyDataset(Dataset):
    '''
    Custom 'Dataset' object for our regression data.
    Must implement these functions: __init__, __len__, and __getitem__.
    '''
    def __init__(self, X, y):
        self.features = torch.tensor(X, dtype=torch.float32)
        self.labels = torch.tensor(y, dtype=torch.float32)

    def __getitem__(self, index):
        x = self.features[index]
        y = self.labels[index]
        return x, y

    def __len__(self):
        return len(self.features)
        #return self.labels.shape[0]

In [None]:
def instantiate_data(Xdata, ydata, batch_size, shuffle=False):
    dataset = MyDataset(Xdata, ydata)
    dataloader = DataLoader(dataset=dataset, 
                            batch_size=batch_size, 
                            shuffle=shuffle)
    return dataloader

In [None]:
train_loader = instantiate_data(
    X_train_normed, y_train, batch_size_train, shuffle=True
)

test_loader = instantiate_data(
    X_test_normed, y_test, batch_size_test
)

__Simple check__

In [None]:
for batch, (X, y) in enumerate(train_loader):
    print(f"Batch: {batch+1}")
    print(f"\tX shape: {X.shape} X length: {len(X)}")
    print(f"\ty shape: {y.shape}")
    break

## <font color="blue">Training loop</font>

__Define the loss function__

In [None]:
loss_function = nn.MSELoss()

__Define the optimizer__

In [None]:
optimizer = torch.optim.Adam(basic_model.parameters(), lr=learning_rate)

#optimizer = torch.optim.RMSprop(basic_model.parameters(), lr=learning_rate)
#optimizer = torch.optim.SGD(basic_model.parameters(), lr=learning_rate)

__Feed train data into the model__

- During training, we iterate over the entire training dataset for a fixed number of epochs.
- At the start of every epoch, we set the value for current_loss (current loss in the epoch) to zero.
- Next, we iterate over the `DataLoader`. Recall that our data loader contains the shuffled and batched data.
- We perform some conversions (e.g. Floating point conversion and reshaping) on the `inputs` and `targets` in the current batch.
- We then zero the gradients in the optimizer. This means that knowledge of previous improvements (especially important in batch > 0 for every epoch) is no longer available. This is followed by the __forward pass__, the error computation using our loss function, the __backward pass__, and finally the optimization.

In [None]:
def train_model_per_batch(inputs, targets, 
                model, loss_function, optimizer) -> float:
    #  data, target = data.to(device), target.to(device)

    # Get and prepare inputs
    inputs, targets = inputs.float(), targets.float()
    targets = targets.reshape((targets.shape[0], 1))
    
    # Zero the gradients
    optimizer.zero_grad()

    # Perform forward pass
    outputs = model(inputs)

    # Compute loss
    loss = loss_function(outputs, targets)

    # Perform backward pass
    loss.backward()

    # Perform optimization
    optimizer.step()
    
    return loss.item()

In [None]:
n_dataloader = len(train_loader.dataset)
n_data_per_batch = len(train_loader)
loss_train = list()
for epoch_idx in range(num_epochs):
    loss_val = 0.0
    for batch_idx, (data, target) in enumerate(train_loader):
        #data, target = data.to(device), target.to(device)
        n_data = len(data)
        loss_val = train_model_per_batch(data, target, basic_model, 
                                         loss_function, optimizer)
        loss_train.append(loss_val)
        if batch_idx % 10 == 0:
            print(f'Train Epoch: {epoch_idx:3d} [{str(batch_idx*n_data).zfill(4)}/{n_dataloader}' 
                  f'({100.*batch_idx/n_data_per_batch:2.0f}%)]\tLoss: {loss_val:.6f}')    

__Plot the loss over epochs__

In [None]:
step = range(len(loss_train))

fig, ax = plt.subplots(figsize=(8,5))
plt.plot(step, np.array(loss_train))
plt.title("Step-wise Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.show()

## <font color="blue">Evaluating the results</font>

### <font color="green">Accuracy</font>

In [None]:
def compute_accuracy(model, loss_fn, dataloader, pct_close: float):

    model = model.eval()

    correct = 0.0
    loss = 0.0
    total_items = 0

    for idx, (X, y) in enumerate(dataloader):
        n_items = len(y)
        # all predicted as 2-d Tensor
        with torch.no_grad():
            logits = model(X)

        loss += loss_fn(logits, y).item()
        # all predicted as 1-d
        pred = logits.view(n_items)

        n_correct = torch.sum((torch.abs(pred - y) < torch.abs(pct_close * y)))
        correct += n_correct.item()
        total_items += n_items

    return correct*100/total_items 

In [None]:
train_acc = compute_accuracy(
    basic_model, loss_function, 
    train_loader, 0.15
)

In [None]:
print(f" Train accuracy: {train_acc}% ")

In [None]:
test_acc = compute_accuracy(
    basic_model, loss_function, 
    test_loader, 0.15
)

In [None]:
print(f" Test accuracy: {test_acc}%")

### <font color="green">Do the 45-degree plot</font> 
- We plot (scatterplot) the true target values against the predicted ones.
- Ideally, all the points should be close to the 45-degree line (`y=x`).
- The closer the points are to the 45-degree line, the more accurate the model is. 

In [None]:
with torch.no_grad():
   basic_model.eval()
   for X, y in test_loader:
       pred = basic_model(X)
       y_test_pred = pred.numpy()[:,0]

In [None]:
print(f"True min/max: {y_test.min()}/{y_test.max()}")
print(f"Pred min/max: {y_test_pred.min()}/{y_test_pred.max()}")

In [None]:
plt.scatter(y_test, y_test_pred);
plt.xlabel('Test true Values');
plt.ylabel('Test predictions');
plt.axis('equal');
plt.axis('square');
plt.xlim([0,plt.xlim()[1]]);
plt.ylim([0,plt.ylim()[1]]);
#_ = plt.plot([-100, 100], [-100, 100]);

**Error Distribution**

In [None]:
sns.distplot(y_test_pred - y_test);

#### Plotting Function Using Predicted Values

In [None]:
fig = plt.figure()
threedee = fig.add_subplot(projection='3d')
threedee.scatter(X_test[:,0], X_test[:,1], y_test_pred);
threedee.set_xlabel('x');
threedee.set_ylabel('y');
threedee.set_zlabel('f(x,y)');
plt.show();

# <font color="red">Exercise</font> <a class="anchor" id="sec_tf_ex"></a>

The accuracy performance of the model can be improve:

- Recreate the model by modifying the hyperpameters (number of hidden nodes, number of hidden layers, batch size, etc.).