## Regression and Classification with Neural Networks

<a target="_blank" href="https://colab.research.google.com/github/AI4EPS/EPS88_PyEarth/blob/master/docs/lectures/09_neural_networks1.ipynb">
<img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>  

<center><img src='https://raw.githubusercontent.com/zhuwq0/images/main/1-1-ai-complete-graph.jpeg' style='width: 40%'/></center>

Created by Minh-Chien Trinh, Jeonbuk National University, 

## 1.1. A Brief History

In the 1940s, NNs were conceived.

In the 1960s, the concept of backpropagation came, then people know how to train them.

In 2010, NNs started winning competitions and get much attention than before.

Since 2010, NNs have been on a meteoric rise as their magical ability to solve problems previously deemed unsolvable (i.e., image captioning, language translation, audio and video synthesis, and more).

One important milestone is the AlexNet architecture in 2012, which won the ImageNet competition. 

<!-- ![](https://raw.githubusercontent.com/zhuwq0/images/main/alexnet.png) -->
<img src='https://raw.githubusercontent.com/zhuwq0/images/main/alexnet.png' style='width: 30%'/>

<!-- ![](https://raw.githubusercontent.com/zhuwq0/images/main/alexnet_score.png) -->
<img src='https://raw.githubusercontent.com/zhuwq0/images/main/alexnet_score.png' style='width: 30%'/>

The ImageNet competition is a benchmark for image classification, where the goal is to classify images into one of 1,000 categories.

<!-- ![](https://raw.githubusercontent.com/zhuwq0/images/main/imagenet.png) -->
<img src='https://raw.githubusercontent.com/zhuwq0/images/main/imagenet.png' style='width: 30%'/>

You can find more information about the AlexNet model on [Wikipedia](https://en.wikipedia.org/wiki/AlexNet). We will use the AlexNet model in the next lecture to classify images of rocks.

Currently, NNs are the primary solution to most competitions and technological challenges like self-driving cars, calculating risk, detecting fraud, early cancer detection,…

## 1.2. What is a Neural Network?

ANNs are inspired by the organic brain, translated to the computer.

ANNs have neurons, activations, and interconnectivities.

NNs are considered “black boxes” between inputs and outputs.

<center><img src='https://raw.githubusercontent.com/zhuwq0/images/main/1-8-basic-nn.png' style='width: 60%'/></center>

Each connection between neurons has a weight associated with it. Weights are multiplied by corresponding input values. These multiplications flow into the neuron and are summed before being added with a bias. Weights and biases are trainable or tunable.

$$
\begin{aligned}
output & = weight \cdot input + bias \\
y & = a \cdot x + b
\end{aligned}
$$

The formula should look very familiar to you. It is similar to the previous linear regression and classification models.

Then, an activation function is applied to the output.

$$
\begin{aligned}
output & = \sum (weight \cdot input) + bias \\
output & = activation (output)
\end{aligned}
$$

When a step function that mimics a neuron in the brain (i.e., “firing” or not, on-off switch) is used as an activation function:
- If its output is greater than 0, the neuron fires (it would output 1).
- If its output is less than 0, the neuron does not fire and would pass along a 0.

The input layer represents the actual input data (i.e., pixel values from an image, temperature, …)

- The data can be “raw”, should be preprocessed like normalization and scaling. 
- The input needs to be in numeric form.

The output layer is whatever the NN returns.
- In regression, the predicted value is a scalar value, the output layer has a single neuron.
- In classification, the class of the input is predicted, the output layer has as many neurons as the training dataset has classes. But can also have a single output neuron for binary (two classes) classification.

A typical NN has thousands or even up to millions of adjustable parameters (weights and biases).

NNs act as enormous functions with vast numbers of parameters.

Finding the combination of parameter (weight and bias) values is the challenging part.

The end goal for NNs is to adjust their weights and biases (the parameters), so they produce the desired output for unseen data.

A major issue in supervised learning is overfitting, where the algorithm doesn’t understand underlying input-output dependencies, just basically “memorizes” the training data.

The goal of NN is generalization, that can be obtained when separating the data into training data and validation data.

Weights and biases are adjusted based on the error/loss presenting how “wrong” the algorithm in NN predicting the output.

NNs can be used for regression (predict a scalar, singular, value), clustering (assigned unstructured data into groups), and many other tasks.


In this lecture, we will use PyTorch to build and train neural networks. 
The pytorch library is a powerful tool for building and training neural networks. It provides a flexible and efficient library for deep learning. It is also currently the most popular library for deep learning.

<!-- ![](https://raw.githubusercontent.com/zhuwq0/images/main/pytorch.png)
![](https://raw.githubusercontent.com/zhuwq0/images/main/tensorflow.png)
![](https://raw.githubusercontent.com/zhuwq0/images/main/jax.png) -->
<img src='https://raw.githubusercontent.com/zhuwq0/images/main/pytorch.png' style='width: 20%'/>
<img src='https://raw.githubusercontent.com/zhuwq0/images/main/tensorflow.png' style='width: 20%'/>
<img src='https://raw.githubusercontent.com/zhuwq0/images/main/jax.png' style='width: 15%'/>

Let's import the necessary libraries of PyTorch and other libraries.

In [None]:
## First part of this semester
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

## Second part of this semester
from sklearn.impute import SimpleImputer
from sklearn.metrics import r2_score, accuracy_score, confusion_matrix, ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

## Last part of this semester
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [None]:
## Set random seed for reproducibility
torch.manual_seed(0)
torch.cuda.manual_seed(0)
np.random.seed(0)

## Applying Neural Networks for Regression

In today's lecture, we will revisit the Betoule data and apply neural networks for regression.

If you already forgot the background of this data, please review the lecture [04 regression](https://ai4eps.github.io/EPS88_PyEarth/lectures/04_regression/#going-even-further-out-into-the-universe).

Remember the challenge of the Betoule data is that the velocity is non-linear with respect to the distance.

In the previous lecture, we used sklearn to fit the linear regression model with high polynomial degrees.

Here we will use PyTorch to fit the Betoule data and compare the results with the linear regression model.

- Load the Betoule data

In [None]:
## Load the Betoule data
# betoule_data = pd.read_csv('data/mu_z.csv',header=1) ## reading from local file
betoule_data = pd.read_csv('https://raw.githubusercontent.com/AI4EPS/EPS88_PyEarth/refs/heads/main/docs/scripts/data/mu_z.csv',header=1) ## reading from github for running on colab
betoule_data.head()

## Apply processing to convert to distance and velocity
# speed of light in km/s
c = 2.9979e8 / 1000 

## the formula for v from z (and c)
betoule_data['velocity'] = c * (((betoule_data['z']+1.)**2-1.)/((betoule_data['z']+1.)**2+1.)) 

## convert mu to Gpc
betoule_data['distance'] = 10000*(10.**((betoule_data['mu'])/5.))*1e-9

In [None]:
## Review the data
plt.figure()
plt.scatter(
plt.xlabel('Distance (Mpc)')
plt.ylabel('Velocity (km s$^{-1}$)')
plt.show()


- Prepare the data into features (X) and target (y). This is same as the previous lecture.

In [None]:
## Define features (X) and target (y) variables using the distance as the feature and velocity as the target
X = 
y = 

## Split the data into training and test sets using 30% of the data for testing
X_train, X_test, y_train, y_test = 

- Let's start to build the first neural network model to fit the Betoule data.

In [None]:
## Convert data to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)

## Normalize the data to make the training process more efficient
magnitude_X = 10**int(np.log10(X.max()))
magnitude_y = 10**int(np.log10(y.max()))
X_train_tensor = X_train_tensor / magnitude_X
y_train_tensor = y_train_tensor / magnitude_y
X_test_tensor = X_test_tensor / magnitude_X
y_test_tensor = y_test_tensor / magnitude_y

## Define the neural network model
class SimpleNN(nn.Module):
    def __init__(self, input_size, output_size, hidden_size):
        super(SimpleNN, self).__init__()
        ## Define a linear layer with input size, hidden size
        self.fc1 = 
        ## Define an activation function using ReLU
        self.relu = 
        ## Define a linear layer with hidden size, output size
        self.fc2 =
    
    def forward(self, x):
        ## Apply the first linear layer
        out = 
        ## Apply the activation function
        out = 
        ## Apply the second linear layer
        out = 
        return out

## Initialize the model, loss function, and optimizer
input_size = X.shape[-1]
output_size = 1 # Output layer for regression (1 output neuron)
hidden_size = 16

## Define the model, loss function, and optimizer. Hint: using your defined model, MSE loss, and Adam optimizer
model = 
criterion = 
optimizer = 

## Define fit function
def fit(model, X, y, epochs=100):
    ## set the model to training
    model.
    losses = []
    for epoch in range(epochs):
        ## zero the gradients
        optimizer.zero_grad()

        ## get the outputs from the model
        outputs = 
        ## calculate the loss
        loss = 
        loss.backward()
        ## update the weights
        optimizer.

        losses.append(loss.item())
        if (epoch+1) % 10 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')
    return losses

## Define predict function
def predict(model, X):
    ## set the model to evaluation
    model.
    with torch.no_grad():
        ## get the outputs from the model
        outputs = 
    return outputs

## Train the model
losses = fit(model, X_train_tensor, y_train_tensor, epochs=100)

## Plot the loss during the training process
plt.figure()
plt.plot(
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.show()


- Evaluate the model on the test set. This is same as the previous lecture.

In [None]:
## Predict on the test set
y_pred_tensor = predict(
y_pred = y_pred_tensor.numpy() * magnitude_y

## Calculate R-squared metric
r2 = 
print(f'R-squared: {r2:.4f}')

In [None]:
## Predict on the whole dataset for plotting
X_tensor = torch.tensor(X, dtype=torch.float32)
X_tensor = X_tensor / magnitude_X
y_pred_tensor = predict(model, X_tensor)
y_pred = y_pred_tensor.numpy() * magnitude_y
y_pred = y_pred.squeeze() # remove the extra dimension

## Plot the results
plt.figure(figsize=(6, 6))
plt.subplot(2,1,1)
## plot the data
plt.scatter(
## plot the fitted line
plt.plot(
plt.title('data and a polynomial degree 2 fit')
plt.ylabel('Velocity (km s$^{-1}$)')
plt.xlabel('Distance (Mpc)')

## plot the residuals
plt.subplot(2,1,2)
plt.scatter(
plt.title('residuals of a polynomial degree 2 fit')
plt.ylabel('Residual velocity (km s$^{-1}$)')
plt.xlabel('Distance (Mpc)')

plt.tight_layout()
plt.show()


- Compare the results with previous polynomial regression. How does the neural network perform?

## Applying Neural Networks for Classification

Neural networks work well for the regression tasks, how about the classification tasks?

Let's continue to apply neural networks for the binary classification task.

Again, we will re-use the basalt affinity dataset that we covered in the previous lecture.

If you already forgot the background of this data, please review the lecture [05 classification](https://ai4eps.github.io/EPS88_PyEarth/lectures/05_classification/#classifying-volcanic-rocks).

- Load the basalt affinity data

In [None]:
## Load the basalt affinity data
# basalt_data = pd.read_csv('data/Vermeesch2006.csv') ## reading from local file
basalt_data = pd.read_csv('https://raw.githubusercontent.com/AI4EPS/EPS88_PyEarth/refs/heads/main/docs/scripts/data/Vermeesch2006.csv') ## reading from github for running on colab
basalt_data.tail()

In [None]:
## Review the data
plt.figure(figsize=(8, 6))

## plot each affinity as a different color
for affinity in basalt_data['affinity'].unique():
    subset = 
    plt.scatter(

plt.legend()
plt.xlabel('TiO2 (wt%)')
plt.ylabel('V (ppm)')
plt.show()


- Prepare the data into features (X) and target (y). This is same as the previous lecture.

In [None]:
## Prepare the data into features (X) and target (y)
X = 
y = 

## Encode the target variable
le = LabelEncoder()
y = 

## Impute missing values using median imputation
imputer = SimpleImputer(strategy='median')
X = 

## Split the data into training and test sets using 30% of the data for testing
X_train, X_test, y_train, y_test = 

- Let's start to build the second neural network model to fit the basalt affinity data.

In [None]:
## Convert data to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

## Normalize the data to make the training process more efficient
mu = X_train_tensor.mean(dim=0, keepdim=True)
std = X_train_tensor.std(dim=0, keepdim=True)
X_train_tensor = (X_train_tensor - mu) / std
X_test_tensor = (X_test_tensor - mu) / std

## Define the neural network model
class SimpleNN(nn.Module):
    def __init__(self, input_size,  output_size, hidden_size):
        super(SimpleNN, self).__init__()
        ## Define a linear layer with input size, hidden size
        self.fc1 = 
        ## Define an activation function using ReLU
        self.relu = 
        ## Define a linear layer with hidden size, output size
        self.fc2 = 
    
    def forward(self, x):
        ## Apply the first linear layer
        out = 
        ## Apply the activation function
        out = 
        ## Apply the second linear layer
        out = 
        return out

## Initialize the model, loss function, and optimizer
input_size = X_train.shape[-1]
output_size = len(le.classes_) # Output layer for classification (number of classes)
hidden_size = 16

## Define the model, loss function, and optimizer. Hint: using your defined model, CrossEntropy loss, and Adam optimizer
model = 
criterion = 
optimizer = 

## Define fit function
def fit(model, X_train, y_train, epochs=100):
    ## set the model to training
    model.
    losses = []
    for epoch in range(epochs):
        ## zero the gradients
        optimizer.zero_grad()

        ## get the outputs from the model
        outputs = 
        ## calculate the loss
        loss = 
        loss.backward()
        ## update the weights
        optimizer.step()

        losses.append(loss.item())
        if (epoch+1) % 10 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')
    return losses

## Define predict function
def predict(model, X):
    ## set the model to evaluation
    model.
    with torch.no_grad():
        ## get the outputs from the model
        outputs = 
        _, predicted = torch.max(outputs, 1)
    return predicted

## Train the model
losses = fit(model, X_train_tensor, y_train_tensor, epochs=100)

## Plot the loss
plt.figure()
plt.plot(
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.show()


- Evaluate the model on the test set. This is same as the previous lecture.

In [None]:
## Predict on the test set
y_pred_tensor = 
y_pred = y_pred_tensor.numpy()

## Calculate accuracy
accuracy = 
print(f'Accuracy: {accuracy:.4f}')

## Confusion matrix; Hint: use confusion_matrix from sklearn.metrics
conf_matrix = 
disp = ConfusionMatrixDisplay(confusion_matrix=conf_matrix, display_labels=le.classes_)
disp.plot(cmap=plt.cm.Blues, values_format='d', colorbar=False);

- Compare the results with previous classification methods. How does the neural network perform?

- Compare the two neural networks built for the regression and classification tasks. Please list the similarities and differences.

- The neural networks we built are very simple with only one hidden layer. Do you know which variable controls the complexity of the neural networks?

- If we want to build a more complex neural network, how can we do it? Think about the number of layers and neurons in each layer.


If you are interested to build a more complex neural network, you can try the following website.

The more layers and neurons you add, the more complex the neural network becomes, it can fit more complex data, while in the meantime, it is also more challenging to train.

There are many hyperparameters you can tune in the online playgroud. Explore if we can find the parameters that can fit all the data distributions.

[Train a neural network online](https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.43783&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false)

![20241103200049](https://raw.githubusercontent.com/zhuwq0/images/main/20241103200049.png)