<a href="https://colab.research.google.com/github/AlexMitsis/climate-change-neural-network/blob/main/Neural_Networks_Project_(EN).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Intelligent Systems Project**

Regression with neural networks and the climate change dataset.

## **Loading and preparation**

Loading libraries, loading and displaying data

In [None]:
# We import all the necessary libraries
import pandas as pd
import numpy as np
import torch
from torch import nn
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import torch.optim as optim

from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

In [None]:
# We load the CSV file with the Pandas library for display purposes only
dataframe = pd.read_csv('climate_change.csv')
print(dataframe.head(5))
print("Dimensions:", dataframe.shape)

   Year  Month    MEI     CO2      CH4      N2O   CFC-11   CFC-12        TSI  \
0  1983      5  2.556  345.96  1638.59  303.677  191.324  350.113  1366.1024   
1  1983      6  2.167  345.52  1633.71  303.746  192.057  351.848  1366.1208   
2  1983      7  1.741  344.15  1633.22  303.795  192.818  353.725  1366.2850   
3  1983      8  1.130  342.25  1631.35  303.839  193.602  355.633  1366.4202   
4  1983      9  0.428  340.17  1648.40  303.901  194.392  357.465  1366.2335   

   Aerosols   Temp  
0    0.0863  0.109  
1    0.0794  0.118  
2    0.0731  0.137  
3    0.0673  0.176  
4    0.0619  0.149  
Διαστάσεις: (308, 11)


## Creation of a custom Dataset object

We create a Dataset object that will feed the dataloaders, and a transform object that will modify the data before it is read by the loaders.

Transform is very important for data augmentation, but in this case we use it for data normalization.

In [None]:
class ClimateDataset(torch.utils.data.Dataset):
  def __init__(self, csv_file, transform=None, train=True):
    """
    Args:
      csv_file (string): Path to the csv file
      transform (callable, optional): Optional transform to be applied
          on each sample.
      train (bool): whether to create the training set or the test set
    """

    # We define the size of the training set
    train_set_size = 250

    # Loading the csv via Pandas, deleting columns that do not help in the analysis,
    # converting Pandas Dataframes to Numpy Arrays, which have more helpful
    # properties
    dataframe = pd.read_csv(csv_file)
    targets = np.array(dataframe[['Temp']]).astype('float32')
    data = np.array(dataframe.drop(columns=['Year','Month','Temp'])).astype('float32')

    # Separation of data into training and test set
    # This line will have to be replaced in the future if we want to do
    # more realistic evaluation
    # We also set the seed to random state for repeatability
    train_data, test_data, train_targets, test_targets = train_test_split(data,
                            targets, train_size=train_set_size, random_state=665)


    # We set and initialize the transform
    self.transform = transform
    # Attention, the transform is initialized with the values of the training set, either
    # the dataset refers to the train or the test set.
    self.transform.fit(train_data, train_targets)

    # If we have been asked to create a train database, we keep the train data.
    # Otherwise we keep the test data
    if train==True:
      self.data = train_data
      self.targets = train_targets
    else:
      self.data = test_data
      self.targets = test_targets
    # The initialisation ends here

  # The __len__ of a Dataset object must return the number of
  # of its objects.
  def __len__(self):
    return len(self.targets)

  # The __getitem__ of a Dataset object takes as parameter an index and
  # returns the corresponding object. It is used by the DataLoader to
  # derive minibatches.
  def __getitem__(self, idx):
    if torch.is_tensor(idx):
      idx = idx.tolist()
    item_data = self.data[idx,:]
    item_target = self.targets[idx,:]

    # If we have a transform defined, we apply it to the object before
    # return. When I call transform by its name
    # transform.__call__() is called (see below)
    if self.transform:
      item_data, item_target = self.transform(item_data, item_target)
    return item_data, item_target


  # This class will play the role of Transform. It will remove the minimum value
  # from each column of data and divide by its range to normalize
  # the values of all variables (and targets) to [0,1]
  class MinMaxScaler():

  # The fit is initialized with the training data, and calculates the minimum and
  # maximum values. The objects are numpy arrays,
  # so they have the min and max function built-in
  def fit(self, data, targets):
    self.data_min = data.min(0, keepdims=True)
    self.target_min = targets.min(0, keepdims=True)
    self.data_max = data.max(0, keepdims=True)
    self.target_max = targets.max(0, keepdims=True)
    return self

  # When transform.__call__() is called, it accepts an object (data and target)
  # and returns it transformed - in this case normalized
  def __call__(self, data, target):
    data = (data - self.data_min)/(self.data_max-self.data_min)
    target = (target - self.target_min)/(self.target_max-self.target_min)
    return data, target

  # We'll train a system to learn the normalized targets.
  # For real-world application, we'll need to be able to transform
  # the outputs of the system back to its original range of values.
  def inverse_transform(self, data, target):
    data=data * (self.data_max-self.data_min) + self.data_min
    target = target * (self.target_max-self.target_min) + self.target_min
    return data, target

## Creation of DataLoaders

We create dataloaders for the training set and for the test set.

In [None]:
# The batch size is big enough to fit all the training data and all the
# all the test data in one loop (practically we apply Batch Gradient Descent, each
# Batch is an Epoch)
# (if the batch size is larger than the available data, pytorch
# creates a batch with the available data)
batch_size = 500
transform=MinMaxScaler()

train_dataset = ClimateDataset(csv_file='climate_change.csv', train=True, transform=transform)
test_dataset = ClimateDataset(csv_file='climate_change.csv', train=False, transform=transform)

train_dataloader = DataLoader(train_dataset, batch_size=batch_size)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size)


## Network architecture

We define a simple network with one neuron for linear regression

In [None]:
# Network defining code
class NeuralNetwork1(nn.Module):

  def __init__(self):
      # I run the init of the parent class
      super(NeuralNetwork1, self).__init__()
      # The input x of the network consists of 8 types of variables, after removing the columns with year, month and temperature
      self.network_architecture_linear = nn.Linear(8, 1)

  def forward(self, x):
      out = self.network_architecture_linear(x)
      return out

## Training and Test Loop

We write the code for the training loop and the test loop

In [None]:
def train_loop(dataloader, model, loss_fn, optimizer, current_epoch):
    size = len(dataloader.dataset)
    # enumerate(dataloader) returns the number of the batch,
    # and the batch itself i.e. a number of Tensors X and targets y equal to batch_size
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # We show the loss every 20 epochs
        if((current_epoch+1)%20==0):
            loss = loss.item()
            print(f"loss: {loss:>7f} (Epoch: {current_epoch+1})")

def test_loop(dataloader, model, loss_fn, current_epoch):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)

    with torch.no_grad():
        for X, y in dataloader:
            # convert data to the appropriate format for r2_score
            pred = model(X).permute(1,0,2).reshape(58,1)
            y = y.permute(1,0,2).reshape(58,1)
            r2 = r2_score(y, pred)
            # Item() converts the Tensor to a simple number
            test_loss = loss_fn(pred, y).item()


    # R2 score values from 0 to 1 express the percentage of the variance of the data that is correctly expressed by the model.
    # Values below 0 mean that the model is worse than an estimator that simply always predicts the average value of the training set.

    print(f"\nTest Error for Epoch {current_epoch+1}: \nR2 score: {r2:>0.2f}, Avg loss: {test_loss:>8f} \n\n")


## Neural Network 1:

As each epoch consists of 1 batch I set the loss to be displayed every 20 epochs
while test_loop is called every 100 epochs

As the epochs pass we observe that the train/test losses decrease
while the R2 score increases. Logically 3000 epochs is not enough time to train the model since the R2 score reaches a value close to 0.3 (ideally it would be closer to 1 than 0). Indeed, I have experimentally confirmed that for more epochs there is a large room for improvement of our model. (increase of R2 score, decrease of loss)

By changing torch.manual_seed(50) to torch.manual_seed(100) we see a slight deterioration in our results. It may be that with 100 instead of 50 the random initialized weights are farther away from the weights we ideally want our model to have. For 50 it takes 14 seconds to complete while for 100 it takes 12 seconds.

By replacing the optimizer from SGD to Adam it seems that we generally have a better R2 score and less loss in our data. The R2 score continues to grow more steeply than when we used SGD until it reaches a point where it stays somewhat constant. Similarly the test loss decreases more steeply per epoch. For some reason this small increase is not seen in the train loss, where it continues to decrease steadily but at a slower rate after the steep increase. There is a possibility that the final results are better for SGD if we run the model for more epochs, but it seems that Adam is much better if we have limited time available (in this case 3000 epochs).

## Neural Network 2:
The output is not linear in this case due to the hidden layers on which we have applied the non-linear activation function ReLU.

With SGD we train the model in 13 seconds and with Adam in 15 seconds.

With SGD we observe as expected that the R2 score increases while the train/test loss decreases with the passage of epochs. With Adam we observe an increase in R2 score and decrease in test loss until around epoch 400 where after that the opposite paradoxically starts to happen (decrease in R2 score, increase in loss). The train loss however continues to decrease normally after this point. Finally Adam gives us better results even though his step performance gets worse after epoch 400.

We notice that the test error doesn't improve if we keep running the script for a long time.
The phenomenon of having low Training Error but high Test Error is the definition of overfitting. In neural networks because training is gradual over time, this phenomenon can be observed to evolve over time. This is why we have the practice of Early Stopping, where we achieve regularization by stopping training before the Test Error starts to drop (the task did not call for Early Stopping).

## Hyperparameter definition and network training

Network execution and evaluation code

#NeuralNetwork2


In [None]:
#Code definition of the new network

class NeuralNetwork2(nn.Module):

  def __init__(self):
      # Run the init of the parent class
      super(NeuralNetwork2, self).__init__()
      # the input x of the network consists of 8 types of variables, after removing the columns with year, month and temperature

      self.network_architecture = nn.Sequential(
            nn.Linear(8, 100),
            nn.ReLU(),
            nn.Linear(100, 100),
            nn.ReLU(),
            nn.Linear(100, 1),
        )

  def forward(self, x):
      out = self.network_architecture(x)
      return out


In [None]:
# manual_seed determines the random initialization of the weights/network parameters
# It is just a fixed seed in pytorch's random number generator.
torch.manual_seed(50)
#torch.manual_seed(100)

model = NeuralNetwork1()
#model = NeuralNetwork2()

# appropriate loss function for regression problems
loss_fn = nn.MSELoss()
learning_rate = 1e-3

# We set the optimizer to use the SGD algorithm
# optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
# Replace SGD with Adam
optimizer = optim.SGD(model.parameters(), lr = learning_rate)

# Training code here:
epochs = 3000
epoch_step = 100
for t in range(epochs):
    if(t==0):
      print(f"Epoch {t+1} - {t+epoch_step}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer, t)
    if((t+1)%100==0):
      test_loop(test_dataloader, model, loss_fn, t)
    if((t+1)%100==0 and (t+1)!=epochs):
      print(f"Epoch {t+1} - {t+1+epoch_step}\n-------------------------------")
print("Done!")



Epoch 1 - 100
-------------------------------
loss: 0.922417 (Epoch: 20)
loss: 0.710118 (Epoch: 40)
loss: 0.549098 (Epoch: 60)
loss: 0.426950 (Epoch: 80)
loss: 0.334270 (Epoch: 100)

Test Error for Epoch 100: 
R2 score: -20.14, Avg loss: 0.443976 


Epoch 100 - 200
-------------------------------
loss: 0.263929 (Epoch: 120)
loss: 0.210523 (Epoch: 140)
loss: 0.169955 (Epoch: 160)
loss: 0.139120 (Epoch: 180)
loss: 0.115664 (Epoch: 200)

Test Error for Epoch 200: 
R2 score: -6.72, Avg loss: 0.162051 


Epoch 200 - 300
-------------------------------
loss: 0.097803 (Epoch: 220)
loss: 0.084183 (Epoch: 240)
loss: 0.073780 (Epoch: 260)
loss: 0.065816 (Epoch: 280)
loss: 0.059702 (Epoch: 300)

Test Error for Epoch 300: 
R2 score: -2.76, Avg loss: 0.078962 


Epoch 300 - 400
-------------------------------
loss: 0.054991 (Epoch: 320)
loss: 0.051344 (Epoch: 340)
loss: 0.048505 (Epoch: 360)
loss: 0.046280 (Epoch: 380)
loss: 0.044520 (Epoch: 400)

Test Error for Epoch 400: 
R2 score: -1.45, Avg los