In [2]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

# Tarefa 3 - Neural Networks
Third assessed coursework for the course: Técnicas e Algoritmos em Ciência de Dados

This tarefa provides an exciting opportunity for students to put their knowledge acquired in class into practice, using neural networks to solve real-world problems in both classification and regression. Students will apply the concepts they have learned to build, train, and optimize neural networks, using a validation set to fine-tune hyperparameters. Students will also get used to generating important plots during training to analyse the models' behaviour. By the end of the project, students will have gained hands-on experience in implementing neural networks.

## General guidelines:

* This work must be entirely original. You are allowed to research documentation for specific libraries, but copying solutions from the internet or your classmates is strictly prohibited. Any such actions will result in a deduction of points for the coursework.
* Please enter your code in the designated areas of the notebook. You can create additional code cells to experiment with, but __make sure to place your final solutions where they are requested in the notebook.__
* Before submitting your work, make sure to rename the file to the random number that you created for the previous coursework (for example, 289479.ipynb).3

## Notebook Overview:

1. [Regression](#Regression) (50%)
2. [Classification](#Classification) (50%)

# Regression
## Dataset and Problem Description
In this exercise, you will use the Energy Efficiency Prediction dataset. This dataset contains information about the energy efficiency of buildings based on eight features, including the size of the building, the orientation, and the type of building materials used. The dataset includes two targets: heating load and cooling load, which represent the energy required to heat and cool the building, respectively.
This dataset is useful for building neural networks that predict the energy efficiency of buildings, which is an important problem in the field of sustainable energy. The dataset has been used in several machine learning research papers and provides a challenging regression problem.

## Exercise Description: Energy Efficiency Prediction with Neural Networks
In this exercise, you will use the Energy Efficiency Prediction dataset provided.
You will build and train a neural network to predict the heating load (column labelled y1 in the dataset) and the cooling load (column labelled y2) of a building based on its energy efficiency features. 

**To complete this exercise, you will write code to build and train neural networks for this problem:**

1. Split the dataset into training, validation, and test sets, using a 70:15:15 ratio.

2. Using numpy, build a neural network that takes in the energy efficiency features as input and predicts the heating load as output. You will choose the number of neurons per layers and the number of layers, but each layer will have the same number of neurons.

3. Code the forward pass and backpropagation algorithm to learn the weights of the neural network. Use the training set to train the neural network and update the weights using stochastic gradient descent. You will need to regularize your neural network using weight decay, that is, you will include a regularization term in your error function.

4. Monitor the training by plotting the training and validation losses across the epochs. 


The performance of your neural network will be different depending on the number of layers, number of neurons per layer and the value of λ that controls the amount of weight decay. You will experiment with 3 values of λ: 0 (no weight decay), 0.1 and 0.01.
To choose the best network configuration and assess its performance you will:

1. Calculate the loss for each configuration on the validation set.

2. Generate 3 [heatmaps](https://seaborn.pydata.org/generated/seaborn.heatmap.html) one for each value of the λ regularization parameter, displaying the loss on the validation set by plotting the number of layers and number of neurons in a grid. This will help you visualise the best configuration for the neural network.

3. Train your final model selecting the best combination of hyper-parameters and evaluate the final performance of the neural network using the test set and the root mean squared error as the metric and report that.

**Important:**
* Train for 50 epochs, remember that one epoch finishes when the whole training set was seen during training.
* Set the learning rate $\eta$ to $0.01$.

In [3]:
## your code goes here:

#Carregando o código:
EnergyEficciencydf = pd.read_csv('energy_efficiency.csv')
EnergyEficciencydf.head(-1)

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,Y1,Y2
0,1.000000,0.000000,0.285714,0.000000,1.0,0.000000,0.0,0.0,0.257212,0.280905
1,1.000000,0.000000,0.285714,0.000000,1.0,0.333333,0.0,0.0,0.257212,0.280905
2,1.000000,0.000000,0.285714,0.000000,1.0,0.666667,0.0,0.0,0.257212,0.280905
3,1.000000,0.000000,0.285714,0.000000,1.0,1.000000,0.0,0.0,0.257212,0.280905
4,0.777778,0.166667,0.428571,0.111111,1.0,0.000000,0.0,0.0,0.399838,0.468085
...,...,...,...,...,...,...,...,...,...,...
762,0.055556,0.916667,0.571429,1.000000,0.0,0.666667,1.0,1.0,0.327582,0.264207
763,0.055556,0.916667,0.571429,1.000000,0.0,1.000000,1.0,1.0,0.320032,0.282790
764,0.000000,1.000000,0.714286,1.000000,0.0,0.000000,1.0,1.0,0.283904,0.161056
765,0.000000,1.000000,0.714286,1.000000,0.0,0.333333,1.0,1.0,0.281208,0.167250


In [4]:
#Fazendo o Split do conjunto de dados.

randomstate = np.random.RandomState(13)


x = EnergyEficciencydf.iloc[:, 0:8]
y = EnergyEficciencydf.iloc[:, 8:10]

#O primeiro split separa o treino com proporção de 70%
x_train, x_validation_test, y_train, y_validation_test = train_test_split(x,y, test_size=0.3, random_state=randomstate)
#O segundo spplit separa o teste e a validação ambos com proporção de 15%
x_validation, x_test, y_validation, y_test = train_test_split(x_validation_test, y_validation_test, test_size=0.5, random_state=randomstate)



# Classification
## Dataset description: 
This is a dataset from the medical domain. It describes the problem of diagnosing coronary heart disease (CHD ) via Traditional Chinese Medicine approaches. Each datapoint corresponds to a patient represented by a set of 49 features corresponding to the presence or absence of different symptoms: feelings cold or warm, sweating, etc. The 6 labels represent presence or absence of specific heart conditions: deficiency of heart qi syndrome, deficiency of heart yang syndrome, deficiency of heart yin syndrome, qi stagnation syndrome, turbid phlegm syndrome, and blood stasis syndrome.

## Exercise Description: CHD49 Multi-Label Classification with Neural Networks
In this exercise, you will build and train a neural network to predict the 6 different labels of CHD (last 6 columns of the dataset). 

**To complete this exercise, follow these steps:**

1. Load the dataset and split it into training, validation, and test sets, using a 70:15:15 ratio. 

2. Build a neural network using numpy that takes in the features as input and predicts the 6 different labels. You will choose the number of neurons per layers and the number of layers, but each layer will have the same number of neurons.

3. Code the forward pass and backpropagation algorithm to learn the weights of the neural network. Use the training set to train the neural network and update the weights using batch gradient descent. You will choose the number of neurons per layers and the number of layers, but each layer will have the same number of neurons.

4. Monitor the training by plotting the training and validation losses across the epochs. 

The performance of your neural network will be different depending on the number of layers, number of neurons per layer and the value of λ that controls the amount of weight decay. You will experiment with 3 values of λ: 0 (no weight decay), 0.1 and 0.01.
To choose the best network configuration and assess its performance you will:

1. Calculate the loss for each configuration on the validation set.

2. Generate 3 heatmaps, one for each value of the λ regularization parameter, displaying the loss on the validation set by plotting the number of layers and number of neurons in a grid. This will help you visualise the best configuration for the neural network.

3. Train your final model selecting the best combination of hyper-parameters and evaluate the final performance of the neural network using the test set and by calculating the area under the ROC curve, accuracy and F1 score as metrics and report these.

**Important:**
* Train for at least 1000 epochs, remember that one epoch finishes when the whole training set was seen during training.
* Set the learning rate $\eta$ to $0.01$.


In [5]:
## your code goes here:

#Carregando o código:
CHDdf = pd.read_csv('CHD_49.csv')
CHDdf.head(-1)

Unnamed: 0,att1,att2,att3,att4,att5,att6,att7,att8,att9,att10,...,att46,att47,att48,att49,label1,label2,label3,label4,label5,label6
0,1.0,-1.0,1.0,-1.0,1.0,1.0,0.0,-0.5,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0
1,-1.0,-1.0,-1.0,-1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0,...,1.0,1.0,1.0,-1.0,1.0,0.0,0.0,1.0,0.0,0.0
2,-1.0,1.0,-1.0,-1.0,1.0,-1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,1.0,-1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
3,1.0,1.0,-1.0,1.0,-1.0,1.0,0.0,-0.5,1.0,1.0,...,-1.0,1.0,1.0,-1.0,0.0,1.0,0.0,0.0,0.0,1.0
4,-1.0,-1.0,-1.0,1.0,1.0,1.0,0.0,-0.5,-1.0,1.0,...,-1.0,-1.0,-1.0,-1.0,0.0,0.0,1.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
549,1.0,1.0,-1.0,-1.0,-1.0,1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,1.0,1.0,-1.0,1.0,0.0,1.0,0.0,0.0,1.0
550,1.0,-1.0,-1.0,-1.0,1.0,-1.0,0.0,0.0,1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,1.0,0.0,0.0,0.0,0.0,1.0
551,-1.0,-1.0,-1.0,-1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,1.0,0.0,0.0,0.0,0.0,1.0
552,-1.0,-1.0,-1.0,1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,1.0,-1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0


In [12]:
#Fazendo o Split do conjunto de dados.

randomstate = np.random.RandomState(13)


x2 = CHDdf.iloc[:, 0:49]
y2 = CHDdf.iloc[:, 49:55]

#O primeiro split separa o treino com proporção de 70%
x_train2, x_validation_test2, y_train2, y_validation_test2 = train_test_split(x2,y2, test_size=0.3, random_state=randomstate)
#O segundo spplit separa o teste e a validação ambos com proporção de 15%
x_validation2, x_test2, y_validation2, y_test2 = train_test_split(x_validation_test2, y_validation_test2, test_size=0.5, random_state=randomstate)

In [13]:
#Inicializando os nossos parâmetros de pesos e vieses

def initialize_weights(input_size, hidden_size, output_size):
    
    #Os pesos são inicializados com o random randn e os vieses com np zeros para anular qualquer enviesamento inicial

    W1 = np.random.randn(input_size, hidden_size)
    b1 = np.zeros((1, hidden_size))
    
    W2 = np.random.randn(hidden_size, hidden_size)
    b2 = np.zeros((1, hidden_size))
    
    W3 = np.random.randn(hidden_size, output_size)
    b3 = np.zeros((1, output_size))

    return W1, W2, W3, b1, b2, b3

#Criando a função foward_propagation a partir de uma função de ativação ReLu

def ReLU(Z):
    return np.maximum(0, Z)

def sigmoid(Z):
    return 1 / (1 + np.exp(-Z))

def softmax(Z):
    #Subtrai o valor máximo de x para evitar a explosão numérica
    Z -= np.max(Z)
    
    #Calcula a exponenciação de cada elemento de x e a soma desses expoentes
    exps = np.exp(Z)
    sum_exps = np.sum(exps)
    
    #Calcula as probabilidades normalizadas e retorna
    softmax_probs = exps / sum_exps
    return softmax_probs

def forward_propagation(X, W1, W2, W3, b1, b2, b3):
    Z1 = np.dot(X, W1) + b1
    A1 = ReLU(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = ReLU(Z2)
    Z3 = np.dot(A2, W3) + b3
    y_pred = softmax(Z3)
    return Z1, Z2, Z3, A1, A2, y_pred

In [28]:
#BackPropagation

def back_propagation(X, y, Z1, Z2, A1, A2, y_pred, W2, W3):
    m = X.shape[0]
    
    # Gradiente da função de perda em relação a Z3
    dZ3 = y_pred - y
    
    # Gradiente da função de perda em relação a W3, b3 e A2
    dW3 = (1 / m) * np.dot(A2.T, dZ3)
    db3 = (1 / m) * np.sum(dZ3, axis=0)
    dA2 = np.dot(dZ3, W3.T)
    
    # Gradiente da função de ativação ReLU em relação a Z2
    dZ2 = dA2 * (Z2 > 0)
    
    # Gradiente da função de perda em relação a W2, b2 e A1
    dW2 = (1 / m) * np.dot(A1.T, dZ2)
    db2 = (1 / m) * np.sum(dZ2, axis=0)
    dA1 = np.dot(dZ2, W2.T)
    
    # Gradiente da função de ativação ReLU em relação a Z1
    dZ1 = dA1 * (Z1 > 0)
    
    # Gradiente da função de perda em relação a W1 e b1
    dW1 = (1 / m) * np.dot(X.T, dZ1)
    db1 = (1 / m) * np.sum(dZ1, axis=0)
    
    return dW1, dW2, dW3, db1, db2, db3

def update_params(W1, W2, W3, b1, b2, b3, dW1, dW2, dW3, db1, db2, db3, learning_rate):
    # Atualização dos pesos e vieses
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    W3 -= learning_rate * dW3
    b3 -= learning_rate * db3

    return W1, W2, W3, b1, b2, b3

In [25]:
def calculate_loss(y, y_pred):
    # Entropia cruzada categórica
    loss = -np.mean(y * np.log(y_pred + 1e-8))  # Adicionamos 1e-8 para evitar problemas de log(0)
    return loss

def train(X, y, num_epochs, learning_rate):
    input_size = X.shape[1]
    hidden_size = 64
    output_size = y.shape[1]
    
    W1, W2, W3, b1, b2, b3 = initialize_weights(input_size, hidden_size, output_size)
    
    for epoch in range(num_epochs):
        # Forward propagation
        Z1, Z2, Z3, A1, A2, y_pred = forward_propagation(X, W1, W2, W3, b1, b2, b3)
        
        # Calcular a loss
        loss = calculate_loss(y, y_pred)
        
        # Backward propagation
        dW1, dW2, dW3, db1, db2, db3 = back_propagation(X, y, Z1, Z2, A1, A2, y_pred, W2, W3)
        
        # Atualização dos parâmetros
        W1, W2, W3, b1, b2, b3 = update_params(W1, W2, W3, b1, b2, b3, dW1, dW2, dW3, db1, db2, db3, learning_rate)
        
        # Print da loss a cada 100 épocas
        if (epoch + 1) % 50 == 0:
            print(f"Época: {epoch+1}, Loss: {loss}")
    
    return W1, W2, W3, b1, b2, b3

In [27]:
train(x2, y2, 500, 0.01)

ValueError: Length of values (1) does not match length of index (6)