# Business Flight Price Predictor

This notebook contains explanations and steps for building a business flight price predictor using an Artificial Neural Network (ANN) model, implemented using the Pytorch library. The notebook is divided into 5 overall parts:

1) **Preprocessing**: This step includes all preprocessing made on the dataset prior to training the neural network. This includes cleaning the dataset, handling missing values and feature engineering. Further, all numerical variable are scaled using the StandadScaler and categorical variables are encoded using either the LabelEncoder or the OneHotEncoder.

2) **Creating tensors**: In this step, the preprocessed dataset is split into feature variables and target variables which is further divided into a train set and a test set. Finally, the train and test set for both the feature and target variables are converted into Pytorch tensors required for training a neural network with Pytorch.

3) **Training various ANNs with one hidden layer**: This includes training different ANNs with one hidden layer and varying hyperparameter settings on the train set. The goal is to see which settings that work relatively better for the specific dataset. This is done by e.g., changing the learning rate of the neural network, updating the number of epochs, and trying different activation functions.

4) **Training various ANNs with two hidden layers**: Based on which hyperparameter settings that performed best in 3), this section adds a secon hidden layer to the ANN to see if it results in better performance of the ANNs.

5) **The chosen one⚡️🧙🏼‍♂️**: Based on 4), the best performing ANN is selected. This ANN is then evaluated on the test set.

Before proceeding, we will first install the neccessarry Python packages and load all relevant libraries:

In [None]:
!pip install scikit-learn -q
!pip install itertools -q

[31mERROR: Could not find a version that satisfies the requirement itertools (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for itertools[0m[31m
[0m

In [None]:
# Load libraries
import pandas as pd # datahandling
import numpy as np # manipulating data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler, MinMaxScaler
import itertools # memory efficiient way of manipulating iterators (sequences of data)
import matplotlib.pyplot as plt  # Plotting library
from tqdm import tqdm_notebook # For progress bar
import torch # for training ANNs using Pytorch
from sklearn.model_selection import train_test_split # to split dataset into train and test set


## Preprocessing

In [None]:
# Load dataset
df = pd.read_csv("https://raw.githubusercontent.com/imads20/BDS23/main/M1_Final_Assignment/business.csv")

# Print the first five rows:
df.head()

Unnamed: 0,date,airline,ch_code,num_code,dep_time,from,time_taken,stop,arr_time,to,price
0,11-02-2022,Air India,AI,868,18:00,Delhi,02h 00m,non-stop,20:00,Mumbai,25612
1,11-02-2022,Air India,AI,624,19:00,Delhi,02h 15m,non-stop,21:15,Mumbai,25612
2,11-02-2022,Air India,AI,531,20:00,Delhi,24h 45m,1-stop\n\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t...,20:45,Mumbai,42220
3,11-02-2022,Air India,AI,839,21:25,Delhi,26h 30m,1-stop\n\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t...,23:55,Mumbai,44450
4,11-02-2022,Air India,AI,544,17:15,Delhi,06h 40m,1-stop\n\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t...,23:55,Mumbai,46690


In [None]:
# Print columns names and types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 93487 entries, 0 to 93486
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   date        93487 non-null  object
 1   airline     93487 non-null  object
 2   ch_code     93487 non-null  object
 3   num_code    93487 non-null  int64 
 4   dep_time    93487 non-null  object
 5   from        93487 non-null  object
 6   time_taken  93487 non-null  object
 7   stop        93487 non-null  object
 8   arr_time    93487 non-null  object
 9   to          93487 non-null  object
 10  price       93487 non-null  object
dtypes: int64(1), object(10)
memory usage: 7.8+ MB


### Cleaning the data

Cleaning data is a critical step prior to making an analysis on a dataset as it involes involves identifying and correcting errors, inconsistencies, and inaccuracies in the dataset to ensure its quality and reliability for further analysis. This step is crucial because real-world data is often messy, incomplete, or contains inaccuracies due to various reasons, such as data entry errors, missing values, outliers, or inconsistencies in formatting.

From the above we can see that most of the columns are objects and we only have one numerical columns. When comparing the column type to the column name, there are several of the columns that we would expect to contain numerical values even through they are caegorized as objects - for instance 'time_taken', 'stop', and 'price'. Further, we would also like to have a variable for the distance between the places. This means that we have some cleaning up to do with the dataset.

In [None]:
# Convert time_taken to number of minutes taken

# From the head() above, we can see that time_taken is formatted like '02h 00m'.
# Converting this format to number of minutes is a bit tedious, so we build a function to help us

# Function for converting the hour-minute format to minutes
def convert_to_minutes(time_str):
    hours, minutes_str = time_str.split('h ')   # Take the time string and split it by 'h '.
                                                # This will split the string into two parts:
                                                # The first part will correspond to the hours number of hours
                                                # The second part will be the number of minutes and an 'm'

    minutes = minutes_str.replace("m", "") # Replace the 'm' in the minutes with nothing

    hours = int(hours) # Convert hours and minutes to integers to ensure that we have numeric format
    minutes = int(minutes)

    total_minutes = (hours * 60) + minutes # Find the total number of minutes

    return total_minutes

# Apply the function defined above to the column
df['duration_minutes'] = df['time_taken'].apply(convert_to_minutes)
df['duration_minutes'][:5] # Print the first 5 rows as control

0     120
1     135
2    1485
3    1590
4     400
Name: duration_minutes, dtype: int64

In [None]:
# Convert stop to number of stops

# Looking at the head() above, the stop column seems to be defined in an interesting manner.
# In order to get an overview of the formatting across the whole column, we can print the unique values in the column:
df['stop'].unique()

array(['non-stop ',
       '1-stop\n\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t',
       '1-stop\n\t\t\t\t\t\t\t\t\t\t\t\tVia IDR\n\t\t\t\t\t\t\t\t\t\t\t\t',
       '1-stop\n\t\t\t\t\t\t\t\t\t\t\t\tVia IXU\n\t\t\t\t\t\t\t\t\t\t\t\t',
       '1-stop\n\t\t\t\t\t\t\t\t\t\t\t\tVia Chennai\n\t\t\t\t\t\t\t\t\t\t\t\t',
       '1-stop\n\t\t\t\t\t\t\t\t\t\t\t\tVia Lucknow\n\t\t\t\t\t\t\t\t\t\t\t\t',
       '1-stop\n\t\t\t\t\t\t\t\t\t\t\t\tVia STV\n\t\t\t\t\t\t\t\t\t\t\t\t',
       '1-stop\n\t\t\t\t\t\t\t\t\t\t\t\tVia Hyderabad\n\t\t\t\t\t\t\t\t\t\t\t\t',
       '1-stop\n\t\t\t\t\t\t\t\t\t\t\t\tVia GAY\n\t\t\t\t\t\t\t\t\t\t\t\t',
       '2+-stop',
       '1-stop\n\t\t\t\t\t\t\t\t\t\t\t\tVia Guwahati\n\t\t\t\t\t\t\t\t\t\t\t\t',
       '1-stop\n\t\t\t\t\t\t\t\t\t\t\t\tVia GAU\n\t\t\t\t\t\t\t\t\t\t\t\t',
       '1-stop\n\t\t\t\t\t\t\t\t\t\t\t\tVia VTZ\n\t\t\t\t\t\t\t\t\t\t\t\t',
       '1-stop\n\t\t\t\t\t\t\t\t\t\t\t\tVia NDC\n\t\t\t\t\t\t\t\t\t\t\t\t',
       '1-stop\n\t\t\t\t\t\t\t\t\t\t\t

In [None]:
# The columns contains the number of stops as well as the location of the stop.
# The number of stops can either be zero, one, or more than two.
# Once again, cleaning up this format is a bit tedious, so we build a function to help us

# Function for defining number of stops
def number_of_stops(string):
    stops_str = string.split('-stop')[0]    # Split the string by '-stop' and keep only the first part of the split
                                            # When doing this we will be left with either 'non', '1' or '2+' values

    # Define what the number of stop is based on the string
    if stops_str == 'non':
        stops = 0
    elif stops_str == '1':
        stops = 1
    elif stops_str == '2+':
        stops = 2
    else:
        stops = np.nan  # If the format of the sting doesn't match the above, set it as nan

    return stops

# Apply the function defined above to the column
df['stop'] = df['stop'].apply(number_of_stops)
df['stop'][:5] # Control

0    0
1    0
2    1
3    1
4    1
Name: stop, dtype: int64

In [None]:
# Convert price to numeical value:

# The head() above shows that price contains a ','.
# In order to convert the price to a numerical value, we need to remove this
df['price'] = df['price'].str.replace(",", "")

# Afterwards, convert the string to an integer to ensure that the column gets interpreted as a numeric value
df['price'] = df['price'].astype(int)
df['price'][:5] # Control

0    25612
1    25612
2    42220
3    44450
4    46690
Name: price, dtype: int64

### Handling missing values

Missing values, denoted as NaN (Not a Number) or null values, can be present in various features of a dataset due to a variety of reasons, such as data collection errors, incomplete records ect. Before working with any dataset it's also important to take care of any missing values. If left untrated, missing values can cause issues later in an analysis as not all algorthms can handle missing values.

There are many ways to handle missing values, and the strategy depends on the nature of the variable:

- **For numerical values**: common strategies to handle missing values include: Replace missing values with the mean or median of the non-missing values in that column, or predicting the missing values using a regression model based on other features.

- **For categorical values**: common strategies to handle missing values include: Replace missing values with the mode of the non-missing values in that column.

Depending on the number of missing values, it might also make sense to remove the rows or columns with missing values. This is suitable when the proportion of missing values is small and won't significantly impact the analysis.

In [None]:
# Find the number of missing values in each column
df.isnull().sum()

date                0
airline             0
ch_code             0
num_code            0
dep_time            0
from                0
time_taken          0
stop                0
arr_time            0
to                  0
price               0
duration_minutes    0
dtype: int64

As seen from the above, there are no missing values in the dataset.

In [None]:
# We remove any duplicate values in the data set
df = df.drop_duplicates()
df.info()
# We have same number of observations so there were no duplicate values

<class 'pandas.core.frame.DataFrame'>
Int64Index: 93487 entries, 0 to 93486
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   date              93487 non-null  object
 1   airline           93487 non-null  object
 2   ch_code           93487 non-null  object
 3   num_code          93487 non-null  int64 
 4   dep_time          93487 non-null  object
 5   from              93487 non-null  object
 6   time_taken        93487 non-null  object
 7   stop              93487 non-null  int64 
 8   arr_time          93487 non-null  object
 9   to                93487 non-null  object
 10  price             93487 non-null  int64 
 11  duration_minutes  93487 non-null  int64 
dtypes: int64(4), object(8)
memory usage: 9.3+ MB


### Feature engineering

Feature engineering is the process of creating new features or modifying existing features in a dataset to improve the performance of machine learning models or to enhance the understanding of the underlying patterns in the data during exploratory data analysis (EDA). It involves selecting, transforming, and creating features that can provide relevant and valuable information to the model.

For this specific dataset, it would make sense to have a variable for the distance between the cities on the flight routes. In order to get this information, we need to do some feature engineering to add a columns wiith the distance. We would also like to group the departure time and arrival time into bins.

In [None]:
# In order to make a distance column, we to find all the unique cities in 'from' and 'to':
print("Unique cities to travel from " + str(list(df['from'].unique())))
print("Unique cities to travel to " + str(list(df['to'].unique())))

Unique cities to travel from ['Delhi', 'Mumbai', 'Bangalore', 'Kolkata', 'Hyderabad', 'Chennai']
Unique cities to travel to ['Mumbai', 'Bangalore', 'Kolkata', 'Hyderabad', 'Chennai', 'Delhi']


In [None]:
# From the above we can see that we have 6 unique cities to travel from and to.
# This means that we overall have 15 different combinations of routes (assuming that you cannot travel to the same place that you took off from).

# The next thing to do is to define a list with distance between the 15 different combinations.
# The distances between the cities has been found by looking them up on the internet.
city_distances = {
    ('Delhi', 'Mumbai'): 1139,
    ('Delhi', 'Bangalore'): 1710,
    ('Delhi', 'Kolkata'): 1313,
    ('Delhi', 'Hyderabad'): 1268,
    ('Delhi', 'Chennai'): 1761,
    ('Mumbai', 'Bangalore'): 835,
    ('Mumbai', 'Kolkata'): 1666,
    ('Mumbai', 'Hyderabad'): 624,
    ('Mumbai', 'Chennai'): 1034,
    ('Bangalore', 'Kolkata'): 1547,
    ('Bangalore', 'Hyderabad'): 455,
    ('Bangalore', 'Chennai'): 268,
    ('Kolkata', 'Hyderabad'): 1208,
    ('Kolkata', 'Chennai'): 1386,
    ('Hyderabad', 'Chennai'): 507
}

# Function for finding distance between 'to' and 'from' columns for each row in our dataframe
def calculate_distance(row):
    city1, city2 = row['from'], row['to'] # Define city1 and city2 based on from and to columns

    distance = city_distances.get((city1, city2),                       # Get the distance between the cities regardless of the order of to and from.
                                  city_distances.get((city2, city1),    # This is done by first trying to get the distance between city1 and city2 combo.
                                                     None))             # If this combo doesn't exist in the city_distances list, we try the city2 and city 1.
                                                                        # If this doesn't exist in the list, the distance should be None (indicating that the route
                                                                        # is missiing in the list).

    return distance

# Create a new column called 'distance' using the above function.
df['distance'] = df.apply(calculate_distance, axis=1)
df.head() # Control

Unnamed: 0,date,airline,ch_code,num_code,dep_time,from,time_taken,stop,arr_time,to,price,duration_minutes,distance
0,11-02-2022,Air India,AI,868,18:00,Delhi,02h 00m,0,20:00,Mumbai,25612,120,1139
1,11-02-2022,Air India,AI,624,19:00,Delhi,02h 15m,0,21:15,Mumbai,25612,135,1139
2,11-02-2022,Air India,AI,531,20:00,Delhi,24h 45m,1,20:45,Mumbai,42220,1485,1139
3,11-02-2022,Air India,AI,839,21:25,Delhi,26h 30m,1,23:55,Mumbai,44450,1590,1139
4,11-02-2022,Air India,AI,544,17:15,Delhi,06h 40m,1,23:55,Mumbai,46690,400,1139


In [None]:
# Finally, we create bins for the departure time and the arrival time.
# We do this is generalize the columns such that we only have a few distics groups and not every possible combination of every hour and every minute of the day.
# By generalising the columns into bins, it will be easier to cluster the data based on the time of the day.
# Likewise, it will also be easier to predict the price based on the time of the day.

# First we need to define the bins that we will use for the time of the day.
# We have decided on the following:
    #Night: 23:00-03:59
    #Early_morning: 04:00-06:59
    #Morning: 07:00-10:59
    #Midday: 11:00-13:59
    #Afternoon: 14:00-16:59
    #Evening: 17:00-19:59
    #Late_evening: 20:00-22:59


# Function for grouping into a bin the time based on the above system
def group_time(time):
    group = None

    if time < '04:00':
        group = 'Night'
    elif time >= '04:00' and time < '07:00':
        group = 'Early_morning'
    elif time >= '07:00' and time < '11:00':
        group = 'Morning'
    elif time >= '11:00' and time < '14:00':
        group = 'Midday'
    elif time >= '14:00' and time < '17:00':
        group = 'Afternoon'
    elif time >= '17:00' and time < '20:00':
        group = 'Evening'
    elif time >= '20:00' and time < '23:00':
        group = 'Late_evening'
    elif time >= '23:00':
        group = 'Night'

    return group

# Apply the function to both 'dep_time' and 'arr_time' columns:
df['dep_time'] = df['dep_time'].apply(group_time)
df['arr_time'] = df['arr_time'].apply(group_time)
df.head() # Control

Unnamed: 0,date,airline,ch_code,num_code,dep_time,from,time_taken,stop,arr_time,to,price,duration_minutes,distance
0,11-02-2022,Air India,AI,868,Evening,Delhi,02h 00m,0,Late_evening,Mumbai,25612,120,1139
1,11-02-2022,Air India,AI,624,Evening,Delhi,02h 15m,0,Late_evening,Mumbai,25612,135,1139
2,11-02-2022,Air India,AI,531,Late_evening,Delhi,24h 45m,1,Late_evening,Mumbai,42220,1485,1139
3,11-02-2022,Air India,AI,839,Late_evening,Delhi,26h 30m,1,Night,Mumbai,44450,1590,1139
4,11-02-2022,Air India,AI,544,Evening,Delhi,06h 40m,1,Night,Mumbai,46690,400,1139


In [None]:
# From the above, we can see that we have quite a few columns in our dataframe.
# Given the content of some of the columns, we will not be using all of them.
# For example the 'ch_code' and 'num_code' are likely not any big indicators of the price
# The 'date' could have played a role in the price determination if we had more data (e.g., we would expect higher flight prices during hollidays
    # or that prices would have an increasing trend over time due to inflation) but given that we only have 50 days of data, it's not enough to
    # spot a seasonality pattern or a trend pattern.

# So we overrwrite our dataframe such that we only keep the columns we're going to use
df = df[['price', 'airline', 'from', 'to', 'dep_time', 'arr_time', 'distance', 'duration_minutes', 'stop']]
df.head()

Unnamed: 0,price,airline,from,to,dep_time,arr_time,distance,duration_minutes,stop
0,25612,Air India,Delhi,Mumbai,Evening,Late_evening,1139,120,0
1,25612,Air India,Delhi,Mumbai,Evening,Late_evening,1139,135,0
2,42220,Air India,Delhi,Mumbai,Late_evening,Late_evening,1139,1485,1
3,44450,Air India,Delhi,Mumbai,Late_evening,Night,1139,1590,1
4,46690,Air India,Delhi,Mumbai,Evening,Night,1139,400,1


### LabelEncoder, OneHotEncoder & StandardScalar

Scaling and encoding are essential preprocessing steps in preparing the data for machine learning models. They serve different purposes but are both crucial for ensuring that the input data is in a suitable format for model training and that the models can effectively learn from the data.

Encoding is the process of converting categorical variables into a numerical format that can be used by machine learning models. Many models require numerical input, so categorical variables (e.g., gender, places, time of the day) need to be transformed. This is done because many machine learning models require numerical inputs. Encoding allows the model to understand and process categorical data, making it possible to train the model effectively.

Common encoding techniques include:

- **LabelEncoding**: Assigns a unique numerical value to each category.

- **OneHotEncoding**: Creates binary or dummy columns for each category and represents the presence (1) or absence (0) of the category in the original feature.

However, when using LabelEncoding, you need to be cautious when applying a machine learning model to the data, as LabelEncodiing implies an order or rank to the categories - even if the original categorical variable doesn't have an inherent order. This assumption might introduce incorrect relationships in the data.

Scaling, also known as normalization, is the process of standardizing the range of independent variables or features of the dataset. This step is crucial for models that are sensitive to the scale of the variables, such as support vector machines, k-nearest neighbors, and neural networks. Scaling ensures that all variables have a similar influence on the model by bringing them to a standard scale. Variables with large scales might dominate the learning process, which could lead to biased models.

Common scaling techniques include:

- **StandardScaler**: Centers the data around 0 by given the data a mean of 0, and scales the data such that it as a standard deviation of 1. This technique assumes that the approximately follows a normal distribution.

- **MinMaxScaler**: Scales the data such that all observations are with a range from 0 to 1, and thus making it suitable for algorithms that require features to be within this range.

In [None]:
# We only have two airlines, so we will make this into a dummy variable using LabelEncoder
le = LabelEncoder()
df['airline'] = le.fit_transform(df['airline'])
df.head() #Control

Unnamed: 0,price,airline,from,to,dep_time,arr_time,distance,duration_minutes,stop
0,25612,0,Delhi,Mumbai,Evening,Late_evening,1139,120,0
1,25612,0,Delhi,Mumbai,Evening,Late_evening,1139,135,0
2,42220,0,Delhi,Mumbai,Late_evening,Late_evening,1139,1485,1
3,44450,0,Delhi,Mumbai,Late_evening,Night,1139,1590,1
4,46690,0,Delhi,Mumbai,Evening,Night,1139,400,1


In [None]:
# We will scale our numerical columns (price, airline, distance, duration_minutes, stop) using the StandardScaler
df_num = df[['price', 'airline', 'distance', 'duration_minutes', 'stop']]
scaler= MinMaxScaler()
scaled_values = scaler.fit_transform(df_num)
df_num=pd.DataFrame(scaled_values,columns=df_num.columns)
df_num.head() # Control

Unnamed: 0,price,airline,distance,duration_minutes,stop
0,0.122552,0.0,0.583389,0.02139,0.0
1,0.122552,0.0,0.583389,0.026738,0.0
2,0.272078,0.0,0.583389,0.508021,0.5
3,0.292155,0.0,0.583389,0.545455,0.5
4,0.312323,0.0,0.583389,0.121212,0.5


In [None]:
# We will encode our categorical columns (to, from, dep_time, arr_time) using OneHotEncoder to avoid any incorrect relationships
df_cat = df[['to', 'from', 'dep_time', 'arr_time']]

# To prevent columns with the same name, we add a suffix to the variables
df_cat['to'] = 'to_' + df_cat['to']
df_cat['from'] = 'from_' + df_cat['from']
df_cat['dep_time'] = 'dep_' + df_cat['dep_time']
df_cat['arr_time'] = 'arr_' + df_cat['arr_time']

# Apply the OneHotEncoder
ohe_X = OneHotEncoder(sparse=False)
X_ohe = ohe_X.fit_transform(df_cat.iloc[:,:])
columns_X_ohe = list(itertools.chain(*ohe_X.categories_)) # Get the column names
columns_X_ohe

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_cat['to'] = 'to_' + df_cat['to']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_cat['from'] = 'from_' + df_cat['from']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_cat['dep_time'] = 'dep_' + df_cat['dep_time']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .

['to_Bangalore',
 'to_Chennai',
 'to_Delhi',
 'to_Hyderabad',
 'to_Kolkata',
 'to_Mumbai',
 'from_Bangalore',
 'from_Chennai',
 'from_Delhi',
 'from_Hyderabad',
 'from_Kolkata',
 'from_Mumbai',
 'dep_Afternoon',
 'dep_Early_morning',
 'dep_Evening',
 'dep_Late_evening',
 'dep_Midday',
 'dep_Morning',
 'dep_Night',
 'arr_Afternoon',
 'arr_Early_morning',
 'arr_Evening',
 'arr_Late_evening',
 'arr_Midday',
 'arr_Morning',
 'arr_Night']

In [None]:
df_cat = pd.DataFrame(X_ohe, columns = columns_X_ohe)

In [None]:
# Combine dataframe using pd.merge
df_combined = pd.merge(df_num, df_cat, left_index=True, right_index=True, how='left')
df_combined.head()

Unnamed: 0,price,airline,distance,duration_minutes,stop,to_Bangalore,to_Chennai,to_Delhi,to_Hyderabad,to_Kolkata,...,dep_Midday,dep_Morning,dep_Night,arr_Afternoon,arr_Early_morning,arr_Evening,arr_Late_evening,arr_Midday,arr_Morning,arr_Night
0,0.122552,0.0,0.583389,0.02139,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
1,0.122552,0.0,0.583389,0.026738,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
2,0.272078,0.0,0.583389,0.508021,0.5,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
3,0.292155,0.0,0.583389,0.545455,0.5,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
4,0.312323,0.0,0.583389,0.121212,0.5,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


## Creating tensors

When working with PyTorch to estimate an ANN, it is essential to follow a structured approach to prepare the data for training and evaluation.
First, the dataset should be split into input features (X) and target variables (y). This division helps the model learn the relationship between the input data and the corresponding outputs.
Subsequently, splitting the dataset into training and testing sets allows you to assess the model's performance on unseen data, ensuring its generalization ability.
Finally, converting the data into PyTorch tensors is crucial as PyTorch operates efficiently with tensors.

In [None]:
# First we split the data into features (X) and target variable (y)
X = df_combined.drop(['price'],axis=1)
y = df_combined['price']

# We split data in train set and test set with the test set using 20% of the data
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2, # test set is 20% of the whole dataset
                                                    random_state=42) # we set a random state to always get the same sample

In [None]:
# Convert the pandas Series to a PyTorch tensor
tensor_data_Y = torch.tensor(y_train.values, dtype=torch.float32)
tensor_data_X = torch.tensor(X_train.values, dtype=torch.float32)
tensor_data_Y_test = torch.tensor(y_test.values, dtype=torch.float32)
tensor_data_X_test = torch.tensor(X_test.values, dtype=torch.float32)

In [None]:
tensor_data_X[0] #First row of X values

tensor([0.0000, 0.5834, 0.2513, 0.5000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.0000, 0.0000, 1.0000,
        0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.0000, 0.0000,
        0.0000, 0.0000, 0.0000])

In [None]:
tensor_data_X.view(-1, len(df_combined.columns)-1).shape

torch.Size([74789, 30])

## Training various ANNs with one hidden layer

Testing different hyperparameters is crucial in the process of training an ANN using PyTorch.
Hyperparameters are settings that are not learned during training but significantly impact the model's performance. Experimenting with various hyperparameter configurations is a way to fine-tune the model and find the optimal combination that maximizes performance for a specific task.

In this assignment the hyperparameters for learning rate, number of epochs during the training process and the activation function are varied to explore how they affect the performance of the model:

- **Epochs**: Epochs represent the number of times the entire dataset is passed through the ANN during training. Increasing epochs may lead to improved model convergence, but excessive epochs can result in overfitting, where the model memorizes the training data instead of learning its underlying patterns.

- **Learning rate**: The learning rate determines the step size in updating the model weights during training. A higher learning rate can speed up convergence, but if it's too high, the model might fail to converge. Conversely, a lower learning rate might lead to slow convergence or getting stuck in local minima.

- **Activation function**: Activation functions introduce non-linearity to the model, enabling it to learn complex patterns. In this notebook, we will look at three different activation functions:

    - **Identity**: It essentially performs a linear transformation, making it suitable for tasks where linearity is desired. However, its usage is limited in capturing intricate patterns.

    - **ReLU**: ReLU introduces non-linearity by outputting the input for positive values and zero for negative values. It helps the model learn complex features and accelerates convergence.

    - **Tahn**: Tanh squashes input values between -1 and 1, aiding in capturing complex relationships.


### Epochs = 3, learning rate = 0.05, activation function = Identity




In [None]:
# Initializing Hyperparameters
epochs = 3
learning_rate = 0.05
# Initializing Parameters
w = 1

loss_set = {}

# 1. Creating a FeedForwardNetwork
# 1.1 Structure (Architecture) of NN
model_net1 = torch.nn.Linear(len(df_combined.columns)-1,1, bias=False)
model_net1_actfun = torch.nn.Identity()
model_net1.weight.data.fill_(w)

# 1.2 Loss Function
loss_mse = torch.nn.MSELoss()

# 1.3 Optmization Approch
optimizer = torch.optim.SGD(model_net1.parameters(), lr=learning_rate)

w_his = []
w_his.append(w)
# Loop over the number of epochs
for epoch in tqdm_notebook(range(epochs), desc="Epochs"):
    epoch_loss = 0.0

    # Loop over each sample in the dataset
    for i in range(tensor_data_X.size(0)):

      # 2. Forward Pass
      output = model_net1_actfun.forward(model_net1.forward(tensor_data_X[i].reshape(-1)))

      # 3. FeedForward Evaluation
      loss = loss_mse(output, tensor_data_Y[i].reshape(-1))
      optimizer.zero_grad();

      # 4. Backward Pass / Gradient Calculation
      loss.backward()

      # Store the loss for each epoch
      epoch_loss += loss.item()

      # 5. Back Propagation / Update Weights
      optimizer.step()

      # Store the weight value for each sample of data
      w_his.append(float(model_net1.weight.data[0][0]))


    # Calculate and display average loss for the epoch
    epoch_loss /= tensor_data_X.size(0)

    # Store the loss for each sample of data
    loss_set[epoch] = epoch_loss
    print(f"\nEpoch {epoch+1} Average Loss: {epoch_loss:.4f}\n{'-'*50}\n")

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for epoch in tqdm_notebook(range(epochs), desc="Epochs"):


Epochs:   0%|          | 0/3 [00:00<?, ?it/s]


Epoch 1 Average Loss: 0.0118
--------------------------------------------------


Epoch 2 Average Loss: 0.0093
--------------------------------------------------


Epoch 3 Average Loss: 0.0093
--------------------------------------------------



### Epochs = 5, learning rate = 0.01, activation function = Identity

In [None]:
# Initializing Hyperparameters
epochs = 5
learning_rate = 0.01
# Initializing Parameters
w = 1

loss_set = {}

# 1. Creating a FeedForwardNetwork
# 1.1 Structure (Architecture) of NN
model_net1 = torch.nn.Linear(len(df_combined.columns)-1,1, bias=False)
model_net1_actfun = torch.nn.Identity()
model_net1.weight.data.fill_(w)

# 1.2 Loss Function
loss_mse = torch.nn.MSELoss()

# 1.3 Optmization Approch
optimizer = torch.optim.SGD(model_net1.parameters(), lr=learning_rate)

w_his = []
w_his.append(w)
# Loop over the number of epochs
for epoch in tqdm_notebook(range(epochs), desc="Epochs"):
    epoch_loss = 0.0

    # Loop over each sample in the dataset
    for i in range(tensor_data_X.size(0)):

      # 2. Forward Pass
      output = model_net1_actfun.forward(model_net1.forward(tensor_data_X[i].reshape(-1)))

      # 3. FeedForward Evaluation
      loss = loss_mse(output, tensor_data_Y[i].reshape(-1))
      optimizer.zero_grad();

      # 4. Backward Pass / Gradient Calculation
      loss.backward()

      # Store the loss for each epoch
      epoch_loss += loss.item()

      # 5. Back Propagation / Update Weights
      optimizer.step()

      # Store the weight value for each sample of data
      w_his.append(float(model_net1.weight.data[0][0]))


    # Calculate and display average loss for the epoch
    epoch_loss /= tensor_data_X.size(0)

    # Store the loss for each sample of data
    loss_set[epoch] = epoch_loss
    print(f"\nEpoch {epoch+1} Average Loss: {epoch_loss:.4f}\n{'-'*50}\n")

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for epoch in tqdm_notebook(range(epochs), desc="Epochs"):


Epochs:   0%|          | 0/5 [00:00<?, ?it/s]


Epoch 1 Average Loss: 0.0169
--------------------------------------------------


Epoch 2 Average Loss: 0.0071
--------------------------------------------------


Epoch 3 Average Loss: 0.0071
--------------------------------------------------


Epoch 4 Average Loss: 0.0071
--------------------------------------------------


Epoch 5 Average Loss: 0.0071
--------------------------------------------------



### Epochs = 3, learning rate = 0.03, activation function = Identity

In [None]:
# Initializing Hyperparameters
epochs = 3
learning_rate = 0.03
# Initializing Parameters
w = 1

loss_set = {}

# 1. Creating a FeedForwardNetwork
# 1.1 Structure (Architecture) of NN
model_net1 = torch.nn.Linear(len(df_combined.columns)-1,1, bias=False)
model_net1_actfun = torch.nn.Identity()
model_net1.weight.data.fill_(w)

# 1.2 Loss Function
loss_mse = torch.nn.MSELoss()

# 1.3 Optmization Approch
optimizer = torch.optim.SGD(model_net1.parameters(), lr=learning_rate)

w_his = []
w_his.append(w)
# Loop over the number of epochs
for epoch in tqdm_notebook(range(epochs), desc="Epochs"):
    epoch_loss = 0.0

    # Loop over each sample in the dataset
    for i in range(tensor_data_X.size(0)):

      # 2. Forward Pass
      output = model_net1_actfun.forward(model_net1.forward(tensor_data_X[i].reshape(-1)))

      # 3. FeedForward Evaluation
      loss = loss_mse(output, tensor_data_Y[i].reshape(-1))
      optimizer.zero_grad();

      # 4. Backward Pass / Gradient Calculation
      loss.backward()

      # Store the loss for each epoch
      epoch_loss += loss.item()

      # 5. Back Propagation / Update Weights
      optimizer.step()

      # Store the weight value for each sample of data
      w_his.append(float(model_net1.weight.data[0][0]))


    # Calculate and display average loss for the epoch
    epoch_loss /= tensor_data_X.size(0)

    # Store the loss for each sample of data
    loss_set[epoch] = epoch_loss
    print(f"\nEpoch {epoch+1} Average Loss: {epoch_loss:.4f}\n{'-'*50}\n")

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for epoch in tqdm_notebook(range(epochs), desc="Epochs"):


Epochs:   0%|          | 0/3 [00:00<?, ?it/s]


Epoch 1 Average Loss: 0.0117
--------------------------------------------------


Epoch 2 Average Loss: 0.0081
--------------------------------------------------


Epoch 3 Average Loss: 0.0081
--------------------------------------------------



### Epochs = 5, learning rate = 0.01, activation function = ReLU

In [None]:
# Initializing Hyperparameters
epochs = 5
learning_rate = 0.01

# Initializing Parameters
w = 1

loss_set = {}

# 1. Creating a FeedForwardNetwork
# 1.1 Structure (Architecture) of NN
model_net1 = torch.nn.Linear(len(df_combined.columns)-1,1, bias=False)
model_net1_actfun = torch.nn.ReLU()
model_net1.weight.data.fill_(w)

# 1.2 Loss Function
loss_mse = torch.nn.MSELoss()

# 1.3 Optmization Approch
optimizer = torch.optim.SGD(model_net1.parameters(), lr=learning_rate)

w_his = []
w_his.append(w)
# Loop over the number of epochs
for epoch in tqdm_notebook(range(epochs), desc="Epochs"):
    epoch_loss = 0.0

    # Loop over each sample in the dataset
    for i in range(tensor_data_X.size(0)):

      # 2. Forward Pass
      output = model_net1_actfun.forward(model_net1.forward(tensor_data_X[i].reshape(-1)))

      # 3. FeedForward Evaluation
      loss = loss_mse(output, tensor_data_Y[i].reshape(-1))
      optimizer.zero_grad();

      # 4. Backward Pass / Gradient Calculation
      loss.backward()

      # Store the loss for each epoch
      epoch_loss += loss.item()

      # 5. Back Propagation / Update Weights
      optimizer.step()

      # Store the weight value for each sample of data
      w_his.append(float(model_net1.weight.data[0][0]))


    # Calculate and display average loss for the epoch
    epoch_loss /= tensor_data_X.size(0)

    # Store the loss for each sample of data
    loss_set[epoch] = epoch_loss
    print(f"\nEpoch {epoch+1} Average Loss: {epoch_loss:.4f}\n{'-'*50}\n")

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for epoch in tqdm_notebook(range(epochs), desc="Epochs"):


Epochs:   0%|          | 0/5 [00:00<?, ?it/s]


Epoch 1 Average Loss: 0.0205
--------------------------------------------------


Epoch 2 Average Loss: 0.0071
--------------------------------------------------


Epoch 3 Average Loss: 0.0071
--------------------------------------------------


Epoch 4 Average Loss: 0.0071
--------------------------------------------------


Epoch 5 Average Loss: 0.0071
--------------------------------------------------



### Epochs = 5, learning rate = 0.01, activation function = Tahn

In [None]:
# Initializing Hyperparameters
epochs = 5
learning_rate = 0.01

# Initializing Parameters
w = 1

loss_set = {}

# 1. Creating a FeedForwardNetwork
# 1.1 Structure (Architecture) of NN
model_net1 = torch.nn.Linear(len(df_combined.columns)-1,1, bias=False)
model_net1_actfun = torch.nn.Tanh()
model_net1.weight.data.fill_(w)

# 1.2 Loss Function
loss_mse = torch.nn.MSELoss()

# 1.3 Optmization Approch
optimizer = torch.optim.SGD(model_net1.parameters(), lr=learning_rate)

w_his = []
w_his.append(w)
# Loop over the number of epochs
for epoch in tqdm_notebook(range(epochs), desc="Epochs"):
    epoch_loss = 0.0

    # Loop over each sample in the dataset
    for i in range(tensor_data_X.size(0)):

      # 2. Forward Pass
      output = model_net1_actfun.forward(model_net1.forward(tensor_data_X[i].reshape(-1)))

      # 3. FeedForward Evaluation
      loss = loss_mse(output, tensor_data_Y[i].reshape(-1))
      optimizer.zero_grad();

      # 4. Backward Pass / Gradient Calculation
      loss.backward()

      # Store the loss for each epoch
      epoch_loss += loss.item()

      # 5. Back Propagation / Update Weights
      optimizer.step()

      # Store the weight value for each sample of data
      w_his.append(float(model_net1.weight.data[0][0]))


    # Calculate and display average loss for the epoch
    epoch_loss /= tensor_data_X.size(0)

    # Store the loss for each sample of data
    loss_set[epoch] = epoch_loss
    print(f"\nEpoch {epoch+1} Average Loss: {epoch_loss:.4f}\n{'-'*50}\n")

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for epoch in tqdm_notebook(range(epochs), desc="Epochs"):


Epochs:   0%|          | 0/5 [00:00<?, ?it/s]


Epoch 1 Average Loss: 0.4171
--------------------------------------------------


Epoch 2 Average Loss: 0.4171
--------------------------------------------------


Epoch 3 Average Loss: 0.4171
--------------------------------------------------


Epoch 4 Average Loss: 0.4171
--------------------------------------------------


Epoch 5 Average Loss: 0.4170
--------------------------------------------------



## Training various ANNs with two hidden layers

Transitioning from one hidden layer to two hidden layers in an AN can offer several advantages. Initially, a single hidden layer may capture relatively simple relationships within the data, but as the complexity of the task increases, introducing additional hidden layers allows the model to learn more intricate patterns.
Two hidden layers provide the network with additional capacity to model non-linear relationships, enabling it to understand more complex patterns and hierarchical dependencies within the data.
However, including more hidden layerrs may result in overfitting. This impact of this can be handled through dropout.

Dropout is a regularization technique that plays a vital role in enhancing the generalization and robustness of the model. Dropout involves randomly deactivating a fraction of neurons during each training iteration, preventing the network from relying too heavily on specific neurons and features. By introducing this stochastic element, dropout helps prevent overfitting. Dropout encourages the network to learn more robust and representative features, improving its ability to generalize to diverse input samples.

As ReLU and Identity activation functions seems to be equally good in the previous sections,they both will be upgraded to contain 2 hidden layers.

### Model 1

Model 1 contains two hidden layers:

- **The first hidden layer**: This consists off a linear transformation with 30 input features and 3 output features. The linear transformation involves weights and biases to map the input data to a 3-dimensional intermediate representation. Afterwards, the ReLU activation function is applied element-wise. ReLU introduces non-linearity by setting negative values to zero, enabling the network to learn complex patterns. Finally,dropout is used to deactivate 33% of the neurons randomly during each training iteration in the first hidden layer.

- **The second hidden layer**: This consists of a linear transformation with 3 input features, corresponding to the output of the first hidden layer, and 1 output feature. Similar to the first hidden layer, the ReLU activation function is applied to introduce non-linearity to the output of the second hidden layer.

In [None]:
# Initializing Hyperparameters
epochs = 3
learning_rate = 0.01


# Initializing Parameters
w = 1

loss_set = {}

# 1. Creating a FeedForwardNetwork
# 1.1 Structure (Architecture) of NN
model_net2 = torch.nn.Sequential(torch.nn.Linear(30,3),
                                 torch.nn.ReLU(),
                                 torch.nn.Dropout(0.33),

                                 torch.nn.Linear(3,1),
                                 torch.nn.ReLU(),
                                 );

# 1.2 Loss Function
loss_mse = torch.nn.MSELoss()

# 1.3 Optmization Approch
optimizer = torch.optim.SGD(model_net2.parameters(), lr=learning_rate)

w_his = []
w_his.append(w)
# Loop over the number of epochs
for epoch in tqdm_notebook(range(epochs), desc="Epochs"):
    epoch_loss = 0.0

    # Loop over each sample in the dataset
    for i in range(tensor_data_X.size(0)):

      # 2. Forward Pass
      output = model_net2.forward(tensor_data_X[i].reshape(-1))

      # 3. FeedForward Evaluation
      loss = loss_mse(output, tensor_data_Y[i].reshape(-1))
      optimizer.zero_grad();

      # 4. Backward Pass / Gradient Calculation
      loss.backward()

      # Store the loss for each epoch
      epoch_loss += loss.item()

      # 5. Back Propagation / Update Weights
      optimizer.step()

      # Store the weight value for each sample of data
      w_his.append(float(model_net1.weight.data[0][0]))


    # Calculate and display average loss for the epoch
    epoch_loss /= tensor_data_X.size(0)

    # Store the loss for each sample of data
    loss_set[epoch] = epoch_loss
    print(f"\nEpoch {epoch+1} Average Loss: {epoch_loss:.4f}\n{'-'*50}\n")

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for epoch in tqdm_notebook(range(epochs), desc="Epochs"):


Epochs:   0%|          | 0/3 [00:00<?, ?it/s]


Epoch 1 Average Loss: 0.1467
--------------------------------------------------


Epoch 2 Average Loss: 0.1467
--------------------------------------------------


Epoch 3 Average Loss: 0.1467
--------------------------------------------------



### Model 2

Model 2 contains two hidden layers:

- **The first hidden layer:** This layer starts with the ReLU activation function, introducing non-linearity to the input data. It is followed by a linear transformation with 30 input features and 3 output features. Dropout is then applied, deactivating 33% of the neurons randomly during each training iteration in the first hidden layer.

- **The second hidden layer:** Following the first hidden layer, the ReLU activation function is again applied to introduce non-linearity to the output of the first hidden layer.  A linear transformation is performed with 3 input features and 1 output feature.


In [None]:
# Initializing Hyperparameters
epochs = 3
learning_rate = 0.01


# Initializing Parameters
w = 1

loss_set = {}

# 1. Creating a FeedForwardNetwork
# 1.1 Structure (Architecture) of NN
model_net2 = torch.nn.Sequential(torch.nn.ReLU(),
                                 torch.nn.Linear(30,3),
                                 torch.nn.Dropout(0.33),

                                 torch.nn.ReLU(),
                                 torch.nn.Linear(3,1),
                                 );

# 1.2 Loss Function
loss_mse = torch.nn.MSELoss()

# 1.3 Optmization Approch
optimizer = torch.optim.SGD(model_net2.parameters(), lr=learning_rate)

w_his = []
w_his.append(w)
# Loop over the number of epochs
for epoch in tqdm_notebook(range(epochs), desc="Epochs"):
    epoch_loss = 0.0

    # Loop over each sample in the dataset
    for i in range(tensor_data_X.size(0)):

      # 2. Forward Pass
      output = model_net2.forward(tensor_data_X[i].reshape(-1))

      # 3. FeedForward Evaluation
      loss = loss_mse(output, tensor_data_Y[i].reshape(-1))
      optimizer.zero_grad();

      # 4. Backward Pass / Gradient Calculation
      loss.backward()

      # Store the loss for each epoch
      epoch_loss += loss.item()

      # 5. Back Propagation / Update Weights
      optimizer.step()

      # Store the weight value for each sample of data
      w_his.append(float(model_net1.weight.data[0][0]))


    # Calculate and display average loss for the epoch
    epoch_loss /= tensor_data_X.size(0)

    # Store the loss for each sample of data
    loss_set[epoch] = epoch_loss
    print(f"\nEpoch {epoch+1} Average Loss: {epoch_loss:.4f}\n{'-'*50}\n")

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for epoch in tqdm_notebook(range(epochs), desc="Epochs"):


Epochs:   0%|          | 0/3 [00:00<?, ?it/s]


Epoch 1 Average Loss: 0.0117
--------------------------------------------------


Epoch 2 Average Loss: 0.0107
--------------------------------------------------


Epoch 3 Average Loss: 0.0106
--------------------------------------------------



### Model 3

Model 3 contains three hidden layers:

- **The first hidden layer**: This layer starts with the ReLU activation function, introducing non-linearity to the input data. It is followed by a linear transformation with 30 input features and 15 output features. Dropout is then applied, deactivating 33% of the neurons.

- **The second hidden layer**: Another ReLU activation function is applied to introduce non-linearity to the output of the first hidden layer.
A linear transformation is performed with 15 input features and 3 output features. Dropout is again applied, deactivating 33% of the neurons in the second hidden layer.

- **The third hidden layer**: ReLU activation function is applied to introduce non-linearity to the output of the second hidden layer.
A linear transformation is performed with 3 input features and 1 output feature.

In [None]:
# Initializing Hyperparameters
epochs = 3
learning_rate = 0.01


# Initializing Parameters
w = 1

loss_set = {}

# 1. Creating a FeedForwardNetwork
# 1.1 Structure (Architecture) of NN
model_net2 = torch.nn.Sequential(torch.nn.ReLU(),
                                 torch.nn.Linear(30,15),
                                 torch.nn.Dropout(0.33),

                                 torch.nn.ReLU(),
                                 torch.nn.Linear(15,3),
                                 torch.nn.Dropout(0.33),

                                 torch.nn.ReLU(),
                                 torch.nn.Linear(3,1),
                                 );

# 1.2 Loss Function
loss_mse = torch.nn.MSELoss()

# 1.3 Optmization Approch
optimizer = torch.optim.SGD(model_net2.parameters(), lr=learning_rate)

w_his = []
w_his.append(w)
# Loop over the number of epochs
for epoch in tqdm_notebook(range(epochs), desc="Epochs"):
    epoch_loss = 0.0

    # Loop over each sample in the dataset
    for i in range(tensor_data_X.size(0)):

      # 2. Forward Pass
      output = model_net2.forward(tensor_data_X[i].reshape(-1))

      # 3. FeedForward Evaluation
      loss = loss_mse(output, tensor_data_Y[i].reshape(-1))
      optimizer.zero_grad();

      # 4. Backward Pass / Gradient Calculation
      loss.backward()

      # Store the loss for each epoch
      epoch_loss += loss.item()

      # 5. Back Propagation / Update Weights
      optimizer.step()

      # Store the weight value for each sample of data
      w_his.append(float(model_net1.weight.data[0][0]))


    # Calculate and display average loss for the epoch
    epoch_loss /= tensor_data_X.size(0)

    # Store the loss for each sample of data
    loss_set[epoch] = epoch_loss
    print(f"\nEpoch {epoch+1} Average Loss: {epoch_loss:.4f}\n{'-'*50}\n")

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for epoch in tqdm_notebook(range(epochs), desc="Epochs"):


Epochs:   0%|          | 0/3 [00:00<?, ?it/s]


Epoch 1 Average Loss: 0.0129
--------------------------------------------------


Epoch 2 Average Loss: 0.0114
--------------------------------------------------


Epoch 3 Average Loss: 0.0100
--------------------------------------------------



### Model 4

Model 4 contains three hidden layers:

- **The first hidden layer**: Linear transformation with 30 input features and 15 output features. It is followed by the ReLU activation function. Dropout is applied, deactivating 33% of the neurons.

- **The second hidden layer**: Another linear transformation occurs with 15 input features and 3 output features. The ReLU activation function is applied. Dropout is again applied, deactivating 33% of the neurons in the second hidden layer.

- **The third hidden layer**: After the second hidden layer, a final linear transformation is performed with 3 input features and 1 output feature. The ReLU activation function is applied to introduce non-linearity to the output of the third hidden layer.

In [None]:
# Initializing Hyperparameters
epochs = 3
learning_rate = 0.01


# Initializing Parameters
w = 1

loss_set = {}

# 1. Creating a FeedForwardNetwork
# 1.1 Structure (Architecture) of NN
model_net2 = torch.nn.Sequential(torch.nn.Linear(30,15),
                                 torch.nn.ReLU(),
                                 torch.nn.Dropout(0.33),

                                 torch.nn.Linear(15,3),
                                 torch.nn.ReLU(),
                                 torch.nn.Dropout(0.33),

                                 torch.nn.Linear(3,1),
                                 torch.nn.ReLU(),

                                 );

# 1.2 Loss Function
loss_mse = torch.nn.MSELoss()

# 1.3 Optmization Approch
optimizer = torch.optim.SGD(model_net2.parameters(), lr=learning_rate)

w_his = []
w_his.append(w)
# Loop over the number of epochs
for epoch in tqdm_notebook(range(epochs), desc="Epochs"):
    epoch_loss = 0.0

    # Loop over each sample in the dataset
    for i in range(tensor_data_X.size(0)):

      # 2. Forward Pass
      output = model_net2.forward(tensor_data_X[i].reshape(-1))

      # 3. FeedForward Evaluation
      loss = loss_mse(output, tensor_data_Y[i].reshape(-1))
      optimizer.zero_grad();

      # 4. Backward Pass / Gradient Calculation
      loss.backward()

      # Store the loss for each epoch
      epoch_loss += loss.item()

      # 5. Back Propagation / Update Weights
      optimizer.step()

      # Store the weight value for each sample of data
      w_his.append(float(model_net1.weight.data[0][0]))


    # Calculate and display average loss for the epoch
    epoch_loss /= tensor_data_X.size(0)

    # Store the loss for each sample of data
    loss_set[epoch] = epoch_loss
    print(f"\nEpoch {epoch+1} Average Loss: {epoch_loss:.4f}\n{'-'*50}\n")

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for epoch in tqdm_notebook(range(epochs), desc="Epochs"):


Epochs:   0%|          | 0/3 [00:00<?, ?it/s]


Epoch 1 Average Loss: 0.0120
--------------------------------------------------


Epoch 2 Average Loss: 0.0093
--------------------------------------------------


Epoch 3 Average Loss: 0.0088
--------------------------------------------------



## The chosen one ⚡️🧙🏼‍♂️

As model 2 is the model that are performing best, we select this for our price predictor.

In [None]:
# Initializing Hyperparameters
epochs = 3
learning_rate = 0.01


# Initializing Parameters
w = 1

loss_set = {}

# 1. Creating a FeedForwardNetwork
# 1.1 Structure (Architecture) of NN
model_net2 = torch.nn.Sequential(torch.nn.ReLU(),
                                 torch.nn.Linear(30,3),
                                 torch.nn.Dropout(0.33),

                                 torch.nn.ReLU(),
                                 torch.nn.Linear(3,1),

                                 );

# 1.2 Loss Function
loss_mse = torch.nn.MSELoss()

# 1.3 Optmization Approch
optimizer = torch.optim.SGD(model_net2.parameters(), lr=learning_rate)

w_his = []
w_his.append(w)
# Loop over the number of epochs
for epoch in tqdm_notebook(range(epochs), desc="Epochs"):
    epoch_loss = 0.0

    # Loop over each sample in the dataset
    for i in range(tensor_data_X.size(0)):

      # 2. Forward Pass
      output = model_net2.forward(tensor_data_X[i].reshape(-1))

      # 3. FeedForward Evaluation
      loss = loss_mse(output, tensor_data_Y[i].reshape(-1))
      optimizer.zero_grad();

      # 4. Backward Pass / Gradient Calculation
      loss.backward()

      # Store the loss for each epoch
      epoch_loss += loss.item()

      # 5. Back Propagation / Update Weights
      optimizer.step()

      # Store the weight value for each sample of data
      w_his.append(float(model_net1.weight.data[0][0]))


    # Calculate and display average loss for the epoch
    epoch_loss /= tensor_data_X.size(0)

    # Store the loss for each sample of data
    loss_set[epoch] = epoch_loss
    print(f"\nEpoch {epoch+1} Average Loss: {epoch_loss:.4f}\n{'-'*50}\n")

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for epoch in tqdm_notebook(range(epochs), desc="Epochs"):


Epochs:   0%|          | 0/3 [00:00<?, ?it/s]


Epoch 1 Average Loss: 0.0120
--------------------------------------------------


Epoch 2 Average Loss: 0.0088
--------------------------------------------------


Epoch 3 Average Loss: 0.0088
--------------------------------------------------



In [None]:
# Print the parameters of all layers
for name, param in model_net2.named_parameters():
    print(f"Layer: {name}")
    print(f"Size: {param.size()}")
    print(f"Values: \n{param.data}\n")

Layer: 1.weight
Size: torch.Size([3, 30])
Values: 
tensor([[ 3.2647e-01,  4.0569e-02, -1.3147e-01,  5.7687e-01, -6.7967e-02,
         -8.1920e-02, -1.1034e-01, -1.4370e-01,  3.0453e-02,  1.3385e-02,
         -1.0031e-01, -7.1685e-02, -1.3050e-01, -1.5267e-01, -3.1148e-02,
          5.6406e-03, -5.0306e-02, -8.8394e-02,  6.1231e-03, -2.8122e-02,
         -8.7559e-02, -1.7570e-02, -2.4234e-02, -1.3775e-01,  6.8906e-03,
         -9.6063e-02, -8.0788e-02, -1.0954e-01, -1.2816e-01, -9.5831e-02],
        [-1.0305e-01,  6.0458e-02, -1.9200e-02, -7.0426e-01,  4.7527e-02,
          5.6821e-02,  1.4406e-01,  1.1310e-01,  3.4582e-02,  1.3624e-01,
         -8.3700e-03, -1.9540e-02,  7.0775e-02,  5.2553e-02, -1.6696e-02,
          7.6332e-02,  4.9231e-02,  5.0069e-02,  5.4231e-02,  5.1178e-02,
          8.4150e-02,  7.0266e-02,  1.8995e-02,  3.8505e-02, -1.8635e-02,
          1.2369e-02, -1.2670e-02,  1.1022e-02,  3.1645e-02,  3.6788e-03],
        [-1.3744e-01, -6.1222e-02,  9.5039e-02,  4.2733e-02

### Performance of the chosen ANN on the test dataset

The calculation of MSE and RMSE

In [None]:
y_true=tensor_data_Y_test.numpy()
y_true

array([0.29214647, 0.3590046 , 0.38429472, ..., 0.4087926 , 0.3775693 ,
       0.2894005 ], dtype=float32)

In [None]:
y_pred=(model_net2(tensor_data_X_test)).detach().numpy()
y_pred

array([[0.36462638],
       [0.36462638],
       [0.3621553 ],
       ...,
       [0.3325886 ],
       [0.36328015],
       [0.39127615]], dtype=float32)

In [None]:
MSE=((y_true-y_pred)**2).mean()
MSE

0.018280586

In [None]:
RMSE=MSE**0.5
RMSE

0.13520571817706017