# Energy Consumption Prediction

In this project, a deep learning model will be developed to predict the amount of energy consumed by a building with solar panel. The deep learning model will use tensorflow/keras framework. A baseline model (*Sequential Model*) and a proposed model (*Functional Model*) would be developed.

<table>
<tr>
<th>Column Name</th>
<th>Description</th>
</tr>
<tr>
<td>Month</td>
<td>The month of the year when the data was recorded.</td>
</tr>
<tr>
<td>Hour</td>
<td>The hour of the day when the data was recorded.</td>
</tr>
<tr>
<td>DayOfWeek</td>
<td>The day of the week when the data was recorded.</td>
</tr>
<tr>
<td>Holiday</td>
<td>Indicates whether the day was a holiday (Yes/No).</td>
</tr>
<tr>
<td>Temperature</td>
<td>The average daily temperature in Celsius.</td>
</tr>
<tr>
<td>Humidity</td>
<td>The average daily humidity level (%).</td>
</tr>
<tr>
<td>SquareFootage</td>
<td>The area of the building being monitored in m<sup>2</sup>.</td>
</tr>
<tr>
<td>Occupancy</td>
<td>The total number of people occupying the building.</td>
</tr>
<tr>
<td>HVACUsage</td>
<td>Indicates whether the HVAC system was in use (On/Off).</td>
</tr>
<tr>
<td>LightingUsage</td>
<td>Indicates whether the lighting system was in use (On/Off).</td>
</tr>
<tr>
<td>RenewableEnergy</td>
<td>The amount of renewable energy generated at the time of data collection. (Kwh)</td>
</tr>
<tr>
<td>EnergyConsumption (the goal)</td>
<td>The amount of energy consumed at the time of data collection. (Kwh)</td>
</tr>
</table>

## Importing the Needed Libraries

### Install Needed Libraries

In [None]:
!pip install optuna
!pip install pyarrow
!pip install pynvml
!pip install tqdm

In [None]:
# Basic python Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
from typing import Dict, Any

# Data prepreocessing Libraries
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder, OneHotEncoder, OrdinalEncoder

# Tensorflow Libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, Input, Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, Embedding, Flatten, Input, Concatenate

# Model Evaluation Libraries
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Hyperparameter tuning Libraries
import optuna

# Library for gpu utilization
import pynvml

# Library for cleaner notebook
from tqdm.notebook import tqdm

## GPU Check

In [None]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

In [None]:
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

# Check if TensorFlow is using the GPU
if tf.test.is_gpu_available():
    print("TensorFlow is using the GPU")

    # Initialize the pynvml library
    pynvml.nvmlInit()

    # Get the number of GPU devices
    num_gpus = pynvml.nvmlDeviceGetCount()

    # Iterate over GPU devices
    for i in range(num_gpus):
        # Get the device identifier
        handle = pynvml.nvmlDeviceGetHandleByIndex(i)
        # Get the full GPU name
        gpu_name = pynvml.nvmlDeviceGetName(handle)
        print("GPU Name:", gpu_name)

    # Shutdown the pynvml library
    pynvml.nvmlShutdown()
else:
    print("TensorFlow is not using the GPU")

## Import the Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
data = pd.read_parquet("/content/drive/MyDrive/Energy_consumption_project/Dataset_01/dataset_1A.parquet")
data.head()

## EDA

### Check the Missing Values and Duplicates

#### The Number of Missing Values

In [None]:
missing_data_count = pd.DataFrame(data.isna().sum())
missing_data_count.columns = ["Number_Of_Data_Missing"]
missing_data_count["Percentage"] = round(missing_data_count["Number_Of_Data_Missing"]/len(data) * 100, 2)
missing_data_count

#### Check for Duplicates

In [None]:
print(data.duplicated().sum())

### Check the Data Types

In [None]:
data.info()

The data is roughly clean so far with some correct data types and no missing values. However the "EnergyConsumption" column has a wrong data type where it should be float64. Therefore, we will change that data type.

In [None]:
# Changing the data type of "EnergyConsumption"
data["EnergyConsumption"] = data["EnergyConsumption"].astype("float64")

In [None]:
# Recheck the data types
data.info()

### Check if there are more missing data after conversion

In [None]:
missing_data_count = pd.DataFrame(data.isna().sum())
missing_data_count.columns = ["Number_Of_Data_Missing"]
missing_data_count["Percentage"] = round(missing_data_count["Number_Of_Data_Missing"]/len(data) * 100, 2)
missing_data_count

In [None]:
# Since there are some missing values in the dataset, we need to handle them.
data = data.dropna(axis=0)

# Recheck the data
missing_data_count = pd.DataFrame(data.isna().sum())
missing_data_count.columns = ["Number_Of_Data_Missing"]
missing_data_count["Percentage"] = round(missing_data_count["Number_Of_Data_Missing"]/len(data) * 100, 2)
missing_data_count

### Check the Range of the "Hours" Column

Let's check the "Hours" to get the range of the time is it a 12 hour format or a 24 hour format

In [None]:
min(data["Hour"]), max(data["Hour"])

Looks like the data is in a 24 hour format so lets change it to a time format of HH:MM

### Plotting Data Distribution

#### Functions for plotting the distribution

In [None]:
#setting the colors generator
def fill_color_generator():
    """This generates a color

    Returns:
        color: An R,G,B value with a range of 0 to 1
    """
    r = random.randint(0, 255)
    g = random.randint(0, 255)
    b = random.randint(0, 255)
    return (r/255, g/255, b/255)

In [None]:
#function for plotting numerical data distribution
def numeric_dist_plot(data: pd.DataFrame):
    """This function creates a plot of the distribution of the numerical data.

    Args:
        data (pd.DataFrame): Numeric pandas dataframe

    Raises:
        TypeError: The following columns are not numeric: {non_numeric_cols}
        This is due to some of the columns are not numeric.

    Returns:
        Displays a plot of
    """
    # Checks
    ## Check if all columns are numeric
    non_numeric_cols = [col for col in data.columns if not pd.api.types.is_numeric_dtype(data[col])]
    if non_numeric_cols:
        raise TypeError(f"The following columns are not numeric: {non_numeric_cols}")

    # Plotting the numerical data
    #titles for plots/figures
    fig_titles = []

    for colName in data.columns:
        fig_titles.append(f"Boxplot Of {colName}")
        fig_titles.append(f"Histogram Of {colName}")

    fill_color_dict = {}
    for colName in data.columns:
        fill_color_dict[colName] = fill_color_generator()

    # make subplot for each column name
    num_rows = len(data.columns)
    fig, axes = plt.subplots(nrows=num_rows, ncols=2, figsize=(12, num_rows * 3))

    # Flatten axes for easier indexing when there's more than 1 row
    axes = axes if num_rows > 1 else [axes]

    for i, column in enumerate(data.columns):
        color = fill_color_dict[column]

        # Boxplot
        axes[i][0].boxplot(data[column].dropna(), vert=False, patch_artist=True,
                        boxprops=dict(facecolor=color, color=color),
                        medianprops=dict(color="black"))
        axes[i][0].set_title(f"Boxplot of {column}")
        axes[i][0].set_xlabel(column)

        # Histogram
        axes[i][1].hist(data[column].dropna(), bins=20, color=color, alpha=0.7, edgecolor='black')
        axes[i][1].set_title(f"Histogram of {column}")
        axes[i][1].set_xlabel(column)

    # Overall layout
    fig.suptitle("Boxplot and Distribution Visualization for Each Numeric Column", fontsize=16)
    fig.tight_layout(rect=[0, 0, 1, 0.97])  # Adjust layout to fit title
    return plt

In [None]:
#function for plotting categorical data distribution
def categoric_dist_plot(data: pd.DataFrame):
    """_summary_

    Args:
        data (pd.DataFrame): Categorical pandas dataframe

    Raises:
        TypeError: The following columns are not categorical: {non_numeric_cols}
        This is due to some of the columns are not categorical.

    Returns:
        plt: Plot of the categorical data distribution
    """
    # Checks
    ## Check if all columns are numeric
    non_categoric_cols = [col for col in data.columns if pd.api.types.is_numeric_dtype(data[col])]
    if non_categoric_cols:
        raise TypeError(f"The following columns are not categoric: {non_categoric_cols}")

    # Create subplots: one row for each categorical column
    num_rows = len(data.columns)
    fig, axes = plt.subplots(nrows=num_rows, ncols=1, figsize=(10, num_rows * 3), sharex=False)

    # Flatten axes for easier handling if there's more than 1 row
    axes = axes if num_rows > 1 else [axes]

    # Plot each categorical distribution
    for i, col in enumerate(data.columns):
        counts = data[col].value_counts(dropna=False)  # Get the count values
        counts.index = counts.index.astype(str) #convert the categorical values to strings since there are numerical categories

        #Create the bar plot
        axes[i].bar(counts.index, counts)

        # Set title and labels
        axes[i].set_title(f"Distribution of {col}", fontsize=12)
        axes[i].set_ylabel("Count")
        axes[i].set_xlabel("Category")

        # Rotate x-axis labels for better readability
        axes[i].tick_params(axis='x', rotation=90)

    # Add an overall title and adjust layout
    fig.suptitle("Bar Plots for Categorical Columns", fontsize=16, y=1.02)
    fig.tight_layout(h_pad=2.0)  # Adjust spacing between rows
    return plt

#### Splitting Data to Categorical and Numeric

In [None]:
numeric_data = data.select_dtypes(include=[np.number])
categorical_data = data.select_dtypes(exclude=[np.number])

#### Numerical Data Plot

In [None]:
numeric_data.head()

In [None]:
numeric_dist_plot(numeric_data)

#### Categorical Data Plot

In [None]:
categorical_data.head()

In [None]:
categoric_dist_plot(categorical_data)

### Key Insights
#### Problems With the Data
1. **Inconcistent Data Values**:
   - The months in the data is inconsistent. There are 3 types of month in the data (e.g. Numerical, partial month name, and full month name)
2. **Wierd Data Range**:
   - There seems to be a negative number of Occupancy as seen in the graph, and that is a outlier since there is nothing such as a negative number of people.
3. **Skewness**:
   - There are some skewness in the month data where there are mostly data from january. Which may impact model's performance/
   - The rest of the data are mostly uniform except for EnergyConsumption where it has a normal distribution.

#### Small Fixes
4. **Data Types**:
- Changing the data type of "EnergyConsumption" from `object` to `float64`
- Removed Missing values from the data

#### Next Steps
- Further Clean the data due to inconsistencies

## Data Cleaning

- Standardizing the "month" data
- Cleaning the negative number of occupancy

### Standardizing the "month" data

#### String to month number

In [None]:
def str_to_month(String: str) -> str:
    """This function converts a string to a month type. The string can be in the format of "Jan", "Feb", etc. or "January", "February", etc.
    It will return the month number as a string. If the string is not in the correct format, it will return NaT.

    Args:
        String (str): The string to be converted to a month number.

    Raises:
        TypeError: 'The following value is not a string: {String}'
        This is due to the fact that the input is not a string.

    Returns:
        str: The month number as a string or NaT if the conversion failed.
    """
    # Check
    ## Check if the text is a string
    if not isinstance(String, str):
        raise TypeError(f"The following value is not a string: {String}")

    try:
        # If the string is already in a number format, convert it to a month number
        if String.isdecimal():
            # Check if the string is a number between 1 and 12 (january to december)
            num = int(String)
            if 1 <= num <= 12:
                dt = pd.to_datetime(num, format = '%m')
                return f"{dt.month:01d}"
            else:
                return pd.NaT

        # Try full month name
        try:
            dt = pd.to_datetime(String, format = '%B')  # e.g., 'January'
        except ValueError:
            dt = pd.to_datetime(String, format = '%b')  # e.g., 'Jan'

        return f"{dt.month:01d}"

    except Exception:
        #Return NA if the string conversion failed
        return pd.NaT

#### Apply the month to number function to the dataset

In [None]:
data["Month"] = data["Month"].apply(str_to_month)

### Cleaning the data with negative occupancy

#### Check if the data is present

In [None]:
data[data["Occupancy"] < 0]

There is an occupancy with -5 person which is impossible since there is no such thing as negative number of persons. So removing the data would be beneficial for a cleaner data fror the deep learning model.

In [None]:
data = data.drop(data[data["Occupancy"] < 0].index).reset_index(drop=True)

### Rechecking the distribution of the data

In [None]:
categorical_data = data.select_dtypes(exclude=[np.number])
numeric_data = data.select_dtypes(include=[np.number])

In [None]:
numeric_dist_plot(numeric_data)

In [None]:
categoric_dist_plot(categorical_data)

### Summary Of Data Cleaning
#### Key Insights:
1. **Inconsistent Data Values**:
    - The "Month" column had inconsistent formats (e.g., numeric, full names, abbreviations), which were standardized.
2. **Outliers**:
    - Negative values in the "Occupancy" column were identified and removed.

3. **Skewness**:
    - The "EnergyConsumption" column has a normal distribution, while other numerical columns show varying degrees of skewness.

4. **Categorical Data**:
    - Some categorical columns, such as "Holiday" and "HVACUsage," have imbalanced distributions, which may impact model performance.

5. **Data Cleaning**:
    - After cleaning, the dataset is now consistent and ready for preprocessing and modeling.

#### Next Steps:
- Scale numerical data to ensure uniformity.
- Encode categorical data for compatibility with machine learning models.
- Split the data into training, validation, and testing sets for model development.

**At this point**:
- The dataset is now clean and ready for further preprocessing or modeling.
- The handling of missing values and standardization of date formats ensures consistency.

## Data Propressing

### Spitting data to train, test, and validation

#### Split data to the predictor and outcome

In [None]:
x_data = data.drop(columns=["EnergyConsumption"])
y_data = data["EnergyConsumption"]

In [None]:
# Check the columns
x_data.columns

In [None]:
y_data.head()

####  Split to train and test

In [None]:
train_x, test_x, train_y, test_y = train_test_split(x_data, y_data, test_size=0.2, random_state=42)
train_x.shape, test_x.shape, train_y.shape, test_y.shape

#### Split to train and validation

In [None]:
train_x, val_x, train_y, val_y = train_test_split(train_x, train_y, test_size=0.2, random_state=42)
train_x.shape, val_x.shape, train_y.shape, val_y.shape

So after splitting the data we will obtain the following:
<table>
<tr>
<th>Data</th>
<th>Size</th>
</tr>
<tr>
<td>Train</td>
<td>788</td>
</tr>
<tr>
<td>Validation</td>
<td>197</td>
</tr>
<tr>
<td>Test</td>
<td>247</td>
</tr>

### Scaling the Data

#### Scaling the numerical data

All the numerical features will use min max scaling since there are no outliers, and the data is almost uniformly distributed.

In [None]:
hoursScaler = MinMaxScaler()
tempScaler = MinMaxScaler()
humidScaler = MinMaxScaler()
squareScaler = MinMaxScaler()
occupancyScaler = MinMaxScaler()
renewableScaler = MinMaxScaler()
energyScaler = MinMaxScaler()

##### Apply to train data

In [None]:
# The x features
train_x["Hour"] = hoursScaler.fit_transform(train_x[["Hour"]])
train_x["Temperature"] = tempScaler.fit_transform(train_x[["Temperature"]])
train_x["Humidity"] = humidScaler.fit_transform(train_x[["Humidity"]])
train_x["SquareFootage"] = squareScaler.fit_transform(train_x[["SquareFootage"]])
train_x["Occupancy"] = occupancyScaler.fit_transform(train_x[["Occupancy"]])
train_x["RenewableEnergy"] = renewableScaler.fit_transform(train_x[["RenewableEnergy"]])

# The y value
train_y = energyScaler.fit_transform(train_y.values.reshape(-1, 1))

##### Apply to validation data

In [None]:
# The x features
val_x["Hour"] = hoursScaler.transform(val_x[["Hour"]])
val_x["Temperature"] = tempScaler.transform(val_x[["Temperature"]])
val_x["Humidity"] = humidScaler.transform(val_x[["Humidity"]])
val_x["SquareFootage"] = squareScaler.transform(val_x[["SquareFootage"]])
val_x["Occupancy"] = occupancyScaler.transform(val_x[["Occupancy"]])
val_x["RenewableEnergy"] = renewableScaler.transform(val_x[["RenewableEnergy"]])

# The y value
val_y = energyScaler.transform(val_y.values.reshape(-1, 1))

##### Apply to test data

In [None]:
# The x features
test_x["Hour"] = hoursScaler.transform(test_x[["Hour"]])
test_x["Temperature"] = tempScaler.transform(test_x[["Temperature"]])
test_x["Humidity"] = humidScaler.transform(test_x[["Humidity"]])
test_x["SquareFootage"] = squareScaler.transform(test_x[["SquareFootage"]])
test_x["Occupancy"] = occupancyScaler.transform(test_x[["Occupancy"]])
test_x["RenewableEnergy"] = renewableScaler.transform(test_x[["RenewableEnergy"]])

# The y value
test_y = energyScaler.transform(test_y.values.reshape(-1, 1))

### Encoding the Categorical Data

Binary Columns: `LightingUsage`, `HVACUsage`, `Holiday`

Nominal Columns: `DayOfWeek`

Ordinal Columns: `Month`

In [None]:
# The Values in the columns
for col in categorical_data.columns:
    print(f"{col}: {categorical_data[col].unique()} \n")

In [None]:
binary_cols_1 = ["Holiday"]
binary_cols_2 = ["HVACUsage", "LightingUsage"] # Use Label enconder (On or Off)
nominal_cols = ["DayOfWeek"] # Use OneHotEncoder
ordinal_cols = ["Month"] # Use OrdinalEncoder

#### Encoder Setup

In [None]:
bin_enc_1 = LabelEncoder()
bin_enc_2 = LabelEncoder()
ohe = OneHotEncoder(sparse_output=False).set_output(transform="pandas")
ordinal_enc = OrdinalEncoder(categories=[["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"]], handle_unknown="use_encoded_value", unknown_value=-1).set_output(transform="pandas")

#### Transforming the Train Data

##### Binary Data

In [None]:
bin_data = pd.concat([train_x[binary_cols_1].apply(bin_enc_1.fit_transform), train_x[binary_cols_2].apply(bin_enc_2.fit_transform)], axis=1)
bin_data.head()

##### Nominal Data

In [None]:
nominal_data = ohe.fit_transform(train_x[nominal_cols])
nominal_data.head()

##### Ordinal Data

In [None]:
ordinal_data = ordinal_enc.fit_transform(train_x[ordinal_cols])
ordinal_data.head()

##### Reunite the dataset

In [None]:
train_x = pd.concat([train_x.drop(columns=binary_cols_1 + binary_cols_2 + nominal_cols + ordinal_cols), bin_data, nominal_data, ordinal_data], axis=1)
train_x.head()

#### Transforming the Validation Data

##### Binary Data

In [None]:
bin_data = pd.concat([val_x[binary_cols_1].apply(bin_enc_1.transform), val_x[binary_cols_2].apply(bin_enc_2.transform)], axis=1)
bin_data.head()

##### Nominal Data

In [None]:
nominal_data = ohe.transform(val_x[nominal_cols])
nominal_data.head()

##### Ordinal Data

In [None]:
ordinal_data = ordinal_enc.transform(val_x[ordinal_cols])
ordinal_data.head()

##### Reunite the dataset

In [None]:
val_x = pd.concat([val_x.drop(columns=binary_cols_1 + binary_cols_2 + nominal_cols + ordinal_cols), bin_data, nominal_data, ordinal_data], axis=1)
val_x.head()

#### Transforming the Test Data

##### Binary Data

In [None]:
bin_data = pd.concat([test_x[binary_cols_1].apply(bin_enc_1.transform), test_x[binary_cols_2].apply(bin_enc_2.transform)], axis=1)
bin_data.head()

##### Nominal Data

In [None]:
nominal_data = ohe.transform(test_x[nominal_cols])
nominal_data.head()

##### Ordinal Data

In [None]:
ordinal_data = ordinal_enc.transform(test_x[ordinal_cols])
ordinal_data.head()

##### Reunite the dataset

In [None]:
test_x = pd.concat([test_x.drop(columns=binary_cols_1 + binary_cols_2 + nominal_cols + ordinal_cols), bin_data, nominal_data, ordinal_data], axis=1)
test_x.head()

In [None]:
test_x.shape

## Model making

### Baseline Model

We will develop a Sequential Model And Functional Model

Requirements:
1. Relu Activation
2. The minimum Number of neurons is 2x the input data dimension

#### Sequential_model

##### Train the model

In [None]:
seq_model = Sequential()
seq_model.add(Dense(64, input_dim=train_x.shape[1], activation='relu'))
seq_model.add(Dense(64, activation='relu'))
seq_model.add(Dense(1, activation='relu')) #one numeric output column
seq_model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mse'])
seq_model.fit(train_x, train_y, epochs=10, batch_size=32, validation_data=(val_x, val_y), verbose=1)

In [None]:
seq_model.summary()

##### Test the model

In [None]:
seq_pred = seq_model.predict(test_x)
print(f"R2 Score: \t{r2_score(test_y, seq_pred)}")
print(f"MSE: \t \t{mean_squared_error(test_y, seq_pred)}")
print(f"MAE: \t \t{mean_absolute_error(test_y, seq_pred)}")

The model will undergo hyperparameter tuning to further optimize the model.

#### Functional Model

##### Train the Model

In [None]:
# Input layer
inputs = Input(shape=(train_x.shape[1],))

# Making hidden layers
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)

# Output layer
output = Dense(1, activation='relu')(x)  # one numeric output column

#Compile the model
func_model = Model(inputs = inputs, outputs = output)
func_model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mse'])
func_model.fit(train_x, train_y, epochs=10, batch_size=32, validation_data=(val_x, val_y), verbose=1)

In [None]:
func_model.summary()


##### Test the Model

In [None]:
func_pred = func_model.predict(test_x)
print(f"R2 Score: \t{r2_score(test_y, func_pred)}")
print(f"MSE: \t \t{mean_squared_error(test_y, func_pred)}")
print(f"MAE: \t \t{mean_absolute_error(test_y, func_pred)}")

#### Compare both model performance

In [None]:
comparison = pd.DataFrame({"Metrics": ["R2 Score", "MSE", "MAE"],
                          "Sequential Model": [r2_score(test_y, seq_pred), mean_squared_error(test_y, seq_pred), mean_absolute_error(test_y, seq_pred)],
                          "Functional Model": [r2_score(test_y, func_pred), mean_squared_error(test_y, func_pred), mean_absolute_error(test_y, func_pred)]})
comparison.set_index("Metrics", inplace=True)
comparison

There is no difference in results between the sequential and functional model. Since there are no difference in the model architecture however different model making.

### Proposed Model

We will develop a Sequential Model And Functional Model with hyperparameter tuning using optuna.

Requirements:
1. Relu Activation
2. The minimum Number of neurons is 2x the input data dimension

#### Sequential Model

We will use optuna for gridsearch

##### Neural Architecture Search Function

In [None]:
def seq_objective(trial):
    """This function is used to optimize the hyperparameters of the sequential model using Optuna.
    It takes a trial object as input and returns the R2 score of the model on the validation set.

    Args:
        trial (_Optuna trial object_): Optuna trial object

    Returns:
        R2_score: R2 score of the model on the validation set
    """
    # Sequential model
    # Suggest hyperparameters
    ## Input Layer
    num_layers = trial.suggest_int('n_layers', 1, 10)
    model = Sequential()
    model.add(Dense(trial.suggest_int('input_l_n_neuron', 17, 256),
                    activation=trial.suggest_categorical('input_l_activation', ['relu', 'linear', 'elu', 'gelu']),
                    input_dim=train_x.shape[1]
                    ))

    ## Hidden Layer
    for i in range(num_layers):
        n_neurons = trial.suggest_int(f'l{i}_n_neuron', 16, 256)
        activation = trial.suggest_categorical(f'l{i}_activation', ['relu', 'linear', 'elu', 'gelu'])
        model.add(Dense(n_neurons, activation=activation))

    ## dropout layer
    if trial.suggest_categorical('dropout', [True, False]):
        model.add(Dropout(trial.suggest_float('dropout_rate', 0.1, 0.5)))

    ## Output Layer
    activation = trial.suggest_categorical(f'output_l_activation', ['relu', 'linear', 'elu', 'gelu'])
    model.add(Dense(1, activation=activation))  # output layer

    ## Choose optimizer
    optimizer_name = trial.suggest_categorical('optimizer', ['adam', 'rmsprop', 'sgd'])
    model.compile(
        optimizer=optimizer_name,
        loss='mean_squared_error',
        metrics=['mse']
    )

    # Train the model
    model.fit(train_x, train_y,
              validation_data=(val_x, val_y),
              epochs=10,
              batch_size= trial.suggest_categorical('batch_size', [32, 64, 128]),
              verbose=0)

    # Evaluate
    model_pred = model.predict(val_x)
    r2_score_val = r2_score(val_y, model_pred)
    return r2_score_val

##### Start the search

In [None]:
seq_study = optuna.create_study(direction="maximize") # Maximize the R2 score
seq_study.optimize(seq_objective, n_trials=5000, show_progress_bar=True) #since we are using gpu we can try more combinations due to the faster processing time

print("Number of finished trials: ", len(seq_study.trials))
print("Best seq_trial:")

seq_trial = seq_study.best_trial

print("\tValue: ", seq_trial.value)
print("\tParams: ")

for key, value in seq_trial.params.items():
    print(f"\t\t{key}: {value}")

#### Functional Model

We will use optuna for gridsearch

##### Neural Architechture Search Function

In [None]:
def func_objective(trial):
    """This function is used to optimize the hyperparameters of the functional model using Optuna.
    It takes a trial object as input and returns the R2 score of the model on the validation set.

    Args:
        trial (_Optuna trial object_): Optuna trial object

    Returns:
        R2_score: R2 score of the model on the validation set
    """
    # Fucntional model
    # Suggest hyperparameters
    ## Input Layer
    num_layers = trial.suggest_int('n_layers', 1, 10)
    input = Input(shape=(train_x.shape[1],))

    ## Hidden Layer
    x = Dense(trial.suggest_int(f'input_l_n_neuron', 16, 256),
              activation = trial.suggest_categorical('input_l_activation', ['relu', 'linear', 'elu', 'gelu']))(input)

    for i in range(num_layers):
        n_neurons = trial.suggest_int(f'l{i}_n_neuron', 16, 256)
        activation = trial.suggest_categorical(f'l{i}_activation', ['relu', 'linear', 'elu', 'gelu'])
        x = Dense(n_neurons, activation=activation)(x)

    ## dropout layer
    if trial.suggest_categorical('dropout', [True, False]):
        x = Dropout(trial.suggest_float('dropout_rate', 0.1, 0.5))(x)

    ## Output Layer
    activation = trial.suggest_categorical(f'output_l_activation', ['relu', 'linear', 'elu', 'gelu'])
    output = Dense(1, activation = activation)(x)  # output layer

    ## Choose optimizer
    optimizer_name = trial.suggest_categorical('optimizer', ['adam', 'rmsprop', 'sgd'])
    model = Model(inputs=input, outputs=output)
    # Compile the model
    model.compile(
        optimizer=optimizer_name,
        loss='mean_squared_error',
        metrics=['mse']
    )

    # Train the model
    model.fit(train_x, train_y,
              validation_data=(val_x, val_y),
              epochs=10,
              batch_size=trial.suggest_categorical('batch_size', [32, 64, 128]),
              verbose=0)

    # Evaluate
    model_pred = model.predict(val_x)
    r2_score_val = r2_score(val_y, model_pred)
    return r2_score_val

##### Start the search

In [None]:
func_study = optuna.create_study(direction="maximize") # Maximize the R2 score
func_study.optimize(func_objective, n_trials=5000, show_progress_bar=True) #since we are using gpu we can try more combinations due to the faster processing time

print("Number of finished trials: ", len(func_study.trials))
print("Best trial:")

func_trial = func_study.best_trial

print("\tValue: ", func_trial.value)
print("\tParams: ")

for key, value in func_trial.params.items():
    print(f"\t\t{key}: {value}")

#### Model builder function

##### Sequential model builder

In [None]:
def seq_model_builder(best_params: dict, input_dim: int):
    """This function builds a sequential model based on the best parameters found by Optuna.
       It takes the best parameters and the input dimension as input and returns the model.

    Args:
        best_params (dict): Dictionary of the best parameters found by Optuna
        input_dim (int): Input dimension of the model
        The input dimension should be greater than 0.

    Raises:
        TypeError: 'best_params should be a dictionary' try best_params.params
        TypeError: 'input_dim should be an integer'
        ValueError: 'input_dim invalid. should be greater than 0'
        negative dimmentions are not allowed

    Returns:
        model: The sequential model from the best parameters
    """
    # Type Check
    if not isinstance(best_params, dict):
        raise TypeError("best_params should be a dictionary")
    if not isinstance(input_dim, int):
        raise TypeError("input_dim should be an integer")
    if input_dim <= 0:
        raise ValueError("input_dim invalid. should be greater than 0")

    # Input Layer
    num_layers = best_params["n_layers"]
    model = Sequential()
    model.add(Dense(best_params['input_l_n_neuron'],
                    activation=best_params['input_l_activation'],
                    input_dim=input_dim
                    ))

    # Hidden Layer
    for i in range(num_layers):
        n_neurons = best_params[f'l{i}_n_neuron']
        activation = best_params[f'l{i}_activation']
        model.add(Dense(n_neurons, activation=activation))

    # dropout layer
    if best_params['dropout']:
        model.add(Dropout(best_params['dropout_rate']))

    # Output Layer
    activation = best_params['output_l_activation']
    model.add(Dense(1, activation=activation))  # output layer

    # Choose optimizer
    optimizer_name = best_params['optimizer']
    model.compile(
        optimizer=optimizer_name,
        loss='mean_squared_error',
        metrics=['mse']
    )

    return model

##### Functional model builder

In [None]:
def func_model_builder(best_params: dict, input_dim: int):
    """This function builds a functional model based on the best parameters found by Optuna.
       It takes the best parameters and the input dimension as input and returns the model.

    Args:
        best_params (dict): Dictionary of the best parameters found by Optuna
        input_dim (int): Input dimension of the model
        The input dimension should be greater than 0.

    Raises:
        TypeError: 'best_params should be a dictionary' try best_params.params
        TypeError: 'input_dim should be an integer'
        ValueError: 'input_dim invalid. should be greater than 0'
        negative dimmentions are not allowed

    Returns:
        model: The functional model from the best parameters
    """

    # Type Check
    if not isinstance(best_params, dict):
        raise TypeError("best_params should be a dictionary")
    if not isinstance(input_dim, int):
        raise TypeError("input_dim should be an integer")
    if input_dim <= 0:
        raise ValueError("input_dim invalid. should be greater than 0")

    ## Input Layer
    num_layers = best_params["n_layers"]
    input = Input(shape=(input_dim,))

    ## Hidden Layer
    x = Dense(best_params['input_l_n_neuron'],
              activation = best_params['input_l_activation'])(input)

    for i in range(num_layers):
        n_neurons = best_params[f'l{i}_n_neuron']
        activation = best_params[f'l{i}_activation']
        x = Dense(n_neurons, activation=activation)(x)

    ## dropout layer
    if best_params['dropout']:
        x = Dropout(best_params['dropout_rate'])(x)

    ## Output Layer
    activation = best_params['output_l_activation']
    output = Dense(1, activation = activation)(x)  # output layer

    ## Choose optimizer
    optimizer_name = best_params['optimizer']
    model = Model(inputs=input, outputs=output)
    # Compile the model
    model.compile(
        optimizer=optimizer_name,
        loss='mean_squared_error',
        metrics=['mse']
    )


    return model

#### Test the models

##### Sequential Model

In [None]:
seq_model = seq_model_builder(seq_trial.params, train_x.shape[1])
seq_model.fit(train_x, train_y, epochs=10, batch_size=32, validation_data=(val_x, val_y), verbose=1)

##### Fucntional Model

In [None]:
func_model = func_model_builder(func_trial.params, train_x.shape[1])
func_model.fit(train_x, train_y, epochs=10, batch_size=32, validation_data=(val_x, val_y), verbose=1)

#### Evaluate and Compare Both Models

##### R<sup>2</sup> Score, MSE, MAE

In [None]:
seq_pred = seq_model.predict(test_x)
func_pred = func_model.predict(test_x)

To evaluate the performance of the models, both a Sequential and a Functional neural network architecture were trained and finetuned using the same dataset. The evaluation metrics used were R² Score, Mean Squared Error (MSE), and Mean Absolute Error (MAE). The results are summarized in the table below:

In [None]:
NAS_comparison = pd.DataFrame({"Metrics": ["R2 Score", "MSE", "MAE"],
                          "Sequential Model": [r2_score(test_y, seq_pred), mean_squared_error(test_y, seq_pred), mean_absolute_error(test_y, seq_pred)],
                          "Functional Model": [r2_score(test_y, func_pred), mean_squared_error(test_y, func_pred), mean_absolute_error(test_y, func_pred)]})
NAS_comparison.set_index("Metrics", inplace=True)
NAS_comparison

In [None]:
comparison = comparison.rename(columns={"Sequential Model": "Sequential Model (Before NAS)", "Functional Model": "Functional Model (Before NAS)"})
NAS_comparison = NAS_comparison.rename(columns={"Sequential Model": "Sequential Model (After NAS)", "Functional Model": "Functional Model (After NAS)"})
pd.concat([comparison, NAS_comparison], axis=1)

##### Model Size

In [None]:
pd.DataFrame(func_model.summary())

In [None]:
pd.DataFrame(seq_model.summary())

## Summary

### Comparison before and after NAS
<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <td>Metrics</td>
      <td>Sequential Model <b style="color:red">(Before NAS)</b></td>
      <td>Functional Model <b style="color:red">(Before NAS)</b></td>
      <td>Sequential Model <b style="color:green">(After NAS)</b></td>
      <td>Functional Model <b style="color:green">(After NAS)</b></td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>R2 Score</th>
      <td>0.233671</td>
      <td>0.211576</td>
      <td>0.168453</td>
      <td>0.195303</td>
    </tr>
    <tr>
      <th>MSE</th>
      <td>0.031701</td>
      <td>0.032615</td>
      <td>0.034399</td>
      <td>0.033288</td>
    </tr>
    <tr>
      <th>MAE</th>
      <td>0.141460</td>
      <td>0.143669</td>
      <td>0.152042</td>
      <td>0.139835</td>
    </tr>
  </tbody>
</table>
</div>

After applying NAS, the Sequential model’s performance slightly declined across all metrics, especially in R² and MAE. In contrast, the Functional model showed a slight improvement in MAE and R², suggesting it benefited more from NAS optimization. However, the changes were relatively small, highlighting the consistency of both models.

### Overall Model Performance
The results show that both models achieved very **similar performance** across all three metrics. The Sequential model is **slightly** outperformed by the Functional model in terms of **R² Score and MSE**, indicating a marginally better fit and lower error variance. However, the Functional model had a slightly lower MAE, suggesting it made more accurate predictions on average.
<table>
<tr>
<th>Aspect</th>
<th>Sequential Model (After NAS)</th>
<th>Functional Model</th>
</tr>
<tr>
<td>Fit</td>
<td>Slightly <b>better</b></td>
<td>Slightly <b>worse</b></td>
</tr>
<tr>
<td>Error Variance</td>
<td>Slightly <b>lower</b></td>
<td>Slightly <b>higher</b></td>
</tr>
<tr>
<td>Prediction Accuracy</td>
<td>Slightly <b>less</b></td>
<td>Slightly <b>more</b></td>
</tr>
</table>

### Model Architecture
While the performance differences are minimal, the Functional model remains advantageous by having a smaller model with 4 layers(including input and output) and less number of parameters. a comparison summary can be seen below:

<table>
<tr>
<th>Specification</th>
<th>Sequential Model</th>
<th>Functional Model</th>
</tr>
<tr>
<td>Number of layers</td>
<td>13</td>
<td>4</td>
</tr>
<tr>
<td>Number of hidden layers</td>
<td>11</td>
<td>2</td>
</tr>
<tr>
<td>Number of parameters</td>
<td>213,682</td>
<td>6,015</td>
</tr>
</table>