Workshop 5 Answered - Enyia Esther

# Tasks

### **Theoretical Task**

### T5.1 – Explain why it is necessary to use activation functions in neural networks. Back up your discussion with mathematical proof (5%).

### **The Necessity of Activation Functions in Neural Networks**  

###  Introduction  
Activation functions are critical components in neural networks that introduce non-linear transformations, enabling the model to learn complex patterns from data. Without them, even a deep neural network would reduce to a series of linear operations, equivalent to a single-layer linear model, severely limiting its expressive power.


Why Activation Functions Are Necessary in Neural Networks

### 1. The Core Problem Without Activation Functions

Neural networks without activation functions would just perform linear computations, no matter how many layers they have. This means they could only solve simple linear problems, failing at complex tasks like image recognition or language processing.

### 2. Mathematical Proof of the Limitation

Let's look at a simple 2-layer network:

Inputs: x1, x2

Hidden layer computations (no activation):

h1 = w1*x1 + w3*x2 + b1

h2 = w2*x1 + w4*x2 + b2

### Final output:

output = w5*h1 + w6*h2 + b3

If we substitute the hidden layer values:

output = w5*(w1*x1 + w3*x2 + b1) + w6*(w2*x1 + w4*x2 + b2) + b3

This simplifies to:

output = (w5*w1 + w6*w2)*x1 + (w5*w3 + w6*w4)*x2 + (w5*b1 + w6*b2 + b3)

Which is just:

output = A*x1 + B*x2 + C

This shows the entire network reduces to a simple linear equation, no matter how many layers we add.

### 3. How Activation Functions Fix This

When we add an activation function like sigmoid:

final_output = 1 / (1 + exp(-output))

The exp() function introduces non-linearity, allowing the network to:

- Learn complex patterns
- Solve non-linear problems
- Model real-world data effectively

### 4. Why This Matters

- Without activation functions:
  * Deep networks = linear regression
  * Can't solve complex problems
  * Multiple layers are useless
- With activation functions:
  * Can learn hierarchical features
  * Can approximate any function (Universal Approximation Theorem)
  * Enables modern AI applications

### 5. Practical Implications

  Activation functions are what allow neural networks to:
- Recognize faces in photos
- Understand human speech
- Translate between languages
- Make complex predictions


## P 5.1: Download the CarSharing dataset from Canvas. Train a deep neural network to predict the 'demand' column. Tune the network hyperparameters to find the best set of hyperparameters that produce the most accurate results. Evaluate the model using a five-fold cross-validation method and calculate all regression evaluation metrics (15%).

In [None]:
############# WRITE YOUR CODE IN THIS CELL (IF APPLICABLE)  ####################
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split, KFold
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score



In [None]:
from google.colab import drive

data = pd.read_csv('CarSharing.csv')
data

Unnamed: 0,id,timestamp,season,holiday,workingday,weather,temp,temp_feel,humidity,windspeed,demand
0,1,2017-01-01 00:00:00,spring,No,No,Clear or partly cloudy,9.84,14.395,81.0,0.0000,2.772589
1,2,2017-01-01 01:00:00,spring,No,No,Clear or partly cloudy,9.02,13.635,80.0,0.0000,3.688879
2,3,2017-01-01 02:00:00,spring,No,No,Clear or partly cloudy,9.02,13.635,80.0,0.0000,3.465736
3,4,2017-01-01 03:00:00,spring,No,No,Clear or partly cloudy,9.84,14.395,75.0,0.0000,2.564949
4,5,2017-01-01 04:00:00,spring,No,No,Clear or partly cloudy,9.84,14.395,75.0,0.0000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...
8703,8704,2018-08-05 00:00:00,fall,No,No,Clear or partly cloudy,30.34,34.850,70.0,19.0012,5.030438
8704,8705,2018-08-05 01:00:00,fall,No,No,Clear or partly cloudy,30.34,34.850,70.0,16.9979,4.465908
8705,8706,2018-08-05 02:00:00,fall,No,No,Clear or partly cloudy,30.34,34.850,70.0,19.9995,4.290459
8706,8707,2018-08-05 03:00:00,fall,No,No,Clear or partly cloudy,29.52,34.850,74.0,16.9979,3.713572


In [None]:
# Check for missing values
data.isna().sum()

Unnamed: 0,0
id,0
timestamp,0
season,0
holiday,0
workingday,0
weather,0
temp,1202
temp_feel,102
humidity,39
windspeed,200


In [None]:
# Fill missing values
data.fillna(0, inplace=True)

print(data.head())

   id            timestamp  season holiday workingday                 weather  \
0   1  2017-01-01 00:00:00  spring      No         No  Clear or partly cloudy   
1   2  2017-01-01 01:00:00  spring      No         No  Clear or partly cloudy   
2   3  2017-01-01 02:00:00  spring      No         No  Clear or partly cloudy   
3   4  2017-01-01 03:00:00  spring      No         No  Clear or partly cloudy   
4   5  2017-01-01 04:00:00  spring      No         No  Clear or partly cloudy   

   temp  temp_feel  humidity  windspeed    demand  
0  9.84     14.395      81.0        0.0  2.772589  
1  9.02     13.635      80.0        0.0  3.688879  
2  9.02     13.635      80.0        0.0  3.465736  
3  9.84     14.395      75.0        0.0  2.564949  
4  9.84     14.395      75.0        0.0  0.000000  


In [None]:
# Drop missing values
data.dropna(inplace=True)
data.isna().sum()

Unnamed: 0,0
id,0
timestamp,0
season,0
holiday,0
workingday,0
weather,0
temp,0
temp_feel,0
humidity,0
windspeed,0


In [None]:
# Classify Columns
cat_cols = ["season", "holiday", "workingday", "weather"]
num_cols = ["temp", "temp_feel", "humidity", "windspeed"]

In [None]:
# Define features and target
X = data[categorical_cols + numerical_cols]
y = data['demand']

In [None]:
# Normalize Columns
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numerical_cols),
        ('cat', OneHotEncoder(), categorical_cols)
    ])

In [None]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Transform features
X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)
X_processed = preprocessor.fit_transform(X)


In [None]:
# Neural Network Builder Function
def build_model(input_shape, learning_rate=0.001, layers=[128, 64, 32], dropout_rate=0.2):
    model = keras.Sequential()
    model.add(keras.layers.InputLayer(input_shape=(input_shape,)))

    for units in layers:
        model.add(keras.layers.Dense(units, activation="relu"))
        model.add(keras.layers.Dropout(dropout_rate))

    model.add(keras.layers.Dense(1, activation="linear"))  # Output layer for regression
    model.compile(optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
                  loss="mse",
                  metrics=["mae"])


    return model

In [None]:
# Hyperparameter grid
param_grid = {
    'learning_rate': 0.001,
    'neurons': [128, 64, 32],
    'dropout_rate': 0.3,
    "epochs": 100,
    'batch_size': 64
}

In [None]:
# Perform 5-fold cross-validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)
mae_scores, rmse_scores, r2_scores = [], [], []

for train_index, val_index in kf.split(X_train):
    X_train_fold, X_val_fold = X_train[train_index], X_train[val_index]
    y_train_fold, y_val_fold = y_train.iloc[train_index], y_train.iloc[val_index]

    # Build and train the model
    model = build_model(X_train.shape[1], param_grid["learning_rate"], param_grid["neurons"], param_grid["dropout_rate"])
    model.fit(X_train_fold, y_train_fold, epochs=param_grid["epochs"], batch_size=param_grid["batch_size"], verbose=0)

    y_pred = model.predict(X_val_fold).flatten()
    mae_scores.append(mean_absolute_error(y_val_fold, y_pred))
    rmse_scores.append(np.sqrt(mean_squared_error(y_val_fold, y_pred)))
    r2_scores.append(r2_score(y_val_fold, y_pred))

# Final metrics
print(f"Average MAE: {np.mean(mae_scores):.4f}")
print(f"Average RMSE: {np.mean(rmse_scores):.4f}")
print(f"Average R²: {np.mean(r2_scores):.4f}")



In [None]:
from tensorflow.keras.callbacks import EarlyStopping
kf = KFold(n_splits=5, shuffle=True, random_state=42)
mae_scores, rmse_scores, r2_scores = [], [], []

for train_idx, val_idx in kf.split(X_processed):
    X_train_fold, X_val_fold = X_processed[train_idx], X_processed[val_idx]
    y_train_fold, y_val_fold = y.iloc[train_idx], y.iloc[val_idx]

    model = create_model(X_processed.shape[1], param_grid["learning_rate"],
                         param_grid["neurons"], param_grid["dropout_rate"])

    early_stop = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

    model.fit(X_train_fold, y_train_fold,
              validation_data=(X_val_fold, y_val_fold),
              epochs=param_grid["epochs"],
              batch_size=param_grid["batch_size"],
              callbacks=[early_stop],
              verbose=0)

    y_pred = model.predict(X_val_fold).flatten()
    mae_scores.append(mean_absolute_error(y_val_fold, y_pred))
    rmse_scores.append(np.sqrt(mean_squared_error(y_val_fold, y_pred)))
    r2_scores.append(r2_score(y_val_fold, y_pred))

# Final metrics
print(f"Average MAE: {np.mean(mae_scores):.4f}")
print(f"Average RMSE: {np.mean(rmse_scores):.4f}")
print(f"Average R²: {np.mean(r2_scores):.4f}")



[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step




[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step




[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step




[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step




[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
Average MAE: 0.3560
Average RMSE: 0.4682
Average R²: 0.9015
