Q1 - Gradient Descent & Types of Gradient Descent

Gradient Descent is an optimization algorithm used to minimize functions, often used in machine learning and statistics to optimize model parameters. Here's an overview of Gradient Descent and its different types:

Gradient Descent Overview
Gradient Descent is an iterative method for finding the minimum of a function. The key idea is to update the parameters in the direction that reduces the function's value the most rapidly, which is opposite to the gradient direction.

Basic Steps:
Initialize the parameters randomly or using some heuristic.
Compute the Gradient of the function with respect to the parameters.
Update the Parameters by subtracting the gradient scaled by a learning rate.
Repeat the process until convergence (i.e., the changes become negligibly small).
Types of Gradient Descent
Batch Gradient Descent

Description: Uses the entire training dataset to compute the gradient of the cost function.
Pros: Stable and deterministic because it always uses the same dataset.
Cons: Can be very slow for large datasets, as it requires computing the gradient over the whole dataset before each update.
Usage: Suitable for smaller datasets where the cost of computing gradients over the entire dataset is manageable.
Stochastic Gradient Descent (SGD)

Description: Uses a single training example to compute the gradient and update the parameters.
Pros: Much faster for large datasets since updates are made more frequently.
Cons: The updates can be noisy and the path towards convergence can be erratic, making it harder to find the exact minimum.
Usage: Effective for large datasets and online learning, where data is continuously fed into the model.
Mini-Batch Gradient Descent

Description: A compromise between Batch and Stochastic Gradient Descent. Uses a small, randomly chosen subset (mini-batch) of the training dataset to compute the gradient and update the parameters.
Pros: Balances the efficiency of SGD and the stability of Batch Gradient Descent. Can make use of vectorized operations for faster computation.
Cons: Requires tuning of mini-batch size, which can affect performance.
Usage: Widely used in practice due to its efficiency and effectiveness, especially for large datasets.

Q2 - Validation set & Validation Loss

In machine learning, a validation set and validation loss are crucial concepts for evaluating and tuning models during training. Here’s a detailed look at both:

Validation Set
What is a Validation Set?
Definition: A validation set is a subset of the training data that is used to evaluate the performance of a machine learning model during the training process. It is separate from both the training set (used to train the model) and the test set (used to assess final model performance).
Purpose: The validation set helps in tuning model hyperparameters, selecting the best model architecture, and preventing overfitting by providing an unbiased evaluation of the model during training.

Validation Loss
What is Validation Loss?
Definition: Validation loss is the value of the loss function (error measure) computed on the validation set. It indicates how well the model is performing on unseen data that it has not been trained on.
Purpose: Validation loss provides an indication of how well the model is generalizing to new, unseen examples. It helps in assessing the model’s performance and in making decisions about adjustments needed.


Q3 - Create a MLP model step by step as we discussed in class and load tips data from Seaborn Library.


In [None]:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Load tips dataset
tips = sns.load_dataset('tips')

# Display the first few rows of the dataset
print("First few rows of the dataset:")
print(tips.head())

# Convert categorical variables to numerical using one-hot encoding
tips = pd.get_dummies(tips, columns=['sex', 'smoker', 'day', 'time'], drop_first=True)

# Separate features and target variable
X = tips.drop('tip', axis=1)
y = tips['tip']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create a Sequential model
model = Sequential()

# Add input layer and first hidden layer
model.add(Dense(units=64, activation='relu', input_shape=(X_train.shape[1],)))

# Add additional hidden layers
model.add(Dense(units=32, activation='relu'))

# Add output layer
model.add(Dense(units=1, activation='linear'))  # Assuming regression task (predicting 'tip')

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')  # Using mean squared error for regression

# Train the model
history = model.fit(X_train, y_train, epochs=100, validation_split=0.2, batch_size=32, verbose=1)

# Evaluate the model on the test set
y_pred = model.predict(X_test)
print("\nPredictions for the first 5 test samples:")
print(y_pred[:5])

# Calculate and print test loss
test_loss = model.evaluate(X_test, y_test, verbose=0)
print(f"\nTest Loss: {test_loss}")

# Plot training & validation loss values
plt.figure(figsize=(10, 6))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend(loc='upper right')
plt.grid(True)
plt.show()
