# 🔍 Hyperparameter Optimisation: Learning Rate & Batch Size  

## What is Hyperparameter Optimisation?  
Hyperparameters are settings that define how a machine learning model learns. Unlike model parameters, which are learned from data, hyperparameters must be set before training begins. Choosing the right hyperparameters is crucial for achieving good model performance.  

## Learning Rate (lr)  
The **learning rate** determines how much the model updates its weights with each step during training:  

- **Too small**: The model learns very slowly and may get stuck in a suboptimal solution.  
- **Too large**: The model's updates are too drastic, potentially leading to unstable training and failure to converge.  
- **Optimal**: A balance that allows the model to learn efficiently while avoiding instability.  

## Batch Size  
The **batch size** is the number of training examples used in one update (gradient descent step):  

- **Small batch sizes**: More frequent updates with higher variance, which can help models generalise but may slow training.  
- **Large batch sizes**: More stable updates and faster training, but they may lead to poorer generalisation to new data.  


Today we will implement a grid search to find the best combination of learning rate and batch size! 🚀  


Run the cells below to set up the environment we will need for this week. We start by importing some libraries:

In [12]:
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import RegularGridInterpolator

First, lets download the data we will need:

In [None]:
! gdown https://drive.google.com/uc?id=1S3xBmi4BIs0h3tdLBIy5nqq_wIsshX1J -O interpolated_rmse.npz

## 👾 The '_Model_'

This week we will be using a model that I have already pretrained, which will save us some time, but it means that we load it in from a file in a slightly different way to what we are used to - don't worry too much about this as it is probably the only time you will see it in this module!

In [None]:
class NetA:
    def __init__(self):
        
        filename='interpolated_rmse.npz'
        data = np.load(filename)

        # Load interpolated RMSE data
        self.rmse_grid = data['RMSE']
        self.lr_grid = data['lr_grid']
        self.bs_grid = data['bs_grid']
        
        # Create interpolator
        self.interpolator = RegularGridInterpolator((self.lr_grid, self.bs_grid), self.rmse_grid.T, bounds_error=False, fill_value=np.inf)
        
        # Store min/max ranges
        self.lr_min, self.lr_max = np.min(self.lr_grid), np.max(self.lr_grid)
        self.bs_min, self.bs_max = np.min(self.bs_grid), np.max(self.bs_grid)
    
    def train(self, learning_rate, batch_size):
        # Check if within bounds
        if not (self.lr_min <= learning_rate <= self.lr_max and self.bs_min <= batch_size <= self.bs_max):
            return np.inf
        
        # Query interpolator
        return self.interpolator([[learning_rate, batch_size]])[0]

We still need to define the model the same as before through!

In [None]:
model = NetA()

Now, we can train and test the model using a specific `lr` (learning rate) and `batch_size`. This will give us the loss, which we are trying to minimise!

In [None]:
# Test the model
lr = 0.001
batch_size = 32
loss = model.train(lr, batch_size)

print(f'Loss: {loss:.8f}')

## 🗺️ Grid Search!

Rather than guessing randomly at values for `lr` and `batch_size`, try to methodically explore the parameter space. Use the tools available in python to help you!

## Why Use Grid Search for Hyperparameter Optimisation?  
Manually selecting hyperparameters can be inefficient and ineffective. A **grid search** systematically tests different combinations of hyperparameters to find the best configuration.  

### Steps to Conduct a Grid Search:  
1. **Define the hyperparameter ranges**: Choose a set of possible values for learning rate and batch size.  
2. **Train models for each combination**: Evaluate model performance for every (learning rate, batch size) pair.  
3. **Compare results**: Identify which combination achieves the best validation accuracy or lowest loss.  
4. **Select the optimal hyperparameters**: Use the best settings for final model training and evaluation.  


In [None]:
# Your code here...