## Regularization

- Penalizes memorization of training examples
- Penalizes the complexity of the solution and forces solutions to be smooth
- Helps the model generalize to new data (effect on training accuracy ?)
- Changes the representation of learning (feature space) 
- Works better with larger models with multiple hidden layers, can be counterproductive with smaller models 
- Works better with sufficient data

### Three kinds of regularization 

- Node regularization 
- Loss regularization 
- Data regularization

#### Node regularization: Dropouts 

- In essence we force the activation of some nodes at random to zero 
- We use a fixed probability of dropout per node (not collectivelly per layer)
- No dropout happens during testing
- Makes the model less reliant on individual nodes i.e. forces a distributed represenatation across nodes
- Prevents a single node from learning too much 
- Works better with deeper networks



In [None]:
# import libraries
import torch
import torch.nn as nn
import torch.nn.functional as F

import numpy as np
import matplotlib.pyplot as plt
import matplotlib_inline.backend_inline

matplotlib_inline.backend_inline.set_matplotlib_formats("svg")

from sklearn.model_selection import train_test_split
from torch.utils.data import DataLoader


In [None]:
# import dataset
import seaborn as sns
iris = sns.load_dataset("iris")


# convert from pandas dataframe to tensor
data = torch.tensor( iris[iris.columns[0:4]].values ).float()

# transform species to number
labels = torch.zeros(len(data), dtype=torch.long)
# labels[iris.species=='setosa'] = 0 # don't need!
labels[iris.species=='versicolor'] = 1
labels[iris.species=='virginica'] = 2

# Separate the data into DataLoaders

In [None]:
# use scikitlearn to split the data
train_data,test_data, train_labels,test_labels = train_test_split(data, labels, test_size=.2)


# then convert them into PyTorch Datasets (note: already converted to tensors)
train_data = torch.utils.data.TensorDataset(train_data,train_labels)
test_data  = torch.utils.data.TensorDataset(test_data,test_labels)


# finally, translate into dataloader objects
batchsize    = 16
train_loader = DataLoader(train_data,batch_size=batchsize,shuffle=True)
test_loader  = DataLoader(test_data,batch_size=test_data.tensors[0].shape[0])

# Create the model and a training regimen

In [None]:
# A new way to create the model (OOP style)
class OurCustomModelClass(nn.Module):
  def __init__(self,dropoutRate):
    super().__init__()

    # layers
    self.input  = nn.Linear( 4,12)
    self.hidden = nn.Linear(12,12)
    self.output = nn.Linear(12, 3)

    # define a dropout rate parameter
    self.dr = dropoutRate

  # forward pass
  def forward(self,x):

    # input
    x = F.relu( self.input(x) ) #This is different from what you've seen before, we are using F.relu instead of ReLU()
    x = F.dropout(x,p=self.dr,training=self.training) # switch dropout off during .eval()
    # 

    # hidden
    x = F.relu( self.hidden(x) )
    x = F.dropout(x,p=self.dr,training=self.training)

    # output
    x = self.output(x)
    return x


# The equivalent code using the nn.Sequential class
# model = nn.Sequential(
#     nn.Linear(4, 12),
#     nn.ReLU(),
#     nn.Dropout(p=dropoutRate),
#     nn.Linear(12, 12),
#     nn.ReLU(), 
#     nn.Dropout(p=dropoutRate),
#     nn.Linear(12, 3)
# )

In [None]:
def createANewModel(dropoutrate):

  # grab an instance of the model class
  ANNiris = OurCustomModelClass(dropoutrate)

  # loss function
  lossfun = nn.CrossEntropyLoss()

  # optimizer
  optimizer = torch.optim.SGD(ANNiris.parameters(),lr=.005)

  return ANNiris,lossfun,optimizer

In [None]:
# train the model

# global parameter
numepochs = 500

def trainTheModel():

  # initialize accuracies as empties (not storing losses here)
  trainAcc = []
  testAcc  = []

  # loop over epochs
  for epochi in range(numepochs):

    # switch learning on
    ANNiris.train()

    # loop over training data batches
    batchAcc = []
    for X,y in train_loader:

      # forward pass and loss
      yHat = ANNiris(X)
      loss = lossfun(yHat,y)

      # backprop
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

      # compute training accuracy just for this batch
      batchAcc.append( 100*torch.mean((torch.argmax(yHat,axis=1) == y).float()).item() )
    # end of batch loop...

    # now that we've trained through the batches, get their average training accuracy
    trainAcc.append( np.mean(batchAcc) )

    # test accuracy
    ANNiris.eval() # Very important, otherwise we evaluate on incomplete model
    X,y = next(iter(test_loader)) # extract X,y from test dataloader
    predlabels = torch.argmax( ANNiris(X),axis=1 )
    testAcc.append( 100*torch.mean((predlabels == y).float()).item() )

  # function output
  return trainAcc,testAcc


In [None]:
# create a model
dropoutrate = .0
ANNiris,lossfun,optimizer = createANewModel(dropoutrate)

# train the model
trainAcc,testAcc = trainTheModel()

In [None]:
# plot the results
fig = plt.figure(figsize=(10,5))

# Colors
bg_color = '#24273a'
text_color = '#cad3f5'
train_color = '#8aadf4'  # Blue
test_color = '#f5a97f'   # Peach

fig.patch.set_facecolor(bg_color)
ax = plt.gca()
ax.set_facecolor(bg_color)

plt.plot(trainAcc, 'o-', color=train_color, linewidth=2, markersize=6, label='Train')
plt.plot(testAcc, 'o-', color=test_color, linewidth=2, markersize=6, label='Test')

plt.xlabel('Epochs', color=text_color, fontsize=12)
plt.ylabel('Accuracy (%)', color=text_color, fontsize=12)
plt.title('Dropout rate = %g'%dropoutrate, color=text_color, fontsize=14, fontweight='bold')

# Style the axes
ax.tick_params(colors=text_color)
ax.spines['bottom'].set_color(text_color)
ax.spines['top'].set_color(text_color)
ax.spines['right'].set_color(text_color)
ax.spines['left'].set_color(text_color)

# Style the legend
legend = plt.legend(facecolor=bg_color, edgecolor=text_color)
legend.get_frame().set_alpha(0.8)
for text in legend.get_texts():
    text.set_color(text_color)

plt.tight_layout()
plt.show()

In [None]:
# run an experiment

dropoutRates = np.arange(10)/10
results = np.zeros((len(dropoutRates),2))

for di in range(len(dropoutRates)):

  # create and train the model
  ANNiris,lossfun,optimizer = createANewModel(dropoutRates[di])
  trainAcc,testAcc = trainTheModel()

  # store accuracies
  results[di,0] = np.mean(trainAcc[-50:])
  results[di,1] = np.mean(testAcc[-50:])


In [None]:
# plot the experiment results
fig,ax = plt.subplots(1,2,figsize=(15,5))

# Catppuccin Macchiato color scheme
bg_color = '#24273a'
text_color = '#cad3f5'
train_color = '#8aadf4'  # Blue
test_color = '#f5a97f'   # Peach
grid_color = '#5b6078'   # Surface2

# Set figure and subplot backgrounds
fig.patch.set_facecolor(bg_color)
for axis in ax:
    axis.set_facecolor(bg_color)

# Left subplot: Dropout rates vs accuracy
ax[0].plot(dropoutRates, results[:,0], 'o-', color=train_color, linewidth=2, markersize=6, label='Train')
ax[0].plot(dropoutRates, results[:,1], 'o-', color=test_color, linewidth=2, markersize=6, label='Test')
ax[0].set_xlabel('Dropout proportion', color=text_color, fontsize=12)
ax[0].set_ylabel('Average accuracy', color=text_color, fontsize=12)

# Style the left subplot
ax[0].tick_params(colors=text_color)
for spine in ax[0].spines.values():
    spine.set_color(text_color)
legend1 = ax[0].legend(facecolor=bg_color, edgecolor=text_color)
legend1.get_frame().set_alpha(0.8)
for text in legend1.get_texts():
    text.set_color(text_color)

# Right subplot: Train-test difference
ax[1].plot(dropoutRates, -np.diff(results,axis=1), 'o-', color='#a6da95', linewidth=2, markersize=6)  # Green
ax[1].plot([0,.9], [0,0], '--', color=grid_color, linewidth=1.5, alpha=0.8)
ax[1].set_xlabel('Dropout proportion', color=text_color, fontsize=12)
ax[1].set_ylabel('Train-test difference (acc%)', color=text_color, fontsize=12)

# Style the right subplot
ax[1].tick_params(colors=text_color)
for spine in ax[1].spines.values():
    spine.set_color(text_color)

plt.tight_layout()
plt.show()

What happens if you increase the complexity of this model, for example by adding several additional (and wider) hidden layers?

### Loss regularization: L1/L2 regularization

- The general idea to add a penalty term to the cost function 

# $$\text{Cost}_{\text{regularized}} = \text{Cost}_{\text{original}}(W,b) + \lambda \cdot \text{RegularizationTerm}$$

- The regularization strength λ (lambda) controls the trade-off between fitting the training data well and keeping the weights small. Higher λ values lead to stronger regularization and simpler models.

There are two common types of penalties

#### L2 regulaization - weight decay - ridge regulization 

# $$\text{Cost}_{L2} = \text{Cost}_{\text{original}}(W,b) + \lambda \sum_{i=1}^{n} w_i^2$$

- L2 regularization adds a penalty term that is proportional to the sum of the squared weights. 
- This encourages the model to keep weights small (why ? & who cares ?)
- The L2 penalty term grows quadratically with the weight values, so it tends to shrink all weights uniformly toward zero without making them exactly zero.

#### L1 regularization - lasso regularization

# $$\text{Cost}_{L1} = \text{Cost}_{\text{original}}(W,b) + \lambda \sum_{i=1}^{n} |w_i|$$
 
- L1 regularization adds a penalty term that is proportional to the sum of the absolute values of weights.
- This encourages the model to keep weights small and can lead to sparse models (some weights become exactly zero).
- The L1 penalty term grows linearly with the weight values, so it tends to drive some weights to exactly zero, effectively performing feature selection.

 


In [None]:
def createANewModel(dropoutrate, L2lambda):

  # grab an instance of the model class
  ANNiris = OurCustomModelClass(dropoutrate)

  # loss function
  lossfun = nn.CrossEntropyLoss()

  # optimizer
  optimizer = torch.optim.SGD(ANNiris.parameters(),lr=.005, weight_decay=L2lambda)

  return ANNiris,lossfun,optimizer

In [None]:
# train the model

# global parameter
numepochs = 1000

def trainTheModel():

  # initialize accuracies as empties
  trainAcc = []
  testAcc  = []
  losses   = []

  # loop over epochs
  for epochi in range(numepochs):

    # need to toggle train mode here??

    # loop over training data batches
    batchAcc  = []
    batchLoss = []
    for X,y in train_loader:

      # forward pass and loss
      yHat = ANNiris(X)
      loss = lossfun(yHat,y)

      # backprop
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

      # compute training accuracy just for this batch
      batchAcc.append( 100*torch.mean((torch.argmax(yHat,axis=1) == y).float()).item() )
      batchLoss.append( loss.item() )
    # end of batch loop...

    # now that we've trained through the batches, get their average training accuracy
    trainAcc.append( np.mean(batchAcc) )
    losses.append( np.mean(batchLoss) )

    # test accuracy
    ANNiris.eval()
    X,y = next(iter(test_loader)) # extract X,y from test dataloader
    predlabels = torch.argmax( ANNiris(X),axis=1 )
    testAcc.append( 100*torch.mean((predlabels == y).float()).item() )

    # no worries, reset here ;)
    ANNiris.train()

  # function output
  return trainAcc,testAcc,losses


In [None]:
# create a model
dropoutrate = .0
L2lambda = .01
ANNiris,lossfun,optimizer = createANewModel(dropoutrate, L2lambda)

# train the model
trainAcc,testAcc,losses = trainTheModel()


In [None]:
# plot the results
fig,ax = plt.subplots(1,2,figsize=(15,5))

# Catppuccin Macchiato color scheme
bg_color = '#24273a'
text_color = '#cad3f5'
loss_color = '#f4dbd6'    # Rosewater
train_color = '#8aadf4'   # Blue
test_color = '#f5a97f'    # Peach

# Set figure and subplot backgrounds
fig.patch.set_facecolor(bg_color)
for axis in ax:
    axis.set_facecolor(bg_color)

# Left subplot: Loss
ax[0].plot(losses, '^-', color=loss_color, linewidth=2, markersize=6)
ax[0].set_ylabel('Loss', color=text_color, fontsize=12)
ax[0].set_xlabel('Epochs', color=text_color, fontsize=12)
ax[0].set_title('Losses with L2 $\lambda$=' + str(L2lambda), color=text_color, fontsize=14, fontweight='bold')

# Style the left subplot
ax[0].tick_params(colors=text_color)
for spine in ax[0].spines.values():
    spine.set_color(text_color)

# Right subplot: Accuracy
ax[1].plot(trainAcc, 'o-', color=train_color, linewidth=2, markersize=6, label='Train')
ax[1].plot(testAcc, 'o-', color=test_color, linewidth=2, markersize=6, label='Test')
ax[1].set_title('Accuracy with L2 $\lambda$=' + str(L2lambda), color=text_color, fontsize=14, fontweight='bold')
ax[1].set_xlabel('Epochs', color=text_color, fontsize=12)
ax[1].set_ylabel('Accuracy (%)', color=text_color, fontsize=12)

# Style the right subplot
ax[1].tick_params(colors=text_color)
for spine in ax[1].spines.values():
    spine.set_color(text_color)
legend = ax[1].legend(facecolor=bg_color, edgecolor=text_color)
legend.get_frame().set_alpha(0.8)
for text in legend.get_texts():
    text.set_color(text_color)

plt.tight_layout()
plt.show()

In [None]:
# create a 1D smoothing filter
def smooth(x,k):
  return np.convolve(x,np.ones(k)/k,mode='same')

## Parameteric experiment to test a range of L2 regularization terms 

In [None]:
# range of L2 regularization amounts
l2lambdas = np.linspace(0,.1,10)

# initialize output results matrices
accuracyResultsTrain = np.zeros((numepochs,len(l2lambdas)))
accuracyResultsTest  = np.zeros((numepochs,len(l2lambdas)))


# loop over batch sizes
for li in range(len(l2lambdas)):

  # create and train a model
  ANNiris,lossfun,optimizer = createANewModel(dropoutrate=0,  L2lambda=l2lambdas[li])
  trainAcc,testAcc,losses = trainTheModel()

  # store data
  accuracyResultsTrain[:,li] = smooth(trainAcc,10)
  accuracyResultsTest[:,li]  = smooth(testAcc,10)

In [None]:
# plot some results
fig,ax = plt.subplots(1,2,figsize=(17,7))

# Catppuccin Macchiato color scheme
bg_color = '#24273a'
text_color = '#cad3f5'
grid_color = '#5b6078'  # Surface2
# Generate a range of colors from the Catppuccin palette
colors = ['#8aadf4', '#f5a97f', '#a6da95', '#eed49f', '#f4dbd6', '#c6a0f6', '#ed8796', '#91d7e3', '#7dc4e4', '#8bd5ca']

# Set figure and subplot backgrounds
fig.patch.set_facecolor(bg_color)
for axis in ax:
    axis.set_facecolor(bg_color)

# Plot with different colors for each L2 lambda value
for i in range(accuracyResultsTrain.shape[1]):
    ax[0].plot(accuracyResultsTrain[:,i], color=colors[i % len(colors)], linewidth=2)
    ax[1].plot(accuracyResultsTest[:,i], color=colors[i % len(colors)], linewidth=2)

ax[0].set_title('Train accuracy', color=text_color, fontsize=14, fontweight='bold')
ax[1].set_title('Test accuracy', color=text_color, fontsize=14, fontweight='bold')

# make the legend easier to read
leglabels = [np.round(i,2) for i in l2lambdas]

# common features
for i in range(2):
    # Style the legend
    legend = ax[i].legend(leglabels, facecolor=bg_color, edgecolor=text_color)
    legend.get_frame().set_alpha(0.8)
    for text in legend.get_texts():
        text.set_color(text_color)
    
    # Style labels and axes
    ax[i].set_xlabel('Epoch', color=text_color, fontsize=12)
    ax[i].set_ylabel('Accuracy (%)', color=text_color, fontsize=12)
    ax[i].set_ylim([50,101])
    
    # Style grid and ticks
    ax[i].grid(True, color=grid_color, alpha=0.3, linestyle='-', linewidth=0.5)
    ax[i].tick_params(colors=text_color)
    
    # Style spines
    for spine in ax[i].spines.values():
        spine.set_color(text_color)

plt.tight_layout()
plt.show()

In [None]:
# show average accuracy by L2 rate

# average only some epochs
epoch_range = [500,950]

# Catppuccin Macchiato color scheme
bg_color = '#24273a'
text_color = '#cad3f5'
train_color = '#8aadf4'  # Blue
test_color = '#f5a97f'   # Peach

# Set figure background
fig = plt.figure(figsize=(10,6))
fig.patch.set_facecolor(bg_color)
ax = plt.gca()
ax.set_facecolor(bg_color)

plt.plot(l2lambdas,
         np.mean(accuracyResultsTrain[epoch_range[0]:epoch_range[1],:],axis=0),
         'o-', color=train_color, linewidth=2, markersize=8, label='TRAIN')

plt.plot(l2lambdas,
         np.mean(accuracyResultsTest[epoch_range[0]:epoch_range[1],:],axis=0),
         'o-', color=test_color, linewidth=2, markersize=8, label='TEST')

plt.xlabel('L2 regularization amount', color=text_color, fontsize=12)
plt.ylabel('Accuracy', color=text_color, fontsize=12)

# Style the axes
ax.tick_params(colors=text_color)
for spine in ax.spines.values():
    spine.set_color(text_color)

# Style the legend
legend = plt.legend(facecolor=bg_color, edgecolor=text_color)
legend.get_frame().set_alpha(0.8)
for text in legend.get_texts():
    text.set_color(text_color)

plt.tight_layout()
plt.show()

### L1 Regularization

In [None]:
# a function that creates the ANN model

def createANewModel():

  # model architecture
  ANNiris = nn.Sequential(
      nn.Linear(4,64),   # input layer
      nn.ReLU(),         # activation unit
      nn.Linear(64,64),  # hidden layer
      nn.ReLU(),         # activation unit
      nn.Linear(64,3),   # output units
        )

  # loss function
  lossfun = nn.CrossEntropyLoss()

  # optimizer
  optimizer = torch.optim.SGD(ANNiris.parameters(),lr=.005)

  return ANNiris,lossfun,optimizer

In [None]:
# train the model

# global parameter
numepochs = 1000

def trainTheModel(L1lambda):

  # initialize accuracies as empties
  trainAcc = []
  testAcc  = []
  losses   = []

  # count the total number of weights in the model
  nweights = 0
  for pname,weight in ANNiris.named_parameters():
    if 'bias' not in pname:
      nweights = nweights + weight.numel()


  # loop over epochs
  for epochi in range(numepochs):

    # loop over training data batches
    batchAcc  = []
    batchLoss = []
    for X,y in train_loader:

      # forward pass and loss
      yHat = ANNiris(X)
      loss = lossfun(yHat,y)



      ### add L1 term
      L1_term = torch.tensor(0.,requires_grad=True)

      # sum up all abs(weights)
      for pname,weight in ANNiris.named_parameters():
        if 'bias' not in pname:
           L1_term = L1_term + torch.sum(torch.abs(weight))

      # add to loss term
      loss = loss + L1lambda*L1_term/nweights



      # backprop
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

      # compute training accuracy just for this batch
      batchAcc.append( 100*torch.mean((torch.argmax(yHat,axis=1) == y).float()).item() )
      batchLoss.append( loss.item() )
    # end of batch loop...

    # now that we've trained through the batches, get their average training accuracy
    trainAcc.append( np.mean(batchAcc) )
    losses.append( np.mean(batchLoss) )

    # test accuracy
    X,y = next(iter(test_loader)) # extract X,y from test dataloader
    predlabels = torch.argmax( ANNiris(X),axis=1 )
    testAcc.append( 100*torch.mean((predlabels == y).float()).item() )

  # function output
  return trainAcc,testAcc,losses


In [None]:
# create a model
ANNiris,lossfun,optimizer = createANewModel()

# train the model
L1lambda = .001
trainAcc,testAcc,losses = trainTheModel(L1lambda)

In [None]:
# plot the results
fig,ax = plt.subplots(1,2,figsize=(15,5))

# Catppuccin Macchiato color scheme
bg_color = '#24273a'
text_color = '#cad3f5'
loss_color = '#f4dbd6'    # Rosewater
train_color = '#8aadf4'   # Blue
test_color = '#f5a97f'    # Peach

# Set figure and subplot backgrounds
fig.patch.set_facecolor(bg_color)
for axis in ax:
    axis.set_facecolor(bg_color)

# Left subplot: Loss
ax[0].plot(losses, '^-', color=loss_color, linewidth=2, markersize=6)
ax[0].set_ylabel('Loss', color=text_color, fontsize=12)
ax[0].set_xlabel('Epochs', color=text_color, fontsize=12)
ax[0].set_title('Losses with L1 $\lambda$=' + str(L1lambda), color=text_color, fontsize=14, fontweight='bold')

# Style the left subplot
ax[0].tick_params(colors=text_color)
for spine in ax[0].spines.values():
    spine.set_color(text_color)

# Right subplot: Accuracy
ax[1].plot(trainAcc, 'o-', color=train_color, linewidth=2, markersize=6, label='Train')
ax[1].plot(testAcc, 'o-', color=test_color, linewidth=2, markersize=6, label='Test')
ax[1].set_title('Accuracy with L1 $\lambda$=' + str(L1lambda), color=text_color, fontsize=14, fontweight='bold')
ax[1].set_xlabel('Epochs', color=text_color, fontsize=12)
ax[1].set_ylabel('Accuracy (%)', color=text_color, fontsize=12)

# Style the right subplot
ax[1].tick_params(colors=text_color)
for spine in ax[1].spines.values():
    spine.set_color(text_color)
legend = ax[1].legend(facecolor=bg_color, edgecolor=text_color)
legend.get_frame().set_alpha(0.8)
for text in legend.get_texts():
    text.set_color(text_color)

plt.tight_layout()
plt.show()

In [None]:
# create a 1D smoothing filter
def smooth(x,k):
  return np.convolve(x,np.ones(k)/k,mode='same')

## Parameteric experiment to test a range of L1 regularization terms 

In [None]:
# range of L1 regularization amounts
L1lambda = np.linspace(0,.005,10)

# initialize output results matrices
accuracyResultsTrain = np.zeros((numepochs,len(L1lambda)))
accuracyResultsTest  = np.zeros((numepochs,len(L1lambda)))


# loop over batch sizes
for li in range(len(L1lambda)):

  # create and train a model
  ANNiris,lossfun,optimizer = createANewModel()
  trainAcc,testAcc,losses = trainTheModel(L1lambda[li])

  # store data
  accuracyResultsTrain[:,li] = smooth(trainAcc,10)
  accuracyResultsTest[:,li]  = smooth(testAcc,10)

In [None]:
# plot some results
fig,ax = plt.subplots(1,2,figsize=(17,7))

# Catppuccin Macchiato color scheme
bg_color = '#24273a'
text_color = '#cad3f5'
grid_color = '#5b6078'  # Surface2
# Generate a range of colors from the Catppuccin palette
colors = ['#8aadf4', '#f5a97f', '#a6da95', '#eed49f', '#f4dbd6', '#c6a0f6', '#ed8796', '#91d7e3', '#7dc4e4', '#8bd5ca']

# Set figure and subplot backgrounds
fig.patch.set_facecolor(bg_color)
for axis in ax:
    axis.set_facecolor(bg_color)

# Plot with different colors for each L1 lambda value
for i in range(accuracyResultsTrain.shape[1]):
    ax[0].plot(accuracyResultsTrain[:,i], color=colors[i % len(colors)], linewidth=2)
    ax[1].plot(accuracyResultsTest[:,i], color=colors[i % len(colors)], linewidth=2)

ax[0].set_title('Train accuracy', color=text_color, fontsize=14, fontweight='bold')
ax[1].set_title('Test accuracy', color=text_color, fontsize=14, fontweight='bold')

# make the legend easier to read
leglabels = [np.round(i,4) for i in L1lambda]

# common features
for i in range(2):
    # Style the legend
    legend = ax[i].legend(leglabels, facecolor=bg_color, edgecolor=text_color)
    legend.get_frame().set_alpha(0.8)
    for text in legend.get_texts():
        text.set_color(text_color)
    
    # Style labels and axes
    ax[i].set_xlabel('Epoch', color=text_color, fontsize=12)
    ax[i].set_ylabel('Accuracy (%)', color=text_color, fontsize=12)
    ax[i].set_ylim([50,101])
    
    # Style grid and ticks
    ax[i].grid(True, color=grid_color, alpha=0.3, linestyle='-', linewidth=0.5)
    ax[i].tick_params(colors=text_color)
    
    # Style spines
    for spine in ax[i].spines.values():
        spine.set_color(text_color)

plt.tight_layout()
plt.show()

In [None]:
# show average accuracy by L1 rate

# average only some epochs
epoch_range = [500,950]

# Catppuccin Macchiato color scheme
bg_color = '#24273a'
text_color = '#cad3f5'
train_color = '#8aadf4'  # Blue
test_color = '#f5a97f'   # Peach

# Set figure background
fig = plt.figure(figsize=(10,6))
fig.patch.set_facecolor(bg_color)
ax = plt.gca()
ax.set_facecolor(bg_color)

plt.plot(L1lambda,
         np.mean(accuracyResultsTrain[epoch_range[0]:epoch_range[1],:],axis=0),
         'o-', color=train_color, linewidth=2, markersize=8, label='TRAIN')

plt.plot(L1lambda,
         np.mean(accuracyResultsTest[epoch_range[0]:epoch_range[1],:],axis=0),
         'o-', color=test_color, linewidth=2, markersize=8, label='TEST')

plt.xlabel('L1 regularization amount', color=text_color, fontsize=12)
plt.ylabel('Accuracy', color=text_color, fontsize=12)

# Style the axes
ax.tick_params(colors=text_color)
for spine in ax.spines.values():
    spine.set_color(text_color)

# Style the legend
legend = plt.legend(facecolor=bg_color, edgecolor=text_color)
legend.get_frame().set_alpha(0.8)
for text in legend.get_texts():
    text.set_color(text_color)

plt.tight_layout()
plt.show()

# Additional explorations

1) Can you modify the code here to create a manual L2 regularizer.

2) Can you take a crack at trying to implement elastic net regularization ? Please see the equation below

3) In the equation I provided, I specified separate parameters for L1 and L2 parameters. what if we would like to control the regularization generally (the combined regularization) but keep control over the balance between L1 & L2. Can you think of a way to modify the equation to accomodate this?

#### Elastic Net regularization
 
# $$\text{Cost}_{\text{ElasticNet}} = \text{Cost}_{\text{original}}(W,b) + \lambda_1 \sum_{i=1}^{n} |w_i| + \lambda_2 \sum_{i=1}^{n} w_i^2$$
 
- Elastic Net regularization combines both L1 and L2 penalties in a single regularization term.
- This provides a balance between the sparsity-inducing properties of L1 (feature selection) and the weight shrinkage properties of L2.
- The relative importance of L1 vs L2 regularization is controlled by the ratio of λ₁ and λ₂ parameters.
- Elastic Net is particularly useful when dealing with correlated features, as it tends to select groups of correlated features together. 
