# Lecture 06

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import animation
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython import display

For this in-class we will experiment with optimizers that are commonly used in deep learning systems. We will use a toy 1D function (one parameter), for simplicity and visualiztion purposes. However, note that in practice the functions being optimized have millions of parameters, are very noisy, and can change each iteration. Therefore, use this assignment to gain some intuition on the optimizers and their parameters, however, take any conclusions with a grain of salt. 

We will perform numerical optimization on the following 1D function:

$$f(w) = w^4 - 21w^3 + 151w^2 - 435w + 550$$

In [None]:
def f(w):
    return 1*w**4 - 21*w**3 + 151*w**2 - 435*w + 550

In the `utils.py` file, a function called `optimize` is provided to run an optimization loop for a given function and optimizer. Additionally, a function called `save_fig` is provied for saving a GIF of the optimization trajectory.

In [None]:
from utils import optimize, save_gif
help(optimize)
help(save_gif)

First, perform optimization using the stochastic gradient descent (SGD) optimizer. For SGD the only parameter to tune is the learning rate (`lr`). Experiment with different learning rate values below. 

In [None]:
w = torch.nn.Parameter(torch.ones(1))
w.requires_grad = True

######################################
#######         TODO           #######
######################################
optimizer = torch.optim.SGD([w], lr = XXX)

ws,fs = optimize(f, optimizer)
anim = save_gif(f, ws, fs, "sgd.gif")

The above plot only shows the final state of the optimization. The previous cell saved a GIF of the full trajectory to the current working directory, which you can open using the file browser. Alternatively, for convenience, you can use the code below to display an inline gif. **NOTE** if the gif is running when you try to save the notebook as a PDF, you will get an error. After you are done expiermenting and ready to save, right click on the cell below and select "Clear Cell Output"

In [None]:
# inline gif display
gifPath = "sgd.gif" 
with open(gifPath,'rb') as myfile:
    display.Image(data=myfile.read(), format='png')

Which values of `lr` work the best? What happens if `lr` is too small? Too large? Does SGD find the global minimum?

XXX

Next try the SGD + momentum optimizer. SGD + momentum is implemented in the `torch.optim.SGD` class by setting the `momentum` parameter to a value greater than 0 (default is 0, corresponding to vanilla SGD which was used in the previous optimization). Experiment with different combinations of learning rate (lr) and momentum.

In [None]:
w = torch.nn.Parameter(torch.ones(1))
w.requires_grad = True

######################################
#######         TODO           #######
######################################
optimizer = torch.optim.SGD([w], lr = XXX, momentum = XXX)

ws,fs = optimize(f, optimizer)
anim = save_gif(f, ws, fs, "momentum.gif")

Display inline GIF of SGD + momentum optimization trajectory

In [None]:
# inline gif display
gifPath = "momentum.gif" 
with open(gifPath,'rb') as myfile:
    display.Image(data=myfile.read(), format='png')

Which values of `lr` work the best? What happens if `lr` is too small or too large?
Which values of `momentum` work the best? What happens if `momentum` is too small or too large?
Does the optimization reach a global minimum?

XXX

Last, perform optimization using the Adam optimizer. Experiment with the learning rate (`lr`) and coefficients used for computing running averages of gradient and its square (`betas`: tuple of 2 elements)

In [None]:
w = torch.nn.Parameter(torch.ones(1))
w.requires_grad = True

######################################
#######         TODO           #######
######################################
optimizer = torch.optim.Adam([w], lr = XXX, betas=(XXX, XXX))

ws,fs = optimize(f, optimizer)
anim = save_gif(f, ws, fs, "adam.gif")

Display inline GIF of Adam optimization trajectory

In [None]:
# inline gif display
gifPath = "adam.gif" 
with open(gifPath,'rb') as myfile:
    display.Image(data=myfile.read(), format='png')

Which values of lr work the best? Which vaues of betas work the best? Does Adam reach a global minimum?

XXX

Compare and contrast the 3 optimizers used in this assignment. How are the trajectories the same vs. different? 

XXX

# Submit

Save a PDF and submit to ICON. **NOTE** the notebook cannot be exported as a PDF with the inline GIFs playing. Clear the output of each of the 3 cells that have inline GIFs (cells that start wtih  `# inline gif display`), then proceed with exporting as PDF.

Add, commit, push to your lecture git repo.