# The Optimizer

The **loss function** tells us how poorly our model is doing. The **optimizer** is the tool that helps us improve the model's performance by adjusting the weights and biases.

## Gradient Descent

Uses calculus to determine the gradients of the loss function. These gradients are the direction signs that indicate which way to adjust the weights and biases in order to decrease the loss function.

### Learning Rate

**Learning rate**: how far to move at each step
- Too low: slow convergence
- Too high: overshoot the minimum

Learning rate is a classic example of **hyperparameters**: values tuned and tweaked by ML engineers during training to improve performance of the model
**Hyperparameter tuning**: the process of adjusting hyperparameters in search of the best model performance


### PyTorch Optimizer
A popular optimizer in PyTorch, **Adam**, uses gradient descent with a few extra bells and whistles (like adjusting the learning rate dynamically during training)

- `model.parameters()` tells Adam what our current weights and biases are
- `lr=0.01` tells Adam to set the learning rate to 0.01

In [None]:
import torch.optim as optim

optimizer = optim.Adam(model.parameters(), lr=0.01)

To apply Adam to a neural network, we need to perform the:

1. **backwards pass**: calculate the gradients of the loss function (these determine the “downward” direction)
2. **step**: use the gradients to update the weights and biases

**Note**: `backward` is applied to the computed loss, not the loss function. This is why the output of the loss function includes the parameter `grad_fn=<MseLossBackward0>`. This parameter is the function used to perform the backwards pass.


In [None]:
# compute the loss 
MSE = loss(predictions,y)
# backward pass to determine "downward" direction
MSE.backward()
# apply the optimizer to update weights and biases
optimizer.step()

### Example

In [None]:
import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.optim as optim 

# create neural network
model = nn.Sequential(
    nn.Linear(3,16),
    nn.ReLU(),
    nn.Linear(16,8),
    nn.ReLU(),
    nn.Linear(8,4),
    nn.ReLU(),
    nn.Linear(4,1)
)

# import the data
apartments_df = pd.read_csv("streeteasy.csv")
numerical_features = ['bedrooms', 'bathrooms', 'size_sqft']
X = torch.tensor(apartments_df[numerical_features].values, dtype=torch.float)
y = torch.tensor(apartments_df['rent'].values,dtype=torch.float)

# forward pass
predictions = model(X)

# define the loss function and compute loss
loss = nn.MSELoss()
MSE = loss(predictions,y)
print('Initial loss is ' + str(MSE))

# create optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)

# use the loss to perform the backwards pass
MSE.backward()

# use optimizer to update the weights and biases
optimizer.step()

# feed the data through the updated model and compute the new loss
predictions = model(X)
MSE = loss(predictions,y)
print('After optimization, loss is ' + str(MSE))