# Gradient Descent Warmup

The gradients will continue until morale improves

## Theory questions

#### 1) What is a loss function?

#### 2) What is the loss function for SSE OLS regression?

1) A loss function is a function representing the error between the actual values of a target variable
and the predictions made by a model

A loss function can be used by a machine learning model to minimize error between predicted and target values

Selection of a loss function often depends on both the specific model being used and the situation in which
one is doing analysis

2) Loss function for OLS using SSE is mean square error, or loss = sum((y-f(x))**2)/n

In practice, often calculated as sum((y-f(x))**2)/(2n), so that the exponent is cancelled out when taking the derivative
and the calculations are a little easier.

## Calculating a loss function for a simple regression model

For this example, we'll import a series of 300 points from the data folder as an `X` feature, and another series of 300 points as a `y` target

In [1]:
#run this cell as-is

#used for testing
from test_scripts.test_class import Test
test = Test()

#data manip
import numpy as np

#data import
X = test.load_ind('X')
y = test.load_ind('y')

### Calculate the change in the loss function for a simple regression model, $y = \beta  x + b$, as we step down the gradient once from an initial guess of 3 for $\beta$ and 1 for $b$

#### First, set up
- define n as 300
- define `beta` and `n` as initial guesses of 3 and 1, respectively
- assign the loss function to `loss` using `y`, `X`, `n`, `beta` and `b`
  - note: go ahead and multiply the denominator by 2 to cancel the partial derivative exponent step coming up

In [7]:
n = 300
b = 1
beta = 3

# sum((y-beta*X-b)**2)/(2*n)

def loss(beta, b, n):
    return sum((y-beta*X-b)**2)/(2*n)

### Now, calculate

- determine the partial derivatives for `beta` and `b`

In [11]:
# beta_deriv = #your code here
# b_deriv = #your code here


beta_deriv = sum((y-beta*X-b)*-X)/300
b_deriv =sum((y-beta*X-b)*-1)/300

In [14]:
beta_deriv

-41.44519343763108

In [15]:
b_deriv

5.399401084176953

- re-define `beta` and `b` by "stepping down the gradient" - ie, subtract their respective partial derivatives from themselves

In [12]:
beta = beta - beta_deriv
b = b - b_deriv

###

# beta = beta - beta_deriv
# b = b - b_deriv

In [16]:
beta,b

(44.44519343763108, -4.399401084176953)

In [13]:
#run this cell to test your work!

print('test for beta:')
test.run_test(beta, "beta")
print()
print('test for b:')
test.run_test(b, "b")

test for beta:


✅ **Hey, you did it.  Good job.**


test for b:


✅ **Hey, you did it.  Good job.**

- does `y = original_beta * X + original_b` have a smaller loss function value than `y = updated_beta * X + new_b`?

In [17]:
# loss_original = #your code here

# loss_current = #your code here

####

loss_original = loss(3,1,300)

loss_new = loss(beta, b, 300)

print(f'loss from first guess: {loss_original}, loss from updated guess: {loss_new}')


loss from first guess: 942.4909304580721, loss from updated guess: 15.147998163619848


#### Now, do this 10 more times!

- Put the machinery to calculate a new `beta` and `b` you've created inside of a function
  - The function should take as inputs `beta`, `b`, `X`, and `y`
  - (`X` and `y` should have default values)
  - calculate the partial derivative of the loss function for `beta`
  - calculate the partial derivative of the loss function for `b`
  - subtract their respective partial derivatives from `beta` and `b`
  - return updated `beta` and `b`


- Put the function inside of a loop which runs 10 times

- Calculate `beta` and `b`, starting with `beta`=3 and `b`=1, after 10 "steps down the gradient"

- What does it look like the final `y = mx + b` equation should be?

In [None]:
#Your work here

In [None]:
def calculate_descent(beta, b, X=X, y=y):
    beta_deriv = sum((y-beta*X-b)*-X)/300
    b_deriv = sum((y-beta*X-b)*-1)/300
    
    new_beta = beta - beta_deriv
    new_b = b - b_deriv
    
    return {'beta':new_beta, 'b': new_b}

beta = 3
b = 1

for epoch in range(0,10):
    results = calculate_descent(beta, b)
    beta = results['beta']
    b = results['b']

print(f'beta after 10 epochs: {beta}')
print(f'b after 10 epochs: {b}')

print(f'y = {beta} * X + {b}, baby')

#used for testing
# test.save(beta, 'beta_10')
# test.save(b, 'b_10')


In [None]:
#run this cell to test your work!

print('test for beta:')
test.run_test(beta, "beta_10")
print()
print('test for b:')
test.run_test(b, "b_10")

# Bonus Round

Look at the loss function as we move from epoch to epoch

Why do we not need a learning rate in this instance?

Why might we need a learning rate in other instances?

In [None]:
#Your work here