<h4>Regression - Gradient Descent Overview</h4>
<ul>
<li>Linear Model. Estimated Target = w<sub>0</sub> + w<sub>1</sub>x<sub>1</sub> 
+ w<sub>2</sub>x<sub>2</sub> + w<sub>3</sub>x<sub>3</sub> 
+ … + w<sub>n</sub>x<sub>n</sub><br>
where, w is the weight and x is the feature
</li>
<li>Predicted Value: Numeric</li>
<li>Algorithm Used: Linear Regression. Objective is to find the weights w</li>
<li>Optimization: Gradient Descent. Seeks to minimize loss/cost so that predicted value is as close to actual as possible</li>
<li>Cost/Loss Calculation: Squared loss function</li>
</ul>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Input Feature: x  

Target: 5*x + 8 + some noise

In [None]:
# True Function
def straight_line(x):
    return 5*x + 8

In [None]:
# Estimate predicted value for a given weight
def predicted_at_weight(weight0, weight1, x):
    return weight1*x + weight0

In [None]:
np.random.seed(5)

samples = 150
x = pd.Series(np.arange(0,150))
y = x.map(straight_line) + np.random.randn(samples)*10

In [None]:
df = pd.DataFrame({'x':x,'y':y})

In [None]:
# One Feature example
# Training Set - Contains several examples of feature 'x' and corresponding correct answer 'y'
# Objective is to find out the form y = w0 + w1*x1
df.head()

In [None]:
df.tail()

In [None]:
plt.plot(df.x,df.y,label='Target')
plt.grid(True)
plt.xlim(-1,150)
plt.ylim(0,800)
plt.xlabel('Input Feature')
plt.ylabel('Target')
plt.legend()
plt.show()

In [None]:
# Linear Regression
import numpy as np
from sklearn.linear_model import LinearRegression

In [None]:
reg = LinearRegression()

In [None]:
reg.fit(df[['x']],df['y'])

In [None]:
print('Coefficients:',reg.coef_,'Intercept:',reg.intercept_)

<h4>Predict Y for different weights</h4>

In [None]:
# True function weight is  w1 = 5 and w0 = 8.  5*x + 8
w0 = [10,3,10,15,100]
w1 = [0,19,25,6,3]

In [None]:
y_predicted = {}
for i in range(len(w1)):
    y_predicted['{0}-{1}'.format(w0[i],w1[i])] = predicted_at_weight(w0[i],w1[i], x)

In [None]:
plt.plot(x,y,label='ground truth')

for w in y_predicted.keys():
    plt.plot(x,y_predicted[w],label=w)

plt.xlim(0,100)
plt.ylim(0,700)
plt.xlabel('Feature')
plt.ylabel('Predicted')
plt.title('Predicted Output for different weights')
plt.legend()
plt.show()

<h4>Squared Loss</h4>

In [None]:
for w in y_predicted.keys():
    squared_loss = (y-y_predicted[w])**2
    print('Weight:{0}\tLoss: {1:10.2f}'.format(w, squared_loss.mean()))

<h4>Plot Loss at different weights for x</h4>

In [None]:
# For a set of weights, let's find out loss or cost
# True Function: 5x+8
# Linear Regression algorithm iteratively tries to find the correct weight for x.
# Let's test how the lost changes at different weights for x.

# In this example, let's see how the "loss" changes for different weights
#DWB# that is, for different values of w1 (with w0 held at the correct 8)
weight = pd.Series(np.linspace(3,7,100))

In [None]:
print(weight[:5])
print()
print(weight[-5:])

<h4>Compute Loss using Squared Loss Function</h4>
<h4>loss = average((true - predicted)^2)</h4>

In [None]:
# Cost/Loss Calculation: Squared loss function...a measure of how far is predicted value from actual
# Steps :

#  For every weight for feature x, predict y
#  Now, find out loss by = average ((actual - predicted)**2)

#DWB# -v-Figuring out what's going on, here.
do_see_the_guts = True
this_count = 0
this_count_max = 5

#DWB# -v- Remember:
# # Estimate predicted value for a given weight
# def predicted_at_weight(weight0, weight1, x):
#     return weight1*x + weight0

loss_at_wt = []
for w1 in weight:
    y_predicted = predicted_at_weight(8,w1,x)
    
    if do_see_the_guts:
        this_count += 1
        if this_count == this_count_max:
            break
        ##endof:  if this_count == this_count_max
        
        print()
        print(f"  this_count:\n{this_count}")
        print(f"  this_count_max:\n{this_count_max}")
        
        print()
        print(f"  w1:\n{w1}")
        print(f"  type(w1):\n{type(w1)}")
        print()
        print(f"  x:\n{x}")
        print(f"  type(x):\n{type(x)}")
        print()
        print(f"  y_predicted = predicted_at_weight(8,{w1},x) =>")
        print(f"  y_predicted =\n{y_predicted}")
        print()
        print("  Yay 1!")
        input("  Press enter to continue.")
    ##endof:  if do_see_the_guts
    
    squared_error = (y - y_predicted)**2
    
    if do_see_the_guts:
        print()
        print(f"  y:\n{y}")
        print(f"  type(y):\n{type(y)}")
        print()
        print(f"  y_predicted:\n{y_predicted}")
        print(f"  type(y_predicted):\n{type(y_predicted)}")
        print()
        print(f"  squared_error:\n{squared_error}")
        print(f"  type(squared_error:\n{type(squared_error)})")
        print()
        print("  Yay 2!")
        input("  Press enter to continue.")
    ##endof:  if do_see_the_guts
    
    # Average Squared Error at weight w1
    loss_at_wt.append(squared_error.mean())
    
    if do_see_the_guts:
        print()
        print(f"  squared_error.mean():\n{squared_error.mean()}")
        print(f"  type(squared_error.mean()\n{type(squared_error.mean())})")
        print()
        print(f"  loss_at_wt:\n{loss_at_wt}")
        print(f"  type(loss_at_wt):\n{type(loss_at_wt)}")
        print()
        print("  Yay 3!")
        input("  Press enter to continue.")
    ##endof:  if do_see_the_guts

In [None]:
#DWB#+ This doesn't really mean anything, since we
#DWB#+ stopped after 4 <strike>or 5</strike> iterations.
min(loss_at_wt)

In [None]:
#DWB#  This one shouldn't even work, once again since
#DWB#+ we stopped after the 4 <strike>5</strike> iterations. Let's see
#DWB#+ the error, just for fun.

#plt.scatter(x=weight, y=loss_at_wt)
plt.plot(weight,loss_at_wt)
plt.grid(True)
plt.xlabel('Weight for feature x')
plt.ylabel('Loss')
plt.title('Loss Curve - Loss at different weight')
plt.show()

Oh, yeah. I can run the loop through without the additions.

In [None]:
# Cost/Loss Calculation: Squared loss function...a measure of how far is predicted value from actual
# Steps :

#  For every weight for feature x, predict y
#  Now, find out loss by = average ((actual - predicted)**2)

#DWB# -v-Figuring out what's going on, here.
do_see_the_guts = False

if do_see_the_guts:
    this_count = 0
    this_count_max = 5

#DWB# -v- Remember:
# # Estimate predicted value for a given weight
# def predicted_at_weight(weight0, weight1, x):
#     return weight1*x + weight0

loss_at_wt = []
for w1 in weight:
    y_predicted = predicted_at_weight(8,w1,x)
    
    if do_see_the_guts:
        this_count += 1
        if this_count == this_count_max:
            break
        ##endof:  if this_count == this_count_max
        
        print()
        print(f"  this_count:\n{this_count}")
        print(f"  this_count_max:\n{this_count_max}")
        
        print()
        print(f"  w1:\n{w1}")
        print(f"  type(w1):\n{type(w1)}")
        print()
        print(f"  x:\n{x}")
        print(f"  type(x):\n{type(x)}")
        print()
        print(f"  y_predicted = predicted_at_weight(8,{w1},x) =>")
        print(f"  y_predicted =\n{y_predicted}")
        print()
        print("  Yay 1!")
        input("  Press enter to continue.")
    ##endof:  if do_see_the_guts
    
    squared_error = (y - y_predicted)**2
    
    if do_see_the_guts:
        print()
        print(f"  y:\n{y}")
        print(f"  type(y):\n{type(y)}")
        print()
        print(f"  y_predicted:\n{y_predicted}")
        print(f"  type(y_predicted):\n{type(y_predicted)}")
        print()
        print(f"  squared_error:\n{squared_error}")
        print(f"  type(squared_error:\n{type(squared_error)})")
        print()
        print("  Yay 2!")
        input("  Press enter to continue.")
    ##endof:  if do_see_the_guts
    
    # Average Squared Error at weight w1
    loss_at_wt.append(squared_error.mean())
    
    if do_see_the_guts:
        print()
        print(f"  squared_error.mean():\n{squared_error.mean()}")
        print(f"  type(squared_error.mean()\n{type(squared_error.mean())})")
        print()
        print(f"  loss_at_wt:\n{loss_at_wt}")
        print(f"  type(loss_at_wt):\n{type(loss_at_wt)}")
        print()
        print("  Yay 3!")
        input("  Press enter to continue.")
    ##endof:  if do_see_the_guts

In [None]:
#DWB#  Since I ran the code a second time without the
#DWB#+ additions (I didn't want to look up the pre-
#DWB#+ additions code, so I just set the boolean for
#DWB#+ running it to False the second time), this
#DWB#+ cell and the next will mean something
#min(loss_at_wt)

#DWB#
loss_to_find = min(loss_at_wt)
print(loss_to_find)

In [None]:
#plt.scatter(x=weight, y=loss_at_wt)
plt.plot(weight,loss_at_wt)
plt.grid(True)
plt.xlabel('Weight for feature x')
plt.ylabel('Loss')
plt.title('Loss Curve - Loss at different weight')
plt.show()

In [None]:
#DWB#
plt.plot(weight,loss_at_wt)
plt.grid(True)
plt.xlim(4.9, 5.1)
plt.ylim(0, 200)
plt.xlabel('Weight for feature x')
plt.ylabel('Loss')
plt.title('Loss Curve - Loss at different weight; Zoomed to min')
plt.show()

In [None]:
#DWB#  I guess this is very similar the original code, but I'm
#DWB#+ adding details to find out more about the minimum.

we_have_found_it = False

for w1 in weight:
    y_predicted = predicted_at_weight(8,w1,x)
    squared_error = (y - y_predicted)**2
    this_loss = squared_error.mean()
    
    if abs(this_loss - loss_to_find) < 0.1:
        we_have_found_it = True
        print( "  We found it!")
        print(f"  The minimum loss of: {this_loss}")
        print( "  came at the (numerically-found) weight of: " + \
              f"{w1}"
             )
        print("  (For 'numerically-found', you can read, 'approximate'.)")
    ##endof:  if abs(this_loss - loss_to_find) < 0.1
    
    if we_have_found_it:
        break
    
##endof:  for w1 in weight

<h4>Summary</h4>
<h4>Squared Loss Function</h4>
Squared Loss is the average of the squared difference between predicted and actual value.  This loss function not only gives us loss at a given weight; it also tells us which direction to go to minimize loss.<br>
For a given weight, the algorithm finds the slope
<ul>
<li>If the slope is negative, then increase the weight</li>
<li>If the slope is positive, then decrease the weight</li>
</ul>

<h4>Learning Rate</h4>
Learning Rate parameter controls how much the weight should be increased or decreased<br>
Too big of a change, the algorithm will skip the point where loss is minimal<br>
Too small of a change, the algorithm will take several iterations to find the optimal weight<br>


<h4>Gradient Descent</h4>
Gradient Descent optimization computes the loss and slope, then adjusts the weights of all the features.<br>
It iterates this process until it finds the optimal weight.<br>
There are three flavors of Gradient descent:<br>

<h4>Batch Gradient Descent</h4>
Batch gradient descent computes loss for all examples in the training set and then adjusts the weight<br>
It repeats this process for every iteration.<br>
This process can be slow to converge when you have a large training data set<br>


<h4>Stochastic Gradient Descent</h4>
With Stochastic Gradient Descent, the algorithm computes loss for the next training example and immediately adjusts the weights.  This approach can help in converging to optimal weights for large data sets.<br>
However, one problem with this approach is algorithm is adjusting weights based on a single example [our end objective is to find weight that works for all training examples and not for the immediate example], and this can result in wild fluctuation in weights.<br>


<h4>Mini-Batch Gradient Descent</h4>
Mini-batch Gradient descent combines benefit of Stochastic and Batch Gradient descent.<br>
It adjusts the weight by testing few samples. The number of samples is defined by mini-batch size, typically around 128.<br>
The mini-batch approach can be used to compute loss in parallel.<br>
This technique is prevalent in deep learning and other algorithms.<br>

