You are supposed to decide whether the next Salman Khan film would earn over 200 crores. You have to predict the __Net Grossing__ of a film using its __Critic Rating__

Try and think about the following questions:

- What are the factors that you would have to consider while predicting the grossings of a movie?
- Once you have decided this, think, what would be your hypothesis?
- Mathematically, could you form a relation between all these parameters and the price?

What you just did up there was solve a problem using __Linear Regression__!

TL;DR 

### Linear Regression - Fitting a Line into your dataset

# Types of Fits

<br>

<center><figure><img src='img/collage.jpg' height=500 weight=500></figure></center>

# Linear Regression: the math under the hood

Let us take a mathematical look at this algorithm. For starters, as we discussed above we could use a linear relation to show the relationship between our __target values__ or the _net grossings_ and our selected __parameters__ or the _critic's ratings_. 

Let us define our hypothesis function as follows:

$$ h_\theta(x) = \theta_0 + \theta_1x $$
<br>
where $\theta_0 = 1$ __(why?)__ and $x$ is the critics' rating. $h_\theta(x)$ represents your hypothesis function that you will be using to predict the grossings of the next movie.

## Trivia!

### But how do you judge if your predictions are right?

### Here is an example cost function:

$$ J(\theta) = h_\theta(x) - y $$

This is the crux of any machine learning algorithm. You define a cost function and try to minimise it's value so as to achieve a local minima.

Here we have to try and minimise the value of the cose function. So suppose you have $m$ samples in your trainng set, your cost function would be:

$$ J(\theta) = \sum_{i=0}^{m} (h_\theta(x_i) - y_i) $$

## Trivia!

- __Can you spot something wrong with this cost function?__
   
   - __HINT__: Try to figure out the value of the cost function of the data points plotted here:
   <br>
   <center><figure><img src='img/here.jpg' width=500 height=500></figure></center>
   <br> 
- __Can you think of a better cost function?__

Mostly, statisticians use the following cost function while employing linear regression to solve problems:

$$ J(\theta) =  \frac{1}{2m}\sum_{i=0}^{m} (h_\theta(x_i) - y_i)^2 $$

## Trivia!

- What are the advantages of this cost function?
- Why is the $\frac{1}{2m}$ factor there in the formula?

## Trivia Again! :/

Now, once you know with the help of the __cost function__ about the correctness of your hypothesis, how do you update it to achieve a minimum cost?

# The blood-line of Machine Learning - Gradient Descent

To arrive at the corrent hypothesis, we use the method of __Gradient Descent__.

Let us go back to the hypothesis that we had selected:

$$ h_\theta(x) = \theta_0 + \theta_1x $$

Here, $\theta_0=1$, hence we would use $ h_\theta(x) = 1 + \theta_1x $


## Trivia!

Now that we have decided __what__ to optimise, how do we go about it? __Any ideas would be awesome!__


Intuition explained on the board!

# Deeper Dive into Gradient Descent - OPTIONAL

## WARNING - Heavy high school math coming. Weak hearts not permitted to move ahead

Now that you know what __Gradient Descent__ is in theory, how do we mathematically model it?

- Clearly, we need to update our parameters for the cose function to decrease, but how do we achieve minima?

   - __HINT__: How did you calculate local maxima in high school?

To understand this, imagine another cost function for a hypothesis with a single parameter, i.e., $H_\theta(x) = \theta_0x$. 

- What will be the required cost function?
- What would the graph of the cost function look like? Sketch it!
- How would you decide how to update the parameter $\theta_0$?

    __HINTS__:

    - What does a derivative signify?
    - How can the derivative of the cost function help you in updation?
    - Would you need to 'scale' the steps you need to take for the local minima? 


```
repeat until convergence {
    
    calculate_cost_function()
    
    for all parameters {
    
        update parameter towards optima
    
    }

}
```

- Q. What would be the problems you face in this approach?


# Lets analyse the code given to you from scratch!

### We have provided the code to you. Play around with it and figure out how the results change. And why?

In [None]:

# This is a code to implement linear regression on a dataset from scratch

# Importing libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from time import sleep

In [None]:
# Creating random dataset
x = np.array( [ 2, 3, 4.5, 6, 1.5, 8, 7, 5.4, 4, 6.5, 5, 2.5, 3.5, 4] )
y = np.array( [ 2.2, 2.8, 4, 5.7, 1.45, 7.9, 7, 5.8, 4.7, 6, 5, 2.8, 4, 4] )

# plotting the dataset
plt.plot(x, y, 'r.')
plt.show()

In [None]:
# printing dataset information and reshaping
print("Dimensions of x:", x.shape)
print("Dimensions of y:", y.shape)

x = x.reshape((x.shape[0], 1))
y = y.reshape((y.shape[0], 1))

print("\n")
print("Dimensions of x after reshaping:", x.shape)
print("Dimensions of y after reshaping:", y.shape)
print("x: ", x.T)

In [None]:
# Creating a dataset by concatenating x and y (Optional)
data = np.concatenate( (x,y), axis = 1 )
print(data)

In [None]:
#finding correlation between x and y 
np.corrcoef(np.transpose(data))

## Trivia!

### What does the correlation coefficient give you an insight about? Can you anticipate the results of regression?

In [None]:
# Defining hyper paramater theta (weights)

# Randomly initialising theta
theta = np.array([[0.9, -1]])
# print(theta.shape)

# creating bias vector x0
x0 = np.ones((x.shape[0], 1))

# forming input variable
X = np.concatenate((x0, x), axis = 1)
# print(X.shape)
# print(X)

# generating hypothesis, h
h = X.dot(theta.T)

# Plotting
plt.plot(x, y, 'r.', label = 'Training set')
plt.plot(x, h, 'b--', label = 'Current hypothesis')
plt.legend()
plt.show()

# printing overall loss
print("\nLOSS: ", (h - y).T)
print(np.sum(abs((h - y)) / len(X)))

In [None]:
def cost_function(X, y, theta):
    h = X.dot(theta.T)
    loss = h - y
    return h, np.sum(loss ** 2) / (2 * len(X))

# For testing the function
print(cost_function(X, y, theta))

In [None]:
def grad_descent(X, y, theta, alpha):
    loss = (X.dot(theta.T) - y) 
    dj = loss.T .dot(X)
    
    theta_n = theta - alpha * dj
    return theta_n

# For testing the function
print("theta before: ", theta)
theta_n = grad_descent(X, y, theta, 0.005)
print("theta after: ", theta_n)

In [None]:
def linear_reg(epoch, X, y, theta, alpha):
    for ep in range(epoch):
        
        #calculate new theta
        theta = grad_descent(X, y, theta, alpha)
        
        #compute new loss
        h, loss = cost_function(X, y, theta)
        print("Cost function: ", loss)
        
        #plot
        plt.plot(x, y, 'r.', label = 'training data')
        plt.plot(x, h, 'b--', label = 'current hypothesis')
        plt.legend()
        plt.show()
        
        sleep(3)
        

## Change the hyper-parameters and see the outputs!

In [None]:
# defining hyper parameters

# epochs are the number of times we run our linear regression to minimise the loss
# alpha is the learning rate

#Both epoch and alpha can be changed and tested on different numbers to minimise loss at a different rate(Advisable)
epoch = 15
alpha = 1
linear_reg(epoch, X, y, theta, alpha)