# Simple Linear Regression

For linear relationship, we start with linear model   

for this remember line equation:
`y = mx + c`, Here m = slope and c = intercept

Hypothesize that there are constants `ɑ(alpha)` and `β(beta)` <br>

`y_i = β * x_i + ɑ + ɛ_i`

- y_i = number of minutes user (i) spends on the site daily <br>
- x_i = number of friends of user (i) <br>
- β(beta) = slope = How many extra minutes a user spends for each additional friend.<br>
- ɑ(alpha) = intecept = How much time a user would spend even with zero friends.<br>
- ɛ = error term representing the fact that there are other factors not accounted for by this simple model <br>

In [1]:
# Assuming, we have determined such an alpha and beta, then we can make predictions simply with:

def predict(alpha:float, beta:float,x_i)->float:
    return beta*x_i + alpha

#### How Do We Pick Alpha and Beta?

We choose the values that make our predictions closest to the real data.

In [2]:
def error(alpha:float, beta:float,x_i:float,y_i:float)-> float:
    return predict(alpha,beta,x_i)-y_i         # error =  Predicted_Value - Actual_Value

We need to know the total error of the entire dataset. But we can't just add the errors. If the prediction for x_1 is too high and the prediction for x_2 is too low, the errors may just cancel out.

So instead we add up the squared errors

In [3]:
from typing import List
Vector = List[float]
def sum_of_sqerrors(alpha:float, beta:float, x:Vector, y:Vector)-> float:
    return sum(error(alpha,beta,x_i,y_i)**2
               for x_i,y_i in zip(x,y))

#### Below code is from Ch-5 (Statistics)

In [5]:
import math
# Correlation 
 ## Covariance( for covariance both variables should)
def mean(xs: List[float])-> float:
    return sum(xs)/len(xs)

def de_mean(xs:List[float]) -> List[float]:
    x_bar = mean(xs)
    return[x-x_bar for x in xs]


def sum_of_squares(xs:List[float])-> float:
    return sum(x_i*x_i for x_i in xs)

def variance(xs:List[float])->float:
    assert len(xs) >= 2 , "variance requires at least two elements"

    n = len(xs)
    deviations = de_mean(xs)
    return(sum_of_squares(deviations)/(n-1))

def standard_deviation(xs:List[float])-> float:
    ''' The standard deviation is the square root of the variance '''
    return math.sqrt(variance(xs))

def dot(xs:List[float],ys:List[float]):
    return sum(x_i*y_i for x_i,y_i in zip(xs,ys))

def covariance(xs:List[float],ys:List[float])-> float:
    assert len(xs) == len(ys), "xs and ys must have same number of elements"
    n = len(xs)
    return dot(de_mean(xs), de_mean(ys)) / (n-1)


# Correlation
def correlation(xs:List[float], ys:List[float])->float:
    ''' Measures how much xs and ys vary in tandem about their means '''
    stdev_x = standard_deviation(xs)
    stdev_y = standard_deviation(ys)
    if stdev_x>0 and stdev_y>0:
        return covariance(xs,ys) / stdev_x / stdev_y
    else:
        return 0                     # if no variation, correlation is zero 

Using Calculus, the error-minimizing (alpha) and (beta) are given by:

In [6]:
from typing import Tuple, List

Vector = List[float]

def least_squares_fit(x:Vector, y:Vector)-> Tuple[float,float]:
    """
    Given two vectors x and y, 
    find the least-squares values of alpha and beta
    """
    beta = correlation(x,y) * standard_deviation(y) / standard_deviation(x)
    alpha = mean(y) - beta * mean(x)
    return alpha,beta


# Quick test
x = [i for i in range(-100,110,10)]
y = [3*i-5 for i in x]

# Should find that y = 3x-5
assert least_squares_fit(x,y) == (-5,3)

R² tells you how well your line explains the data. <br>

R² = 1.0 − (Unexplained Variation)​ / (Total Variation)  <br>

- Unexplained Variation = It is basically the error


In [None]:
def total_sum_of_squares(y:Vector) -> float:
    """ The total sqaured variation of y_i's from their mean """
    return sum(v**2 for v in de_mean(y))

def r_squared(alpha:float, beta:float, x:Vector, y:Vector)-> float:
    """ The fraction of variation in y captured by the model, which equals
    1 - the fraction of variation in y not captured by the model """

    return 1.0 - (sum_of_sqerrors(alpha,beta,x,y) / total_sum_of_squares(y))

#### Using Gradient Descent to find alpha and beta

In [11]:
import random, tqdm

num_friends_good = list(range(0, 100, 5))
daily_minutes_good = [0.9*x + 23 for x in num_friends_good]


def scalar_multiplication(c:float, v:Vector)-> Vector:
    return[c*x for x in v]

def add(v:Vector, w:Vector)-> Vector:
    return[v_i+w_i for v_i,w_i in zip(v,w)]

def gradient_step(v:Vector, gradient:Vector, step_size: float) -> Vector:
    """ Moves 'step size' in the `gradient` direction from `v` """
    assert len(v) == len(gradient)
    step = scalar_multiplication(step_size, gradient)
    return add(v,step)
 
num_epochs = 10000
random.seed(0) 
 
guess = [random.random(), random.random()]  # choose random value to start 
 
learning_rate = 0.00001 
 
with tqdm.trange(num_epochs) as t: 
    for _ in t: 
        alpha, beta = guess 
 
        # Partial derivative of loss with respect to alpha 
        grad_a = sum(2 * error(alpha, beta, x_i, y_i) 
                     for x_i, y_i in zip(num_friends_good, 
                                         daily_minutes_good)) 
 
        # Partial derivative of loss with respect to beta 
        grad_b = sum(2 * error(alpha, beta, x_i, y_i) * x_i 
                     for x_i, y_i in zip(num_friends_good, 
                                         daily_minutes_good)) 
 
        # Compute loss to stick in the tqdm description 
        loss = sum_of_sqerrors(alpha, beta, 
                               num_friends_good, daily_minutes_good) 
        t.set_description(f"loss: {loss:.3f}") 
 
        # Finally, update the guess 
        guess = gradient_step(guess, [grad_a, grad_b], -learning_rate) 
 
# We should get pretty much the same results:
alpha, beta = guess
print(f"alpha = {alpha} and beta = {beta}")


loss: 306.684: 100%|██████████| 10000/10000 [00:05<00:00, 1720.50it/s]


alpha = 15.45392345798616 and beta = 1.0161036071293024


## Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation is a method used to estimate model parameters by choosing values that make the observed data most probable.

In short:
Pick parameters that best explain the data.

## Steps in Maximum Likelihood Estimation (MLE)

### 1. Choose a Model
Select a mathematical model that describes the relationship between variables.

Example:
y = α + βx + ε

---

### 2. Assume a Distribution for Errors
Assume how the random errors behave.  
Most commonly, errors are assumed to follow a normal distribution:

ε ~ Normal(0, σ)

- Mean = 0 → no bias  
- σ → typical size of errors  

---

### 3. Build the Likelihood Function
Construct a function that measures how probable the observed data is given the parameters.

L(θ) = P(data | θ)

For independent observations, the total likelihood is the product of individual probabilities.

---

### 4. Maximize the Likelihood
Find the parameter values that make the observed data most probable.

When errors are normally distributed, maximizing likelihood is equivalent to minimizing the sum of squared errors (SSE)