# Linear Regression
most basic algo.

Two Types:
1. Simple Linear Regression (SLR)
2. Multiple Linear Regression (MLR)

In [2]:
import numpy as np
import pandas as pd

## 1. Simple Linear Regression
In simple linear regression, we model the relationship between two variables by fitting a straight line to the data. The `equation of the line` is given by:

$${ y = \beta_0 + \beta_1 x }$$

- ${ y }$ is the dependent variable (target).
- ${ x }$ is the independent variable (predictor).
- ${ \beta_0 }$ is the y-intercept (the value of ${ y }$ when ${ x = 0 }$).
- ${ \beta_1 }$ is the slope of the line (the change in ${ y }$ for a one-unit change in ${ x }$).

In [5]:
# example dataset, scores after each hour
hours = [1, 2, 3, 4, 5]
scores = [1.5, 3.2, 4.8, 8.5, 11.5]

### Simplest Algorithm

In [6]:
# independant and dependant variables
x = np.array(hours)
y = np.array(scores)

In [14]:
# Initialize parameters

b0 = 0  # Intercept
b1 = 0  # Slope

learning_rate = 0.01
iterations = 1000

In [15]:
print("Before: b0 =", b0, "and b1 =", b1)

Before: b0 = 0 and b1 = 0


In [16]:
# Train

# Calculate the prediction
for _ in range(iterations): # epochs
    m = len(y)

    for i in range(m):
        predicted = b0 + b1 * x[i]

        # Calculate the error
        error = predicted - y[i]

        # Update the weights
        b0 -= learning_rate * error
        b1 -= learning_rate * error * hours[i]

In [17]:
print("After: b0 =", b0, "and b1 =", b1)

After: b0 = -1.773159591514913 and b1 = 2.573774778247447


In [20]:
# predict
print(f"input={x[1]},\npredicted={b0 + b1 * x[1]}\nactual = {y[1]}\nerror = {(b0 + b1 * x[1]) - y[1]}\nerror %age = {round(abs(((b0 + b1 * x[1]) - y[1]) / y[1]) * 100 ,2)}% ")

input=2,
predicted=3.374389964979981
actual = 3.2
error = 0.17438996497998094
error %age = 5.45% 


In [12]:
def mse(X, y, b0, b1): # mean squared error (cost function)
  c = len(y)

  total_error = 0
  for i in range(c):
    total_error += (
      # result
      b0 +
      b1 * X[i]

      # minus actual
      - y[i]
    ) ** 2 # raise to two
  return total_error / (2 * c)


## 2. Multiple Linear Regression
In multiple linear regression, we model the relationship between one dependent variable and `multiple independent variables`. The equation is given by:

$${ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n }$$

- ${ y }$ is the dependent variable.
- ${ x_1, x_2, \ldots, x_n }$ are the independent variables.
- ${ \beta_0, \beta_1, \beta_2, \ldots, \beta_n }$ are the coefficients.


In [21]:
# sample dataset

hours_studied         = np.array([1,   2,   3,   4,   5   ])
assignments_completed = np.array([1,   1,   2,   2,   3   ])
scores                = np.array([1.5, 3.2, 4.8, 8.5, 11.5])

In [22]:
# input and output
X = np.column_stack([hours_studied, assignments_completed])
Y = scores

In [26]:
weights = np.full(X.shape[1:], 1.0)
bias = 0.0

learning_rate = 0.01
iterations = 1000

In [29]:
c = len(Y)
for _ in range(iterations): # epochs
  for i in range(c):
    predicted = np.dot(X[i], weights)
    error = predicted - Y[i]
    weights -= learning_rate * error * X[i]
  print("iteration =", _, " weights = ", weights)

iteration = 0  weights =  [ 2.75627415 -1.05738389]
iteration = 1  weights =  [ 2.75648528 -1.0577451 ]
iteration = 2  weights =  [ 2.75669577 -1.05810522]
iteration = 3  weights =  [ 2.75690564 -1.05846427]
iteration = 4  weights =  [ 2.75711488 -1.05882225]
iteration = 5  weights =  [ 2.75732349 -1.05917916]
iteration = 6  weights =  [ 2.75753149 -1.059535  ]
iteration = 7  weights =  [ 2.75773885 -1.05988977]
iteration = 8  weights =  [ 2.7579456  -1.06024348]
iteration = 9  weights =  [ 2.75815173 -1.06059614]
iteration = 10  weights =  [ 2.75835725 -1.06094774]
iteration = 11  weights =  [ 2.75856215 -1.0612983 ]
iteration = 12  weights =  [ 2.75876644 -1.0616478 ]
iteration = 13  weights =  [ 2.75897011 -1.06199626]
iteration = 14  weights =  [ 2.75917318 -1.06234367]
iteration = 15  weights =  [ 2.75937564 -1.06269005]
iteration = 16  weights =  [ 2.7595775  -1.06303539]
iteration = 17  weights =  [ 2.75977875 -1.0633797 ]
iteration = 18  weights =  [ 2.7599794  -1.06372298]
ite

In [30]:
# test
print(f"input={X[1]},\npredicted={np.dot(weights, X[1])}\nactual = {Y[1]}\nerror = {np.dot(weights, X[1]) - Y[1]}\nerror %age = {round(abs((np.dot(weights, X[1]) - Y[1] - Y[1]) / Y[1]) * 100 ,2)}% ")

input=[2 1],
predicted=4.474558150142222
actual = 3.2
error = 1.274558150142222
error %age = 60.17% 
