# Deriving Least Squares Solution for Linear Regression

We begin with the cost function used in simple linear regression, which aims to minimize the squared difference between predicted and actual values.

In [9]:
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
N = len(x)
S_x = sum(x)
S_y = sum(y)
S_xx = sum(i**2 for i in x)
S_xy = sum(x[i] * y[i] for i in range(N))

a = (N * S_xy - S_x * S_y) / (N * S_xx - S_x**2)
b = (S_y - a * S_x) / N


## Step 1: Compute the Partial Derivatives
We compute the partial derivatives of E with respect to both parameters a and b.

## Cost Function

We define the cost function \( E \) as:

    E = Σ (y_i - (a * x_i + b))²  
    for i = 1 to N

This represents the sum of squared errors between predicted and actual values.

---

## Partial Derivatives

To minimize the error, we compute the partial derivatives of \( E \) with respect to the parameters \( a \) and \( b \).

**Partial derivative with respect to a:**

    ∂E/∂a = -2 * Σ [x_i * (y_i - (a * x_i + b))]

**Partial derivative with respect to b:**

    ∂E/∂b = -2 * Σ [y_i - (a * x_i + b)]

---

## Setting Derivatives to Zero

To minimize the error \( E \), we set both partial derivatives to zero:

    ∂E/∂a = 0
    ∂E/∂b = 0


In [4]:
# Cost Function
E = sum((y[i] - (a * x[i] + b))**2 for i in range(N))


In [5]:
E

13.75

In [6]:
dE_da = -2 * sum(x[i] * (y[i] - (a * x[i] + b)) for i in range(N))
dE_db = -2 * sum((y[i] - (a * x[i] + b)) for i in range(N))


In [7]:
dE_da
dE_db


-15.0

## Step 2: Set the derivatives to zero to find the minimum
To minimize the error, we set the derivatives equal to zero.

## Step 2: Set the Derivatives to Zero -Code
To minimize the cost function, we set the gradients to zero:

    sum(x[i] * (y[i] - a * x[i] - b)) = 0
    sum(y[i] - a * x[i] - b) = 0


## Step 1: Set the Partial Derivatives to Zero

To minimize the error function:

    E = Σ (y_i - (a * x_i + b))²

We take the partial derivatives with respect to `a` and `b` and set them to zero.

**Partial derivative with respect to `a`:**

    ∂E/∂a = -2 * Σ [x_i * (y_i - a * x_i - b)] = 0

**Partial derivative with respect to `b`:**

    ∂E/∂b = -2 * Σ [y_i - a * x_i - b] = 0


## Step 3: Rewriting the Equations Using Summation Shorthand

Let:
- S_x  = sum of all x values
- S_y  = sum of all y values
- S_xx = sum of x squared
- S_xy = sum of x * y
- N    = number of data points

This leads to the following system of equations:

    Equation 1: S_xy - a * S_xx - b * S_x = 0  
    Equation 2: S_y - a * S_x - b * N = 0


## Step 4: Solve for 'a' using Equation 1 and 2
First, solve Equation 2 for b and substitute into Equation 1.

## Step 4: Solve the System of Equations

We now solve for `a` and `b` using the system of equations derived from setting the partial derivatives to zero.

From Equation (2):

    b = (S_y - a * S_x) / N

Substitute this expression for `b` into Equation (1):

    S_xy - a * S_xx - [(S_y - a * S_x) / N] * S_x = 0

Multiply the entire equation by `N` to eliminate the denominator:

    N * S_xy - a * N * S_xx - S_x * (S_y - a * S_x) = 0

Distribute the terms:

    N * S_xy - a * N * S_xx - S_x * S_y + a * S_x^2 = 0

Group like terms:

    a * (S_x^2 - N * S_xx) = N * S_xy - S_x * S_y

Solve for `a`:

    a = (N * S_xy - S_x * S_y) / (N * S_xx - S_x^2)


In [10]:
b = (S_y - a * S_x) / N

In [11]:
b

2.2

In [12]:
a = (N * S_xy - S_x * S_y) / (N * S_xx - S_x**2)

In [13]:
a

0.6

## Final Closed-form Expressions for Linear Regression Coefficients

## Final Result (Closed-form Solution)

We now present the final closed-form solution for the linear regression coefficients `a` and `b`.

The slope `a` is:

    a = [N * Σ(x_i * y_i) - Σx_i * Σy_i] / [N * Σ(x_i²) - (Σx_i)²]

The intercept `b` is:

    b = [Σy_i - a * Σx_i] / N


In [14]:
# Compute coefficients
a = (N * S_xy - S_x * S_y) / (N * S_xx - S_x**2)
b = (S_y - a * S_x) / N

In [15]:
a,b

(0.6, 2.2)