# Linear Regression

## Definition

Linear regression is a statistical method used to model the relationship between a dependent (target) variable and one or more independent variables by fitting a linear equation to observed data.

## When to Use

Linear regression is a simple model that's suitable when:

- The relationship between the variables is linear.
- You want to predict a continuous outcome.
- The data meets the assumptions of linear regression.

## Building a Model

Here we will be creating linear regression models so it's easy to understand.

### Simple Linear Regression

For a simple linear regression model (one independent variable), the equation of the line is:

$$ y = mx+b $$

Where:

- y = dependent variable (target)
- x = independent variable (feature)
- m = slope (coefficient)
- b = y-intercept

The slope (m) and y-intercept (y) can be calculated using the formulas:

$$ m = \frac{N\sum(xy) - \sum(x)\sum(y)}{N\sum(x^2) - (\sum(x))^2} $$

$$ b = \frac{\sum(y) - m\sum(x)}{N} $$

Where:

- $N$ = number of data points
- $\sum(xy)$ = sum of the product of x and y
- $\sum(x)$ = sum of x
- $\sum(y)$ = sum of y
- $\sum(x^2)$ = sum of sqaured x

In [50]:
import numpy as np

def simple_linear_regression(x, y):
    N = len(x)
    sum_x = np.sum(x)
    sum_y = np.sum(y)
    sum_xy = np.sum(x * y)
    sum_x2 = np.sum(x ** 2)
    
    m = (N * sum_xy - sum_x * sum_y) / (N * sum_x2 - sum_x ** 2)
    b = (sum_y - m * sum_x) / N
    
    return m, b

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

m, b = simple_linear_regression(x, y)
print(f"Slope (m): {m}")
print(f"Intercept (b): {b}")

def predict_simple_linear_regression(x_new):
    return m * x_new + b

x_new = 34
y_pred = predict_simple_linear_regression(x_new)
print(f"Predicted value for x = {x_new}: {y_pred}")

Slope (m): 2.0
Intercept (b): 0.0
Predicted value for x = 34: 68.0


### Multiple Linear Regression

For multiple linear regression (more than one independent variable), the process involves solving a system of linear equations using matrix operations. The equation is:

$$ \beta = (X^TX)^{-1}X^Ty $$

Where:

- $\beta$ = vector of coefficients
- $X$ = matrix of independent variables (with a column of 1s for the intercept)
- $y$ = vector of dependent variable values

In [66]:
def multiple_linear_regression(X, y):
    ones = np.ones(X.shape[0])
    X = np.column_stack([ones, X])

    XTX = X.T.dot(X)
    XTX_inv = np.linalg.inv(XTX)  # use np.linalg.pinv if XTX is singular
    b = XTX_inv.dot(X.T).dot(y)

    return b[0], b[1:]

def predict_multiple_linear_regression(x_new, intercept, coefficients):
    # ensure x_new is a numpy array
    x_new = np.array(x_new)
    
    y_pred = intercept + np.dot(coefficients, x_new)
    return y_pred

X = np.array([[1, 5], [10, 4], [2, 9], [8, 7], [3, 6]])
y = np.array([1, 2, 3, 4, 5])

intercept, coefficients = multiple_linear_regression(X, y)
print(f"Intercept: {intercept}")
print(f"Coefficients: {coefficients}")

x_new = [3, 4]
y_pred = predict_multiple_linear_regression(x_new, intercept, coefficients)
print(f"Predicted value for x = {x_new}: {y_pred}")

Intercept: -0.1870412553783969
Coefficients: [0.11212351 0.42723361]
Predicted value for x = [3, 4]: 1.8582637307010805


## Example Using Scikit-Learn

Here we will be using Scikit-Learn to apply linear regression.

In [77]:
from sklearn.linear_model import LinearRegression

X = np.array([[1, 5], [10, 4], [2, 9], [8, 7], [3, 6]])
y = np.array([1, 2, 3, 4, 5])

linear_regression = LinearRegression()
linear_regression.fit(X, y)

x_new = np.array([3, 4]).reshape(1, -1)  # use reshape(1, -1) to ensure x_new is a 2D array with 1 sample and 2 features

y_pred = linear_regression.predict(x_new)
print(f"Predicted value for x = {x_new}: {y_pred}")

Predicted value for x = [[3 4]]: [1.85826373]
