# Linear Regression

## Regression
A line or curve is fitted to the data in order to identify the relationship between the variables. The regression line represents the best-fit line or curve that minimizes the distance between the predicted values and the actual values.
There are independant variables ("the input") and dependant variable ("the output")

## Linear regression
The relationship between the variables is assumed to be linear, that can be represented by a straight line

It can be simple or multiple:

### Simple Linear Regression
Only one independant variable is used
y = mx + q

[YouTube video](https://www.youtube.com/watch?v=zPG4NjIkCjc)

### Multiple Linear Regression
More independant variable are used
y = a + b1 * x1 + b2 * x2 + b3 * x3...

[YouTube video](https://www.youtube.com/watch?v=29rjWClT_3U)

### Polynomial Regression
The equation is polynomial.
y = b0 + b1 * x + b2 * x^2 + b3 * x^3....

[YouTube video](https://www.youtube.com/watch?v=QptI-vDle8Y)

In [2]:
# Library import

import numpy as np
from sklearn.linear_model import LinearRegression

In [3]:
# Create data

x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1)) # one column and x rows
y = np.array([5, 20, 14, 32, 22, 38])

In [4]:
# Create the model
model = LinearRegression()

# Calculates the optimal values of the weights m and q
model.fit(x, y)

In [7]:
# Coefficient of determination or R^2
r_sq = model.score(x, y)
print(r_sq)
print("intercept:", model.intercept_)
print("slope:", model.coef_)


0.7158756137479542

In [13]:
# Predict response
y_pred = model.predict(x)
print(y_pred)

x_new = np.arange(5).reshape((-1, 1))
y_new = model.predict(x_new)
print(y_new)

[ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333 35.33333333]
[5.63333333 6.17333333 6.71333333 7.25333333 7.79333333]


# Multiple Linear Regression

In [17]:
# Now x is a multi dimensional array

x = np.array([[0, 1], [5, 1], [15, 2], [25, 5], [35, 11], [45, 15], [55, 34], [60, 35]])
y = np.array([4, 5, 20, 14, 32, 22, 38, 43])

model = LinearRegression().fit(x, y)

r_sq = model.score(x, y)
print(f"{r_sq}")

y_pred = model.predict(x)
print(f"{y_pred=}")

0.8615939258756776
y_pred=array([ 5.77760476,  8.012953  , 12.73867497, 17.9744479 , 23.97529728,
       29.4660957 , 38.78227633, 41.27265006])


# Polynomial Regression

In [19]:
from sklearn.preprocessing import PolynomialFeatures

In [20]:
x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([15, 11, 2, 8, 25, 32])

In [23]:
transformer = PolynomialFeatures(degree=2, include_bias=False)
transformer.fit(x)
x_ = transformer.transform(x)
x_

array([[   5.,   25.],
       [  15.,  225.],
       [  25.,  625.],
       [  35., 1225.],
       [  45., 2025.],
       [  55., 3025.]])

In [24]:
model = LinearRegression().fit(x_, y)
r2 = model.score(x_, y)
print(f"{r2=}")
intercept = model.intercept_
print(f"{intercept=}")
coeff = model.coef_
print(f"{coeff}")

r2=0.8908516262498564
intercept=21.37232142857144
[-1.32357143  0.02839286]


In [26]:
y_pred = model.predict(x_)
y_pred

array([15.46428571,  7.90714286,  6.02857143,  9.82857143, 19.30714286,
       34.46428571])