### What is regression?
Regression searches for relationships among variables.<br>
>The dependent features are called the dependent variables, outputs, or responses.<br>
>The independent features are called the independent variables, inputs, regressors, or predictors.

In regression problems we find relationships between the dependent and independent variables which are ultimately used to predict an output using given inputs ( independent variables )

Let us have a set of independent variables 𝐱 = (𝑥₁, …, 𝑥ᵣ)<br>
  where 𝑟 is the number of predictors<br>
  
We find a linear relationship between 𝑦 and 𝐱 using the regression equation: 𝑦 = 𝛽₀ + 𝛽₁𝑥₁ + ⋯ + 𝛽ᵣ𝑥ᵣ + 𝜀.<br>
  where 𝛽₀, 𝛽₁, …, 𝛽ᵣ are the regression coefficients, and 𝜀 is the random error.

Linear regression calculates the estimators of the regression coefficients.

### What are residuals?
The differences 𝑦ᵢ - f(𝐱ᵢ) for all observations 𝑖 = 1, …, 𝑛, are called the residuals.
 >where f(x) is the estimated function. ie. f(x<sub>i</sub>) = 𝑏₀ + 𝑏₁𝑥₁ + ⋯ + 𝑏<sub>i</sub>𝑥<sub>i</sub>
 
The goal of applying linear regression is to find the best fit line AKA best weights AKA value of estimated function that generated the least residual!

### Coefficient of determination (R<sup>2</sup>)
The coefficient of determination, denoted as 𝑅², tells you which amount of variation in 𝑦 can be explained by the dependence on 𝐱, using the particular regression model. A larger 𝑅² indicates a better fit and means that the model can better explain the variation of the output with different inputs.

The value 𝑅² = 1 corresponds to SSR = 0. That’s the perfect fit.

### Assumptions in regression
>There should be a linear and additive relationship between dependent (response) variable and independent (predictor) variable(s). A linear relationship suggests that a change in response Y due to one unit change in X¹ is constant, regardless of the value of X¹. An additive relationship suggests that the effect of X¹ on Y is independent of other variables.

>There should be no correlation between the residual (error) terms. Absence of this phenomenon is known as Autocorrelation.

>The independent variables should not be correlated. Absence of this phenomenon is known as multicollinearity.

>The error terms must have constant variance. This phenomenon is known as homoskedasticity. The presence of non-constant variance is referred to heteroskedasticity.

>The error terms must be normally distributed.

### Simple Linear Regression
Simple or single-variate linear regression is the simplest case of linear regression, as it has a single independent variable, 𝐱 = 𝑥.

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression

In [14]:
# Creating sample data
x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1)) #one column and as many rows as necessary (2D array -independent variable)
y = np.array([5, 20, 14, 32, 22, 38])                  # 1D array (dependent variable)        

In [17]:
# Creating model and fitting
model = LinearRegression().fit(x, y)

Parameters that can be set on the model
 1. <b>fit_intercept</b> is a Boolean that, if True, decides to calculate the intercept 𝑏₀ or, if False, considers it equal to zero. It defaults to True.
 2. <b>normalize</b> is a Boolean that, if True, decides to normalize the input variables. It defaults to False, in which case it doesn’t normalize the input variables.
 3. <b>copy_X</b> is a Boolean that decides whether to copy (True) or overwrite the input variables (False). It’s True by default.
 4. <b>n_jobs</b> is either an integer or None. It represents the number of jobs used in parallel computation. It defaults to None, which usually means one job. -1 means to use all available processors.

In [21]:
# Show R-square
r = model.score(x, y)
print(r)

# Show intercept of the estimated line (b0)
print(model.intercept_)

# Show b1
print(model.coef_)

0.715875613747954
5.633333333333333
[0.54]


The value of 𝑏₀ is approximately 5.63. This illustrates that your model predicts the response 5.63 when 𝑥 is zero. The value 𝑏₁ = 0.54 means that the predicted response rises by 0.54 when 𝑥 is increased by one.

In [30]:
# Predicting results (using train data as test data)
y_pred = model.predict(x)
print(y_pred)

[ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333 35.33333333]


model.predict simple plots the estimated line, which can also be calculated manually

In [31]:
y_pred = model.intercept_ + model.coef_ * x
print(y_pred.reshape(1,-1).flatten())

[ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333 35.33333333]


In [33]:
# Predicting results ( using new test data)
x_new = np.arange(4).reshape((-1, 1))
y_pred_new = model.predict(x_new)
print(y_pred_new)

[5.63333333 6.17333333 6.71333333 7.25333333]
