# Real Python
## Linear Regression in Python

***We’re living in the era of large amounts of data, powerful computers, and artificial intelligence. This is just the beginning. Data science and machine learning are driving image recognition, autonomous vehicles development, decisions in the financial and energy sectors, advances in medicine, the rise of social networks, and more. Linear regression is an important part of this.
Linear regression is one of the fundamental statistical and machine learning techniques. Whether you want to do statistics, machine learning, or scientific computing, there are good chances that you’ll need it. It’s advisable to learn it first and then proceed towards more complex methods.***

### When Do You Need Regression?

**Typically, you need regression to answer whether and how some phenomenon influences the other or how several variables are related. For example, you can use it to determine if and to what extent the experience or gender impact salaries.**

## Polynomial Regression

## Underfitting and Overfitting

**Underfitting:<br>**
***occurs when a model can’t accurately capture the dependencies among data, usually as a consequence of its own simplicity. It often yields a low 𝑅² with known data and bad generalization capabilities when applied with new data.***

**Overfitting:<br>**
***happens when a model learns both dependencies among data and random fluctuations. In other words, a model learns the existing data too well. Complex models, which have many features or terms, are often prone to overfitting. When applied to known data, such models usually yield high 𝑅². However, they often don’t generalize well and have significantly lower 𝑅² when used with new data.***

## Simple Linear Regression With scikit-learn

In [5]:
import numpy as np
from sklearn.linear_model import LinearRegression

In [7]:
x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
x

array([[ 5],
       [15],
       [25],
       [35],
       [45],
       [55]])

In [9]:
y = np.array([5, 20, 14, 32, 22, 38])
y

array([ 5, 20, 14, 32, 22, 38])

In [10]:
model = LinearRegression()

In [17]:
model = LinearRegression().fit(x, y)

In [18]:
# With .fit(), you calculate the optimal values of the 
# weights 𝑏₀ and 𝑏₁, using the existing input and output 
# (x and y) as the arguments.
# In other words, .fit() fits the model.
#  It returns self, which is the variable model itself.
# That’s why you can replace the last two statements with this one:

In [21]:
r_sq = model.score(x, y)
print('coefficient of determination:', r_sq)

coefficient of determination: 0.715875613747954


In [22]:
# When you’re applying .score(), 
# the arguments are also the predictor x and regressor y, 
# and the return value is 𝑅².

In [24]:
print('intercept:', model.intercept_)
print('slope:', model.coef_)

intercept: 5.633333333333329
slope: [0.54]


In [27]:
new_model = LinearRegression().fit(x, y.reshape((-1, 1)))
print('intercept:', new_model.intercept_)
print('slope:', new_model.coef_)

intercept: [5.63333333]
slope: [[0.54]]


In [30]:
# Predict Response:

y_pred = model.predict(x)
print('predicted response:', y_pred, sep = '\n')

predicted response:
[ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333 35.33333333]


In [32]:
y_pred = model.intercept_ + model.coef_ * x
print('predicted response:', y_pred, sep = '\n')

predicted response:
[[ 8.33333333]
 [13.73333333]
 [19.13333333]
 [24.53333333]
 [29.93333333]
 [35.33333333]]
