# 📈 **Linear Regression**

**In Machine Learning and this notebook we use Scikit-learn a lot.**

<a href="https://uupload.ir/" target="_blank"><img src="https://s4.uupload.ir/files/download_(1)_slz6.png" border="0" alt="آپلود عکس" /></a>

### **What is scikit-learn used for?**

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python.

#### **What is linear regression used for?**

Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable.

# **Making Predictions with Linear Regression**

Given the representation is a linear equation, making predictions is as simple as solving the equation for a specific set of inputs.

Let’s make this concrete with an example. Imagine we are predicting weight (y) from height (x). Our linear regression model representation for this problem would be:

**y = B0 + B1 * x1**

or

**weight =B0 +B1 * height**

Where B0 is the bias coefficient and B1 is the coefficient for the height column. We use a learning technique to find a good set of coefficient values. Once found, we can plug in different height values to predict the weight.

For example, lets use B0 = 0.1 and B1 = 0.5. Let’s plug them in and calculate the weight (in kilograms) for a person with the height of 182 centimeters.

weight = 0.1 + 0.5 * 182

weight = 91.1

You can see that the above equation could be plotted as a line in two-dimensions. The B0 is our starting point regardless of what height we have. We can run through a bunch of heights from 100 to 250 centimeters and plug them to the equation and get weight values, creating our line.

<a href="https://uupload.ir/" target="_blank"><img src="https://s4.uupload.ir/files/sample-height-vs-weight-linear-regression_10h7.png" border="0" alt="آپلود عکس" /></a>

Now that we know how to make predictions given a learned linear regression model, let’s look at some rules of thumb for preparing our data to make the most of this type of model.

# 📤 Import & Install Libraries

In [None]:
!pip install hvplot

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import hvplot.pandas

from sklearn.model_selection import train_test_split

from sklearn import metrics

from sklearn.linear_model import LinearRegression

%matplotlib inline

## 💾 Check out the Data

In [None]:
df=pd.read_csv('../input/real-estate-price-prediction/Real estate.csv')

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.corr()

In [None]:
sns.heatmap(df.corr(), annot=True,cmap='Reds')

# 📊 Exploratory Data Analysis (EDA)

In [None]:
sns.pairplot(df)

# 📈 Training a Linear Regression Model

## X and y arrays

In [None]:
X=df.drop('Y house price of unit area', axis=1)

y=df['X4 number of convenience stores']

In [None]:
print("X=",X.shape,"\ny=", y.shape)

## 🧱 Train Test Split

Now let's split the data into a training set and a testing set. We will train out model on the training set and then use the test set to evaluate the model.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

In [None]:
X_train.shape

In [None]:
X_test.shape

# ✔️ Linear Regression

In [None]:
model = LinearRegression()

In [None]:
model.fit(X_train, y_train)

## ✔️ Model Evaluation

In [None]:
model.coef_

In [None]:
pd.DataFrame(model.coef_, X.columns, columns=['Coedicients'])

## ✔️ Predictions from our Model

In [None]:
y_pred = model.predict(X_test)

## ✔️ Regression Evaluation Metrics


Here are three common evaluation metrics for regression problems:

> - **Mean Absolute Error** (MAE) is the mean of the absolute value of the errors:
$$\frac 1n\sum_{i=1}^n|y_i-\hat{y}_i|$$

> - **Mean Squared Error** (MSE) is the mean of the squared errors:
$$\frac 1n\sum_{i=1}^n(y_i-\hat{y}_i)^2$$

> - **Root Mean Squared Error** (RMSE) is the square root of the mean of the squared errors:
$$\sqrt{\frac 1n\sum_{i=1}^n(y_i-\hat{y}_i)^2}$$

> 📌 Comparing these metrics:
- **MAE** is the easiest to understand, because it's the average error.
- **MSE** is more popular than MAE, because MSE "punishes" larger errors, which tends to be useful in the real world.
- **RMSE** is even more popular than MSE, because RMSE is interpretable in the "y" units.

> All of these are **loss functions**, because we want to minimize them.

In [None]:
MAE= metrics.mean_absolute_error(y_test, y_pred)
MSE=metrics.mean_squared_error(y_test, y_pred)
RMSE= np.sqrt(MSE)

In [None]:
MAE

In [None]:
MSE

In [None]:
RMSE

In [None]:
df['X4 number of convenience stores'].mean()

## **Residual Histogram**

* **Often for Linear Regression it is a good idea to separately evaluate residuals $$(y-\hat{y})$$ and not just calculate performance metrics (e.g. RMSE).**

* **Let's explore why this is important...**

* **The residual eerors should be random and close to a normal distribution.**


<a href="https://uupload.ir/" target="_blank"><img src="https://s4.uupload.ir/files/download_ycg.png" border="0" alt="آپلود عکس" /></a>

<a href="https://uupload.ir/" target="_blank"><img src="https://s4.uupload.ir/files/2_pe68.png" border="0" alt="آپلود عکس" /></a>

In [None]:
test_residual= y_test - y_pred

In [None]:
pd.DataFrame({'Error Values': (test_residual)}).hvplot.kde()

In [None]:
sns.displot(test_residual, bins=25, kde=True)

* **Residual plot shows residual error VS. true y value.**

In [None]:
sns.scatterplot(x=y_test, y=test_residual)

plt.axhline(y=0, color='r', ls='--')

* **Residualplot showing a clear pattern, indicating Linear Regression no valid!**

# Finished, but you can copy this notebook and start practicing.