# Simple Linear Regression 

### Dans cet exemple, nous considérerons les ventes basées sur le budget marketing "TV".

### Importer les bibliothèques

In [None]:
import pandas as pd

### Chargement du fichier

In [None]:
advertising = pd.read_csv("tvmarketing.csv")

### Essayons de voir la structure de notre dataset

In [None]:
# Display the first 5 rows
advertising.head()

In [None]:
# Display the last 5 rows
advertising.tail()

In [None]:
# Let's check the columns
advertising.info()

In [None]:
# Check the shape of the DataFrame (rows, columns)
advertising.shape

In [None]:
# Let's look at some statistical information about the dataframe.
advertising.describe()

# Visualisation des données

In [None]:
# Visualise the relationship between the features and the response using scatterplots
advertising.plot(x='TV',y='Sales',kind='scatter')

# Perfroming Simple Linear Regression

Equation of linear regression<br>
$y = c + m_1x_1 + m_2x_2 + ... + m_nx_n$

-  $y$ is the response
-  $c$ is the intercept
-  $m_1$ is the coefficient for the first feature
-  $m_n$ is the coefficient for the nth feature<br>

In our case:

$y = c + m_1 \times TV$

The $m$ values are called the model **coefficients** or **model parameters**.

### Generic Steps in Model Building using ```sklearn```

## Preparing X and y

In [None]:
# Putting feature variable to X
X = advertising['TV']

# Print the first 5 rows
X.head()

In [None]:
# Putting response variable to y
y = advertising['Sales']

# Print the first 5 rows
y.head()

## Splitting Data into Training and Testing Sets

In [None]:
#random_state is the seed used by the random number generator, it can be any integer.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7 , random_state=0)

In [None]:
print(type(X_train))
print(type(X_test))
print(type(y_train))
print(type(y_test))

In [None]:
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

In [None]:
import numpy as np
#Simply put, numpy.newaxis is used to increase the dimension of the existing array by one more dimension,
X_train = X_train[:, np.newaxis]
X_test = X_test[:, np.newaxis]

In [None]:
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

## Performing Linear Regression

In [None]:
# import LinearRegression from sklearn
from sklearn.linear_model import LinearRegression

# Representing LinearRegression as lr(Creating LinearRegression Object)
lr = LinearRegression()

# Fit the model using lr.fit()
lr.fit(X_train, y_train)

## Coefficients Calculation

In [None]:
# Print the intercept and coefficients
print(lr.intercept_)
print(lr.coef_)

$y = 7.310 + 0.045 \times TV $<br>

## Predictions

In [None]:
# Making predictions on the testing set
y_pred = lr.predict(X_test)

In [None]:
type(y_pred)

#### Computing RMSE and R^2 Values
RMSE is the standard deviation of the errors which occur when a prediction is made on a dataset. This is the same as MSE (Mean Squared Error) but the root of the value is considered while determining the accuracy of the model


In [None]:
from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(y_test, y_pred)

In [None]:
r_squared = r2_score(y_test, y_pred)

In [None]:
print('Mean_Squared_Error :' ,mse)
print('r_square_value :',r_squared)

**Mean Squared Error (MSE)** est une mesure de la qualité d'ajustement d'un modèle de prédiction. Elle mesure la moyenne des erreurs au carré entre les valeurs prédites par le modèle et les valeurs réelles. Plus précisément, pour une série de prédictions {y_1, y_2, ..., y_n} et les valeurs réelles correspondantes {y'_1, y'_2, ..., y'_n}, la MSE est définie comme:

MSE = (1/n) * sum((y_i - y'_i)^2)

où n est le nombre de prédictions.

**Root Mean Squared Error (RMSE)** est une mesure de la qualité d'ajustement d'un modèle de prédiction. Elle est similaire à la Mean Squared Error (MSE), mais il prend la racine carrée de la moyenne des erreurs au carré pour rendre les résultats exprimés dans les mêmes unités que les données d'entrée. Il est donc plus facile à interpréter que la MSE.

Plus précisément, pour une série de prédictions {y_1, y_2, ..., y_n} et les valeurs réelles correspondantes {y'_1, y'_2, ..., y'_n}, le RMSE est défini comme:

RMSE = sqrt( (1/n) * sum((y_i - y'_i)^2) )

où n est le nombre de prédictions et sqrt est la racine carrée.

**Plus la valeur de RMSE est petite, meilleure est la performance du modèle.**