# Prédiction du prix des voitures à travers une application Web
À partir des données concernant les voitures, faisons un modèle de Machine Learning permettant de prédire le prix des voitures et déployer ce modèle.

## 1. Lecture de données

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import warnings
warnings.filterwarnings("ignore")

In [None]:
df = pd.read_csv("car_price_prediction.csv")
df.head()

In [None]:
df.dtypes

In [None]:
df.describe(include='all')

## 2. Analyse Exploratoire et Visualisation des données

In [None]:
df['Price'].unique()

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns



In [None]:
plt.hist(df["Price"])
# Afficher le graphique
plt.show()

In [None]:
plt.figure(figsize = (20, 15))
sns.countplot(y = df['Manufacturer'])
plt.title("Car companies with their cars", fontsize = 20)
plt.show()

In [None]:
# on convertit les features en une liste de columns
columns = df.columns.tolist()
columns

In [None]:
plt.figure(figsize=(16,38))

for i, col in enumerate(columns, 1):
    plt.subplot(8,4,i)
    sns.kdeplot(df[col], color = '#d1aa00', fill = True, warn_singular=False)
    plt.subplot(8,4,i+11)
    df[col].plot.box()
plt.tight_layout()
plt.show()


In [None]:
df = pd.get_dummies(df)
df.head()

Supprimons la colonne ID

In [None]:
df.drop('ID', inplace=True, axis=1)

In [None]:
df.head()

In [None]:
df.describe(include = 'all')

De ce fait ,nous pouvons le vérifier dans la liste de colonnes ci-dessous.

In [None]:
columns = df.columns.tolist()
columns

Sur le rendu ci-dessous, on voit qu'il ya des  outliers. Examinons de près grâce à la rélation entre Skewness et Kurtosis.

In [None]:
pd.DataFrame(data=[df[columns].skew(),df[columns].kurtosis()],index=['skewness','kurtosis'])

In [None]:
from scipy.stats import zscore
for i in columns:
    y_outliers = df[abs(zscore(df[i])) >= 3 ]
    print('Le nombre des outliers de ',i,'est ',len(y_outliers))
    y_outliers
    


Maintenant, mettons le prix suivant une classe ( __Moins chèr,Moyen, Chèr et Très Chèr__).

In [None]:
df['Price'].describe()

In [None]:
print((1.855593e+04)+9278.965)

In [None]:
def price_class(Price):
    if (Price >=1 and Price<=9278.965):
        return "Moins Cher"
    elif (Price > 9278.965 and Price<=1.855593e+04):
        return "Moyen"
    elif (Price >1.855593e+04  and Price<=27834.895):
        return "Cher"
    else:
        return "Tres Cher"

df['Classe'] = df['Price'].apply(price_class)
df.sample(frac=1).head(10)

In [None]:
df['Classe'].value_counts()

In [None]:
import seaborn as sns
f = plt.figure(figsize=(14,12))
sns.set()
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')

## 3. Traitement des Valeurs abbérrantes

Vérifions si notre dataset ne comporte pas de  valeurs nulles


In [None]:
df.isnull().sum()

In [None]:
df.isna().sum()

In [None]:
df.duplicated()

In [None]:
df.dropna(inplace=True)

In [None]:
df.drop_duplicates(inplace=True)

In [None]:
df.head()

Supprimons certaines colonnes .

In [None]:
df.drop('Doors', inplace=True, axis=1)
df.drop('Wheel', inplace=True, axis=1)
df.drop('Leather interior', inplace=True, axis=1)
df.drop('Levy', inplace=True, axis=1)

## 5. Entrainement du Modèle

In [None]:
X = df.iloc[:, 1:12].values
Y = df.iloc[:, 12].values

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)

In [None]:
from sklearn.preprocessing import LabelEncoder
labelencoder_Y = LabelEncoder()
Y = labelencoder_Y.fit_transform(Y)

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#### La matrice de confusion
La matrice de confusion est un outil de mesure de la performance des modèles de classification à 2 classes ou plus. 

![image.png](attachment:image.png)

Importons les bibiliothèques nécéssaires

In [None]:
import sklearn.linear_model as sk
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsClassifier

## Regression Linéaire.

In [None]:
classifier_LR = LinearRegression()
classifier_LR.fit(X_train, Y_train)

In [None]:
print("Le score est  :")
classifier_LR.score(X_test, Y_test)

Le score est très faible,déterminons l'erreur.

In [None]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, Y_train)

In [None]:
from sklearn.metrics import mean_squared_error
predictions = model.predict(X_test)
print("L'erreur  est :")
mean_squared_error(predictions, Y_test)

### 5.1 Regression Logistique.

In [None]:

classifier_LR = LogisticRegression(random_state = 0)
classifier_LR.fit(X_train, Y_train)


In [None]:
print("Le score est  :")
classifier_LR.score(X_test, Y_test)

In [None]:
Y_pred = classifier_LR.predict(X_test)

Déterminons la valeur de l'erreur.

In [None]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(random_state=0)
model.fit(X_train, Y_train)

In [None]:
from sklearn.metrics import mean_squared_error
predictions = model.predict(X_test)
print("L'erreur est :")
mean_squared_error(predictions, Y_test)