Projet 6: Pr√©diction du Prix de l'Immobilier
Contexte
La pr√©diction des prix de l'immobilier est essentielle pour les agents immobiliers et les acheteurs potentiels afin de prendre des d√©cisions inform√©es.

Probl√®me
Vous devez d√©velopper un mod√®le pour pr√©dire les prix de l'immobilier et d√©ployer une application qui permet aux utilisateurs de soumettre des caract√©ristiques de propri√©t√©s pour obtenir une estimation du prix.

Dataset
Lien: house prices data

Instructions
Importer et explorer le dataset.
Pr√©traiter les donn√©es (gestion des valeurs manquantes, encodage des variables cat√©gorielles).
Entra√Æner un mod√®le de pr√©diction des prix (par exemple, une r√©gression lin√©aire, XGBoost).
√âvaluer la performance du mod√®le.
Cr√©er une application web pour permettre la pr√©diction des prix de l'immobilier.
D√©ployer l'application

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os

Nous allons charger le jeu de donn√©es (train), dans un dataframe que nous nommerons ici `data`.
Le jeu de donn√©es (test) dans un dataframe 'datatest'.
Le jeu de donn√©es (sample_submission) dans un dataframe 'datasample'
Nous afficherons ensuite les 5 premi√®res valeurs de data.

In [2]:
data = pd.read_csv('train.csv')
datatest = pd.read_csv('test.csv')
datasample = pd.read_csv('sample_submission.csv')
data.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


Nous allons √† pr√©sent fussionner la deuxi√®me colonne de 'sample_submission.csv' qui comporte les prix de vente avec le fichier 'test.csv'
L'objectif est de rassembler toutes les donn√©es des fichiers en un seul jeu de donn√©es pour le pr√©traitement.

In [4]:
col_B = datasample.iloc[:, 1]                      # Extraction de la deuxi√®me colonne de datasample
datatest["SalePrice"] = col_B.values               # Cr√©ation dans datatest de la colonne des prix 
datatest = datatest.iloc[1:].reset_index(drop=True) # Supression de la ligne des noms de colonnes dans datatest
print(data.columns.equals(datatest.columns))        # Verifier si datatest et data ont la m√™me taille

True


In [5]:
# Fusion des deux jeux de donn√©s et cr√©ation d'un fichier csv bilan

datafus = pd.concat([data, datatest], axis=0, ignore_index=True)
for col in datafus.columns:
    datafus[col] = pd.to_numeric(datafus[col], errors="ignore")
datafus.dtypes
datafus.to_csv("fichierfus.csv", index=False)

# <a name="C2">A- Nettoyage du jeu de donn√©es</a>

Plusieurs **erreurs** se sont gliss√©es dans ce jeu de donn√©es. Nous allons donc passer au nettoyage

In [6]:
# Types de donn√©es
datafus.dtypes

Id                 int64
MSSubClass         int64
MSZoning          object
LotFrontage      float64
LotArea            int64
                  ...   
MoSold             int64
YrSold             int64
SaleType          object
SaleCondition     object
SalePrice          int64
Length: 81, dtype: object

In [7]:
datafus.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500.0
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500.0
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500.0
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000.0
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000.0


In [8]:
datafus.shape

(2918, 81)

In [9]:
datafus.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2918 entries, 0 to 2917
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             2918 non-null   int64  
 1   MSSubClass     2918 non-null   int64  
 2   MSZoning       2914 non-null   object 
 3   LotFrontage    2432 non-null   float64
 4   LotArea        2918 non-null   int64  
 5   Street         2918 non-null   object 
 6   Alley          198 non-null    object 
 7   LotShape       2918 non-null   object 
 8   LandContour    2918 non-null   object 
 9   Utilities      2916 non-null   object 
 10  LotConfig      2918 non-null   object 
 11  LandSlope      2918 non-null   object 
 12  Neighborhood   2918 non-null   object 
 13  Condition1     2918 non-null   object 
 14  Condition2     2918 non-null   object 
 15  BldgType       2918 non-null   object 
 16  HouseStyle     2918 non-null   object 
 17  OverallQual    2918 non-null   int64  
 18  OverallC


### Traitement des valeurs manquantes

In [10]:
# Supression des lignes doubles
data = data.drop_duplicates()
datatest = datatest.drop_duplicates()
# Point des cellules avec valeurs non existantes
missing = datafus.isnull().sum().sort_values(ascending=False)
missing[missing > 0]

PoolQC          2908
MiscFeature     2813
Alley           2720
Fence           2348
MasVnrType      1765
FireplaceQu     1419
LotFrontage      486
GarageFinish     159
GarageQual       159
GarageCond       159
GarageYrBlt      159
GarageType       157
BsmtExposure      82
BsmtCond          82
BsmtQual          81
BsmtFinType2      80
BsmtFinType1      79
MasVnrArea        23
MSZoning           4
Functional         2
BsmtFullBath       2
BsmtHalfBath       2
Utilities          2
BsmtFinSF1         1
GarageCars         1
GarageArea         1
Exterior2nd        1
BsmtUnfSF          1
Exterior1st        1
SaleType           1
TotalBsmtSF        1
BsmtFinSF2         1
KitchenQual        1
Electrical         1
dtype: int64

In [None]:
Toutes les valeurs nulles ne signifient pas forcement Valeurs Manquantes.
CAS 1 : Absence normal des √©quipements.
Les valeurs manquantes correspondent √†  l‚Äôabsence de l‚Äô√©quipement concern√© 
(garage, piscine, sous-sol, etc.). Elles ont donc √©t√© remplac√©es pour conserver
la structure par la modalit√© None.

In [11]:
none_cols = [
    "PoolQC","MiscFeature","Alley","Fence","FireplaceQu","MasVnrType",
    "GarageType","GarageFinish","GarageQual","GarageCond",
    "BsmtQual","BsmtCond","BsmtExposure","BsmtFinType1","BsmtFinType2"
]

datafus[none_cols] = datafus[none_cols].fillna("None")


In [None]:
Cas 2 : Variables num√©riques li√©es √† un √©quipement
Dans ce cas, si la valeur est absente , cela signifie que la valeur est nulle
Par exemple, une surface inexistante donnera 0

In [12]:
numtest_median_cols = [
    "LotFrontage", "MasVnrArea", "GarageYrBlt",
    "BsmtFinSF1", "BsmtFinSF2", "BsmtUnfSF",
    "TotalBsmtSF", "GarageArea", "GarageCars",
    "BsmtFullBath", "BsmtHalfBath"
]

for col in numtest_median_cols:
    datafus[col] = datafus[col].fillna(datatest[col].median())

On observe que notre jeu de donn√©es contient 3 valeurs manquantes. Regardons les plus en d√©tails :

In [None]:
Cas 3 : Tr√®s peu de valeurs manquantes
On remplace ces valeurs par le mode de la s√©rie

In [13]:
# 2. Colonnes cat√©gorielles ‚Üí valeur la plus fr√©quente (mode)
cattest_mode_cols = [
    "MSZoning", "Utilities", "Exterior1st",
    "Exterior2nd", "KitchenQual",
    "Functional", "SaleType"
]

for col in cattest_mode_cols:
   datafus[col] = datafus[col].fillna(datafus[col].mode()[0])

In [19]:
# Cas particulier : MasVnrType
# NA signifie "pas de parement"
datafus["MasVnrType"] = datafus["MasVnrType"].fillna("None")

datafus["Electrical"] =datafus["Electrical"].fillna(
    datafus["Electrical"].mode()[0]
)

# V√©rification finale
missing = datafus.isnull().sum().sort_values(ascending=False)
missing[missing > 0]

Series([], dtype: int64)

In [None]:
# <a name="C2">ENcodage et Pr√©-traitement</a>

Encodage des variables cat√©gorielles One-Hot Encoding

In [21]:
X_train = datafus.drop("SalePrice", axis=1)
y_train = datafus["SalePrice"]
X_encoded = pd.get_dummies(X_train, drop_first=True)

In [None]:
On standardise les donn√©es d√©j√† encod√©es (one-hot) pour que le mod√®le travaille sur des variables √† la m√™me √©chelle.

In [22]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_encoded)

In [None]:
# <a name="C2">Entra√Ænement des mod√®les de pr√©diction du prix</a>
Construire et entra√Æner des mod√®les capables de pr√©dire le prix de vente (SalePrice) √† partir des caract√©ristiques des logements.
S√©paration jeu d‚Äôentra√Ænement / test

In [23]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y_train, test_size=0.2, random_state=42
)

In [None]:
MOD√àLE 1 ‚Äì R√©gression lin√©aire
On va tester si la distribution des prix suit une tendance lin√©aire
MOD√àLE 2 ‚Äì XGBoost Regressor
Il Capture les relations non lin√©aires, Tr√®s performant sur les donn√©es tabulaires
et G√®re bien les interactions complexes

In [26]:
from sklearn.linear_model import LinearRegression

lr_model = LinearRegression()
lr_model.fit(X_train, y_train)


In [27]:
y_pred_lr = lr_model.predict(X_test)

In [28]:
from xgboost import XGBRegressor

xgb_model = XGBRegressor(
    n_estimators=1000,
    learning_rate=0.05,
    max_depth=5,
    random_state=42
)

xgb_model.fit(X_train, y_train)

In [None]:
# <a name="C2">Evaluation des performances</a>

Le prix √©tant une variable continue les erreurs absolues sont plus parlantes.
On determinera aussi R¬≤

In [31]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

y_pred_xgb = xgb_model.predict(X_test)

def evaluate_model(y_true, y_pred):
    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    r2 = r2_score(y_true, y_pred)
    
    print(f"MAE  : {mae}")
    print(f"RMSE : {rmse}")
    print(f"R¬≤   : {r2}")
    return mae, rmse, r2
print(f"Error calclulation linear method : ")
mae_lr, rmse_lr, r2_lr = evaluate_model(y_test, y_pred_lr)
print(f"Error calclulation xgb method : ")
mae_xgb, rmse_xgb, r2_xgb = evaluate_model(y_test, y_pred_xgb)


Error calclulation linear method : 
MAE  : 134153739302126.9
RMSE : 1938831792334511.5
R¬≤   : -1.030717499095306e+21
Error calclulation xgb method : 
MAE  : 9869.826525043165
RMSE : 18826.057241081326
R¬≤   : 0.9028197226942521


In [None]:
Pour la m√©thode lin√©aire, les valeurs sont trop grandes. R¬≤ n√©gatif signifie que le mod√®le n'est pas du tout performant.

XGB
MAE (Mean Absolute Error) ‚âà 9‚ÄØ870 $
En moyenne, le mod√®le se trompe d‚Äôenviron 9‚ÄØ870 $ sur le prix d‚Äôune maison.
RMSE (Root Mean Squared Error) ‚âà 18‚ÄØ826 $
Cette m√©trique p√©nalise plus les grosses erreurs. Les √©carts importants entre
la pr√©diction et la r√©alit√© peuvent aller jusqu‚Äô√† ~19‚ÄØ000 $.
R¬≤ ‚âà 0,903
Le mod√®le explique environ 90 % de la variance des prix. C‚Äôest un tr√®s bon score.

In [None]:
# <a name="C2">Conclusion</a>

Deux mod√®les ont √©t√© entra√Æn√©s et compar√©s : une r√©gression lin√©aire servant de mod√®le de r√©f√©rence,
et un mod√®le XGBoost permettant de mieux capturer les relations complexes. 
L‚Äô√©valuation des performances montre que XGBoost offre une meilleure pr√©cision de pr√©diction.

In [None]:
# <a name="C2">Verification</a>
A pr√©sent, les utilisateurs auront √† rentrer des param√®tres de la maison recherch√©e.
Nous allons cr√©er ici une fonction qui se chargera de formater les donn√©es rentr√©es pour utiliser notre mod√®le.

In [32]:
valeurmaison = {
    "Id": 2, "MSSubClass": 20, "MSZoning": "RL", "LotFrontage": 80, "LotArea": 9600,
    "Street": "Pave", "Alley": "NA", "LotShape": "Reg", "LandContour": "Lvl",
    "Utilities": "AllPub", "LotConfig": "FR2", "LandSlope": "Gtl", "Neighborhood": "Veenker",
    "Condition1": "Feedr", "Condition2": "Norm", "BldgType": "1Fam", "HouseStyle": "1Story",
    "OverallQual": 6, "OverallCond": 8, "YearBuilt": 1976, "YearRemodAdd": 1976,
    "RoofStyle": "Gable", "RoofMatl": "CompShg", "Exterior1st": "MetalSd", "Exterior2nd": "MetalSd",
    "MasVnrType": "None", "MasVnrArea": 0, "ExterQual": "TA", "ExterCond": "TA",
    "Foundation": "CBlock", "BsmtQual": "Gd", "BsmtCond": "TA", "BsmtExposure": "Gd",
    "BsmtFinType1": "ALQ", "BsmtFinSF1": 978, "BsmtFinType2": "Unf", "BsmtFinSF2": 0,
    "BsmtUnfSF": 284, "TotalBsmtSF": 1262, "Heating": "GasA", "HeatingQC": "Ex",
    "CentralAir": "Y", "Electrical": "SBrkr", "1stFlrSF": 1262, "2ndFlrSF": 0,
    "LowQualFinSF": 0, "GrLivArea": 1262, "BsmtFullBath": 0, "BsmtHalfBath": 1,
    "FullBath": 2, "HalfBath": 0, "BedroomAbvGr": 3, "KitchenAbvGr": 1, "KitchenQual": "TA",
    "TotRmsAbvGrd": 6, "Functional": "Typ", "Fireplaces": 1, "FireplaceQu": "TA",
    "GarageType": "Attchd", "GarageYrBlt": 1976, "GarageFinish": "RFn", "GarageCars": 2,
    "GarageArea": 460, "GarageQual": "TA", "GarageCond": "TA", "PavedDrive": "Y",
    "WoodDeckSF": 298, "OpenPorchSF": 0, "EnclosedPorch": 0, "3SsnPorch": 0,
    "ScreenPorch": 0, "PoolArea": 0, "PoolQC": "NA", "Fence": "NA", "MiscFeature": "NA",
    "MiscVal": 0, "MoSold": 5, "YrSold": 2007, "SaleType": "WD", "SaleCondition": "Normal"
}


In [33]:
# transformer en DataFrame
nouvelle_maison_df = pd.DataFrame([valeurmaison])

# encodage identique √† X_train
nouvelle_maison_encoded = pd.get_dummies(nouvelle_maison_df)
nouvelle_maison_encoded = nouvelle_maison_encoded.reindex(columns=X_encoded.columns, fill_value=0)

# standardisation
nouvelle_maison_scaled = scaler.transform(nouvelle_maison_encoded)

# pr√©diction
prix_pred = xgb_model.predict(nouvelle_maison_scaled)
print(f"Prix estim√© : {prix_pred[0]:,.2f} $")


Prix estim√© : 181,351.47 $


In [None]:
# <a name="C2">CREATION APPLICATION</a>
Cr√©er une application web pour permettre la pr√©diction des prix de l'immobilier.

On va cr√©er une application web interactive pour la pr√©diction immobili√®re. 
On va le faire avec Streamlit, qui est simple pour d√©ployer un formulaire et afficher la pr√©diction.

prediction_immobilier/
‚îÇ
‚îú‚îÄ app.py
‚îú‚îÄ xgb_model.pkl
‚îú‚îÄ scaler.pkl
‚îú‚îÄ X_encoded_columns.pkl

In [None]:
Ce code sauvegarde le mod√®le XGBoost entra√Æn√©, le StandardScaler utilis√© pour normaliser les donn√©es apr√®s le One-Hot Encoding, ainsi que la liste exacte des colonnes g√©n√©r√©es par cet encodage. 
Ces trois √©l√©ments sont indispensables pour garantir que les nouvelles donn√©es saisies dans l‚Äôapplication web subissent exactement les m√™mes transformations que les donn√©es d‚Äôentra√Ænement,
permettant ainsi d‚Äôobtenir des pr√©dictions coh√©rentes et fiables sans r√©entra√Æner le mod√®le.

In [34]:
import pandas as pd
import joblib
from sklearn.preprocessing import StandardScaler
import xgboost as xgb
joblib.dump(xgb_model, "xgb_model.pkl")
joblib.dump(scaler, "scaler.pkl")
joblib.dump(X_encoded.columns, "X_encoded_columns.pkl")


['X_encoded_columns.pkl']

In [None]:
Creation de l'application principale

In [None]:
import streamlit as st
import pandas as pd
import joblib

# -------------------
# Charger le mod√®le et le scaler
# -------------------
xgb_model = joblib.load("xgb_model.pkl")
scaler = joblib.load("scaler.pkl")
X_encoded = joblib.load("X_encoded_columns.pkl")  # colonnes apr√®s one-hot encoding

# -------------------
# Labels fran√ßais + descriptions
# -------------------
labels_fr = {
    "MSSubClass": ("Type construction", "Classe de construction de la maison"),
    "MSZoning": ("Zone du terrain", "Type de zonage : r√©sidentiel, commercial, etc."),
    "LotFrontage": ("Fa√ßade sur rue", "Longueur du terrain le long de la rue en pieds"),
    "LotArea": ("Superficie du terrain", "Surface totale du terrain en pieds¬≤"),
    "Street": ("Type de rue", "Pav√© ou non"),
    "Alley": ("All√©e", "Type d'acc√®s secondaire ou NA"),
    "LotShape": ("Forme du terrain", "R√©gulier ou irr√©gulier"),
    "LandContour": ("Contour du terrain", "Plat ou pente"),
    "Utilities": ("Services publics", "AllPub=Tout disponible, NoSewr=Non"),
    "LotConfig": ("Configuration du lot", "FR2, Inside, Corner, CulDSac"),
    "LandSlope": ("Pente du terrain", "Gtl=Faible, Mod=Moyenne, Sev=Forte"),
    "Neighborhood": ("Quartier", "Nom du quartier"),
    "Condition1": ("Proximit√© route 1", "Route principale proche de la maison"),
    "Condition2": ("Proximit√© route 2", "Deuxi√®me route proche de la maison"),
    "BldgType": ("Type de b√¢timent", "1Fam=Maison individuelle, 2FmCon=Duplex..."),
    "HouseStyle": ("Style de maison", "1Story, 2Story, etc."),
    "OverallQual": ("Qualit√© g√©n√©rale", "1=Mauvais, 10=Excellent"),
    "OverallCond": ("√âtat g√©n√©ral", "1 √† 10"),
    "YearBuilt": ("Ann√©e construction", "Ann√©e de construction"),
    "YearRemodAdd": ("Ann√©e r√©novation", "Ann√©e de remodelage"),
    "RoofStyle": ("Style toit", "Gable, Hip, Flat..."),
    "RoofMatl": ("Mat√©riau toit", "CompShg, Metal, etc."),
    "Exterior1st": ("Rev√™tement ext√©rieur 1", "VinylSd, MetalSd, etc."),
    "Exterior2nd": ("Rev√™tement ext√©rieur 2", "VinylSd, MetalSd, etc."),
    "MasVnrType": ("Type ma√ßonnerie", "None, BrkFace, Stone, etc."),
    "MasVnrArea": ("Surface ma√ßonnerie", "En pieds¬≤"),
    "ExterQual": ("Qualit√© ext√©rieur", "Ex=Excellent, Gd=Bon, TA=Correct, Fa=M√©diocre, Po=Mauvais"),
    "ExterCond": ("√âtat ext√©rieur", "Ex, Gd, TA, Fa, Po"),
    "Foundation": ("Fondation", "PConc, CBlock, BrkTil, Slab, etc."),
    "BsmtQual": ("Qualit√© sous-sol", "Ex, Gd, TA, Fa, Po, NA"),
    "BsmtCond": ("√âtat sous-sol", "Ex, Gd, TA, Fa, Po, NA"),
    "BsmtExposure": ("Exposition sous-sol", "Gd=Bonne, Av=Moyenne, Mn=Faible, No=Aucune, NA"),
    "BsmtFinType1": ("Type finition 1", "GLQ, ALQ, BLQ, Rec, LwQ, Unf, NA"),
    "BsmtFinSF1": ("Surface finie 1", "En pieds¬≤"),
    "BsmtFinType2": ("Type finition 2", "GLQ, ALQ, BLQ, Rec, LwQ, Unf, NA"),
    "BsmtFinSF2": ("Surface finie 2", "En pieds¬≤"),
    "BsmtUnfSF": ("Sous-sol non fini", "En pieds¬≤"),
    "TotalBsmtSF": ("Surface totale sous-sol", "En pieds¬≤"),
    "1stFlrSF": ("Surface 1er √©tage", "En pieds¬≤"),
    "2ndFlrSF": ("Surface 2√®me √©tage", "En pieds¬≤"),
    "GrLivArea": ("Surface habitable", "En pieds¬≤"),
    "GarageCars": ("Capacit√© garage", "Nombre de voitures"),
    "GarageArea": ("Surface garage", "En pieds¬≤"),
    "WoodDeckSF": ("Terrasse bois", "Surface en pieds¬≤"),
    "OpenPorchSF": ("Porche ouvert", "Surface en pieds¬≤"),
    "EnclosedPorch": ("Porche ferm√©", "Surface en pieds¬≤"),
    "ScreenPorch": ("Porche grillag√©", "Surface en pieds¬≤"),
    "PoolArea": ("Piscine", "Surface en pieds¬≤"),
    "MiscVal": ("Valeur divers", "Valeur des commodit√©s diverses"),
    "MoSold": ("Mois de vente", "1=Janvier, 12=D√©cembre"),
    "YrSold": ("Ann√©e de vente", "Ex: 2010, 2015, etc."),
    "Heating": ("Type chauffage", "GasA, GasW, Floor, etc."),
    "HeatingQC": ("Qualit√© chauffage", "Ex, Gd, TA, Fa, Po"),
    "CentralAir": ("Climatisation centrale", "Y=Oui, N=Non"),
    "Electrical": ("√âlectricit√©", "SBrkr, FuseF, FuseA, Mix"),
    "KitchenQual": ("Qualit√© cuisine", "Ex, Gd, TA, Fa, Po"),
    "Functional": ("Fonctionnalit√© maison", "Typ=Normal, Min1=Minimale, etc."),
    "FireplaceQu": ("Qualit√© chemin√©e", "Ex, Gd, TA, Fa, Po, NA"),
    "GarageType": ("Type garage", "Attchd, Detchd, BuiltIn, CarPort, NA"),
    "GarageFinish": ("Finition garage", "Fin, RFn, Unf, NA"),
    "GarageQual": ("Qualit√© garage", "Ex, Gd, TA, Fa, Po, NA"),
    "GarageCond": ("√âtat garage", "Ex, Gd, TA, Fa, Po, NA"),
    "PavedDrive": ("All√©e pav√©e", "Y=Oui, P=Partiel, N=Non"),
    "PoolQC": ("Qualit√© piscine", "Ex, Gd, TA, Fa, Po, NA"),
    "Fence": ("Cl√¥ture", "GdPrv, MnPrv, GdWo, MnWw, NA"),
    "MiscFeature": ("Caract√©ristiques diverses", "Elev, Gar2, Shed, TenC, NA"),
    "SaleType": ("Type de vente", "WD, CWD, VWD, ConLD, ConLI, ConLw, Oth"),
    "SaleCondition": ("Condition vente", "Normal, Abnorml, AdjLand, Alloca, Family, Partial")
}

# -------------------
# Options pour les selectbox
# -------------------
options_dict = {
    "LandContour": {"Lvl":"Plat","Bnk":"Pente","HLS":"Haut-Bas","Low":"Bas"},
    "LotShape": {"Reg":"R√©gulier","IR1":"Irr√©gulier 1","IR2":"Irr√©gulier 2","IR3":"Irr√©gulier 3"},
    "ExterQual": {"Ex":"Excellent","Gd":"Bon","TA":"Correct","Fa":"M√©diocre","Po":"Mauvais"},
    "ExterCond": {"Ex":"Excellent","Gd":"Bon","TA":"Correct","Fa":"M√©diocre","Po":"Mauvais"},
    "BsmtQual": {"Ex":"Excellent","Gd":"Bon","TA":"Correct","Fa":"M√©diocre","Po":"Mauvais","NA":"Aucun"},
    "BsmtCond": {"Ex":"Excellent","Gd":"Bon","TA":"Correct","Fa":"M√©diocre","Po":"Mauvais","NA":"Aucun"},
    "BsmtExposure": {"Gd":"Bonne","Av":"Moyenne","Mn":"Faible","No":"Aucune","NA":"Aucune"},
    "BsmtFinType1": {"GLQ":"Good Living","ALQ":"Average Living","BLQ":"Basement Living","Rec":"Recr√©ation","LwQ":"Low Quality","Unf":"Non fini","NA":"Aucun"},
    "BsmtFinType2": {"GLQ":"Good Living","ALQ":"Average Living","BLQ":"Basement Living","Rec":"Recr√©ation","LwQ":"Low Quality","Unf":"Non fini","NA":"Aucun"},
    "GarageType": {"Attchd":"Attach√©","Detchd":"D√©tach√©","BuiltIn":"Int√©gr√©","CarPort":"Abri","NA":"Aucun"},
    "GarageFinish": {"Fin":"Fini","RFn":"Semi-fini","Unf":"Non fini","NA":"Aucun"},
    "GarageQual": {"Ex":"Excellent","Gd":"Bon","TA":"Correct","Fa":"M√©diocre","Po":"Mauvais","NA":"Aucun"},
    "GarageCond": {"Ex":"Excellent","Gd":"Bon","TA":"Correct","Fa":"M√©diocre","Po":"Mauvais","NA":"Aucun"},
    "PavedDrive": {"Y":"Oui","P":"Partiel","N":"Non"},
    "CentralAir": {"Y":"Oui","N":"Non"},
    "FireplaceQu": {"Ex":"Excellent","Gd":"Bon","TA":"Correct","Fa":"M√©diocre","Po":"Mauvais","NA":"Aucun"}
}

# -------------------
# Valeurs par d√©faut
# -------------------
default_values = {field: 0 for field in labels_fr.keys()}
default_values.update({
    "MSSubClass":20, "LotFrontage":80, "LotArea":9600, "OverallQual":7, "OverallCond":5,
    "YearBuilt":2000, "YearRemodAdd":2005, "MasVnrArea":0, "BsmtFinSF1":0, "BsmtFinSF2":0,
    "BsmtUnfSF":0, "TotalBsmtSF":0, "1stFlrSF":900, "2ndFlrSF":500, "GrLivArea":1400,
    "GarageCars":2, "GarageArea":400, "WoodDeckSF":0, "OpenPorchSF":0, "EnclosedPorch":0,
    "ScreenPorch":0, "PoolArea":0, "MiscVal":0, "MoSold":6, "YrSold":2020,
    "MSZoning":"RL", "Street":"Pave", "Alley":"NA", "LotShape":"Reg", "LandContour":"Lvl",
    "Utilities":"AllPub", "LotConfig":"FR2", "LandSlope":"Gtl", "Neighborhood":"CollgCr",
    "Condition1":"Norm", "Condition2":"Norm", "BldgType":"1Fam", "HouseStyle":"2Story",
    "RoofStyle":"Gable", "RoofMatl":"CompShg", "Exterior1st":"VinylSd", "Exterior2nd":"VinylSd",
    "MasVnrType":"None", "ExterQual":"Gd", "ExterCond":"TA", "Foundation":"PConc",
    "BsmtQual":"Gd", "BsmtCond":"TA", "BsmtExposure":"No", "BsmtFinType1":"GLQ", "BsmtFinType2":"Unf",
    "Heating":"GasA", "HeatingQC":"Ex", "CentralAir":"Y", "Electrical":"SBrkr", "KitchenQual":"Gd",
    "Functional":"Typ", "FireplaceQu":"NA", "GarageType":"Attchd", "GarageFinish":"Unf",
    "GarageQual":"TA", "GarageCond":"TA", "PavedDrive":"Y", "PoolQC":"NA", "Fence":"NA",
    "MiscFeature":"NA", "SaleType":"WD", "SaleCondition":"Normal"
})

# -------------------
# Formulaire Streamlit final
# -------------------
st.set_page_config(page_title="Pr√©diction Prix Immobilier", layout="wide")
st.title("üè† Pr√©diction du Prix de l'Immobilier")
st.markdown("Remplissez les informations sur la maison. Les valeurs par d√©faut sont pr√©-remplies.")

with st.form(key='maison_form'):
    valeurs = {}
    cols = st.columns(3)
    for i, field in enumerate(labels_fr.keys()):
        col = cols[i % 3]
        label, desc = labels_fr[field]
        default = default_values[field]
        st.markdown(f"**{label}**")  # Label en haut
        col.caption(desc)            # Description en bas

        # Champ cat√©goriel ou num√©rique
        if field in options_dict or isinstance(default, str):
            valeurs[field] = col.selectbox(
                label="",
                options=list(options_dict.get(field, {default: default}).keys()),
                format_func=lambda x, f=field: options_dict.get(f, {default: default})[x] if f in options_dict else x,
                index=list(options_dict.get(field, {default: default}).keys()).index(default),
                key=f"{field}_select"  # cl√© unique
            )
        else:
            valeurs[field] = col.number_input(
                label="",
                value=float(default),
                min_value=0.0,
                key=f"{field}_num"  # cl√© unique
            )

    submit_button = st.form_submit_button(label="üí∞ Pr√©dire le prix")

# -------------------
# Pr√©diction
# -------------------
if submit_button:
    nouvelle_maison_df = pd.DataFrame([valeurs])
    nouvelle_maison_encoded = pd.get_dummies(nouvelle_maison_df)
    nouvelle_maison_encoded = nouvelle_maison_encoded.reindex(columns=X_encoded, fill_value=0)
    nouvelle_maison_scaled = scaler.transform(nouvelle_maison_encoded)
    prix_pred = xgb_model.predict(nouvelle_maison_scaled)
    st.markdown("---")
    st.subheader("üí° R√©sultat")
    st.success(f"Le prix estim√© de cette maison est : **{prix_pred[0]:,.2f} $**")
    st.balloons()


In [None]:
# <a name="C2">TESTER L"APPLICATION</a>

In [None]:
pip install streamlit pandas scikit-learn xgboost
 # Dans le terminal,se placer dans le dossier prediction_immobilier et lancer
streamlit run app.py

In [None]:
# <a name="C2">DEPLOIEMENT L"APPLICATION</a>

In [None]:
REALISE AVEC Steamlit