## 

# Atelier 2 : Prétraitement et Représentation des données

## IFT870/BIN710 Hiver 2024

### Source Kaggle : https://www.kaggle.com/c/house-prices-advanced-regression-techniques

Statistiques et Perspectives Visuelles sur les Données Immobilières

Les statistiques descriptives englobent des métriques telles que la moyenne, la médiane, le mode, la variance et l'écart type, offrant ainsi un résumé rapide des caractéristiques du jeu de données.

Les histogrammes, les graphiques de dispersion, les boîtes à moustaches et les cartes thermiques se distinguent comme des outils essentiels pour visualiser les données. Par exemple, un graphique de dispersion peut mettre en lumière la relation entre des variables telles que la localisation et le nombre de chambres, éclairant ainsi leur impact sur les prix des maisons sur des plateformes comme Zillow.

À mesure que nous approfondissons l'analyse, nous découvrirons des tendances, telles que la saisonnalité des prix immobiliers ou l'influence de facteurs externes sur les fluctuations de prix.


--------------------------------------------------------------


Statistical and Visual Insights into Housing Data

Descriptive statistics encompass metrics such as mean, median, mode, variance, and standard deviation, providing a rapid summary of the dataset's characteristics.

Histograms, scatter plots, box plots, and heat maps stand out as essential tools for visualizing the data. For example, a scatter plot may illuminate the relationship between variables such as location and the number of bedrooms, shedding light on their impact on house prices on platforms like Zillow.

As we delve further into the analysis, we will unearth patterns, such as seasonality in housing prices or the influence of external factors on price fluctuations.

In [1]:
# Importantion de librairies
import sklearn
import numpy as np
import scipy as sp
import pandas as pd
%matplotlib notebook
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# Lecture des données csv
data_file = "housing.csv"

In [3]:
# Stockage des données dans un dataframe
data = pd.read_csv(data_file,index_col=0)
data.shape

(2930, 82)

In [4]:
# Statistiques descriptives sur les attributs
data.describe()

Unnamed: 0,Order,PID,MS SubClass,Lot Frontage,Lot Area,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Mas Vnr Area,...,Wood Deck SF,Open Porch SF,Enclosed Porch,3Ssn Porch,Screen Porch,Pool Area,Misc Val,Mo Sold,Yr Sold,SalePrice
count,2930.0,2930.0,2930.0,2440.0,2930.0,2930.0,2930.0,2930.0,2930.0,2907.0,...,2930.0,2930.0,2930.0,2930.0,2930.0,2930.0,2930.0,2930.0,2930.0,2930.0
mean,1465.5,714464500.0,57.387372,69.22459,10147.921843,6.094881,5.56314,1971.356314,1984.266553,101.896801,...,93.751877,47.533447,23.011604,2.592491,16.002048,2.243345,50.635154,6.216041,2007.790444,180796.060068
std,845.96247,188730800.0,42.638025,23.365335,7880.017759,1.411026,1.111537,30.245361,20.860286,179.112611,...,126.361562,67.4834,64.139059,25.141331,56.08737,35.597181,566.344288,2.714492,1.316613,79886.692357
min,1.0,526301100.0,20.0,21.0,1300.0,1.0,1.0,1872.0,1950.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2006.0,12789.0
25%,733.25,528477000.0,20.0,58.0,7440.25,5.0,5.0,1954.0,1965.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,2007.0,129500.0
50%,1465.5,535453600.0,50.0,68.0,9436.5,6.0,5.0,1973.0,1993.0,0.0,...,0.0,27.0,0.0,0.0,0.0,0.0,0.0,6.0,2008.0,160000.0
75%,2197.75,907181100.0,70.0,80.0,11555.25,7.0,6.0,2001.0,2004.0,164.0,...,168.0,70.0,0.0,0.0,0.0,0.0,0.0,8.0,2009.0,213500.0
max,2930.0,1007100000.0,190.0,313.0,215245.0,10.0,9.0,2010.0,2010.0,1600.0,...,1424.0,742.0,1012.0,508.0,576.0,800.0,17000.0,12.0,2010.0,755000.0


In [5]:
# Lire les 5 premières lignes
data.head()

Unnamed: 0,Order,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,...,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,Sale Condition,SalePrice
0,1,526301100,20,RL,141.0,31770,Pave,,IR1,Lvl,...,0,,,,0,5,2010,WD,Normal,215000
1,2,526350040,20,RH,80.0,11622,Pave,,Reg,Lvl,...,0,,MnPrv,,0,6,2010,WD,Normal,105000
2,3,526351010,20,RL,81.0,14267,Pave,,IR1,Lvl,...,0,,,Gar2,12500,6,2010,WD,Normal,172000
3,4,526353030,20,RL,93.0,11160,Pave,,Reg,Lvl,...,0,,,,0,4,2010,WD,Normal,244000
4,5,527105010,60,RL,74.0,13830,Pave,,IR1,Lvl,...,0,,MnPrv,,0,3,2010,WD,Normal,189900


In [6]:
# Afficher les données
data

Unnamed: 0,Order,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,...,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,Sale Condition,SalePrice
0,1,526301100,20,RL,141.0,31770,Pave,,IR1,Lvl,...,0,,,,0,5,2010,WD,Normal,215000
1,2,526350040,20,RH,80.0,11622,Pave,,Reg,Lvl,...,0,,MnPrv,,0,6,2010,WD,Normal,105000
2,3,526351010,20,RL,81.0,14267,Pave,,IR1,Lvl,...,0,,,Gar2,12500,6,2010,WD,Normal,172000
3,4,526353030,20,RL,93.0,11160,Pave,,Reg,Lvl,...,0,,,,0,4,2010,WD,Normal,244000
4,5,527105010,60,RL,74.0,13830,Pave,,IR1,Lvl,...,0,,MnPrv,,0,3,2010,WD,Normal,189900
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2925,2926,923275080,80,RL,37.0,7937,Pave,,IR1,Lvl,...,0,,GdPrv,,0,3,2006,WD,Normal,142500
2926,2927,923276100,20,RL,,8885,Pave,,IR1,Low,...,0,,MnPrv,,0,6,2006,WD,Normal,131000
2927,2928,923400125,85,RL,62.0,10441,Pave,,Reg,Lvl,...,0,,MnPrv,Shed,700,7,2006,WD,Normal,132000
2928,2929,924100070,20,RL,77.0,10010,Pave,,Reg,Lvl,...,0,,,,0,4,2006,WD,Normal,170000


## Analyse des valeurs manquantes

In [7]:
# Affichage des nombres, taux et types des valeurs manquantes par attribut
nb_m = data.isnull().sum().sort_values()[50:]
ratio_m = (data.isnull().sum()/data.shape[0]).sort_values()[50:]
manquant = pd.concat([nb_m, ratio_m], axis=1, sort=False)

In [8]:
# Afficher la liste des attributs avec le nombres de données manquantes
pd.DataFrame({'Types': data[list(manquant.index.values)].dtypes,
                       'Nb manquants': nb_m,
                      'Ratio manquants%': ratio_m,})

Unnamed: 0,Types,Nb manquants,Ratio manquants%
Year Remod/Add,int64,0,0.0
Roof Style,object,0,0.0
Roof Matl,object,0,0.0
Exterior 1st,object,0,0.0
Exterior 2nd,object,0,0.0
Electrical,object,1,0.000341
BsmtFin SF 1,float64,1,0.000341
BsmtFin SF 2,float64,1,0.000341
Bsmt Unf SF,float64,1,0.000341
Total Bsmt SF,float64,1,0.000341


Des nombres de valeurs manquantes de 1 à 2917  pour 27 variables: 11 numériques et 16 catégorielles.

Beaucoup de variables catégorielles : les valeurs manquantes ont-elles un sens ou sont-elles vraiment manquantes ?

In [9]:
# Ignorer les données manquantes - attention car un attribut est manquant pour toutes les données
# data = pd.read_csv(data_file,index_col=0)
# data.dropna(inplace=True)

In [10]:
# On va mettre une valeur fictif (disons 0) aux données manquantes
data = pd.read_csv(data_file,index_col=0)
data.fillna(0)

Unnamed: 0,Order,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Alley,Lot Shape,Land Contour,...,Pool Area,Pool QC,Fence,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,Sale Condition,SalePrice
0,1,526301100,20,RL,141.0,31770,Pave,0,IR1,Lvl,...,0,0,0,0,0,5,2010,WD,Normal,215000
1,2,526350040,20,RH,80.0,11622,Pave,0,Reg,Lvl,...,0,0,MnPrv,0,0,6,2010,WD,Normal,105000
2,3,526351010,20,RL,81.0,14267,Pave,0,IR1,Lvl,...,0,0,0,Gar2,12500,6,2010,WD,Normal,172000
3,4,526353030,20,RL,93.0,11160,Pave,0,Reg,Lvl,...,0,0,0,0,0,4,2010,WD,Normal,244000
4,5,527105010,60,RL,74.0,13830,Pave,0,IR1,Lvl,...,0,0,MnPrv,0,0,3,2010,WD,Normal,189900
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2925,2926,923275080,80,RL,37.0,7937,Pave,0,IR1,Lvl,...,0,0,GdPrv,0,0,3,2006,WD,Normal,142500
2926,2927,923276100,20,RL,0.0,8885,Pave,0,IR1,Low,...,0,0,MnPrv,0,0,6,2006,WD,Normal,131000
2927,2928,923400125,85,RL,62.0,10441,Pave,0,Reg,Lvl,...,0,0,MnPrv,Shed,700,7,2006,WD,Normal,132000
2928,2929,924100070,20,RL,77.0,10010,Pave,0,Reg,Lvl,...,0,0,0,0,0,4,2006,WD,Normal,170000


In [10]:
# Analyser la distribution du prix (SalePrice) sous forme de violon
import plotly.express as px

fig = px.violin(data, y="SalePrice")
fig.show()

In [12]:
# On va mettre une valeur tenant compte des autres valeurs de la colonnes (disons la moyenne) aux données manquantes
# Attention aux chaines de charactères
# data = pd.read_csv(data_file,index_col=0)
# data.fillna(data.mean(), inplace=True)

In [11]:
# Pour comparer plusieurs violons
import plotly.express as px

fig = px.violin(data, y="SalePrice", x="MS Zoning", color="MS Zoning", box=True, points="all",
          hover_data=data.columns)
fig.show()

### Pool QC : Qualité de la piscine (None = pas de piscine)

In [13]:
pd.Categorical(data['Pool QC'].values)

[NaN, NaN, NaN, NaN, NaN, ..., NaN, NaN, NaN, NaN, NaN]
Length: 2930
Categories (4, object): ['Ex', 'Fa', 'Gd', 'TA']

In [14]:
# Remplacer 'Pool QC' par une variable binaire 'Pool' 
data["Pool"] = np.where(data["Pool QC"].isnull(),0,1)
data.drop(["Pool QC"], axis = 1, inplace = True)

In [15]:
# Valider la manipulation
pd.Categorical(data['Pool'].values)

[0, 0, 0, 0, 0, ..., 0, 0, 0, 0, 0]
Length: 2930
Categories (2, int32): [0, 1]

### Misc feature : Autre information

In [17]:
pd.Categorical(data['Misc Feature'].values)

[NaN, NaN, 'Gar2', NaN, NaN, ..., NaN, NaN, 'Shed', NaN, NaN]
Length: 2930
Categories (5, object): ['Elev', 'Gar2', 'Othr', 'Shed', 'TenC']

Les valeurs peuvent être réparties dans d'autres attributs.

### Alley : Type de l'allée pour accès à la propriété (redondance avec Street) 

In [16]:
pd.Categorical(data['Alley'].values)

[NaN, NaN, NaN, NaN, NaN, ..., NaN, NaN, NaN, NaN, NaN]
Length: 2930
Categories (2, object): ['Grvl', 'Pave']

In [17]:
pd.Categorical(data['Street'].values)

['Pave', 'Pave', 'Pave', 'Pave', 'Pave', ..., 'Pave', 'Pave', 'Pave', 'Pave', 'Pave']
Length: 2930
Categories (2, object): ['Grvl', 'Pave']

In [18]:
#Supprimer 'Alley'
data.drop(["Alley"], axis = 1, inplace = True)

### Fence : clôture (None = pas de clôture)

In [19]:
pd.Categorical(data['Fence'].values)

[NaN, 'MnPrv', NaN, NaN, 'MnPrv', ..., 'GdPrv', 'MnPrv', 'MnPrv', NaN, NaN]
Length: 2930
Categories (4, object): ['GdPrv', 'GdWo', 'MnPrv', 'MnWw']

In [20]:
#Remplacer 'Fence' par une variable binaire 'HasFence' 
data["HasFence"] = np.where(data["Fence"].isnull(),0,1)
data.drop(["Fence"], axis = 1, inplace = True)

### Fireplace Qu: ordinal (None = pas de cheminée)

Ça a de l'importance (esthétique + économie)

### 'Garage Finish', 'Garage Cond', 'Garage Qual', 'Garage Type'

Plusieurs attributs pour le garage

In [21]:
pd.Categorical(data['Garage Finish'].values)

['Fin', 'Unf', 'Unf', 'Fin', 'Fin', ..., 'Unf', 'Unf', NaN, 'RFn', 'Fin']
Length: 2930
Categories (3, object): ['Fin', 'RFn', 'Unf']

In [22]:
pd.Categorical(data['Garage Cond'].values)

['TA', 'TA', 'TA', 'TA', 'TA', ..., 'TA', 'TA', NaN, 'TA', 'TA']
Length: 2930
Categories (5, object): ['Ex', 'Fa', 'Gd', 'Po', 'TA']

In [23]:
pd.Categorical(data['Garage Qual'].values)

['TA', 'TA', 'TA', 'TA', 'TA', ..., 'TA', 'TA', NaN, 'TA', 'TA']
Length: 2930
Categories (5, object): ['Ex', 'Fa', 'Gd', 'Po', 'TA']

In [24]:
pd.Categorical(data['Garage Type'].values)

['Attchd', 'Attchd', 'Attchd', 'Attchd', 'Attchd', ..., 'Detchd', 'Attchd', NaN, 'Attchd', 'Attchd']
Length: 2930
Categories (6, object): ['2Types', 'Attchd', 'Basment', 'BuiltIn', 'CarPort', 'Detchd']

'Garage Cond' et 'Garage Qual': synonymes et inutiles

'Garage Finish' et 'Garage Type' : utiles, mais l'information peut être binaire


In [25]:
data.drop(["Garage Cond"], axis = 1, inplace = True)
data.drop(["Garage Qual"], axis = 1, inplace = True)

In [26]:
data['Garage Finish'] = np.where(data['Garage Finish'].str.contains('|'.join(['Fin','Rfn'])),1,0)
data['Garage Type'] = np.where(data['Garage Type'].str.contains('Detchd'),0,1)


Les attributs correspondant au sous-sol (Bsmt) peuvent être aussi consolidés.

### Imputation: example (Electrical : 1 valeur manquante) 

In [27]:
data['Electrical'] = data['Electrical'].fillna(data['Electrical'].mode()[0])

## Traitement des valeurs bruitées

Rechercher des duplications, des valeurs numériques abérantes.


In [28]:
#objets dupliqués ?
data.duplicated().sum()

0

In [29]:

fig,axes=plt.subplots(1,3,figsize=(12,4))
sns.scatterplot(x="Lot Area", y="SalePrice",data = data, ax = axes[0])
sns.boxplot(x="Lot Area", data = data, ax = axes[1])
sns.boxplot(x="SalePrice", data = data, ax = axes[2])


<IPython.core.display.Javascript object>

<Axes: xlabel='SalePrice'>

# Lot Area

In [30]:
fig = px.scatter(data, x="Lot Area", y="SalePrice")
fig.show()

In [31]:
fig = px.box(data, y="Lot Area")
fig.show()

In [32]:
fig = px.box(data, x="Lot Area")
fig.show()

In [33]:
fig = px.box(data, x="SalePrice")
fig.show()

In [34]:
fig = px.box(data, x="SalePrice", y="MS Zoning")
fig.show()

# Lot Frontage

In [35]:
#Lot Frontage
import warnings
warnings.filterwarnings("ignore")

fig,axes=plt.subplots(1,3,figsize=(12,4))
sns.scatterplot(x="Lot Frontage", y="SalePrice",data = data, ax = axes[0])
sns.boxplot(x="Lot Frontage", data = data, ax = axes[1])
sns.boxplot(x="SalePrice", data = data, ax = axes[2])

<IPython.core.display.Javascript object>

<Axes: xlabel='SalePrice'>

# GrLivArea

In [36]:
fig,axes=plt.subplots(1,3,figsize=(12,4))
sns.scatterplot(x="Gr Liv Area", y="SalePrice",data = data, ax = axes[0])
sns.boxplot(x="Gr Liv Area", data = data, ax = axes[1])
sns.boxplot(x="SalePrice", data = data, ax = axes[2])

<IPython.core.display.Javascript object>

<Axes: xlabel='SalePrice'>

# GrLivArea

In [37]:
fig,axes=plt.subplots(1,3,figsize=(12,4))
sns.scatterplot(x="Gr Liv Area", y="SalePrice",data = data, ax = axes[0])
sns.boxplot(x="Gr Liv Area", data = data, ax = axes[1])
sns.boxplot(x="SalePrice", data = data, ax = axes[2])

<IPython.core.display.Javascript object>

<Axes: xlabel='SalePrice'>

## Traitement des valeurs incohérentes

Vérifier que les types des attributs correspondent bien aux valeurs.

Vérifier que les attributs corrélés ont des valeurs cohérentes.

In [38]:
valeurs = [data[att].values[:5] for att in data.columns]
pd.DataFrame({'Types': data.dtypes, '5 1eres valeurs': valeurs}).transpose()

Unnamed: 0,Order,PID,MS SubClass,MS Zoning,Lot Frontage,Lot Area,Street,Lot Shape,Land Contour,Utilities,...,Pool Area,Misc Feature,Misc Val,Mo Sold,Yr Sold,Sale Type,Sale Condition,SalePrice,Pool,HasFence
Types,int64,int64,int64,object,float64,int64,object,object,object,object,...,int64,object,int64,int64,int64,object,object,int64,int32,int32
5 1eres valeurs,"[1, 2, 3, 4, 5]","[526301100, 526350040, 526351010, 526353030, 5...","[20, 20, 20, 20, 60]","[RL, RH, RL, RL, RL]","[141.0, 80.0, 81.0, 93.0, 74.0]","[31770, 11622, 14267, 11160, 13830]","[Pave, Pave, Pave, Pave, Pave]","[IR1, Reg, IR1, Reg, IR1]","[Lvl, Lvl, Lvl, Lvl, Lvl]","[AllPub, AllPub, AllPub, AllPub, AllPub]",...,"[0, 0, 0, 0, 0]","[nan, nan, Gar2, nan, nan]","[0, 0, 12500, 0, 0]","[5, 6, 6, 4, 3]","[2010, 2010, 2010, 2010, 2010]","[WD , WD , WD , WD , WD ]","[Normal, Normal, Normal, Normal, Normal]","[215000, 105000, 172000, 244000, 189900]","[0, 0, 0, 0, 0]","[0, 1, 0, 0, 1]"


## D'autres graphiques

In [39]:
data1 = data.select_dtypes(['int64','float64'])
data1

Unnamed: 0,Order,PID,MS SubClass,Lot Frontage,Lot Area,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Mas Vnr Area,...,Wood Deck SF,Open Porch SF,Enclosed Porch,3Ssn Porch,Screen Porch,Pool Area,Misc Val,Mo Sold,Yr Sold,SalePrice
0,1,526301100,20,141.0,31770,6,5,1960,1960,112.0,...,210,62,0,0,0,0,0,5,2010,215000
1,2,526350040,20,80.0,11622,5,6,1961,1961,0.0,...,140,0,0,0,120,0,0,6,2010,105000
2,3,526351010,20,81.0,14267,6,6,1958,1958,108.0,...,393,36,0,0,0,0,12500,6,2010,172000
3,4,526353030,20,93.0,11160,7,5,1968,1968,0.0,...,0,0,0,0,0,0,0,4,2010,244000
4,5,527105010,60,74.0,13830,5,5,1997,1998,0.0,...,212,34,0,0,0,0,0,3,2010,189900
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2925,2926,923275080,80,37.0,7937,6,6,1984,1984,0.0,...,120,0,0,0,0,0,0,3,2006,142500
2926,2927,923276100,20,,8885,5,5,1983,1983,0.0,...,164,0,0,0,0,0,0,6,2006,131000
2927,2928,923400125,85,62.0,10441,5,5,1992,1992,0.0,...,80,32,0,0,0,0,700,7,2006,132000
2928,2929,924100070,20,77.0,10010,5,5,1974,1975,0.0,...,240,38,0,0,0,0,0,4,2006,170000


In [40]:
import plotly.figure_factory as ff

corrs = data1.corr()

figure = ff.create_annotated_heatmap(
    z=corrs.values,
    x=list(corrs.columns),
    y=list(corrs.index),
    colorscale='Earth',
    annotation_text=corrs.round(2).values,
    showscale=True, reversescale=True)

figure.show()

In [41]:
corrs

Unnamed: 0,Order,PID,MS SubClass,Lot Frontage,Lot Area,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Mas Vnr Area,...,Wood Deck SF,Open Porch SF,Enclosed Porch,3Ssn Porch,Screen Porch,Pool Area,Misc Val,Mo Sold,Yr Sold,SalePrice
Order,1.0,0.173593,0.011797,-0.007034,0.031354,-0.0485,-0.011054,-0.052319,-0.075566,-0.030907,...,-0.011292,0.016355,0.027908,-0.024975,0.004307,0.052518,-0.006083,0.133365,-0.975993,-0.031408
PID,0.173593,1.0,-0.001281,-0.096918,0.034868,-0.263147,0.104451,-0.343388,-0.157111,-0.229283,...,-0.051135,-0.071311,0.162519,-0.024894,-0.025735,-0.002845,-0.00826,-0.050455,0.009579,-0.246521
MS SubClass,0.011797,-0.001281,1.0,-0.420135,-0.204613,0.039419,-0.067349,0.036579,0.043397,0.00273,...,-0.01731,-0.014823,-0.022866,-0.037956,-0.050614,-0.003434,-0.029254,0.00035,-0.017905,-0.085092
Lot Frontage,-0.007034,-0.096918,-0.420135,1.0,0.491313,0.212042,-0.074448,0.121562,0.091712,0.222407,...,0.120084,0.16304,0.012758,0.028564,0.076666,0.173947,0.044476,0.011085,-0.007547,0.357318
Lot Area,0.031354,0.034868,-0.204613,0.491313,1.0,0.097188,-0.034759,0.023258,0.021682,0.12683,...,0.157212,0.10376,0.021868,0.016243,0.055044,0.093775,0.069188,0.003859,-0.023085,0.266549
Overall Qual,-0.0485,-0.263147,0.039419,0.212042,0.097188,1.0,-0.094812,0.597027,0.569609,0.429418,...,0.255663,0.298412,-0.140332,0.01824,0.041615,0.030399,0.005179,0.031103,-0.020719,0.799262
Overall Cond,-0.011054,0.104451,-0.067349,-0.074448,-0.034759,-0.094812,1.0,-0.368773,0.04768,-0.13534,...,0.020344,-0.068934,0.071459,0.043852,0.044055,-0.016787,0.034056,-0.007295,0.031207,-0.101697
Year Built,-0.052319,-0.343388,0.036579,0.121562,0.023258,0.597027,-0.368773,1.0,0.612095,0.313292,...,0.228964,0.198365,-0.374364,0.015803,-0.041436,0.002213,-0.011011,0.014577,-0.013197,0.558426
Year Remod/Add,-0.075566,-0.157111,0.043397,0.091712,0.021682,0.569609,0.04768,0.612095,1.0,0.196928,...,0.217857,0.241748,-0.220383,0.037412,-0.046888,-0.01141,-0.003132,0.018048,0.032652,0.532974
Mas Vnr Area,-0.030907,-0.229283,0.00273,0.222407,0.12683,0.429418,-0.13534,0.313292,0.196928,1.0,...,0.165467,0.143748,-0.110787,0.013778,0.065643,0.004617,0.044934,-0.000276,-0.017715,0.508285


In [42]:
corrs1 = corrs.iloc[:5, : 5]

In [43]:
figure = ff.create_annotated_heatmap(
    z=corrs1.values,
    x=list(corrs1.columns),
    y=list(corrs1.index),
    colorscale='solar',
    annotation_text=corrs1.round(2).values,
    showscale=True, reversescale=True)

figure.show() 