# **Correction des surfaces ventilées par usage**

Il y a de nombreuses incohérences dans les surfaces, notamment en relation avec les surfaces de Parkings dont la définition est confuse. 60 % environ des échantillons sont incohérents en raison de divergences dans les sommes de contrôle.

L'objet de cette section est :
1. d'identifier les individus qui posent problème (12 classes de problèmes),
2. de mieux comprendre la nature des erreurs,
3. d'en corriger la plus grande partie possible,
4. d'identifier les cas aberrants pour les éliminer du jeu de données pour la modélisation.

# Reste à faire

stratégie pour les surfaces :
* repartir du tableau qui met les sous-surfaces en 69 + 1 colonnes
* pour chaque propriété qui a un parking (cf. num_bivar_analysis) :
    * Parking not in detailed uses s > 0, s_2 = 0 (146) => l'y ajouter
        * Attention, cas tordus comme pour 655 : tot. 117950, ext. 11210, int. 106740, usage 1 multifamily : 117950
        * Dans ce cas, il faut également retrancher 11210 à multifamily
        * En d'autres termes, il faut coupler cette analyse-réparation avec celle de la surface tot. vs. somme détails
    * Parking is under accounted s > s_2 > 0 (144) : là il faut corriger soit s, soit s_2 : idem s vs sum(s-details)
    * Parking is over accounted s_2 > s > 0 (72) : idem
    * => tout cela serait plus simple en GSheet, mais la mise à plat en 69 + 1 cols est délicate dans cet env.
    * => En revanche, ce que je peux tout à faire, c'est de l'exporter vers la GSheet
    * => NB > cette vérification n'est pas faite dans num_bivar mais ??? => mettre un renvoi dans num_bivar
    * MAIS commencer par terminer mon onglet outliers dans GSheet


**NB** > zoomer sur chaque cas avant de trop orienter les traitements.

**Raffiner** : les appartenances aux classes en utilisant des critères statistiques : si déviation par rapport à la classe 1, pouvoir la quantifier. Une déviation mineure n'a pas le même impact qu'une déviation majeure.

**Tableau markdown** Le produire à l'aide d'une fonction (toute bête)

# Rappel de la problématique 

Surface / sa ventilation par usage : voir `num_bivar.analysis.ipynb`

Nous avons validé que la relation `'PropertyGFATotal' = 'PropertyGFAParking' + 'PropertyGFABuilding(s)'` ($a = a_i + a_o$) était parfaitement satisfaite par l'ensemble des individus.

**Q** - *La surface totale est-elle systématiquement la sommes des surfaces par usage ?*

**NB** - le cas du parking est spécial : il faut vérifier qu'il y a cohérence entre la surface indiquée dans `'PropertyGFAParking'` et celle indiquée dans l'éventuel des 3 usages ventilés ayant pour label `'Parking'`.

Bilan :
* Pas d'exclusion
* 72 % d'individus vérifient la relation (erreur tolérée 1 pour 1000).
* 28 % ne la vérifient pas ... <mark>Analyse à poursuivre en réutilisant l'histogramme pentes</mark>
* Pour les 593 propriétés avec parking, les inconsistances dépassent les consistances.. quelque chose ne va pas.

Formalisation et méthode :

Voir `multi_index_demo.ipynb` pour la définition de toutes les variables formelles.

Ici, nous utilisons ce sous-ensemble :
* $U=\{u_k, k=0, ...\}$ ⚪ `'u_0', ...` ⚪ Ensemble des labels d'usages, ordonnés par surface totale décroissante.
* $A_U = \{a_{u_k}|u_k \in U\}$ ⚪ `'a_u_0', ...` ⚪ Part de surface dédiée à chaque usage.
* $a_u$ ⚪ `'a_u'` ⚪ $a_u = \displaystyle\sum_{u_k \in U}{a_{u_k}}$
* $a_o$ ⚪`'a_o'` ⚪ `'PropertyGFAParking'`
* $a_i$ ⚪ `'a_i'` ⚪ `'PropertyGFABuilding(s)'`
* $a$ ⚪ `'a'` ⚪ `'PropertyGFATotal'` ⚪ $a = a_i + a_o$
* $u_{\text{1st}}, u_{\text{2nd}}, u_{\text{3rd}}$ ⚪ `'u_1st', u_2nd, u_3rd` ⚪ `'LargestPropertyUseType'`, etc
* $a_{\text{1st}}, a_{\text{2nd}}, a_{\text{3rd}}$ ⚪ `'a_1st', a_2nd, a_3rd` ⚪ `'LargestPropertyUseTypeGFA'`, etc

Les utilitaires clés :
* `use_types_analysis.use_table` :
    - elle dérive la table $A_U$ de $a_{\text{1st}}, a_{\text{2nd}}, a_{\text{3rd}}$
    - en complément, elle produit ces vecteurs analytiques :
        - $a_u = a_{\text{1st}} + a_{\text{2nd}} + a_{\text{3rd}}$ ⚪ `'a_u'`
        - $\Delta a = a - a_u$ ⚪ `'a - a_u'`
        - $a_{u_o}$ ⚪ `'a_u_o'` ⚪ La surface de Parking telle que renseignée dans l'un de $a_{\text{1st}}, a_{\text{2nd}}, a_{\text{3rd}}$
        - $\Delta a_o = a_o - a_{u_o}$ ⚪ `'a_o - a_u_o'`
        - $a_{u_i} = a_u - a_{u_o}$ ⚪ `'a_u_i'` ⚪ La somme des surfaces ventilées autres que l'éventuelle surface de Parking
        - $\Delta a_i = a_i - a_{u_i}$ ⚪ `'a_i - a_u_i'`

La méthode de détection des anomalies et les mesures pour les traiter :

# Chargement des données

In [1]:
import numpy as np
def join_use_allocation(subset, data):
    main_uses = data.loc[subset.index, 'ListOfAllPropertyUseTypes':'ThirdLargestPropertyUseTypeGFA']
    main_uses.LargestPropertyUseTypeGFA = main_uses.LargestPropertyUseTypeGFA.astype(np.float64)
    main_uses.SecondLargestPropertyUseTypeGFA = main_uses.SecondLargestPropertyUseTypeGFA.astype(np.float64)
    main_uses.ThirdLargestPropertyUseTypeGFA = main_uses.ThirdLargestPropertyUseTypeGFA.astype(np.float64)
    return subset.join(main_uses, how='left') # .fillna('_')

In [2]:
from pepper_commons import get_data
from seattle_commons import clean_dataset
import pandas as pd

data = get_data()
_data, not_compliant, outliers = clean_dataset(data)

# données de base
d = _data[['BuildingType', 'PrimaryPropertyType', 'PropertyGFATotal',
    'PropertyGFAParking', 'PropertyGFABuilding(s)']].copy()
d.columns = ['btype', 'ptype', 'a', 'a_o', 'a_i']

# données dérivées de la ventilation des surfaces
from use_types_analysis import use_table_2
use_table, a_u, a_diff, a_u_o, a_o_diff, a_u_i, a_i_diff = use_table_2(_data)

table = pd.concat([d, a_u, a_u_o, a_u_i, a_diff, a_o_diff, a_i_diff, use_table], axis=1)
_table = table[table.columns[:11]]
display(_table)

# Sdiff (Unknown verif calc) = S - SUM(Su)
# Spdiffx : Sext - Sup
# Meilleure conception  : use_table sans Unkown => on le fait ici
# changer les noms de cols pour u1, u2, etc ? à voir..

# algo (une fois table assemblée)
# 1. retirer tout ce qui est ok (1503 cas)
# 2. diviser ce qui est ko en classes cf.nature pb (982 diff > 0, 726 diff < 0)
# 3. représentation graphique du pb
# 4. corriger le pb et montrer graphiquement qu'il l'est

[1m[32m✔ _data[0m[0m loaded
[1m[32m✔ struct[0m[0m loaded


Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,NonResidential,Hotel,88434,0,88434,88434,0,88434,0,0,0
2,NonResidential,Hotel,103566,15064,88502,103566,15064,88502,0,0,0
3,NonResidential,Hotel,956110,196718,759392,756493,0,756493,199617,196718,2899
5,NonResidential,Hotel,61320,0,61320,61320,0,61320,0,0,0
8,NonResidential,Hotel,175580,62000,113580,191454,68009,123445,-15874,-6009,-9865
...,...,...,...,...,...,...,...,...,...,...,...
50221,Nonresidential COS,Other,18261,0,18261,18261,0,18261,0,0,0
50223,Nonresidential COS,Other,16000,0,16000,16000,0,16000,0,0,0
50224,Nonresidential COS,Other,13157,0,13157,13157,0,13157,0,0,0
50225,Nonresidential COS,Mixed Use Property,14101,0,14101,13586,0,13586,515,0,515


# Les classes de cas

Les cas particuliers sont nombreux ce qui complique cette opération.

Commençons par isoler les classes de cas suivant des critères formels.

Partons de la propriété qui résulte des définitions et de l'identité vérifiée : $\Delta a = \Delta a_i + \Delta a_o$ (A)

Suivant les cas $\Delta x = 0, \gt 0, \lt 0$, avec $x=a, a_i,  a_o$, on peut isoler en principe $3^3 = 27$ cas.

Mais la relation (A) entraîne qu'il n'y a que 13 cas possibles, puisque les 14 autres ne la satisfont pas.

Le tableau suivant définit les classes correspondant à ces cas :

|Classe|$\Delta a$|$\Delta a_i$|$\Delta a_o$|#|%|Commentaire/correction|
|-|-|-|-|-|-|-|
|0|= 0|= 0|= 0|1278|**40**|✔ Tout est ok|
|1|= 0|> 0|< 0|182|6|Transfert $a_{u_i} \rightarrow a_{u_o}$|
|2|= 0|< 0|> 0|43|1|Transfert $a_{u_o} \rightarrow a_{u_i}$|
|3|> 0|= 0|> 0|23|1|Enregistrement $a_{u_e}$|
|4|> 0|> 0|= 0|621|**19**|Enregistrement $a_{u_i}$|
|5|> 0|> 0|> 0|96|3|Enregistrement $a_{u_i}$ et $a_{u_o}$|
|6|> 0|> 0|< 0|179|6||
|7|> 0|< 0|> 0|63|2||
|8|< 0|= 0|< 0|117|4||
|9|< 0|> 0|< 0|107|3||
|10|< 0|< 0|= 0|355|**11**||
|12|< 0|< 0|> 0|67|2||
|12|< 0|< 0|< 0|80|2||

In [21]:
a_eq_a_u = _table['a - a_u'] == 0
a_lt_a_u = _table['a - a_u'] < 0
a_gt_a_u = _table['a - a_u'] > 0
a_neq_a_u = ~a_eq_a_u

a_i_eq_a_u_i = _table['a_i - a_u_i'] == 0
a_i_lt_a_u_i = _table['a_i - a_u_i'] < 0
a_i_gt_a_u_i = _table['a_i - a_u_i'] > 0
a_i_neq_a_u_i = ~a_i_eq_a_u_i

a_o_eq_a_u_o = _table['a_o - a_u_o'] == 0
a_o_lt_a_u_o = _table['a_o - a_u_o'] < 0
a_o_gt_a_u_o = _table['a_o - a_u_o'] > 0
a_o_neq_a_u_o = ~a_o_eq_a_u_o

a_class = [
    a_eq_a_u & a_i_eq_a_u_i,
    a_eq_a_u & a_i_gt_a_u_i,
    a_eq_a_u & a_i_lt_a_u_i,

    a_gt_a_u & a_i_eq_a_u_i & a_o_gt_a_u_o,
    a_gt_a_u & a_i_gt_a_u_i & a_o_eq_a_u_o,
    a_gt_a_u & a_i_gt_a_u_i & a_o_gt_a_u_o,
    a_gt_a_u & a_i_gt_a_u_i & a_o_lt_a_u_o,
    a_gt_a_u & a_i_lt_a_u_i & a_o_gt_a_u_o,

    a_lt_a_u & a_i_eq_a_u_i & a_o_lt_a_u_o,
    a_lt_a_u & a_i_gt_a_u_i & a_o_lt_a_u_o,
    a_lt_a_u & a_i_lt_a_u_i & a_o_eq_a_u_o,
    a_lt_a_u & a_i_lt_a_u_i & a_o_gt_a_u_o,
    a_lt_a_u & a_i_lt_a_u_i & a_o_lt_a_u_o
]

# $\Delta a = 0 \Leftrightarrow a = a_u$<br/>Classes 0 à 2<br/>1503 cas, 47 % des cas

In [27]:
nil_a_diff = _table[a_eq_a_u]
display(nil_a_diff)

Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,NonResidential,Hotel,88434,0,88434,88434,0,88434,0,0,0
2,NonResidential,Hotel,103566,15064,88502,103566,15064,88502,0,0,0
5,NonResidential,Hotel,61320,0,61320,61320,0,61320,0,0,0
11,NonResidential,Other,102761,0,102761,102761,0,102761,0,0,0
12,NonResidential,Hotel,163984,0,163984,163984,0,163984,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...
50208,Nonresidential COS,Other,12769,0,12769,12769,0,12769,0,0,0
50212,Nonresidential COS,Other,23445,0,23445,23445,0,23445,0,0,0
50221,Nonresidential COS,Other,18261,0,18261,18261,0,18261,0,0,0
50223,Nonresidential COS,Other,16000,0,16000,16000,0,16000,0,0,0


## Cas valides (classe 0)<br/>1278 cas, 40 % des cas

Ce sont les cas où :
* $\Delta a = 0$
* $\Delta a_i = 0$
* $\Delta a_o = 0$

In [23]:
class_0_members = _table[a_class[0]]
display(class_0_members)

Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,NonResidential,Hotel,88434,0,88434,88434,0,88434,0,0,0
2,NonResidential,Hotel,103566,15064,88502,103566,15064,88502,0,0,0
5,NonResidential,Hotel,61320,0,61320,61320,0,61320,0,0,0
11,NonResidential,Other,102761,0,102761,102761,0,102761,0,0,0
12,NonResidential,Hotel,163984,0,163984,163984,0,163984,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...
50208,Nonresidential COS,Other,12769,0,12769,12769,0,12769,0,0,0
50212,Nonresidential COS,Other,23445,0,23445,23445,0,23445,0,0,0
50221,Nonresidential COS,Other,18261,0,18261,18261,0,18261,0,0,0
50223,Nonresidential COS,Other,16000,0,16000,16000,0,16000,0,0,0


## Incohérences de classe 1<br/>182 cas, 6 % des cas

Ce sont  les cas où :
* $\Delta a = 0$
* $\Delta a_i \gt 0$
* $\Delta a_o \lt 0$

Cela signifie que si la surface totale et la somme de ses surfaces par usage coïncident, la quantité de surface de Parking renseignée dans `'PropertyGFAParking'` diffère de celle qui apparaît dans l'un des 3 `'LargestPropertyUseTypeGFA'`, etc

Le traitement consiste à rectifier la ventilation et éventuellement `'ListOfAllPropertyUseTypes'`.

Dans tous les cas, nous fixons $a, a_i, a_o$ comme référence et corrigeons les $a_{u_k}$ pour les mettre en cohérence.

Plusieurs cas sont à distinguer selon que :
* la surface de Parking fait partie de la ventilation ou non ($a_{u_o} = 0$).
* les 3 places de ventilation sont déjà prises ou non ($n_u = 3$).
* il n'y a pas de surface par usage autre que Parking ($a_{u_i} = 0$).

Les traitements :
* si la surface de Parking fait partie de la ventilation : ajout de la quantité $\Delta a_e$ à cette surface, et :
    - si $n_u > 1$ (équivalent à ($a_{u_i} \ne 0$)) retrait de la même quantité en fractions égales des autres usages (1 ou 2),
    - si $n_u = 1$ (équivalent à ($a_{u_i} = 0$)), ajout de l'usage ? `'Others'` et retrait : problème, cela conduit à une surface négative.
* si la surface de Parking ne fait pas partie de la ventilation ($a_{u_o} = 0$) :
    - .. compliqué .. à reprendre sur un prochain sprint, pas sur celui-ci de réactivation

..

In [24]:
class_1_members = _table[a_class[1]]
display(join_use_allocation(class_1_members, data))

Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i,ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
423,NonResidential,Large Office,794592,0,794592,794592,36606,757986,0,-36606,36606,"Office, Other, Parking",Office,748011.0,Parking,36606.0,Other,9975.0
691,Multifamily HR (10+),High-Rise Multifamily,99470,0,99470,99470,11000,88470,0,-11000,11000,"Multifamily Housing, Parking, Retail Store",Multifamily Housing,79470.0,Parking,11000.0,Retail Store,9000.0
738,NonResidential,Other,143439,0,143439,143439,23869,119570,0,-23869,23869,"Other, Parking",Other,119570.0,Parking,23869.0,,
19453,Multifamily LR (1-4),Low-Rise Multifamily,76285,0,76285,76285,15798,60487,0,-15798,15798,"Multifamily Housing, Parking",Multifamily Housing,60487.0,Parking,15798.0,,
19454,Multifamily LR (1-4),Low-Rise Multifamily,46873,0,46873,46873,16959,29914,0,-16959,16959,"Multifamily Housing, Parking",Multifamily Housing,29914.0,Parking,16959.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49870,Multifamily MR (5-9),Mid-Rise Multifamily,55113,0,55113,55113,13049,42064,0,-13049,13049,"Multifamily Housing, Parking",Multifamily Housing,42064.0,Parking,13049.0,,
49874,Multifamily LR (1-4),Low-Rise Multifamily,93179,0,93179,93179,29516,63663,0,-29516,29516,"Multifamily Housing, Parking",Multifamily Housing,63663.0,Parking,29516.0,,
49994,Multifamily HR (10+),High-Rise Multifamily,269065,0,269065,269065,75138,193927,0,-75138,75138,"Multifamily Housing, Parking",Multifamily Housing,193927.0,Parking,75138.0,,
50011,Multifamily MR (5-9),Mid-Rise Multifamily,85647,0,85647,85647,13119,72528,0,-13119,13119,"Multifamily Housing, Parking",Multifamily Housing,72528.0,Parking,13119.0,,


## Incohérences de classe 2<br/>43 cas, 1 % des cas

Ce sont  les cas où :
* $\Delta a = 0$
* $\Delta a_i \lt 0$
* $\Delta a_o \gt 0$

In [26]:
class_2_members = _table[a_class[2]]
class_2_members_details = join_use_allocation(class_2_members, data)
print(class_2_members_details.shape[0])
display(class_2_members_details)

43


Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i,ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
70,NonResidential,Hotel,155602,36744,118858,155602,0,155602,0,36744,-36744,Hotel,Hotel,155602.0,,,,
114,NonResidential,Large Office,920598,303707,616891,920598,185014,735584,0,118693,-118693,"Office, Other, Parking",Office,729584.0,Parking,185014.0,Other,6000.0
245,NonResidential,Other,1585960,327680,1258280,1585960,0,1585960,0,327680,-327680,"Other - Entertainment/Public Assembly, Parking",Other - Entertainment/Public Assembly,1585960.0,Parking,0.0,,
322,NonResidential,Large Office,100734,26731,74003,100734,0,100734,0,26731,-26731,Office,Office,100734.0,,,,
375,NonResidential,Large Office,396626,124216,272410,396626,122624,274002,0,1592,-1592,"Office, Parking",Office,274002.0,Parking,122624.0,,
410,NonResidential,Hotel,332067,59280,272787,332067,0,332067,0,59280,-59280,Hotel,Hotel,332067.0,,,,
434,NonResidential,Other,206950,97400,109550,206950,0,206950,0,97400,-97400,Other,Other,206950.0,,,,
441,NonResidential,Hotel,109017,29409,79608,109017,0,109017,0,29409,-29409,Hotel,Hotel,109017.0,,,,
521,NonResidential,Small- and Mid-Sized Office,90000,36528,53472,90000,0,90000,0,36528,-36528,Office,Office,90000.0,,,,
549,NonResidential,Small- and Mid-Sized Office,78672,28427,50245,78672,0,78672,0,28427,-28427,Office,Office,78672.0,,,,


# $\Delta a \gt 0 \Leftrightarrow a \gt a_u$<br/>Classes 3 à 7<br/>982 cas, 30 % des cas

In [28]:
pos_a_diff = _table[a_gt_a_u]
display(pos_a_diff)

Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
3,NonResidential,Hotel,956110,196718,759392,756493,0,756493,199617,196718,2899
9,Nonresidential COS,Other,97288,37198,60090,88830,0,88830,8458,37198,-28740
10,NonResidential,Hotel,83008,0,83008,81352,0,81352,1656,0,1656
13,Multifamily MR (5-9),Mid-Rise Multifamily,63712,1496,62216,56132,0,56132,7580,1496,6084
15,NonResidential,Hotel,153163,19279,133884,133884,0,133884,19279,19279,0
...,...,...,...,...,...,...,...,...,...,...,...
50196,Nonresidential COS,Mixed Use Property,20616,0,20616,19841,0,19841,775,0,775
50207,Nonresidential COS,Other,16795,0,16795,16229,0,16229,566,0,566
50219,Nonresidential COS,Mixed Use Property,20050,0,20050,19613,0,19613,437,0,437
50225,Nonresidential COS,Mixed Use Property,14101,0,14101,13586,0,13586,515,0,515


## Incohérences de classe 3<br/>23 cas, 1 % des cas

Ce sont  les cas où :
* $\Delta a \gt 0$
* $\Delta a_i \gt 0$
* $\Delta a_o = 0$

In [32]:
class_3_members = _table[a_class[3]]
class_3_members_details = join_use_allocation(class_3_members, data)
print(class_3_members_details.shape[0])
display(class_3_members_details)

23


Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i,ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
15,NonResidential,Hotel,153163,19279,133884,133884,0,133884,19279,19279,0,Hotel,Hotel,133884.0,,,,
476,NonResidential,Large Office,226592,95000,131592,214770,83178,131592,11822,11822,0,"Office, Parking",Office,131592.0,Parking,83178.0,,
495,NonResidential,Medical Office,191276,95281,95995,95995,0,95995,95281,95281,0,Medical Office,Medical Office,95995.0,,,,
500,NonResidential,Supermarket / Grocery Store,308965,187878,121087,121087,0,121087,187878,187878,0,"Parking, Supermarket/Grocery Store",Supermarket/Grocery Store,121087.0,Parking,0.0,Parking,0.0
692,NonResidential,Other,275982,197130,78852,275892,197040,78852,90,90,0,"Office, Parking",Parking,197040.0,Office,78852.0,,
1281,NonResidential,Large Office,113944,48510,65434,65434,0,65434,48510,48510,0,Office,Office,65434.0,,,,
19498,Multifamily LR (1-4),Mixed Use Property,37849,14997,22852,28352,5500,22852,9497,9497,0,"Multifamily Housing, Office, Parking",Multifamily Housing,11980.0,Office,10872.0,Parking,5500.0
19679,Multifamily HR (10+),High-Rise Multifamily,91600,12365,79235,79235,0,79235,12365,12365,0,"Multifamily Housing, Office",Multifamily Housing,65706.0,Office,13529.0,,
19925,Multifamily HR (10+),High-Rise Multifamily,393841,67222,326619,326619,0,326619,67222,67222,0,"Multifamily Housing, Parking, Retail Store",Multifamily Housing,302922.0,Retail Store,23697.0,Parking,0.0
20730,Multifamily MR (5-9),Mid-Rise Multifamily,91703,7565,84138,88638,4500,84138,3065,3065,0,"Multifamily Housing, Parking",Multifamily Housing,84138.0,Parking,4500.0,,


## Incohérences de classe 4<br/>621 cas, 19 % des cas

Ce sont  les cas où :
* $\Delta a \gt 0$
* $\Delta a_i \gt 0$
* $\Delta a_o = 0$

In [33]:
class_4_members = _table[a_class[4]]
class_4_members_details = join_use_allocation(class_4_members, data)
print(class_4_members_details.shape[0])
display(class_4_members_details)

621


Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i,ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
10,NonResidential,Hotel,83008,0,83008,81352,0,81352,1656,0,1656,Hotel,Hotel,81352.0,,,,
24,NonResidential,Mixed Use Property,57452,0,57452,41688,0,41688,15764,0,15764,"Office, Other, Other - Lodging/Residential, Re...",Social/Meeting Hall,16442.0,Restaurant,15505.0,Office,9741.0
26,NonResidential,Other,540360,0,540360,537150,0,537150,3210,0,3210,Courthouse,Courthouse,537150.0,,,,
30,NonResidential,University,126593,0,126593,125000,0,125000,1593,0,1593,College/University,College/University,125000.0,,,,
38,NonResidential,Small- and Mid-Sized Office,87262,0,87262,63403,0,63403,23859,0,23859,"K-12 School, Multifamily Housing, Office, Othe...",Office,40943.0,K-12 School,18153.0,Other - Entertainment/Public Assembly,4307.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50196,Nonresidential COS,Mixed Use Property,20616,0,20616,19841,0,19841,775,0,775,"Fitness Center/Health Club/Gym, Office, Other ...",Other - Recreation,9900.0,Fitness Center/Health Club/Gym,8577.0,Pre-school/Daycare,1364.0
50207,Nonresidential COS,Other,16795,0,16795,16229,0,16229,566,0,566,"Fitness Center/Health Club/Gym, Food Service, ...",Other - Recreation,8680.0,Fitness Center/Health Club/Gym,7014.0,Pre-school/Daycare,535.0
50219,Nonresidential COS,Mixed Use Property,20050,0,20050,19613,0,19613,437,0,437,"Fitness Center/Health Club/Gym, Office, Other ...",Other - Recreation,8108.0,Fitness Center/Health Club/Gym,7726.0,Office,3779.0
50225,Nonresidential COS,Mixed Use Property,14101,0,14101,13586,0,13586,515,0,515,"Fitness Center/Health Club/Gym, Food Service, ...",Other - Recreation,6601.0,Fitness Center/Health Club/Gym,6501.0,Pre-school/Daycare,484.0


## Incohérences de classe 5<br/>96 cas, 3 % des cas

Ce sont  les cas où :
* $\Delta a \gt 0$
* $\Delta a_i \gt 0$
* $\Delta a_o \gt 0$

In [34]:
class_5_members = _table[a_class[5]]
class_5_members_details = join_use_allocation(class_5_members, data)
print(class_5_members_details.shape[0])
display(class_5_members_details)

96


Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i,ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
3,NonResidential,Hotel,956110,196718,759392,756493,0,756493,199617,196718,2899,Hotel,Hotel,756493.0,,,,
13,Multifamily MR (5-9),Mid-Rise Multifamily,63712,1496,62216,56132,0,56132,7580,1496,6084,Multifamily Housing,Multifamily Housing,56132.0,,,,
33,NonResidential,Hotel,171866,38281,133585,128909,0,128909,42957,38281,4676,Hotel,Hotel,128909.0,,,,
35,NonResidential,Hotel,68410,16200,52210,47994,0,47994,20416,16200,4216,Hotel,Hotel,47994.0,,,,
68,NonResidential,Hotel,150453,34735,115718,110547,0,110547,39906,34735,5171,Hotel,Hotel,107547.0,Restaurant,3000.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50083,Multifamily MR (5-9),Mid-Rise Multifamily,175844,52045,123799,123554,24000,99554,52290,28045,24245,"Multifamily Housing, Parking",Multifamily Housing,99554.0,Parking,24000.0,,
50075,Multifamily MR (5-9),Mid-Rise Multifamily,260051,90503,169548,210125,64341,145784,49926,26162,23764,"Multifamily Housing, Parking",Multifamily Housing,145784.0,Parking,64341.0,,
50078,Multifamily HR (10+),High-Rise Multifamily,149976,12223,137753,138004,5986,132018,11972,6237,5735,"College/University, Multifamily Housing, Offic...",Multifamily Housing,115768.0,Office,12503.0,College/University,9733.0
50082,Multifamily LR (1-4),Low-Rise Multifamily,36685,8254,28431,0,0,0,36685,8254,28431,,,,,,,


## Incohérences de classe 6<br/>179 cas, 6 % des cas

Ce sont  les cas où :
* $\Delta a \gt 0$
* $\Delta a_i \gt 0$
* $\Delta a_o \lt 0$

In [35]:
class_6_members = _table[a_class[6]]
class_6_members_details = join_use_allocation(class_6_members, data)
print(class_6_members_details.shape[0])
display(class_6_members_details)

179


Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i,ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
65,NonResidential,Hotel,71150,0,71150,64674,13730,50944,6476,-13730,20206,"Hotel, Parking, Restaurant, Retail Store",Hotel,41340.0,Parking,13730.0,Retail Store,9604.0
96,NonResidential,Mixed Use Property,99780,9341,90439,78062,20868,57194,21718,-11527,33245,"Data Center, Medical Office, Non-Refrigerated ...",Medical Office,40174.0,Parking,20868.0,Office,17020.0
113,NonResidential,Small- and Mid-Sized Office,66240,2352,63888,55632,5304,50328,10608,-2952,13560,"Data Center, Distribution Center, Office, Park...",Office,45900.0,Data Center,5181.0,Distribution Center,4551.0
168,NonResidential,Hotel,162222,14200,148022,146394,32692,113702,15828,-18492,34320,"Hotel, Parking",Hotel,113702.0,Parking,32692.0,,
200,NonResidential,Other,55442,0,55442,52665,15385,37280,2777,-15385,18162,"Financial Office, Medical Office, Parking",Financial Office,28636.0,Parking,15385.0,Medical Office,8644.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50027,Multifamily MR (5-9),Mid-Rise Multifamily,62049,0,62049,47290,16810,30480,14759,-16810,31569,"Multifamily Housing, Parking",Multifamily Housing,30480.0,Parking,16810.0,,
50030,Multifamily MR (5-9),Mid-Rise Multifamily,283479,0,283479,252168,86340,165828,31311,-86340,117651,"Multifamily Housing, Parking",Multifamily Housing,165828.0,Parking,86340.0,,
50031,NonResidential,Large Office,513816,0,513816,511785,151658,360127,2031,-151658,153689,"Office, Parking, Restaurant",Office,359040.0,Parking,151658.0,Restaurant,1087.0
50058,Multifamily LR (1-4),Low-Rise Multifamily,48230,0,48230,42600,16425,26175,5630,-16425,22055,"Multifamily Housing, Parking",Multifamily Housing,26175.0,Parking,16425.0,,


## Incohérence de classe 7<br/>63 cas, 2 % des cas

Ce sont  les cas où :
* $\Delta a \gt 0$
* $\Delta a_i \lt 0$
* $\Delta a_o \gt 0$

In [36]:
class_7_members = _table[a_class[7]]
class_7_members_details = join_use_allocation(class_7_members, data)
print(class_7_members_details.shape[0])
display(class_7_members_details)

63


Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i,ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
9,Nonresidential COS,Other,97288,37198,60090,88830,0,88830,8458,37198,-28740,Police Station,Police Station,88830.0,,,,
115,NonResidential,Small- and Mid-Sized Office,76213,25930,50283,72406,21454,50952,3807,4476,-669,"Office, Other, Parking",Office,48546.0,Parking,21454.0,Other,2406.0
183,NonResidential,Residence Hall,139600,37500,102100,135520,0,135520,4080,37500,-33420,Residence Hall/Dormitory,Residence Hall/Dormitory,135520.0,,,,
338,NonResidential,Other,299070,68432,230638,250000,0,250000,49070,68432,-19362,Other,Other,250000.0,,,,
211,Campus,University,694072,111625,582447,667335,0,667335,26737,111625,-84888,College/University,College/University,667335.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27252,Multifamily MR (5-9),Mid-Rise Multifamily,121773,25568,96205,121000,0,121000,773,25568,-24795,Multifamily Housing,Multifamily Housing,121000.0,,,,
27378,NonResidential,Supermarket / Grocery Store,76585,38585,38000,41447,0,41447,35138,38585,-3447,"Parking, Supermarket/Grocery Store",Supermarket/Grocery Store,41447.0,Parking,0.0,,
49867,NonResidential,Large Office,333714,146687,187027,324765,133432,191333,8949,13255,-4306,"Office, Parking",Office,191333.0,Parking,133432.0,,
49945,NonResidential,Senior Care Community,197395,156000,41395,43036,0,43036,154359,156000,-1641,"Office, Other - Public Services, Personal Serv...",Senior Care Community,38800.0,Other - Public Services,3650.0,Office,586.0


# $\Delta a \lt 0 \Leftrightarrow a \lt a_u$<br/>Classes 8 à 12<br/>726 cas, 23 % des cas

In [37]:
neg_a_diff = _table[a_lt_a_u]
display(neg_a_diff)

Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
8,NonResidential,Hotel,175580,62000,113580,191454,68009,123445,-15874,-6009,-9865
16,NonResidential,Hotel,333176,61161,272015,336640,0,336640,-3464,61161,-64625
18,NonResidential,Hotel,315952,57600,258352,353111,57600,295511,-37159,0,-37159
19,NonResidential,Hotel,92190,25200,66990,92590,25200,67390,-400,0,-400
21,Nonresidential COS,Other,412000,57000,355000,414987,49000,365987,-2987,8000,-10987
...,...,...,...,...,...,...,...,...,...,...,...
50089,Multifamily LR (1-4),Low-Rise Multifamily,25442,604,24838,34164,5582,28582,-8722,-4978,-3744
50090,Multifamily MR (5-9),Mid-Rise Multifamily,63825,4850,58975,71241,0,71241,-7416,4850,-12266
50093,Multifamily MR (5-9),Mid-Rise Multifamily,86045,8908,77137,87854,8126,79728,-1809,782,-2591
50095,Multifamily MR (5-9),Mid-Rise Multifamily,208136,58818,149318,212938,44717,168221,-4802,14101,-18903


## Incohérences de classe 8<br/>117 cas, 4 % des cas

Ce sont  les cas où :
* $\Delta a \lt 0$
* $\Delta a_i = 0$
* $\Delta a_o \lt 0$

In [38]:
class_8_members = _table[a_class[8]]
class_8_members_details = join_use_allocation(class_8_members, data)
print(class_8_members_details.shape[0])
display(class_8_members_details)

117


Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i,ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
144,NonResidential,Hotel,190980,0,190980,210031,19051,190980,-19051,-19051,0,"Hotel, Parking",Hotel,190980.0,Parking,19051.0,,
387,NonResidential,Large Office,298426,0,298426,496176,197750,298426,-197750,-197750,0,"Office, Other - Education, Parking",Office,286538.0,Parking,197750.0,Other - Education,11888.0
477,Campus,Other,535947,0,535947,639930,103983,535947,-103983,-103983,0,"Other, Parking",Other,535947.0,Parking,103983.0,,
19511,Multifamily LR (1-4),Low-Rise Multifamily,21888,0,21888,27388,5500,21888,-5500,-5500,0,"Multifamily Housing, Parking",Multifamily Housing,21888.0,Parking,5500.0,,
19792,Multifamily MR (5-9),Mid-Rise Multifamily,21715,0,21715,26005,4290,21715,-4290,-4290,0,"Multifamily Housing, Parking",Multifamily Housing,21715.0,Parking,4290.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50017,Multifamily MR (5-9),Mid-Rise Multifamily,70191,0,70191,96478,26287,70191,-26287,-26287,0,"Multifamily Housing, Parking",Multifamily Housing,70191.0,Parking,26287.0,,
50018,Multifamily MR (5-9),Mid-Rise Multifamily,82087,0,82087,107281,25194,82087,-25194,-25194,0,"Multifamily Housing, Parking",Multifamily Housing,82087.0,Parking,25194.0,,
50023,Multifamily LR (1-4),Low-Rise Multifamily,182378,0,182378,227178,44800,182378,-44800,-44800,0,"Multifamily Housing, Parking",Multifamily Housing,182378.0,Parking,44800.0,,
50028,Multifamily LR (1-4),Low-Rise Multifamily,43566,0,43566,52168,8602,43566,-8602,-8602,0,"Multifamily Housing, Parking",Multifamily Housing,43566.0,Parking,8602.0,,


## Incohérences de classe 9<br/>107 cas, 3 % des cas

Ce sont  les cas où :
* $\Delta a \lt 0$
* $\Delta a_i \gt 0$
* $\Delta a_o \lt 0$

In [39]:
class_9_members = _table[a_class[9]]
class_9_members_details = join_use_allocation(class_9_members, data)
print(class_9_members_details.shape[0])
display(class_9_members_details)

107


Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i,ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
32,NonResidential,Hotel,158676,30301,128375,159176,42000,117176,-500,-11699,11199,"Hotel, Parking, Retail Store, Swimming Pool",Hotel,112676.0,Parking,42000.0,Retail Store,4500.0
100,NonResidential,Large Office,316306,0,316306,427691,150726,276965,-111385,-150726,39341,"Office, Other, Parking, Retail Store",Office,261826.0,Parking,150726.0,Other,15139.0
102,NonResidential,Hotel,282863,44766,238097,287325,51537,235788,-4462,-6771,2309,"Hotel, Parking",Hotel,235788.0,Parking,51537.0,,
238,Nonresidential COS,Small- and Mid-Sized Office,91130,0,91130,102699,11850,90849,-11569,-11850,281,"Data Center, Distribution Center, Office, Othe...",Office,57968.0,Distribution Center,32881.0,Parking,11850.0
264,NonResidential,Mixed Use Property,110785,0,110785,136922,29839,107083,-26137,-29839,3702,"Financial Office, Multifamily Housing, Parking...",Multifamily Housing,58563.0,Supermarket/Grocery Store,48520.0,Parking,29839.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49986,Multifamily MR (5-9),Residence Hall,69451,0,69451,69912,23084,46828,-461,-23084,22623,"Parking, Residence Hall/Dormitory",Residence Hall/Dormitory,46828.0,Parking,23084.0,,
50003,Multifamily LR (1-4),Low-Rise Multifamily,21267,0,21267,23398,3765,19633,-2131,-3765,1634,"Multifamily Housing, Parking",Multifamily Housing,19633.0,Parking,3765.0,,
50016,Multifamily MR (5-9),Mid-Rise Multifamily,418285,0,418285,545910,127633,418277,-127625,-127633,8,"Multifamily Housing, Parking",Multifamily Housing,418277.0,Parking,127633.0,,
50046,Multifamily MR (5-9),Mid-Rise Multifamily,110964,0,110964,131645,27261,104384,-20681,-27261,6580,"Multifamily Housing, Parking",Multifamily Housing,104384.0,Parking,27261.0,,


## Incohérences de classe 10<br/>355 cas, 11 % des cas

Ce sont  les cas où :
* $\Delta a \lt 0$
* $\Delta a_i \lt 0$
* $\Delta a_o = 0$

In [40]:
class_10_members = _table[a_class[10]]
class_10_members_details = join_use_allocation(class_10_members, data)
print(class_10_members_details.shape[0])
display(class_10_members_details)

355


Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i,ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
18,NonResidential,Hotel,315952,57600,258352,353111,57600,295511,-37159,0,-37159,"Hotel, Parking",Hotel,295511.0,Parking,57600.0,,
19,NonResidential,Hotel,92190,25200,66990,92590,25200,67390,-400,0,-400,"Hotel, Parking",Hotel,67390.0,Parking,25200.0,,
22,NonResidential,Other,103911,0,103911,130000,0,130000,-26089,0,-26089,"Fitness Center/Health Club/Gym, Office, Swimmi...",Fitness Center/Health Club/Gym,90000.0,Office,40000.0,Swimming Pool,0.0
23,NonResidential,Hotel,416281,85000,331281,433329,85000,348329,-17048,0,-17048,"Hotel, Parking",Hotel,348329.0,Parking,85000.0,,
43,Campus,Mixed Use Property,494835,0,494835,1856706,0,1856706,-1361871,0,-1361871,"Energy/Power Station, Laboratory, Manufacturin...",Office,757027.0,Laboratory,639931.0,Non-Refrigerated Warehouse,459748.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50035,NonResidential,Hotel,144614,0,144614,159500,0,159500,-14886,0,-14886,Hotel,Hotel,159500.0,,,,
50061,Multifamily MR (5-9),Mid-Rise Multifamily,91128,0,91128,97468,0,97468,-6340,0,-6340,Multifamily Housing,Multifamily Housing,91128.0,Retail Store,6340.0,,
50062,NonResidential,Hotel,126823,41539,85284,129696,41539,88157,-2873,0,-2873,"Hotel, Parking, Swimming Pool",Hotel,88157.0,Parking,41539.0,Swimming Pool,0.0
50081,NonResidential,K-12 School,45000,0,45000,45728,0,45728,-728,0,-728,"K-12 School, Parking",K-12 School,45728.0,Parking,0.0,,


## Incohérences de classe 11<br/>67 cas, 2 % des cas

Ce sont  les cas où :
* $\Delta a \lt 0$
* $\Delta a_i \lt 0$
* $\Delta a_o \gt 0$

In [41]:
class_11_members = _table[a_class[11]]
class_11_members_details = join_use_allocation(class_11_members, data)
print(class_11_members_details.shape[0])
display(class_11_members_details)

67


Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i,ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
16,NonResidential,Hotel,333176,61161,272015,336640,0,336640,-3464,61161,-64625,Hotel,Hotel,336640.0,,,,
21,Nonresidential COS,Other,412000,57000,355000,414987,49000,365987,-2987,8000,-10987,"Data Center, Library, Parking",Library,364913.0,Parking,49000.0,Data Center,1074.0
56,NonResidential,Hotel,332210,205970,126240,348630,0,348630,-16420,205970,-222390,Hotel,Hotel,348630.0,,,,
63,NonResidential,Hotel,994212,146694,847518,1111880,117668,994212,-117668,29026,-146694,"Hotel, Parking, Swimming Pool",Hotel,994212.0,Parking,117668.0,Swimming Pool,0.0
69,NonResidential,Hotel,116300,28200,88100,116790,0,116790,-490,28200,-28690,Hotel,Hotel,88490.0,Parking,28300.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50057,Multifamily HR (10+),High-Rise Multifamily,319481,41915,277566,396398,35180,361218,-76917,6735,-83652,"Multifamily Housing, Office, Other - Lodging/R...",Multifamily Housing,308680.0,Other - Lodging/Residential,52538.0,Parking,35180.0
50074,Multifamily MR (5-9),Mid-Rise Multifamily,59653,14383,45270,60023,10664,49359,-370,3719,-4089,"Multifamily Housing, Parking",Multifamily Housing,49359.0,Parking,10664.0,,
50090,Multifamily MR (5-9),Mid-Rise Multifamily,63825,4850,58975,71241,0,71241,-7416,4850,-12266,Multifamily Housing,Multifamily Housing,71241.0,,,,
50093,Multifamily MR (5-9),Mid-Rise Multifamily,86045,8908,77137,87854,8126,79728,-1809,782,-2591,"Multifamily Housing, Other - Public Services, ...",Multifamily Housing,78359.0,Parking,8126.0,Other - Public Services,1369.0


## Incohérences de classe 12<br/>80 cas, 2 % des cas

Ce sont  les cas où :
* $\Delta a \lt 0$
* $\Delta a_i \lt 0$
* $\Delta a_o \lt 0$


In [42]:
class_12_members = _table[a_class[12]]
class_12_members_details = join_use_allocation(class_12_members, data)
print(class_12_members_details.shape[0])
display(class_12_members_details)

80


Unnamed: 0_level_0,btype,ptype,a,a_o,a_i,a_u,a_u_o,a_u_i,a - a_u,a_o - a_u_o,a_i - a_u_i,ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
8,NonResidential,Hotel,175580,62000,113580,191454,68009,123445,-15874,-6009,-9865,"Hotel, Parking, Swimming Pool",Hotel,123445.0,Parking,68009.0,Swimming Pool,0.000000
107,NonResidential,Large Office,571329,0,571329,935703,310699,625004,-364374,-310699,-53675,"Office, Other, Parking",Office,598801.0,Parking,310699.0,Other,26203.000000
147,NonResidential,Hospital,285333,0,285333,451526,148865,302661,-166193,-148865,-17328,"Hospital (General Medical & Surgical), Parking",Hospital (General Medical & Surgical),302661.0,Parking,148865.0,,
268,NonResidential,Hospital,597519,0,597519,650222,48607,601615,-52703,-48607,-4096,"Hospital (General Medical & Surgical), Parking",Hospital (General Medical & Surgical),601615.0,Parking,48607.0,,
276,NonResidential,Hospital,1158691,0,1158691,1737833,387651,1350182,-579142,-387651,-191491,"Hospital (General Medical & Surgical), Parking",Hospital (General Medical & Surgical),1350182.0,Parking,387651.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50002,NonResidential,Other,33648,0,33648,122600,83600,39000,-88952,-83600,-5352,"Automobile Dealership, Parking",Parking,83600.0,Automobile Dealership,39000.0,,
50054,Nonresidential COS,Large Office,536697,197659,339038,551329,202178,349151,-14632,-4519,-10113,"Office, Parking, Retail Store",Office,342838.0,Parking,202178.0,Retail Store,6313.200195
50086,Multifamily LR (1-4),Low-Rise Multifamily,51095,2784,48311,52115,2791,49324,-1020,-7,-1013,"Multifamily Housing, Parking",Multifamily Housing,49324.0,Parking,2791.0,,
50088,Multifamily LR (1-4),Low-Rise Multifamily,41403,0,41403,46955,4941,42014,-5552,-4941,-611,"Multifamily Housing, Parking",Multifamily Housing,42014.0,Parking,4941.0,,


# Etude de cas remarquables

## Etiquetage des classes

In [50]:
import pandas as pd
def get_class_labels(table):
    s = pd.Series(index=table.index, name='class', dtype=int)
    for i, bi in enumerate(a_class):
        s[bi] = i
    return s.astype(int)

class_labels = get_class_labels(table)

## Le cas n° 32

In [52]:
print('Class of case n° 32 :', class_labels.loc[32])

Class of case n° 32 : 9


Extraction des informations :

In [92]:
#from IPython.display import display, Math
from pepper_commons import bold, print_title, print_subtitle

def display_sample_info(id, data, table, class_labels):
    print_title(f"Sample n° {id}")
    sample = data.loc[id]
    sample_analysis = table.loc[id]
    sample_class = class_labels.loc[id]

    print_subtitle("Main areas (GFA)")
    display(Math("a = " + str(sample_analysis['a'])))
    display(Math("a_i = " + str(sample_analysis['a_i']) + " = p_i + \\overline{p_i}"))
    display(Math("a_o = " + str(sample_analysis['a_o']) + " = p_o + \\overline{p_o}"))
    display(Math("a_u = " + str(sample_analysis['a_u'])))
    display(Math("a_{u_i} = " + str(sample_analysis['a_u_i'])))
    display(Math("a_{u_o} = " + str(sample_analysis['a_u_o'])))
    
    print_subtitle("Areas diffs")
    display(Math("\Delta a = a - a_u = " + str(sample_analysis['a - a_u'])))
    display(Math("\Delta a_i = a_i - a_{u_i} = " + str(sample_analysis['a_i - a_u_i'])))
    display(Math("\Delta a_o = a_o - a_{u_o} = " + str(sample_analysis['a_o - a_u_o'])))

    print_subtitle("Details : areas by use")
    display(Math("a_\\text{1st} = a_\\text{"
        + sample.LargestPropertyUseType + "} = "
        + str(int(sample.LargestPropertyUseTypeGFA))))
    display(Math("a_\\text{2nd} = a_\\text{"
        + sample.SecondLargestPropertyUseType + "} = "
        + str(int(sample.SecondLargestPropertyUseTypeGFA))))
    display(Math("a_\\text{3rd} = a_\\text{"
        + sample.ThirdLargestPropertyUseType + "} = "
        + str(int(sample.ThirdLargestPropertyUseTypeGFA))))


display_sample_info(32, _data, _table, class_labels)

[1m[35m
SAMPLE N° 32[0m[0m
[1m[36m
Main areas (GFA)[0m[0m


<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

[1m[36m
Areas diffs[0m[0m


<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

[1m[36m
Details : areas by use[0m[0m


<IPython.core.display.Math object>

<IPython.core.display.Math object>

<IPython.core.display.Math object>

In [81]:
display(_data.loc[32])

BuildingType                                                    NonResidential
PrimaryPropertyType                                                      Hotel
PropertyName                                                   Homewood Suites
Address                                                           1011 Pike ST
ZipCode                                                                98101.0
TaxParcelIdentificationNumber                                       0660001832
CouncilDistrictCode                                                          7
Neighborhood                                                              EAST
Latitude                                                              47.61301
Longitude                                                           -122.32929
YearBuilt                                                                 1991
NumberofBuildings                                                          1.0
NumberofFloors                                      

# Export GSheet

Pourquoi exporter vu que la même opération est facile à réaliser directement en GSheet ?

3 feuilles : version brute, version corrigée, diff : modfications correctives rendues visibles.

1/ On commence doucement : production et export du tableau avec les détails muni d'une colonne de classe.
2/ visualisation des scatters pour chacune des classes (ça, je l'insère au-dessus), en version brute et en version corrigée
3/ ...


### Tableau avec détail et classes

Rappel :
|Classe|$\Delta s$|$\Delta s_i$|$\Delta s_e$|#cas|Commentaire/correction|

<mark>... le faire produire par une fonction</mark>

In [None]:
def add_class_id(table):
    classes_list = [
        class_1, class_2, class_3, class_4, class_5, class_6, class_7,
        class_8, class_9, class_10, class_11, class_12, class_13]
    table['class_id'] = 0
    for i, c in enumerate(classes_list):
        table.loc[c, 'class_id'] = i + 1

In [None]:
exported = _table.copy()
add_class_id(exported)
exported = join_use_allocation(exported, data)
display(exported)

Unnamed: 0_level_0,btype,ptype,s,se,si,s_u,se_u,si_u,s - s_u,se - se_u,si - si_u,class_id,ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1,NonResidential,Hotel,88434,0,88434,88434,0,88434,0,0,0,1,Hotel,Hotel,88434.0,,,,
2,NonResidential,Hotel,103566,15064,88502,103566,15064,88502,0,0,0,1,"Hotel, Parking, Restaurant",Hotel,83880.0,Parking,15064.0,Restaurant,4622.0
3,NonResidential,Hotel,956110,196718,759392,756493,0,756493,199617,196718,2899,6,Hotel,Hotel,756493.0,,,,
5,NonResidential,Hotel,61320,0,61320,61320,0,61320,0,0,0,1,Hotel,Hotel,61320.0,,,,
8,NonResidential,Hotel,175580,62000,113580,191454,68009,123445,-15874,-6009,-9865,13,"Hotel, Parking, Swimming Pool",Hotel,123445.0,Parking,68009.0,Swimming Pool,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50221,Nonresidential COS,Other,18261,0,18261,18261,0,18261,0,0,0,1,Other - Recreation,Other - Recreation,18261.0,,,,
50223,Nonresidential COS,Other,16000,0,16000,16000,0,16000,0,0,0,1,Other - Recreation,Other - Recreation,16000.0,,,,
50224,Nonresidential COS,Other,13157,0,13157,13157,0,13157,0,0,0,1,"Fitness Center/Health Club/Gym, Other - Recrea...",Other - Recreation,7583.0,Fitness Center/Health Club/Gym,5574.0,Swimming Pool,0.0
50225,Nonresidential COS,Mixed Use Property,14101,0,14101,13586,0,13586,515,0,515,5,"Fitness Center/Health Club/Gym, Food Service, ...",Other - Recreation,6601.0,Fitness Center/Health Club/Gym,6501.0,Pre-school/Daycare,484.0


In [None]:
exported.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3211 entries, 1 to 50226
Data columns (total 19 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   btype                            3211 non-null   object 
 1   ptype                            3211 non-null   object 
 2   s                                3211 non-null   int64  
 3   se                               3211 non-null   int64  
 4   si                               3211 non-null   int64  
 5   s_u                              3211 non-null   int32  
 6   se_u                             3211 non-null   int32  
 7   si_u                             3211 non-null   int32  
 8   s - s_u                          3211 non-null   int32  
 9   se - se_u                        3211 non-null   int32  
 10  si - si_u                        3211 non-null   int32  
 11  class_id                         3211 non-null   int64  
 12  ListOfAllPropertyUs

In [None]:
from gspread_pandas import Spread
spread = Spread('1gtTOd-taN9aY8sg4PGY456E2AlsMxi2W_-7kZaCSYlA')    # target GSheet TODO : externaliser, spécif. user
as_fr_FR = [c for c in exported.columns if exported[c].dtype == 'float64']
data_to_gsheet(exported, spread, 'gross_areas', as_code=None, as_fr_FR=as_fr_FR)