# [Un premier tuto](https://towardsdatascience.com/working-with-multi-index-pandas-dataframes-f64d2e2c3e02)

In [2]:
import pandas as pd
scores = {
    'Zone': ['North','South','South', 'East','East','West','West','West','West'], 
    'School': ['Rushmore','Bayside','Rydell', 'Shermer','Shermer','Ridgemont', 'Hogwarts','Hogwarts','North Shore'],             
    'Name': ['Jonny','Joe','Jakob', 'Jimmy','Erik','Lam','Yip','Chen','Jim'], 
    'Math': [78,76,56,67,89,100,55,76,79],
    'Science': [70,68,90,45,66,89,32,98,70]}
df = pd.DataFrame(scores, columns=['Zone', 'School', 'Name', 'Science', 'Math'])
df

Unnamed: 0,Zone,School,Name,Science,Math
0,North,Rushmore,Jonny,70,78
1,South,Bayside,Joe,68,76
2,South,Rydell,Jakob,90,56
3,East,Shermer,Jimmy,45,67
4,East,Shermer,Erik,66,89
5,West,Ridgemont,Lam,89,100
6,West,Hogwarts,Yip,32,55
7,West,Hogwarts,Chen,98,76
8,West,North Shore,Jim,70,79


L'agrégation produit des multi-index. Ici lignes et colonnes :

In [3]:
df_result_zone_school = df.groupby(['Zone','School']).agg({
    'Science':['mean','min','max'],
     'Math':['mean','min','max']
})
df_result_zone_school

Unnamed: 0_level_0,Unnamed: 1_level_0,Science,Science,Science,Math,Math,Math
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,min,max,mean,min,max
Zone,School,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
East,Shermer,55.5,45,66,78.0,67,89
North,Rushmore,70.0,70,70,78.0,78,78
South,Bayside,68.0,68,68,76.0,76,76
South,Rydell,90.0,90,90,56.0,56,56
West,Hogwarts,65.0,32,98,65.5,55,76
West,North Shore,70.0,70,70,79.0,79,79
West,Ridgemont,89.0,89,89,100.0,100,100


Introspection

In [6]:
# nb > ce qui s'affiche n'a rien à voir avec la version du tuto, mais c'est bien un multi index
display(df_result_zone_school.columns)
print(df_result_zone_school.columns)

# accès aux index par niveau
print(df_result_zone_school.columns.get_level_values(0))
print(df_result_zone_school.columns.get_level_values(1))


MultiIndex([('Science', 'mean'),
            ('Science',  'min'),
            ('Science',  'max'),
            (   'Math', 'mean'),
            (   'Math',  'min'),
            (   'Math',  'max')],
           )

MultiIndex([('Science', 'mean'),
            ('Science',  'min'),
            ('Science',  'max'),
            (   'Math', 'mean'),
            (   'Math',  'min'),
            (   'Math',  'max')],
           )
Index(['Science', 'Science', 'Science', 'Math', 'Math', 'Math'], dtype='object')
Index(['mean', 'min', 'max', 'mean', 'min', 'max'], dtype='object')


Extraction de colonnes (très avantageuse^^)

In [9]:
display(df_result_zone_school['Science'])       # header 'Science' not included
display(df_result_zone_school[['Science']])     # header 'Science' included

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,min,max
Zone,School,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
East,Shermer,55.5,45,66
North,Rushmore,70.0,70,70
South,Bayside,68.0,68,68
South,Rydell,90.0,90,90
West,Hogwarts,65.0,32,98
West,North Shore,70.0,70,70
West,Ridgemont,89.0,89,89


Unnamed: 0_level_0,Unnamed: 1_level_0,Science,Science,Science
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,min,max
Zone,School,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
East,Shermer,55.5,45,66
North,Rushmore,70.0,70,70
South,Bayside,68.0,68,68
South,Rydell,90.0,90,90
West,Hogwarts,65.0,32,98
West,North Shore,70.0,70,70
West,Ridgemont,89.0,89,89


Extraction suivant une branche level 0 -> level 1 -> .. : tuples

In [10]:
df_result_zone_school[('Science', 'mean')]       # retour d'une série multi-indexée
df_result_zone_school[[('Science', 'mean')]]     # idem, mais retour d'un DF

Zone   School     
East   Shermer        55.5
North  Rushmore       70.0
South  Bayside        68.0
       Rydell         90.0
West   Hogwarts       65.0
       North Shore    70.0
       Ridgemont      89.0
Name: (Science, mean), dtype: float64

Extraction de deux colonnes sous un même index de niveau 0

In [16]:
# df_result_zone_school[('Science', ['mean', 'min'])]    # là, ça plante ! mais il le dit^^
df_result_zone_school.loc[:, ('Science', ['mean', 'min'])]
# equiv à df_result_zone_school.loc[:,[('Science','mean'), ('Science','min')]]


Unnamed: 0_level_0,Unnamed: 1_level_0,Science,Science
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,min
Zone,School,Unnamed: 2_level_2,Unnamed: 3_level_2
East,Shermer,55.5,45
North,Rushmore,70.0,70
South,Bayside,68.0,68
South,Rydell,90.0,90
West,Hogwarts,65.0,32
West,North Shore,70.0,70
West,Ridgemont,89.0,89


Slicing :

In [17]:
df_result_zone_school.loc[:,'Science':'Math']

Unnamed: 0_level_0,Unnamed: 1_level_0,Science,Science,Science,Math,Math,Math
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,min,max,mean,min,max
Zone,School,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
East,Shermer,55.5,45,66,78.0,67,89
North,Rushmore,70.0,70,70,78.0,78,78
South,Bayside,68.0,68,68,76.0,76,76
South,Rydell,90.0,90,90,56.0,56,56
West,Hogwarts,65.0,32,98,65.5,55,76
West,North Shore,70.0,70,70,79.0,79,79
West,Ridgemont,89.0,89,89,100.0,100,100


In [18]:
# on peut être tenté de faire ça : df_result_zone_school.loc[:,('Science','mean':'max')] mais ça ne marche pas :
df_result_zone_school.loc[:,('Science','mean'):('Science','max')]

Unnamed: 0_level_0,Unnamed: 1_level_0,Science,Science,Science
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,min,max
Zone,School,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
East,Shermer,55.5,45,66
North,Rushmore,70.0,70,70
South,Bayside,68.0,68,68
South,Rydell,90.0,90,90
West,Hogwarts,65.0,32,98
West,North Shore,70.0,70,70
West,Ridgemont,89.0,89,89


Et les lignes..

In [21]:
# équivalents :
display(df_result_zone_school.index.get_level_values(0))
display(df_result_zone_school.index.get_level_values('Zone'))
# idem
display(df_result_zone_school.index.get_level_values(1))
display(df_result_zone_school.index.get_level_values('School'))

Index(['East', 'North', 'South', 'South', 'West', 'West', 'West'], dtype='object', name='Zone')

Index(['East', 'North', 'South', 'South', 'West', 'West', 'West'], dtype='object', name='Zone')

Index(['Shermer', 'Rushmore', 'Bayside', 'Rydell', 'Hogwarts', 'North Shore',
       'Ridgemont'],
      dtype='object', name='School')

Index(['Shermer', 'Rushmore', 'Bayside', 'Rydell', 'Hogwarts', 'North Shore',
       'Ridgemont'],
      dtype='object', name='School')

Extraction :

In [30]:
display(df_result_zone_school.loc['South'])
display(df_result_zone_school.loc[['South']])
display(df_result_zone_school.loc[['South','West']])
display(df_result_zone_school.loc[('South','Bayside')])
display(df_result_zone_school.loc[[('South','Bayside')]])
display(df_result_zone_school.loc[
    [('South','Bayside'),
    ('West','Ridgemont')]
])
df_result_zone_school.loc[('West',['Hogwarts','Ridgemont']),]
df_result_zone_school.loc['North':'West']                         # slicing
df_result_zone_school.loc[
    ('South','Bayside'):('West','Hogwarts')
]
df_result_zone_school.iloc[2:5, 3:]                               # iloc puissant

Unnamed: 0_level_0,Science,Science,Science,Math,Math,Math
Unnamed: 0_level_1,mean,min,max,mean,min,max
School,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Bayside,68.0,68,68,76.0,76,76
Rydell,90.0,90,90,56.0,56,56


Unnamed: 0_level_0,Unnamed: 1_level_0,Science,Science,Science,Math,Math,Math
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,min,max,mean,min,max
Zone,School,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
South,Bayside,68.0,68,68,76.0,76,76
South,Rydell,90.0,90,90,56.0,56,56


Unnamed: 0_level_0,Unnamed: 1_level_0,Science,Science,Science,Math,Math,Math
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,min,max,mean,min,max
Zone,School,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
South,Bayside,68.0,68,68,76.0,76,76
South,Rydell,90.0,90,90,56.0,56,56
West,Hogwarts,65.0,32,98,65.5,55,76
West,North Shore,70.0,70,70,79.0,79,79
West,Ridgemont,89.0,89,89,100.0,100,100


Science  mean    68.0
         min     68.0
         max     68.0
Math     mean    76.0
         min     76.0
         max     76.0
Name: (South, Bayside), dtype: float64

Unnamed: 0_level_0,Unnamed: 1_level_0,Science,Science,Science,Math,Math,Math
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,min,max,mean,min,max
Zone,School,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
South,Bayside,68.0,68,68,76.0,76,76


Unnamed: 0_level_0,Unnamed: 1_level_0,Science,Science,Science,Math,Math,Math
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,min,max,mean,min,max
Zone,School,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
South,Bayside,68.0,68,68,76.0,76,76
West,Ridgemont,89.0,89,89,100.0,100,100


Unnamed: 0_level_0,Unnamed: 1_level_0,Math,Math,Math
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,min,max
Zone,School,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
South,Bayside,76.0,76,76
South,Rydell,56.0,56,56
West,Hogwarts,65.5,55,76


A terminer..
Autres tutos qui ont l'air intéressants et complémentaires :
1. https://sparkbyexamples.com/pandas/pandas-multiindex-dataframe-examples/
2. midx & plotting : https://stackoverflow.com/questions/25386870/pandas-plotting-with-multi-index

# [Le tuto officiel](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html)

Il a l'air d'être bien hardcore.

Mon défi sera de faire du jamais vu, à savoir plus de 2 niveaux. J'ai des tonnes d'idées d'applications.