Le multi indexing permet d'obtenir plusieurs niveaux d'aggregation, qui est equivalent au groupby()

L'indexing, comme les pivot table, permettent egalement de faire des regroupements

https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html

In [1]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt 

Changed in version 0.24.0: MultiIndex.labels has been renamed to MultiIndex.codes and MultiIndex.set_labels to MultiIndex.set_codes

## Creating a MultiIndex (hierarchical index) object

In [2]:
arrays = [["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
         ["one", "two", "one", "two", "one", "two", "one", "two"]]
print('type : ',type(arrays))

type :  <class 'list'>


In [3]:
tuples = list(zip(arrays))
tuples

[(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],),
 (['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'],)]

type :  <class 'list'>

In [4]:
tuples = list(zip(*arrays))
tuples

[('bar', 'one'),
 ('bar', 'two'),
 ('baz', 'one'),
 ('baz', 'two'),
 ('foo', 'one'),
 ('foo', 'two'),
 ('qux', 'one'),
 ('qux', 'two')]

type :  <class 'list'>

In [5]:
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
index

MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('baz', 'one'),
            ('baz', 'two'),
            ('foo', 'one'),
            ('foo', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           names=['first', 'second'])

In [6]:
s = pd.Series(np.random.randn(8), index=index)
s

first  second
bar    one      -0.930402
       two      -0.167721
baz    one      -0.847127
       two       1.014201
foo    one      -0.314732
       two      -0.380000
qux    one      -0.436832
       two       0.904889
dtype: float64

When you want every pairing of the elements in two iterables, it can be easier to use the MultiIndex.from_product() method:

In [7]:
iterables = [["bar", "baz", "foo", "qux"], ["one", "two"]]

pd.MultiIndex.from_product(iterables, names=["first", "second"])

MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('baz', 'one'),
            ('baz', 'two'),
            ('foo', 'one'),
            ('foo', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           names=['first', 'second'])

You can also construct a MultiIndex from a DataFrame directly, using the method MultiIndex.from_frame(). This is a complementary method to MultiIndex.to_frame().

In [8]:
df = pd.DataFrame(
        [["bar", "one"], ["bar", "two"], ["foo", "one"], ["foo", "two"]],
        columns=["first", "second"])

pd.MultiIndex.from_frame(df)

MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('foo', 'one'),
            ('foo', 'two')],
           names=['first', 'second'])

As a convenience, you can pass a list of arrays directly into Series or DataFrame to construct a MultiIndex automatically:

In [9]:
arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
          np.array(["one", "two", "one", "two", "one", "two", "one", "two"])
         ]

s = pd.Series(np.random.randn(8), index=arrays)
s

bar  one   -1.009042
     two   -0.523819
baz  one   -0.284046
     two    0.985943
foo  one    0.891054
     two    0.201923
qux  one    0.380043
     two    0.361795
dtype: float64

In [10]:
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df

Unnamed: 0,Unnamed: 1,0,1,2,3
bar,one,-0.321617,1.068917,-1.007877,-1.413843
bar,two,1.50665,0.131779,0.314327,-0.150214
baz,one,0.428627,-0.193888,1.364639,-1.16793
baz,two,-1.022001,-0.819106,-0.970742,0.874624
foo,one,0.304114,0.36618,0.331843,0.349328
foo,two,2.700864,1.065227,-0.033305,0.408338
qux,one,0.69594,-0.501838,-0.572056,-0.683186
qux,two,0.515908,-3.051859,-1.079251,-0.335159


All of the MultiIndex constructors accept a names argument which stores string names for the levels themselves. If no names are provided, None will be assigned:

In [11]:
df.index.names

FrozenList([None, None])

This index can back any axis of a pandas object, and the number of levels of the index is up to you:

In [12]:
df = pd.DataFrame(np.random.randn(3, 8), index=["A", "B", "C"], columns=index)
df

first,bar,bar,baz,baz,foo,foo,qux,qux
second,one,two,one,two,one,two,one,two
A,1.061518,-0.069417,-1.168373,1.687725,0.401029,1.180221,0.650803,0.657541
B,-0.674467,1.820494,-0.02803,-2.102226,1.032391,0.55301,-0.195269,-0.14757
C,-1.371395,-2.587804,1.388101,1.42808,-0.763253,-0.543728,-0.399004,0.818151


In [13]:
pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])

Unnamed: 0_level_0,first,bar,bar,baz,baz,foo,foo
Unnamed: 0_level_1,second,one,two,one,two,one,two
first,second,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
bar,one,0.006832,0.752198,0.618209,-2.231662,-0.587282,-0.683926
bar,two,-0.298644,0.284214,0.407541,1.711859,0.871927,-0.217435
baz,one,-0.786849,-0.916032,0.305697,-0.427903,0.054376,0.006147
baz,two,1.316732,-0.194734,0.791956,0.793195,-2.265461,-0.46144
foo,one,-0.156249,-0.246187,2.399704,-0.022824,-0.471494,-1.048341
foo,two,-2.312578,-0.395369,-0.797812,-0.541763,2.014673,-0.425662


We’ve “sparsified” the higher levels of the indexes to make the console output a bit easier on the eyes. Note that how the index is displayed can be controlled using the multi_sparse option in pandas.set_options():

In [16]:
with pd.option_context("display.multi_sparse", False):
    df

It’s worth keeping in mind that there’s nothing preventing you from using tuples as atomic labels on an axis:

In [17]:
pd.Series(np.random.randn(8), index=tuples)

(bar, one)   -0.690481
(bar, two)    1.741310
(baz, one)    0.498624
(baz, two)    0.514853
(foo, one)    0.170603
(foo, two)   -0.333985
(qux, one)   -1.976987
(qux, two)   -0.157452
dtype: float64

The reason that the MultiIndex matters is that it can allow you to do grouping, selection, and reshaping operations as we will describe below and in subsequent areas of the documentation. As you will see in later sections, you can find yourself working with hierarchically-indexed data without creating a MultiIndex explicitly yourself. However, when loading data from a file, you may wish to generate your own MultiIndex when preparing the data set.

## Reconstructing the level labels

The method get_level_values() will return a vector of the labels for each location at a particular level:

In [18]:
index.get_level_values(0)

Index(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], dtype='object', name='first')

In [19]:
index.get_level_values('second')

Index(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'], dtype='object', name='second')