Pandas 는 3차원, 4차원을 다루는 Panel , Panel4D 가 있지만, 계층적 인덱싱을 통해 1,2차원으로 표현할 수 있다

In [1]:
import pandas as pd
import numpy as np

In [8]:
index = [('California', 2000), ('California', 2010),
         ('NewYork',2000), ('NewYork',2000),
         ('Texas', 2000), ('Texas',2010)]
populations = [110, 120, 
               130, 140 ,
               150, 160]
pop = pd.Series(populations, index=index)
pop

(California, 2000)    110
(California, 2010)    120
(NewYork, 2000)       130
(NewYork, 2000)       140
(Texas, 2000)         150
(Texas, 2010)         160
dtype: int64

인덱싱의 비효율적 예시

In [9]:
pop[[i for i in pop.index if i[1]==2010]]

(California, 2010)    120
(Texas, 2010)         160
dtype: int64

인덱싱의 효율적 예시

<b>Pandas MultiIndex</b>
- Tuple 형태를 나눠주는 것을 확인@@@

In [12]:
index = pd.MultiIndex.from_tuples(index)
index

MultiIndex([('California', 2000),
            ('California', 2010),
            (   'NewYork', 2000),
            (   'NewYork', 2000),
            (     'Texas', 2000),
            (     'Texas', 2010)],
           )

In [13]:
pop = pop.reindex(index)
pop

California  2000    110
            2010    120
NewYork     2000    130
            2000    140
Texas       2000    150
            2010    160
dtype: int64

In [15]:
# 두번째 인덱스가 2010인 모든 데이터에 접근하려면
pop[:,2010]

California    120
Texas         160
dtype: int64

In [17]:
# 열 추가하기
pop_df = pd.DataFrame({'total':pop,
                       'under18': [ 1, 2,
                                   3, 4,
                                   5, 6]})
pop_df

Unnamed: 0,Unnamed: 1,total,under18
California,2000,110,1
California,2010,120,2
NewYork,2000,130,3
NewYork,2000,140,4
Texas,2000,150,5
Texas,2010,160,6


Multi Index 생성 메서드
- 생성자에 2개 이상의 인덱스 배열 리스트를 전달하는 것!!!

In [19]:
df = pd.DataFrame(np.random.rand(4,2),
                  index = [['a','a','b','b'], [1,2,1,2]],
                  columns=['data1','data2'])
df


Unnamed: 0,Unnamed: 1,data1,data2
a,1,0.893739,0.438747
a,2,0.905438,0.924871
b,1,0.939831,0.092994
b,2,0.328136,0.854503


In [23]:
print(pd.MultiIndex.from_arrays([['a','a','b','b'],[1,2,1,2]]))
print(pd.MultiIndex.from_tuples([('a',1),('a',2),('b',1),('b',2)]))

MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           )
MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           )


MultiIndex 레벨 이름 지정하기

In [25]:
pop.index.names = ['state','year']
pop

state       year
California  2000    110
            2010    120
NewYork     2000    130
            2000    140
Texas       2000    150
            2010    160
dtype: int64