# 다중인덱스 인덱싱 및 슬라이싱

#### [ 다중 인덱스 Series 데이터 접근]
#### [ 다중 인덱스 Dataframe 데이터 접근]
#### [ 정렬, 비정렬 인덱스]
- **MutiIndex 에서 슬라이싱은 인덱스가 정렬 되어 있어야 함**

#### [ 인덱스 설정 및 재설정 ]
#### [ 다중 인덱스에서 데이터 집계 ]

In [1]:
import pandas as pd
import numpy as np

print("pandas ver : ",pd.__version__)
print("numpy ver : ",np.__version__)

pandas ver :  0.24.2
numpy ver :  1.16.4


In [2]:
index = [('California', 2000), ('California', 2010),
         ('New York', 2000), ('New York', 2010),
         ('Texas', 2000), ('Texas', 2010)]

populations = [33871648, 37253956,
               18976457, 19378102,
               20851820, 25145561]

pop = pd.Series(populations, index=index)
print(pop)
print(type(pop))

(California, 2000)    33871648
(California, 2010)    37253956
(New York, 2000)      18976457
(New York, 2010)      19378102
(Texas, 2000)         20851820
(Texas, 2010)         25145561
dtype: int64
<class 'pandas.core.series.Series'>


In [3]:
mIndex = pd.MultiIndex.from_tuples(index)
print(mIndex)

MultiIndex(levels=[['California', 'New York', 'Texas'], [2000, 2010]],
           codes=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])


In [5]:
pop = pop.reindex(mIndex)
print(pop)
print(type(pop))

California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64
<class 'pandas.core.series.Series'>


California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64

In [6]:
pop_df = pop.unstack()
pop_df

Unnamed: 0,2000,2010
California,33871648,37253956
New York,18976457,19378102
Texas,20851820,25145561


In [8]:
pop_df = pd.DataFrame({'total': pop,
                       'under18': [9267089, 9284094,
                                   4687374, 4318033,
                                   5906301, 6879014]})
pop_df

Unnamed: 0,Unnamed: 1,total,under18
California,2000,33871648,9267089
California,2010,37253956,9284094
New York,2000,18976457,4687374
New York,2010,19378102,4318033
Texas,2000,20851820,5906301
Texas,2010,25145561,6879014


In [10]:
pop.index.names = ['states','year']
print(pop)
print(type(pop))

states      year
California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64
<class 'pandas.core.series.Series'>


### [ 다중 인덱스 Series 데이터 접근] 

- 인덱스의 레벨 순서대로 지정하여 단일 데이터에 접근

In [14]:
pop['California', 2010]

37253956

- 부분 인덱싱(Partial indexing) 인데싱 이나 인덱스 레벨중 하나만 인덱싱 가능

In [16]:
ser = pop['Texas']
print(ser)
print(type(ser))

year
2000    20851820
2010    25145561
dtype: int64
<class 'pandas.core.series.Series'>


- MultiIndex 가 정렬 되어 있다면 부분 슬라이싱 가능

In [21]:
ser3 = pop['California':'New York']
print(ser3)
print(type(ser3))

states      year
California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
dtype: int64
<class 'pandas.core.series.Series'>


In [20]:
ser2 = pop[:,2000]
print(ser2)
print(type(ser2))

states
California    33871648
New York      18976457
Texas         20851820
dtype: int64
<class 'pandas.core.series.Series'>


- 부울 마스크를 이용한 인덱싱

In [22]:
pop[pop>22000000]

states      year
California  2000    33871648
            2010    37253956
Texas       2010    25145561
dtype: int64

- 팬시 인덱싱

In [25]:
pop[['California', 'Texas']]

states      year
California  2000    33871648
            2010    37253956
Texas       2000    20851820
            2010    25145561
dtype: int64

In [26]:
# California, Texas 의 2010 년 인구수 가져오기
pop[['California', 'Texas']][:,2010]

states
California    37253956
Texas         25145561
dtype: int64

### [ 다중 인덱스 Dataframe 데이터 접근]

In [28]:
# hierarchical indices and columns
index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],
                                   names=['year', 'visit'])
columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],
                                     names=['subject', 'type'])

# mock some data
data = np.round(np.random.randn(4, 6), 1)
data[:, ::2] *= 10
data += 37

# create the DataFrame
health_data = pd.DataFrame(data, index=index, columns=columns)
print(health_data)

subject      Bob       Guido         Sue      
type          HR  Temp    HR  Temp    HR  Temp
year visit                                    
2013 1      37.0  37.6  30.0  37.5  37.0  35.1
     2      46.0  37.3  36.0  35.9  51.0  36.1
2014 1      33.0  37.8  32.0  37.1  34.0  37.5
     2      25.0  37.4  20.0  36.9  31.0  35.6


- 기본적으로 키를 입력한 인덱싱은 열(columns)을 기준으로 적용됨

In [29]:
df = health_data['Guido', 'HR']
print(df)
print(type(df))

year  visit
2013  1        30.0
      2        36.0
2014  1        32.0
      2        20.0
Name: (Guido, HR), dtype: float64
<class 'pandas.core.series.Series'>


- 인덱서(loc, iloc)를 사용 가능 

In [42]:
health_data

Unnamed: 0_level_0,subject,Bob,Bob,Guido,Guido,Sue,Sue
Unnamed: 0_level_1,type,HR,Temp,HR,Temp,HR,Temp
year,visit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2013,1,37.0,37.6,30.0,37.5,37.0,35.1
2013,2,46.0,37.3,36.0,35.9,51.0,36.1
2014,1,33.0,37.8,32.0,37.1,34.0,37.5
2014,2,25.0,37.4,20.0,36.9,31.0,35.6


- .iloc[ 행 , 열 ] : 인덱스 번호(암묵적 접근)로 접근

In [49]:
# .iloc[ 행 , 열 ] : 인덱스 번호(암묵적 접근)로 접근
df2 =health_data.iloc[2:4, 4:]
print(df2)
print(type(df2))

subject      Sue      
type          HR  Temp
year visit            
2014 1      34.0  37.5
     2      31.0  35.6
<class 'pandas.core.frame.DataFrame'>


- .loc[ 행 , 열 ] : 인덱스 키로 접근, 튜플 형태로 명시적으로 표현 가능(튜플 안에서 슬라이싱은 안됨,에러발생)

In [58]:
# .loc[ 행 , 열 ] : 인덱스 키로 접근, 튜플 형태로 명시적으로 표현 가능(튜플 안에서 슬라이싱은 안됨,에러발생)
ser4 = health_data.loc[ : ,('Bob','HR') ]
print(ser4)
print(type(ser4))

year  visit
2013  1        37.0
      2        46.0
2014  1        33.0
      2        25.0
Name: (Bob, HR), dtype: float64
<class 'pandas.core.series.Series'>


In [59]:
# 단일 데이터 접근
val = health_data.loc[(2013,2) , ('Bob','HR') ]
print(val)
print(type(val))

46.0
<class 'numpy.float64'>


In [60]:
ser5 = health_data.loc[(2013)  ,('Bob','HR') ]
print(ser5)
print(type(ser5))

visit
1    37.0
2    46.0
Name: (Bob, HR), dtype: float64
<class 'pandas.core.series.Series'>


In [61]:
ser5.iloc[0]

37.0

- 파이썬 기본 함수인  `slice()`를 사용하여 슬라이스를 **명시적으로 사용 가능**
- Pandas에서는 `IndexSlice` 객체를 사용할 수 있음
  - `pandas.IndexSlice = <pandas.core.indexing._IndexSlice object>` : Create an object to more easily perform multi-index slicing


  


In [65]:
idx = pd.IndexSlice
health_data.loc[idx[ : ,1], idx[:,'HR']]

Unnamed: 0_level_0,subject,Bob,Guido,Sue
Unnamed: 0_level_1,type,HR,HR,HR
year,visit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2013,1,37.0,30.0,37.0
2014,1,33.0,32.0,34.0


### [ 정렬(사전적 순서), 비정렬 인덱스]

- 비정렬된 인덱스를 가진 데이터 선언

In [68]:
index = pd.MultiIndex.from_product([['a', 'c', 'b'], [1, 2]])
data = pd.Series(np.random.rand(6), index=index)
data.index.names = ['char', 'int']
print(data)
print(type(data))

char  int
a     1      0.250880
      2      0.317349
c     1      0.762916
      2      0.232398
b     1      0.073649
      2      0.407146
dtype: float64
<class 'pandas.core.series.Series'>


- 비정렬된 인덱스를 가지는 Series에서 슬라이싱 -> **에러발생**

In [71]:
try: 
    data['a':'b']
except KeyError as e:
        print(type(e))
        print(e)

<class 'pandas.errors.UnsortedIndexError'>
'Key length (1) was greater than MultiIndex lexsort depth (0)'


- 인덱스를 정렬하는 메소드 제공 
  - `sort_index()`, `sortlevel()` 등등
- `Series.sort_index(self, axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True)`
- `DataFrame.sort_index(self, axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, by=None)`
- `MultiIndex.sortlevel(self, level=0, ascending=True, sort_remaining=True)`

In [86]:
# 인데스 정렬
print("정렬 전 : \n",data)
print(type(data))
sorted_data = data.sort_index()
print("\n")
print("정렬 후 : \n",sorted_data)
print(type(sorted_data))

정렬 전 : 
 char  int
a     1      0.250880
      2      0.317349
c     1      0.762916
      2      0.232398
b     1      0.073649
      2      0.407146
dtype: float64
<class 'pandas.core.series.Series'>


정렬 후 : 
 char  int
a     1      0.250880
      2      0.317349
b     1      0.073649
      2      0.407146
c     1      0.762916
      2      0.232398
dtype: float64
<class 'pandas.core.series.Series'>


In [87]:
# 정렬된 데이터에서 슬라이싱 
try: 
    print(sorted_data['a':'b'])
except KeyError as e:
        print(type(e))
        print(e)

char  int
a     1      0.250880
      2      0.317349
b     1      0.073649
      2      0.407146
dtype: float64


- `stack()`, `unstack()`

In [90]:
sorted_data

char  int
a     1      0.250880
      2      0.317349
b     1      0.073649
      2      0.407146
c     1      0.762916
      2      0.232398
dtype: float64

In [93]:
df5 = sorted_data.unstack(level=0)
print(df5)
print(type(df5))

char         a         b         c
int                               
1     0.250880  0.073649  0.762916
2     0.317349  0.407146  0.232398
<class 'pandas.core.frame.DataFrame'>


In [94]:
df6 = sorted_data.unstack(level=1)
print(df6)
print(type(df6))

int          1         2
char                    
a     0.250880  0.317349
b     0.073649  0.407146
c     0.762916  0.232398
<class 'pandas.core.frame.DataFrame'>


In [96]:
df7 = sorted_data.unstack()
print(df7)
print(type(df7))

int          1         2
char                    
a     0.250880  0.317349
b     0.073649  0.407146
c     0.762916  0.232398
<class 'pandas.core.frame.DataFrame'>


In [101]:
df7.stack(level=0)

char  int
a     1      0.250880
      2      0.317349
b     1      0.073649
      2      0.407146
c     1      0.762916
      2      0.232398
dtype: float64

### [ 인덱스 설정 및 재설정 ]

In [104]:
pop

states      year
California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64

- 계층적 데이터를 재정렬 하는 방법으로 인덱스 레이블을 하나의 열로 변환 하면 됨
- `reset_index()` 를 사용
  - `DataFrame.reset_index(self, level=None, drop=False, inplace=False, col_level=0, col_fill='')`
  - `Series.reset_index(self, level=None, drop=False, name=None, inplace=False)`

In [107]:
reset_index_pop = pop.reset_index()
print(reset_index_pop)
print(type(reset_index_pop))

       states  year         0
0  California  2000  33871648
1  California  2010  37253956
2    New York  2000  18976457
3    New York  2010  19378102
4       Texas  2000  20851820
5       Texas  2010  25145561
<class 'pandas.core.frame.DataFrame'>


- name 파라미터를 이용하여 기존의 데이터의 column 이름을 지정 할 수 있음

In [110]:
reset_index_pop2 = pop.reset_index(name='population')
print(reset_index_pop2)
print(type(reset_index_pop2))

       states  year  population
0  California  2000    33871648
1  California  2010    37253956
2    New York  2000    18976457
3    New York  2010    19378102
4       Texas  2000    20851820
5       Texas  2010    25145561
<class 'pandas.core.frame.DataFrame'>


- series 데이터의 인덱스를 재설정하면 Dataframe 으로 변경됨
- Dataframe -> Dataframe 으로 변경됨

In [117]:
reset_index_pop['states']

0    California
1    California
2      New York
3      New York
4         Texas
5         Texas
Name: states, dtype: object

In [121]:
reset_index_pop2.keys

<bound method NDFrame.keys of        states  year  population
0  California  2000    33871648
1  California  2010    37253956
2    New York  2000    18976457
3    New York  2010    19378102
4       Texas  2000    20851820
5       Texas  2010    25145561>

In [127]:
health_data

Unnamed: 0_level_0,subject,Bob,Bob,Guido,Guido,Sue,Sue
Unnamed: 0_level_1,type,HR,Temp,HR,Temp,HR,Temp
year,visit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2013,1,37.0,37.6,30.0,37.5,37.0,35.1
2013,2,46.0,37.3,36.0,35.9,51.0,36.1
2014,1,33.0,37.8,32.0,37.1,34.0,37.5
2014,2,25.0,37.4,20.0,36.9,31.0,35.6


In [129]:
reset_index_health_data = health_data.reset_index()
print(reset_index_health_data)
print(type(reset_index_health_data))

subject  year visit   Bob       Guido         Sue      
type                   HR  Temp    HR  Temp    HR  Temp
0        2013     1  37.0  37.6  30.0  37.5  37.0  35.1
1        2013     2  46.0  37.3  36.0  35.9  51.0  36.1
2        2014     1  33.0  37.8  32.0  37.1  34.0  37.5
3        2014     2  25.0  37.4  20.0  36.9  31.0  35.6
<class 'pandas.core.frame.DataFrame'>


In [130]:
reset_index_health_data

subject,year,visit,Bob,Bob,Guido,Guido,Sue,Sue
type,Unnamed: 1_level_1,Unnamed: 2_level_1,HR,Temp,HR,Temp,HR,Temp
0,2013,1,37.0,37.6,30.0,37.5,37.0,35.1
1,2013,2,46.0,37.3,36.0,35.9,51.0,36.1
2,2014,1,33.0,37.8,32.0,37.1,34.0,37.5
3,2014,2,25.0,37.4,20.0,36.9,31.0,35.6


In [131]:
reset_index_health_data.columns

MultiIndex(levels=[['Bob', 'Guido', 'Sue', 'visit', 'year'], ['HR', 'Temp', '']],
           codes=[[4, 3, 0, 0, 1, 1, 2, 2], [2, 2, 0, 1, 0, 1, 0, 1]],
           names=['subject', 'type'])

In [136]:
print(reset_index_health_data['Bob'])
print(type(reset_index_health_data['Bob']))

type    HR  Temp
0     37.0  37.6
1     46.0  37.3
2     33.0  37.8
3     25.0  37.4
<class 'pandas.core.frame.DataFrame'>


In [137]:
print(reset_index_health_data['Bob']['Temp'])
print(type(reset_index_health_data['Bob']['Temp']))

0    37.6
1    37.3
2    37.8
3    37.4
Name: Temp, dtype: float64
<class 'pandas.core.series.Series'>


In [134]:
print(reset_index_health_data['year'])
print(type(reset_index_health_data['year']))

0    2013
1    2013
2    2014
3    2014
Name: year, dtype: int64
<class 'pandas.core.series.Series'>


### [ 다중 인덱스에서 데이터 집계 ]

- mean(), sum(), max(), min() 과 같은 데이터 집계 메서드 사용
- **level** 파리미터(매개변수)를 이용하여 어느 부분 데이터를 집계할 지 설정
- index와 column에 name을 설정 해야 함

In [140]:
health_data.keys

<bound method NDFrame.keys of subject      Bob       Guido         Sue      
type          HR  Temp    HR  Temp    HR  Temp
year visit                                    
2013 1      37.0  37.6  30.0  37.5  37.0  35.1
     2      46.0  37.3  36.0  35.9  51.0  36.1
2014 1      33.0  37.8  32.0  37.1  34.0  37.5
     2      25.0  37.4  20.0  36.9  31.0  35.6>

In [141]:
health_data.values

array([[37. , 37.6, 30. , 37.5, 37. , 35.1],
       [46. , 37.3, 36. , 35.9, 51. , 36.1],
       [33. , 37.8, 32. , 37.1, 34. , 37.5],
       [25. , 37.4, 20. , 36.9, 31. , 35.6]])

In [142]:
health_data

Unnamed: 0_level_0,subject,Bob,Bob,Guido,Guido,Sue,Sue
Unnamed: 0_level_1,type,HR,Temp,HR,Temp,HR,Temp
year,visit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2013,1,37.0,37.6,30.0,37.5,37.0,35.1
2013,2,46.0,37.3,36.0,35.9,51.0,36.1
2014,1,33.0,37.8,32.0,37.1,34.0,37.5
2014,2,25.0,37.4,20.0,36.9,31.0,35.6


- level : 행의 index 입력(파라미터 axis =0 or 입력 생략시)

- `sum()`

In [143]:
year_sum = health_data.sum(level='year')
year_sum

subject,Bob,Bob,Guido,Guido,Sue,Sue
type,HR,Temp,HR,Temp,HR,Temp
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2013,83.0,74.9,66.0,73.4,88.0,71.2
2014,58.0,75.2,52.0,74.0,65.0,73.1


In [144]:
visit_sum = health_data.sum(level='visit')
visit_sum

subject,Bob,Bob,Guido,Guido,Sue,Sue
type,HR,Temp,HR,Temp,HR,Temp
visit,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
1,70.0,75.4,62.0,74.6,71.0,72.6
2,71.0,74.7,56.0,72.8,82.0,71.7


- `max()`

In [149]:
year_max = health_data.max(level='year')
year_max

subject,Bob,Bob,Guido,Guido,Sue,Sue
type,HR,Temp,HR,Temp,HR,Temp
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2013,46.0,37.6,36.0,37.5,51.0,36.1
2014,33.0,37.8,32.0,37.1,34.0,37.5


In [154]:
visit_max = health_data.max(level='visit')
visit_max

subject,Bob,Bob,Guido,Guido,Sue,Sue
type,HR,Temp,HR,Temp,HR,Temp
visit,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
1,37.0,37.8,32.0,37.5,37.0,37.5
2,46.0,37.4,36.0,36.9,51.0,36.1


- `mean()`

In [155]:
year_mean = health_data.mean(level='year')
year_mean

subject,Bob,Bob,Guido,Guido,Sue,Sue
type,HR,Temp,HR,Temp,HR,Temp
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2013,41.5,37.45,33.0,36.7,44.0,35.6
2014,29.0,37.6,26.0,37.0,32.5,36.55


In [156]:
visit_mean = health_data.mean(level='visit')
visit_mean

subject,Bob,Bob,Guido,Guido,Sue,Sue
type,HR,Temp,HR,Temp,HR,Temp
visit,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
1,35.0,37.7,31.0,37.3,35.5,36.3
2,35.5,37.35,28.0,36.4,41.0,35.85


- axis 파라미터를 이용하여 열의 집계도 가능
  - axis = 0 : index
  - axis = 1 : columns

In [159]:
health_data

Unnamed: 0_level_0,subject,Bob,Bob,Guido,Guido,Sue,Sue
Unnamed: 0_level_1,type,HR,Temp,HR,Temp,HR,Temp
year,visit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2013,1,37.0,37.6,30.0,37.5,37.0,35.1
2013,2,46.0,37.3,36.0,35.9,51.0,36.1
2014,1,33.0,37.8,32.0,37.1,34.0,37.5
2014,2,25.0,37.4,20.0,36.9,31.0,35.6


- `sum(axis= 1, ..)`

In [166]:
subject_sum = health_data.sum(axis=1, level='subject')
subject_sum

Unnamed: 0_level_0,subject,Bob,Guido,Sue
year,visit,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2013,1,74.6,67.5,72.1
2013,2,83.3,71.9,87.1
2014,1,70.8,69.1,71.5
2014,2,62.4,56.9,66.6


In [169]:
type_sum = health_data.sum(axis=1, level='type')
type_sum

Unnamed: 0_level_0,type,HR,Temp
year,visit,Unnamed: 2_level_1,Unnamed: 3_level_1
2013,1,104.0,110.2
2013,2,133.0,109.3
2014,1,99.0,112.4
2014,2,76.0,109.9


- `max(axis=1, ... )`

In [188]:
subject_max = health_data.max(axis=1, level='subject')
subject_max

Unnamed: 0_level_0,subject,Bob,Guido,Sue
year,visit,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2013,1,37.6,37.5,37.0
2013,2,46.0,36.0,51.0
2014,1,37.8,37.1,37.5
2014,2,37.4,36.9,35.6


In [189]:
type_max = health_data.max(axis=1, level='type')
type_max

Unnamed: 0_level_0,type,HR,Temp
year,visit,Unnamed: 2_level_1,Unnamed: 3_level_1
2013,1,37.0,37.6
2013,2,51.0,37.3
2014,1,34.0,37.8
2014,2,31.0,37.4


- `mean(axis=1, ...)`

In [190]:
subject_mean = health_data.mean(axis=1, level='subject')
subject_mean

Unnamed: 0_level_0,subject,Bob,Guido,Sue
year,visit,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2013,1,37.3,33.75,36.05
2013,2,41.65,35.95,43.55
2014,1,35.4,34.55,35.75
2014,2,31.2,28.45,33.3


In [191]:
type_mean = health_data.mean(axis=1, level='type')
type_mean

Unnamed: 0_level_0,type,HR,Temp
year,visit,Unnamed: 2_level_1,Unnamed: 3_level_1
2013,1,34.666667,36.733333
2013,2,44.333333,36.433333
2014,1,33.0,37.466667
2014,2,25.333333,36.633333


###### 예) health_data 에서 연도별  HR 평균

In [194]:
result_year_mean = health_data.mean(level='year')
print(result_year_mean)
print(type(result_year_mean))

subject   Bob        Guido         Sue       
type       HR   Temp    HR  Temp    HR   Temp
year                                         
2013     41.5  37.45  33.0  36.7  44.0  35.60
2014     29.0  37.60  26.0  37.0  32.5  36.55
<class 'pandas.core.frame.DataFrame'>


In [198]:
result_year_type_mean= result_year_mean.mean(axis=1, level='type')
print(result_year_type_mean)
print(type(result_year_type_mean))

type         HR       Temp
year                      
2013  39.500000  36.583333
2014  29.166667  37.050000
<class 'pandas.core.frame.DataFrame'>


In [200]:
result = result_year_type_mean['HR']
print(result)
print(type(result))

year
2013    39.500000
2014    29.166667
Name: HR, dtype: float64
<class 'pandas.core.series.Series'>


___

- name을 설정하지 않은 Dataframe 선언

In [178]:
index = [('California', 2000), ('California', 2010),
         ('New York', 2000), ('New York', 2010),
         ('Texas', 2000), ('Texas', 2010)]
populations = [33871648, 37253956,
               18976457, 19378102,
               20851820, 25145561]
pop2 = pd.Series(populations, index=index)
idx = pd.MultiIndex.from_tuples(index)
pop2 = pop2.reindex(idx)

s_pop = pd.DataFrame({'total': pop2})
s_pop

Unnamed: 0,Unnamed: 1,total
California,2000,33871648
California,2010,37253956
New York,2000,18976457
New York,2010,19378102
Texas,2000,20851820
Texas,2010,25145561


In [187]:
# index, columns name 확인
print(s_pop.index.names)
print(s_pop.columns.names)
print(health_data.index.names)
print(health_data.columns.names)

[None, None]
[None]
['year', 'visit']
['subject', 'type']


___