In [2]:
import pandas as pd

# 정돈된 형태로 데이터 재구성
이 장은 데이터셋을 분석이 용이한 **정돈된 데이터(tidy data)**로 가공하는 방법을 다룹니다. 그렇다면 정돈된 데이터란 무엇일까요? 데이터 과학자인 해들리 위컴은 데이터가 정돈된 것인지 아닌지 판단할 수 있는 세 가지 원칙을 제시했습니다.
* 각 변수는 열을 형성한다.
* 각 관측값은 행을 형성한다.
* 각 관측 단위별로 별도의 테이블이 구성된다.

우선 변수, 관측값, 관측 단위가 무엇인지 알아야합니다.
* 변수
 * 성별, 인종
* 관측값
 * 남자/여자, 흑인/백인
* 관측 단위
 * 소매 상점 데이터셋에서 종업원 정보, 고객 정보를 관측 단위로 생각할 수 있음
 * 같은 테이블에 병합하는 것은 정돈된 데이터의 원칙을 위배
 
해들리는 정돈되지 않은 데이터의 가장 흔한 형태 다섯 가지를 언급했습니다.
* 열 이름이 변수 이름이 아니라 값인 경우
* 열 이름에 복수 개의 변수가 저장된 경우
* 변수가 행과 열 모두에 저장된 경우
* 같은 테이블에 복수 형식의 관측 단위가 저장된 경우
* 단일 관측 단위가 복수 데이블에 저장된 경우

데이터를 정돈 한다는것은 데이터셋의 값을 바꾼다거나, 누락값을 채운다거나, 분석 하는 것을 의미하지 않습니다. **데이터를 정돈하는 것은 데이터의 형태나 구조를 정돈 원칙에 맞게 변경시키는 것입니다.**

pandas에서 정돈을 위해 DataFrame 메서드인 `stack`, `melt`, `unstack`, `pivot`가 주로 사용됩니다. 뿐만 아니라 보다 복잡한 정돈은 텍스트를 완전히 분해해야 하는데, 이 때 str 액세서(accessor)를 사용합니다. `rename`, `rename_axis`, `reset_index`, `set_index` 같은 헬퍼 메서드는 정돈된 데이터를 마지막으로 다듭는 데 도움이 됩니다.
> getter 메서드를 통상 액세서라 부르기도 한다.

## stack을 이용해 변수값을 변수 이름으로 정돈

In [3]:
state_fruit = pd.read_csv('data/state_fruit.csv', index_col=0)
state_fruit

Unnamed: 0,Apple,Orange,Banana
Texas,12,10,40
Arizona,9,7,12
Florida,0,14,190


위의 표를 살펴보면 간단하고 정보도 쉽게 사용할 수 있어 보입니다. 하지만 정돈 원칙에 따르면 이 데이터는 정돈돼 있지 않습니습니다. 정돈되지 않은 데이터를 정돈된 데이터로 변환하는 첫 단계는 모든 변수를 파악하는 것입니다. 이 데이터셋의 경우에 state와 fruit라는 변수가 있습니다. 또한 문제의 문맥을 파악할 수 없는 수치 데이터가 여기저기 있습니다. 이 변수는 의미 있는 이름으로 레이블 할 수 있습니다. 

### 준비단계
이 데이터셋은 변수값을 열 이름으로 사용하고 있습니다. 이 레시피에서는 `stack` 메서드를 사용해 DataFrame을 재구성하여 정돈된 형태로 만듭니다.

### 방법

In [4]:
#1
# stack 메서드는 모든 열 이름을 받아 단일 인덱스 레벨로 수직으로 재구성합니다. 
# 기존 DataFrame이 가지고 있던 값의 개수와 같은 값의 개수를 가지는 Series가 반환되었다는 것에 주목
state_fruit.stack()

Texas    Apple      12
         Orange     10
         Banana     40
Arizona  Apple       9
         Orange      7
         Banana     12
Florida  Apple       0
         Orange     14
         Banana    190
dtype: int64

In [5]:
#2 
# reset_index 메서드를 사용해 MultiIndex를 가진 Series를 하나의 레벨의 인덱스를 가진 DataFrame으로 변환합니다.
state_fruit_tidy = state_fruit.stack().reset_index()
state_fruit_tidy

Unnamed: 0,level_0,level_1,0
0,Texas,Apple,12
1,Texas,Orange,10
2,Texas,Banana,40
3,Arizona,Apple,9
4,Arizona,Orange,7
5,Arizona,Banana,12
6,Florida,Apple,0
7,Florida,Orange,14
8,Florida,Banana,190


> * [df.reset_index()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reset_index.html)
> * [s.reset_index()](http://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.Series.reset_index.html)

In [6]:
#3 
# reset_index 메서드는 기본적으로 인덱스 레벨에 따라 열 이름을 지정합니다. 의미있는 열이름을 지정해주기 위해 다음 코드를 실행합니다.
state_fruit_tidy.columns = ['state', 'fruit', 'weight']
state_fruit_tidy

Unnamed: 0,state,fruit,weight
0,Texas,Apple,12
1,Texas,Orange,10
2,Texas,Banana,40
3,Arizona,Apple,9
4,Arizona,Orange,7
5,Arizona,Banana,12
6,Florida,Apple,0
7,Florida,Orange,14
8,Florida,Banana,190


In [8]:
#4
# reset_index 메서드를 사용하기 전에 rename_axis 메서드를 사용하여 Series의 각 인덱스 레벨에 이름을 지정합니다.
state_fruit.stack().rename_axis(['state', 'fruit'])

state    fruit 
Texas    Apple      12
         Orange     10
         Banana     40
Arizona  Apple       9
         Orange      7
         Banana     12
Florida  Apple       0
         Orange     14
         Banana    190
dtype: int64

In [9]:
#5
# 다음 실행 결과 처럼 인덱스 레벨에 이름이 지정되어 있는 경우, reset_index 메서드를 실행하면 인덱스 레벨에 해당하는 이름을 열이름으로 치환해줍니다.
# reset_index 메서드의 매개변수인 name은 기존 Series가 가지는 값의 새로운 열 이름을 지정합니다.
state_fruit.stack().rename_axis(['state', 'fruit']).reset_index(name='weight')

Unnamed: 0,state,fruit,weight
0,Texas,Apple,12
1,Texas,Orange,10
2,Texas,Banana,40
3,Arizona,Apple,9
4,Arizona,Orange,7
5,Arizona,Banana,12
6,Florida,Apple,0
7,Florida,Orange,14
8,Florida,Banana,190


> 모든 Series는 직접적으로 설정할 수 있는 name 속성이나 rename 메서드를 가지고 있습니다. 이 속성은 reset_index()를 사용할 때 열이름이 됩니다.

### 추가사항

In [14]:
state_fruit2 = pd.read_csv('data/state_fruit2.csv')
state_fruit2

Unnamed: 0,State,Apple,Orange,Banana
0,Texas,12,10,40
1,Arizona,9,7,12
2,Florida,0,14,190


In [15]:
state_fruit2.stack()

0  State       Texas
   Apple          12
   Orange         10
   Banana         40
1  State     Arizona
   Apple           9
   Orange          7
   Banana         12
2  State     Florida
   Apple           0
   Orange         14
   Banana        190
dtype: object

위의 표처럼 주 이름(state name)이 인덱스가 아닌경우, `stack` 메서드를 사용하였을 때 원하지 않은 형태의 Series가 반환 됩니다. 이 문제를 해결하기 위해서는 주 이름을 인덱스로 설정한 뒤에 `stack` 메서드를 사용해야합니다.

In [17]:
state_fruit2.set_index('State').stack()

State          
Texas    Apple      12
         Orange     10
         Banana     40
Arizona  Apple       9
         Orange      7
         Banana     12
Florida  Apple       0
         Orange     14
         Banana    190
dtype: int64

## melt를 사용해 변수값을 열 이름으로 정돈
DataFrame의 melt 메서드는 앞 레시피에서 설명한 stack 메서드와 비슷한 동작을 하지만, 다소 더 유연한 측면이 있습니다.

### 준비 단계
이 레시피에서는 melt 메서드를 사용해 변수값(Apple, Orange, Banana)을 가진 간단한 DataFrame을 열 이름으로 정돈합니다.

### 방법

In [19]:
#1
# 데이터셋을 읽어 들인 후 변환이 필요한 열을 파악합니다.
state_fruit2 = pd.read_csv('data/state_fruit2.csv')
state_fruit2

Unnamed: 0,State,Apple,Orange,Banana
0,Texas,12,10,40
1,Arizona,9,7,12
2,Florida,0,14,190


In [20]:
#2
# id_vars: 재구성하지 않고 열로서 유지하고 싶은 이름의 리스트
# value_vars: 단일 열로 재구성 하고 싶은 열들의 이름을 가진 리스트
state_fruit2.melt(id_vars=['State'], value_vars=['Apple', 'Orange', 'Banana'])

Unnamed: 0,State,variable,value
0,Texas,Apple,12
1,Arizona,Apple,9
2,Florida,Apple,0
3,Texas,Orange,10
4,Arizona,Orange,7
5,Florida,Orange,14
6,Texas,Banana,40
7,Arizona,Banana,12
8,Florida,Banana,190


In [21]:
#3 
# melt 메서드는 변환 전의 열 이름을 변수로 참고하고 해당 값들은 값으로 참고합니다.
# var_name: 병합 한 단일 열의 이름
# value_name: 값들을 나타내는 열의 이름
state_fruit2.melt(id_vars=['State'], value_vars=['Apple', 'Orange', 'Banana'], var_name='Fruit', value_name='Weight')

Unnamed: 0,State,Fruit,Weight
0,Texas,Apple,12
1,Arizona,Apple,9
2,Florida,Apple,0
3,Texas,Orange,10
4,Arizona,Orange,7
5,Florida,Orange,14
6,Texas,Banana,40
7,Arizona,Banana,12
8,Florida,Banana,190


### 작동 원리
melt 메서드는 인덱스의 값을 무시합니다. 때문에 인덱스에 유지하고 싶은 값이 있다면 melt 메서드를 사용하기전에 인덱스를 리셋해야합니다.
> 수평 열 이름을 수직 열 이름으로 변환하는 것을 일컫는 일반적 용어가 보통 melting, stacking, unpivoting 입니다.

### 추가 사항
melt 메서드가 필요한 변수들이 아주 많아 식별 변수만을 지정하고 싶을 수도 있습니다. 그런 경우 다음처럼 호출하면 단계 2와 같은 결과를 얻을 수 있습니다.

In [23]:
state_fruit2.melt(id_vars=['State'])

Unnamed: 0,State,variable,value
0,Texas,Apple,12
1,Arizona,Apple,9
2,Florida,Apple,0
3,Texas,Orange,10
4,Arizona,Orange,7
5,Florida,Orange,14
6,Texas,Banana,40
7,Arizona,Banana,12
8,Florida,Banana,190


In [24]:
state_fruit2.melt(id_vars='State')

Unnamed: 0,State,variable,value
0,Texas,Apple,12
1,Arizona,Apple,9
2,Florida,Apple,0
3,Texas,Orange,10
4,Arizona,Orange,7
5,Florida,Orange,14
6,Texas,Banana,40
7,Arizona,Banana,12
8,Florida,Banana,190


## 복수 변수 그룹을 동시에 스태킹

In [26]:
movie = pd.read_csv('data/movie.csv')
actor = movie[['movie_title', 'actor_1_name', 'actor_2_name', 'actor_3_name'\
               , 'actor_1_facebook_likes', 'actor_2_facebook_likes', 'actor_3_facebook_likes']]
actor.head()

Unnamed: 0,movie_title,actor_1_name,actor_2_name,actor_3_name,actor_1_facebook_likes,actor_2_facebook_likes,actor_3_facebook_likes
0,Avatar,CCH Pounder,Joel David Moore,Wes Studi,1000.0,936.0,855.0
1,Pirates of the Caribbean: At World's End,Johnny Depp,Orlando Bloom,Jack Davenport,40000.0,5000.0,1000.0
2,Spectre,Christoph Waltz,Rory Kinnear,Stephanie Sigman,11000.0,393.0,161.0
3,The Dark Knight Rises,Tom Hardy,Christian Bale,Joseph Gordon-Levitt,27000.0,23000.0,23000.0
4,Star Wars: Episode VII - The Force Awakens,Doug Walker,Rob Walker,,131.0,12.0,


여기서 변수를 영화 제목, 배우 이름, 페이스북 좋아요 개수로 정의한다면, 두 열의 집합을 개별적으로 스택해야 하는데, 이는 단일 `stack`이나 `melt` 호출로는 불가능합니다.

### 준비 단계
이 레시피에서는 actor DataFrame을 배우 이름과 해당 페이스북 좋아요를 wide_to_long 메서드를 사용하여 동시에 스태킹합니다.

### 방법

In [37]:
#1
# wide_to_long 메서드를 사용하기 위해서 스태킹하려는 열 이름을 바꿔 숫자로 끝나도록 해야 합니다. 이를 위한 사용자 정의 함수를 생성하여 열 이름을 변경합니다.
def change_col_name(col_name):
    col_name = col_name.replace('_name', '')
    if 'facebook' in col_name:
        fb_idx = col_name.find('facebook')
        col_name = col_name[:5] + col_name[fb_idx - 1:] + col_name[5:fb_idx-1]
    return col_name

In [38]:
#2 
# change_col_name 사용자 함수를 rename 메서드에 전달해 모든 열 이름을 변환합니다.
actor2 = actor.rename(columns=change_col_name)
actor2.head()

Unnamed: 0,movie_title,actor_1,actor_2,actor_3,actor_facebook_likes_1,actor_facebook_likes_2,actor_facebook_likes_3
0,Avatar,CCH Pounder,Joel David Moore,Wes Studi,1000.0,936.0,855.0
1,Pirates of the Caribbean: At World's End,Johnny Depp,Orlando Bloom,Jack Davenport,40000.0,5000.0,1000.0
2,Spectre,Christoph Waltz,Rory Kinnear,Stephanie Sigman,11000.0,393.0,161.0
3,The Dark Knight Rises,Tom Hardy,Christian Bale,Joseph Gordon-Levitt,27000.0,23000.0,23000.0
4,Star Wars: Episode VII - The Force Awakens,Doug Walker,Rob Walker,,131.0,12.0,


In [42]:
#3
# wide_to_long 함수를 사용해 배우와 페이스북 집합을 동시에 스태킹합니다.
actor2_tidy = pd.wide_to_long(actor2, 
                              stubnames=['actor', 'actor_facebook_likes'], 
                              i=['movie_title'], 
                              j='actor_num', 
                              sep='_').reset_index()
actor2_tidy.head()

Unnamed: 0,movie_title,actor_num,actor,actor_facebook_likes
0,Avatar,1,CCH Pounder,1000.0
1,Pirates of the Caribbean: At World's End,1,Johnny Depp,40000.0
2,Spectre,1,Christoph Waltz,11000.0
3,The Dark Knight Rises,1,Tom Hardy,27000.0
4,Star Wars: Episode VII - The Force Awakens,1,Doug Walker,131.0


### 작동원리
`wide_to_long` 메서드의 매개변수는 다음과 같은 역할을 합니다
* stubnames: 각 문자열은 단일 열 그룹을 나타내며, 이 문자열로 시작하는 모든 열은 단일 열로 스태킹
* i: 스태킹 하지 않을 변수를 지정
* j: 열 이름 끝의 식별 숫자값을 담을 열의 이름
* sep: 스태킹할 열들의 이름은 숫자로 끝나야하는데 sep 매개변수로 입력되는 문자를 기준으로 숫자와 stubname을 구분 

## 스택된 데이터 되돌리기
DataFrame은 유사한 메서드인 `stack`과 `melt`을 사용하여 수평 열 이름을 수직 열 값으로 변환할 수 있습니다. 뿐만 아니라 DataFrame의 `unstack`과 `pivot` 메서드를 사용해 되돌릴 수도 있습니다. stack/unstack은 melt/pivot보다 더 단순한 메서드로 열/행 인덱스에 대해서 조절하지만, melt/pivot은 어떤 열을 재구성할 것인지 선택할 수 있어 보다 많은 유연성을 가질 수 있습니다.

### 준비 단계
이 레시피에서는 데이터셋을 stack/melt 한 후 unstack/pivot 메서드를 사용해 원래 상태로 돌립니다.

### 방법

In [45]:
#1
# 대학 데이터셋에서 기관명을 인덱스로 지정하고 인종에 관한 열만 읽어들입니다.
usecol_func = lambda x: 'UGDS_' in x or x == 'INSTNM'
college = pd.read_csv('data/college.csv',
                     index_col='INSTNM',
                     usecols=usecol_func)
college.head()

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama A & M University,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138
University of Alabama at Birmingham,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01
Amridge University,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715
University of Alabama in Huntsville,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.035
Alabama State University,0.0158,0.9208,0.0121,0.0019,0.001,0.0006,0.0098,0.0243,0.0137


In [46]:
#2
# stack 메서드를 사용해 각 수평 열을 수직 인덱스 레벨로 변환합니다.
college_stacked = college.stack()
college_stacked.head(18)

INSTNM                                         
Alabama A & M University             UGDS_WHITE    0.0333
                                     UGDS_BLACK    0.9353
                                     UGDS_HISP     0.0055
                                     UGDS_ASIAN    0.0019
                                     UGDS_AIAN     0.0024
                                     UGDS_NHPI     0.0019
                                     UGDS_2MOR     0.0000
                                     UGDS_NRA      0.0059
                                     UGDS_UNKN     0.0138
University of Alabama at Birmingham  UGDS_WHITE    0.5922
                                     UGDS_BLACK    0.2600
                                     UGDS_HISP     0.0283
                                     UGDS_ASIAN    0.0518
                                     UGDS_AIAN     0.0022
                                     UGDS_NHPI     0.0007
                                     UGDS_2MOR     0.0368
                        

In [47]:
#3
# 이 스택된 데이터를 Series의 unstack 메서드를 사용해 원래 형태로 되돌립니다.
college_stacked.unstack()

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama A & M University,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0000,0.0059,0.0138
University of Alabama at Birmingham,0.5922,0.2600,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.0100
Amridge University,0.2990,0.4192,0.0069,0.0034,0.0000,0.0000,0.0000,0.0000,0.2715
University of Alabama in Huntsville,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.0350
Alabama State University,0.0158,0.9208,0.0121,0.0019,0.0010,0.0006,0.0098,0.0243,0.0137
The University of Alabama,0.7825,0.1119,0.0348,0.0106,0.0038,0.0009,0.0261,0.0268,0.0026
Central Alabama Community College,0.7255,0.2613,0.0044,0.0025,0.0044,0.0000,0.0000,0.0000,0.0019
Athens State University,0.7823,0.1200,0.0191,0.0053,0.0157,0.0010,0.0174,0.0057,0.0334
Auburn University at Montgomery,0.5328,0.3376,0.0074,0.0221,0.0044,0.0016,0.0297,0.0397,0.0246
Auburn University,0.8507,0.0704,0.0248,0.0227,0.0074,0.0000,0.0000,0.0100,0.0140


In [48]:
#4
# melt 메서드와 pivot 메서드의 동작을 확인하기 위해 기관명을 인덱스로 사용하지 않고 대학 데이터셋을 읽어들입니다.
college2 = pd.read_csv('data/college.csv', usecols=usecol_func)
college2.head()

Unnamed: 0,INSTNM,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
0,Alabama A & M University,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138
1,University of Alabama at Birmingham,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01
2,Amridge University,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715
3,University of Alabama in Huntsville,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.035
4,Alabama State University,0.0158,0.9208,0.0121,0.0019,0.001,0.0006,0.0098,0.0243,0.0137


In [52]:
#5
# melt 메서드를 사용하여 인종에 관한 열들을 단일 열로 전치시킵니다.
college_melted = college2.melt(id_vars='INSTNM', var_name='Race', value_name='Percentage')
college_melted.head()

Unnamed: 0,INSTNM,Race,Percentage
0,Alabama A & M University,UGDS_WHITE,0.0333
1,University of Alabama at Birmingham,UGDS_WHITE,0.5922
2,Amridge University,UGDS_WHITE,0.299
3,University of Alabama in Huntsville,UGDS_WHITE,0.6988
4,Alabama State University,UGDS_WHITE,0.0158


In [53]:
#6
# pivot 메서드를 사용해 앞의 결과를 되돌립니다.
melted_inv = college_melted.pivot(index='INSTNM', columns='Race', values='Percentage')
melted_inv.head()

Race,UGDS_2MOR,UGDS_AIAN,UGDS_ASIAN,UGDS_BLACK,UGDS_HISP,UGDS_NHPI,UGDS_NRA,UGDS_UNKN,UGDS_WHITE
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
A & W Healthcare Educators,0.0,0.0,0.0,0.975,0.025,0.0,0.0,0.0,0.0
A T Still University of Health Sciences,,,,,,,,,
ABC Beauty Academy,0.0,0.0,0.9333,0.0333,0.0333,0.0,0.0,0.0,0.0
ABC Beauty College Inc,0.0,0.0,0.0,0.6579,0.0526,0.0,0.0,0.0,0.2895
AI Miami International University of Art and Design,0.0018,0.0,0.0018,0.0198,0.4773,0.0,0.0025,0.4644,0.0324


In [54]:
#7
# 단계 6의 결과를 확인해보면 인덱스의 순서와 열 이름의 순서가 바뀐것을 확인할 수 있습니다.
# 단계 4에서의 DataFrame과 순서까지 일치시키려면 loc 인덱서를 사용하여 행과 열을 동시에 선택한 후 인덱스를 리셋해야합니다.
college2_replication = melted_inv.loc[college2['INSTNM'], college2.columns[1:]].reset_index()
college2.equals(college2_replication)

True

### 준비단계
단계 1에서 usecols 매개변수는 불러오고자 하는 열 이름을 받아들이거나 이를 동적으로 결정하는 함수를 받아들입니다. 함수는 각 열 이름이 문자열로 전달되어야하고 불리언을 반환해야 합니다. 이 방법을 사용하면 대량의 메모리를 절약할 수 있습니다.

단계 3의 결과는 단계 1의 결과와 사실 다릅니다. `stack` 메서드는 디폴트로 누락값을 삭제하기 때문에, 정확한 복제를 원한다면 `stack` 메서드에 dropna 매개변수를 False로 설정해야합니다.

### 추가사항
`stack`과 `unstack`의 보다 깊은 이해를 위해 college DataFrame을 전치하는데 사용해보겠습니다. `unstack` 메서드는 디폭트로 가장 안쪽이 인덱스 레벨을 새로운 열의 값으로 사용합니다. 하지만 매개변수로 0을 전달하게 되면 가장 바깥쪽 열을 언스태킹 할 수 있습니다.

In [58]:
college.stack().unstack(0)

INSTNM,Alabama A & M University,University of Alabama at Birmingham,Amridge University,University of Alabama in Huntsville,Alabama State University,The University of Alabama,Central Alabama Community College,Athens State University,Auburn University at Montgomery,Auburn University,...,MCI Institute of Technology-Boca Raton,West Coast University-Miami,National American University-Houston,Aparicio-Levy Technical College,Fred D. Learey Technical College,Hollywood Institute of Beauty Careers-West Palm Beach,Hollywood Institute of Beauty Careers-Casselberry,Coachella Valley Beauty College-Beaumont,Dewey University-Mayaguez,Coastal Pines Technical College
UGDS_WHITE,0.0333,0.5922,0.299,0.6988,0.0158,0.7825,0.7255,0.7823,0.5328,0.8507,...,0.0199,0.1522,0.1858,0.2431,0.3731,0.2182,0.12,0.3284,0.0,0.6762
UGDS_BLACK,0.9353,0.26,0.4192,0.1255,0.9208,0.1119,0.2613,0.12,0.3376,0.0704,...,0.2815,0.1739,0.6443,0.1215,0.1388,0.4182,0.3333,0.1045,0.0,0.2508
UGDS_HISP,0.0055,0.0283,0.0069,0.0382,0.0121,0.0348,0.0044,0.0191,0.0074,0.0248,...,0.6854,0.6087,0.0672,0.6243,0.308,0.2364,0.44,0.4925,1.0,0.0359
UGDS_ASIAN,0.0019,0.0518,0.0034,0.0376,0.0019,0.0106,0.0025,0.0053,0.0221,0.0227,...,0.0132,0.0217,0.0079,0.0055,0.0,0.0182,0.0,0.0149,0.0,0.0045
UGDS_AIAN,0.0024,0.0022,0.0,0.0143,0.001,0.0038,0.0044,0.0157,0.0044,0.0074,...,0.0,0.0,0.0079,0.0055,0.0,0.0,0.0,0.0299,0.0,0.0034
UGDS_NHPI,0.0019,0.0007,0.0,0.0002,0.0006,0.0009,0.0,0.001,0.0016,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0149,0.0,0.0017
UGDS_2MOR,0.0,0.0368,0.0,0.0172,0.0098,0.0261,0.0,0.0174,0.0297,0.0,...,0.0,0.0435,0.0751,0.0,0.0022,0.0,0.04,0.0149,0.0,0.0191
UGDS_NRA,0.0059,0.0179,0.0,0.0332,0.0243,0.0268,0.0,0.0057,0.0397,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0182,0.0,0.0,0.0,0.0028
UGDS_UNKN,0.0138,0.01,0.2715,0.035,0.0137,0.0026,0.0019,0.0334,0.0246,0.014,...,0.0,0.0,0.0119,0.0,0.1779,0.0909,0.0667,0.0,0.0,0.0056


> DataFrame을 `transpose` 메서드나 `T` 속성을 사용하면 더 쉽게 전치할 수 있습니다.

## groupby 종합 후 unstacking
