# **파이썬 데이터분석 특화 데이터 형태: [Pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)**

[![Open in Colab](http://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cheonbi/ProgrammingPython/blob/main/Practice1-10_Basic_InputDataFrame_KK.ipynb)

- **목적:** 데이터 과학자를 위해 테이블형태로 데이터를 다룰 수 있게 해주는 가장 많이 사용되는 패키지(Python용 엑셀)
> - Wes McKinney가 투자운용 회사인 AQR에 재직중이던 2008년 초에 개발됨
> - NumPy를 기반으로 업데이트 개발
> - Python을 사용한 데이터 분석 및 관리의 표준으로 없어서는 안될 도구
> - Python 기반 데이터 전문가, Kaggle 도전 또는 데이터 프로세스를 자동화가 필요한 사람에게 필수적
> - 일반인이 데이터분석을 접하기 쉽게 만들어준 결정적인 라이브러리
> - Pandas만으로도 충분히 데이터 분석이 가능할 정도로 고수준의 함수들을 내장
>> - Series는 1차원의 배열같은 구조의 데이터를 저장하기 위한 Python 데이터 형태
>> - DataFrame는 Table형식의 2D 데이터를 저장하기위한 Python 데이터 형태
>> - DataFrame은 복수의 Series가 합쳐진 것으로 각 Series는 동일한 자료형을 가짐
>> - 데이터에는 여러 행과 열이있을 수 있으며, 각 행은 데이터 샘플이고 각 열은 샘플(행)을 설명하는 변수
>> - 일반적으로 Excel 데이터 세트와 유사하나 DataFrames는 누락 된 값을 피하고 행이나 열 사이에 간격과 빈 값이 없음
>> - 다양한 형태의 데이터를 받아 다양한 통계, 시각화 함수를 제공


## 설치 및 호출

In [None]:
# 주피터 노트북에서 Pandas 설치
# Anaconda 설치하면 기 설치되어 있어서 진행할 필요 없음
!pip install pandas

You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.7/bin/python3 -m pip install --upgrade pip' command.[0m[33m
[0m

In [None]:
# Pandas 사용을 위해 패키지 불러오기
# import 패키지명 as 닉네임
# 관례적으로 pd라는 약자를 많이 사용
import pandas as pd

## Series & DataFrame

> - 엑셀에 익숙한 사용자를 위해 제작 된 테이블형태의 데이터 구조
> - 엑셀과 같은 표 형태의 데이터는 **데이터프레임(DataFrame)**, 데이터프레임의 한 열을 **시리즈(Series)** 라고 칭함

In [None]:
# Series 생성하기
import pandas as pd 
ds = pd.Series([1,2,3,4,5])
ds

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [None]:
# Series의 값을 배열로 변환하기
ds.values

array([1, 2, 3, 4, 5])

In [None]:
# Series의 행의 위치를 반환하기
ds.index

RangeIndex(start=0, stop=5, step=1)

In [None]:
# Series 생성하되 index를 내가 원하는데로 변환하기
ds = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
ds

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [None]:
# Series에서 c인덱스의 행값 출력
ds['c']

3

In [None]:
# Series에서도 3보다 큰지 논리연산 가능
ds>3

a    False
b    False
c    False
d     True
e     True
dtype: bool

In [None]:
# Series에서도 3보다 큰지 논리값 기준 True인 값을 출력
ds[ds>3]

d    4
e    5
dtype: int64

In [None]:
# 개별값의 곱하기 2 출력
ds*2

a     2
b     4
c     6
d     8
e    10
dtype: int64

In [None]:
# 개별 값이 비어있는(Null)인지 여부 논리값 출력
ds.isnull()

a    False
b    False
c    False
d    False
e    False
dtype: bool

In [None]:
# 개별 값이 비어있지않은지(Full) 여부 논리값 출력
ds.notnull()

a    True
b    True
c    True
d    True
e    True
dtype: bool

In [None]:
# 개별 값이 몇개 비어있는(Null)인지 출력
ds.isnull().sum()

0

In [None]:
# 딕셔너리 데이터 생성
df = {'column1':[1,2,3,4,5],
      'another_column':['this', 'column', 'has', 'strings', 'inside!'],
      'float_column':[0.1, 0.5, 33, 48, 42.5555],
      'binary_column':[True, False, True, True, False]}
df

{'column1': [1, 2, 3, 4, 5],
 'another_column': ['this', 'column', 'has', 'strings', 'inside!'],
 'float_column': [0.1, 0.5, 33, 48, 42.5555],
 'binary_column': [True, False, True, True, False]}

In [None]:
# 딕셔너리를 DataFrame 형태로 변환하기
# DataFrame의 약자로 형식적으로 df 변수명을 사용
df = pd.DataFrame(df)
df

Unnamed: 0,column1,another_column,float_column,binary_column
0,1,this,0.1,True
1,2,column,0.5,False
2,3,has,33.0,True
3,4,strings,48.0,True
4,5,inside!,42.5555,False


In [None]:
# DataFrame의 새로운 열을 만들고 100으로 값을 채우기
df['column_test'] = 100
df

Unnamed: 0,column1,another_column,float_column,binary_column,seq_test2,column_test
0,1,this,0.1,True,0,100
1,2,column,0.5,False,1,100
2,3,has,33.0,True,2,100
3,4,strings,48.0,True,3,100
4,5,inside!,42.5555,False,4,100


In [None]:
# DataFrame의 새로운 열을 만들고 numpy.arange 사용해서 0~4로 값을 채우기
import numpy as np
df['seq_test'] = np.arange(5)
df

Unnamed: 0,column1,another_column,float_column,binary_column,seq_test2,column_test,seq_test
0,1,this,0.1,True,0,100,0
1,2,column,0.5,False,1,100,1
2,3,has,33.0,True,2,100,2
3,4,strings,48.0,True,3,100,3
4,5,inside!,42.5555,False,4,100,4


In [None]:
## 개수 모를때
df.shape, df.shape[0]
df['seq_test2'] = np.arange(df.shape[0])
df

Unnamed: 0,column1,another_column,float_column,binary_column,seq_test2,column_test,seq_test
0,1,this,0.1,True,0,100,0
1,2,column,0.5,False,1,100,1
2,3,has,33.0,True,2,100,2
3,4,strings,48.0,True,3,100,3
4,5,inside!,42.5555,False,4,100,4


In [None]:
# DataFrame의 특정 열 삭제하기
del df['column_test']
### 여러 컬럼을 삭제하려면 drop https://pythonexamples.org/pandas-dataframe-delete-column/
df = df.drop(['seq_test2'],axis=1)
df

Unnamed: 0,column1,another_column,float_column,binary_column,seq_test
0,1,this,0.1,True,0
1,2,column,0.5,False,1
2,3,has,33.0,True,2
3,4,strings,48.0,True,3
4,5,inside!,42.5555,False,4


In [None]:
# 전치행렬: 대각선 기준 회전하기
df.T

Unnamed: 0,0,1,2,3,4
column1,1,2,3,4,5
another_column,this,column,has,strings,inside!
float_column,0.1,0.5,33.0,48.0,42.5555
binary_column,True,False,True,True,False
seq_test,0,1,2,3,4


In [None]:
# DataFrame의 열 이름 출력하기
df.columns

Index(['column1', 'another_column', 'float_column', 'binary_column',
       'seq_test'],
      dtype='object')

In [None]:
# DataFrame의 행 이름 출력하기
df.index

RangeIndex(start=0, stop=5, step=1)

In [None]:
# DataFrame의 행과 열이름을 제외한 값들만 배열로 출력하기
df.values

array([[1, 'this', 0.1, True, 0],
       [2, 'column', 0.5, False, 1],
       [3, 'has', 33.0, True, 2],
       [4, 'strings', 48.0, True, 3],
       [5, 'inside!', 42.5555, False, 4]], dtype=object)

## 데이터 불러오기(Data Loading)

> **"pandas는 다양한 데이터 파일 형태를 지원(불러올수 있음)하며, 주로 파일 확장자가 csv, xlsx, sql 등의 파일 로딩"**
> ```python
read_csv()
read_excel()
read_sql()

<center><img src='Image/io_readwrite.svg' width='700'></center>

**1) CSV는 무엇인가?**
> - CSV(쉼표 구분 값)파일은 숫자와 텍스트 데이터를 저장하는 일반적인 파일 형식
> - Python을 사용하여 CSV 파일을 불러오고 핸들링하고 출력하는 기능은 모든 데이터 과학자 또는 비즈니스 분석가에게 핵심 기술

**2) 데이터 위치확인**
```python
# 현재 Jupyter Notebook 작업 폴더위치 출력
# 내장함수 사용
!dir
# 외장함수 사용
import os
os.getcwd()
```

In [None]:
# 내장함수 사용
!dir
## 나는 왜 안나오냥.. 결과: /bin/bash: dir: command not found
### https://quick-adviser.com/why-is-jupyter-command-not-found/

/bin/bash: dir: command not found


In [None]:
# 외장함수 사용
import os
minydir = os.getcwd()
minydir

'/Users/suuup/Desktop/newminy/FC2022code/FC2204soil'

**3) 데이터 불러오기**

> - **절대경로:** **"컴퓨터"** 기준 로딩할 데이터의 폴더위치로 어디서 분석 하든 바뀌지 않을 경로
>> - 데이터폴더와 작업공간을 분리시킬 수 있음
>> - 데이터위치가 변경되면 코드를 변경해야 함
>> - 작업공간이 어디에 있든 실행됨
> - **상대경로:** **"작업파일"** 기준 로딩할 데이터의 폴더위치로 분석파일 위치에 따라 바뀔수 있는 경로
>> - 데이터폴더와 작업공간을 함께 위치시킴
>> - 데이터위치가 변경되어도 코드를 변경할 필요가 없음
>> - 데이터를 포함하지 않으면 실행되지 않음

In [None]:
# 절대경로 방식으로 수동 파일경로 입력
# 절대경로는 PC내에서 실제 존재하는 위치
# 강의내 설명경로가 아닌 본인 PC상 파일의 위치경로로 반영해야 함
import pandas as pd

df_abs = pd.read_csv(r'{}/Data/FoodAgricultureOrganization/Food_Agriculture_Organization_UN_Full.csv'.format(minydir))
df_abs

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AF,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
1,AF,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,AF,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AF,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AF,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21472,ZW,181,Zimbabwe,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21473,ZW,181,Zimbabwe,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21474,ZW,181,Zimbabwe,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21475,ZW,181,Zimbabwe,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 상대경로 방식으로 수동 파일경로 입력
# 상대경로는 현 Jupyter Notebook에서부터 PC내에서 실제 존재하는 위치까지의 거리
# 강의내 설명경로와 동일해도 진행 가능
import pandas as pd

df_rel = pd.read_csv(r'./Data/FoodAgricultureOrganization/Food_Agriculture_Organization_UN_Full.csv')
df_rel

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AF,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
1,AF,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,AF,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AF,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AF,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21472,ZW,181,Zimbabwe,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21473,ZW,181,Zimbabwe,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21474,ZW,181,Zimbabwe,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21475,ZW,181,Zimbabwe,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 절대경로 폴더 분리 관리
# location_abs = r'D:\DataScience\Lecture\[DataScience]\Data\FoodAgricultureOrganization'
# location_abs

In [None]:
# 상대경로 폴더 분리 관리
# location_rel = r'.\Data\FoodAgricultureOrganization'
# location_rel

In [None]:
# 현재경로와 상대경로를 사용하여 절대경로 폴더 생성
# os.getcwd() + location_rel[1:]

In [None]:
# 비교해보면 같음
# os.getcwd() + location_rel[1:] == location_abs

In [None]:
# 절대경로와 파일이름 결합
# 파이썬이 폴더를 인식하는 구분자로 함께 결합
# file_name = 'Food_Agriculture_Organization_UN_Full.csv'
# location_abs + '\\' + file_name

In [None]:
# 구분자는 PC환경과 OS환경에따라 다르기 떄문에 사람마다 다를 수 있음
# 구분자를 자동으로 인식하는 방법
# os.path.join(location_abs, file_name)

In [None]:
# 비교해보면 같음
# location_abs + '\\' + file_name == os.path.join(location_abs, file_name)

In [None]:
# 절대경로 사용한 데이터 로딩
# pandas 패키지의 read_csv() 함수를 사용하여 파일을 불러들여 
# 데이터프레임을 만들고 df_abs 이름의 변수로 저장
# import pandas as pd

# location_abs = r'D:\DataScience\Lecture\[DataScience]\Data\FoodAgricultureOrganization'
# file_name = 'Food_Agriculture_Organization_UN_Full.csv'
# df_abs = pd.read_csv(os.path.join(location_abs, file_name))
# df_abs

In [None]:
# 상대경로 사용한 데이터 로딩
import pandas as pd

location_rel = r'./Data/FoodAgricultureOrganization'
file_name = 'Food_Agriculture_Organization_UN_Full.csv'
df_rel = pd.read_csv(os.path.join(location_rel, file_name))
df_rel

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AF,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
1,AF,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,AF,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AF,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AF,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21472,ZW,181,Zimbabwe,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21473,ZW,181,Zimbabwe,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21474,ZW,181,Zimbabwe,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21475,ZW,181,Zimbabwe,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 두개의 DataFrame이 동일한지 여부 확인
df_abs.equals(df_rel)

True

- Pandas는 DataFrame을 기본적으로 20개의 열과 60개의 행만 표시(나머지 중간부분은 자름)
- DataFrame 출력 제한을 변경하는 옵션 존재    
(https://pandas.pydata.org/pandas-docs/stable/options.html)
> - **pd.options.display.width:** 표시되는 디스플레이의 너비로 둘 이상의 행에 걸쳐 행을 줄 바꿈하는 경우
> - **pd.options.display.max_rows:** 표시되는 최대 행 수
> - **pd.options.display.max_columns:** 표시되는 최대 열 수

In [None]:
# 화면에 표시되는 출력 결과물을 갯수를 설정 가능
pd.options.display.max_rows = 10
pd.options.display.max_columns = 20

In [None]:
df_rel

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AF,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
1,AF,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,AF,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AF,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AF,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21472,ZW,181,Zimbabwe,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21473,ZW,181,Zimbabwe,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21474,ZW,181,Zimbabwe,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21475,ZW,181,Zimbabwe,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


## 데이터의 특성확인(Descriptive Statistics)

> - 데이터를 불러온 후, 가장 처음하는 작업으로 데이터 구조, 형태, 특성을 빠르게 확인하는게 목적
> - **DataFrame.describe()** 함수는 적용되는 모든 변수 또는 그룹의 통계를 빠르게 표시하는 유용한 요약 도구

| Function | Description                         |
|----------|-------------------------------------|
| count    | Number of non-null observations     |
| sum      | Sum of values                       |
| mean     | Mean of values                      |
| mad      | Mean absolute deviation             |
| median   | Arithmetic median of values         |
| min      | Minimum                             |
| max      | Maximum                             |
| mode     | Mode                                |
| abs      | Absolute Value                      |
| prod     | Product of values                   |
| std      | Unbiased standard deviation         |
| var      | Unbiased variance                   |
| sem      | Unbiased standard error of the mean |
| skew     | Unbiased skewness (3rd moment)      |
| kurt     | Unbiased kurtosis (4th moment)      |
| quantile | Sample quantile (value at %)        |
| cumsum   | Cumulative sum                      |
| cumprod  | Cumulative product                  |
| cummax   | Cumulative maximum                  |
| cummin   | Cumulative minimum                  |

In [None]:
# 데이터의 행과 열의 수 확인
df_rel.shape

(21477, 63)

In [None]:
# numpy 자체 함수로도 추정 가능
np.shape(df_rel)

(21477, 63)

In [None]:
# 데이터의 차원 확인
df_rel.ndim

2

In [None]:
# 첫 5개의 샘플 추출
df_rel.head(5)

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AF,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
1,AF,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,AF,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AF,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AF,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200


In [None]:
# 첫 10개의 샘플 추출
df_rel.head(10)

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AF,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
1,AF,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,AF,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AF,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AF,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
5,AF,2,Afghanistan,2514,Maize and products,5142,Food,1000 tonnes,33.94,67.71,...,231.0,67.0,82.0,67.0,69.0,71.0,82.0,73.0,77,76
6,AF,2,Afghanistan,2517,Millet and products,5142,Food,1000 tonnes,33.94,67.71,...,15.0,21.0,11.0,19.0,21.0,18.0,14.0,14.0,14,12
7,AF,2,Afghanistan,2520,"Cereals, Other",5142,Food,1000 tonnes,33.94,67.71,...,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0,0
8,AF,2,Afghanistan,2531,Potatoes and products,5142,Food,1000 tonnes,33.94,67.71,...,276.0,294.0,294.0,260.0,242.0,250.0,192.0,169.0,196,230
9,AF,2,Afghanistan,2536,Sugar cane,5521,Feed,1000 tonnes,33.94,67.71,...,50.0,29.0,61.0,65.0,54.0,114.0,83.0,83.0,69,81


In [None]:
# 마지막 5개의 샘플 확인
# 데이터가 잘 가져왔는지 확인 할 때 사용
df_rel.tail(5)

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
21472,ZW,181,Zimbabwe,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21473,ZW,181,Zimbabwe,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21474,ZW,181,Zimbabwe,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21475,ZW,181,Zimbabwe,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
21476,ZW,181,Zimbabwe,2928,Miscellaneous,5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 인덱스로 데이터 확인
# 해당 인덱스번호에 해당하는 데이터 추출(복수 추출 가능)
df_rel.take([10])

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
10,AF,2,Afghanistan,2537,Sugar beet,5521,Feed,1000 tonnes,33.94,67.71,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 특정 인덱스 행 추출
df_rel.take([10, 20, 25])

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
10,AF,2,Afghanistan,2537,Sugar beet,5521,Feed,1000 tonnes,33.94,67.71,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
20,AF,2,Afghanistan,2571,Soyabean Oil,5142,Food,1000 tonnes,33.94,67.71,...,6.0,35.0,18.0,21.0,11.0,6.0,15.0,16.0,16,16
25,AF,2,Afghanistan,2577,Palm Oil,5142,Food,1000 tonnes,33.94,67.71,...,71.0,69.0,56.0,51.0,36.0,53.0,59.0,51.0,61,64


In [None]:
# 인덱스 값 확인
df_rel.index

RangeIndex(start=0, stop=21477, step=1)

In [None]:
### 조건 넣어서 인덱스 뽑기
# https://stackoverflow.com/questions/21800169/python-pandas-get-index-of-rows-which-column-matches-certain-value
# df_rel.index[df_rel['Area']=='Afghanistan'].tolist()
df_rel.index[df_rel['Area']=='Afghanistan']

Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
            34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
            51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
            68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82],
           dtype='int64')

In [None]:
# 컬럼명 확인
df_rel.columns

Index(['AREA', 'AREA_ABBREVIATION', 'AREA_CODE', 'ITEM_CODE', 'ITEM',
       'ELEMENT_CODE', 'ELEMENT', 'UNIT', 'LATITUDE', 'LONGITUDE', 'Y1961',
       'Y1962', 'Y1963', 'Y1964', 'Y1965', 'Y1966', 'Y1967', 'Y1968', 'Y1969',
       'Y1970', 'Y1971', 'Y1972', 'Y1973', 'Y1974', 'Y1975', 'Y1976', 'Y1977',
       'Y1978', 'Y1979', 'Y1980', 'Y1981', 'Y1982', 'Y1983', 'Y1984', 'Y1985',
       'Y1986', 'Y1987', 'Y1988', 'Y1989', 'Y1990', 'Y1991', 'Y1992', 'Y1993',
       'Y1994', 'Y1995', 'Y1996', 'Y1997', 'Y1998', 'Y1999', 'Y2000', 'Y2001',
       'Y2002', 'Y2003', 'Y2004', 'Y2005', 'Y2006', 'Y2007', 'Y2008', 'Y2009',
       'Y2010', 'Y2011', 'Y2012', 'Y2013'],
      dtype='object')

In [None]:
# 데이터 값의 형태 확인
# object == str
df_rel.dtypes

Area Abbreviation     object
Area Code              int64
Area                  object
Item Code              int64
Item                  object
                      ...   
Y2009                float64
Y2010                float64
Y2011                float64
Y2012                  int64
Y2013                  int64
Length: 63, dtype: object

In [None]:
# 데이터의 전반적인 정보를 확인
df_rel.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21477 entries, 0 to 21476
Data columns (total 63 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Area Abbreviation  21356 non-null  object 
 1   Area Code          21477 non-null  int64  
 2   Area               21477 non-null  object 
 3   Item Code          21477 non-null  int64  
 4   Item               21477 non-null  object 
 5   Element Code       21477 non-null  int64  
 6   Element            21477 non-null  object 
 7   Unit               21477 non-null  object 
 8   latitude           21477 non-null  float64
 9   longitude          21477 non-null  float64
 10  Y1961              17938 non-null  float64
 11  Y1962              17938 non-null  float64
 12  Y1963              17938 non-null  float64
 13  Y1964              17938 non-null  float64
 14  Y1965              17938 non-null  float64
 15  Y1966              17938 non-null  float64
 16  Y1967              179

In [None]:
# 특정 컬럼 값의 type 변경
df_rel['Item Code'] = df_rel['Item Code'].astype(str)
df_rel.dtypes

Area Abbreviation     object
Area Code              int64
Area                  object
Item Code             object
Item                  object
                      ...   
Y2009                float64
Y2010                float64
Y2011                float64
Y2012                  int64
Y2013                  int64
Length: 63, dtype: object

In [None]:
df_rel['Item Code'] # Series가 됨
df_rel['Item Code'].dtypes

dtype('O')

In [None]:
# 가로로 값의 합
df_rel.sum(axis=1)

  


0        138171.65
1         20527.65
2         10814.65
3         13774.65
4         15075.65
           ...    
21472     23570.13
21473      6319.13
21474      6330.13
21475      5333.13
21476      5333.13
Length: 21477, dtype: float64

In [None]:
# 세로로 값의 합
df_rel.sum(axis=0)

  


Area Code                                                 2694277
Area            AfghanistanAfghanistanAfghanistanAfghanistanAf...
Item Code       2511280525132513251425142517252025312536253725...
Item            Wheat and productsRice (Milled Equivalent)Barl...
Element Code                                            111931405
                                      ...                        
Y2009                                                  11211891.0
Y2010                                                  11445072.0
Y2011                                                  11827802.0
Y2012                                                    12039345
Y2013                                                    12361248
Length: 62, dtype: object

In [None]:
# 세로로 값의 최소값
df_rel.min()

  


Area Code                         1
Area                    Afghanistan
Item Code                      2511
Item            Alcoholic Beverages
Element Code                   5142
                       ...         
Y2009                           0.0
Y2010                           0.0
Y2011                           0.0
Y2012                          -169
Y2013                          -246
Length: 62, dtype: object

In [None]:
# 세로로 값의 평균
df_rel.mean()

  


Area Code        125.449411
Item Code               inf
Element Code    5211.687154
latitude          20.450613
longitude         15.794445
                   ...     
Y2009            524.581996
Y2010            535.492069
Y2011            553.399242
Y2012            560.569214
Y2013            575.557480
Length: 58, dtype: float64

In [None]:
# 세로로 값의 중앙값
df_rel.median()

  


Area Code        120.00
Item Code       2640.00
Element Code    5142.00
latitude          20.59
longitude         19.15
                 ...   
Y2009              7.00
Y2010              7.00
Y2011              8.00
Y2012              8.00
Y2013              8.00
Length: 58, dtype: float64

In [None]:
# 세로로 값의 분산
df_rel.var()

  


Area Code       5.309767e+03
Element Code    2.155614e+04
latitude        6.065550e+02
longitude       4.357598e+03
Y1961           3.474960e+06
                    ...     
Y2009           3.075744e+07
Y2010           3.273086e+07
Y2011           3.461053e+07
Y2012           3.657771e+07
Y2013           3.866824e+07
Length: 57, dtype: float64

In [None]:
# 세로로 값의 표준편차
df_rel.std()

  


Area Code         72.868149
Element Code     146.820079
latitude          24.628336
longitude         66.012104
Y1961           1864.124336
                   ...     
Y2009           5545.939303
Y2010           5721.089425
Y2011           5883.071604
Y2012           6047.950804
Y2013           6218.379479
Length: 57, dtype: float64

In [None]:
# DataFrame에서 특정 컬럼을 선택, 즉 Seires 선택
# DataFrame[컬럼명]
df_rel['Area Code']

0          2
1          2
2          2
3          2
4          2
        ... 
21472    181
21473    181
21474    181
21475    181
21476    181
Name: Area Code, Length: 21477, dtype: int64

In [None]:
# 컬럼명 선택시 괄호를 하나를 사용하면 1차원(Series) 형태, 두개를 사용하면 2차원(DF) 형태로 출력
df_rel[['Area Code']]

Unnamed: 0,Area Code
0,2
1,2
2,2
3,2
4,2
...,...
21472,181
21473,181
21474,181
21475,181


In [None]:
# 특정 컬럼의 중복되지 않은 독립된 값 출력
df_rel['Area Code'].unique() # set과의 차이는 unique는 순서대로 나온다.

array([  2,   3,   4,   7,   8,   9,   1,  10,  11,  52,  12,  16,  14,
        57, 255,  23,  53,  17,  19,  80,  20,  21,  26,  27, 233,  35,
       115,  32,  33,  37,  39,  40,  96, 128,  41, 214,  44,  46,  48,
       107,  98,  49,  50, 167, 116,  54,  72,  55,  56,  58,  59,  60,
        63, 238,  66,  67,  68,  70,  74,  75,  73,  79,  81,  84,  86,
        89,  90, 175,  91,  93,  95,  97,  99, 100, 101, 102, 103, 104,
       105, 106, 109, 110, 112, 108, 114,  83, 118, 113, 120, 119, 121,
       122, 123, 126, 256, 129, 130, 131, 132, 133, 134, 136, 137, 138,
       141, 273, 143, 144,  28, 147, 149, 150, 153, 156, 157, 158, 159,
       162, 221, 165, 166, 169, 170, 171, 173, 174, 117, 146, 183, 185,
       184, 188, 189, 191, 244, 193, 194, 195, 272, 197, 199, 198,  25,
       202, 203,  38, 276, 207, 209, 210, 211, 208, 216, 154, 176, 217,
       220, 222, 223, 213, 226, 230, 225, 229, 215, 231, 234, 235, 155,
       236, 237, 249, 251, 181])

In [None]:
# 특정 컬럼의 값들이 대상 값에 포함되었는지 여부 확인
df_rel['Area Code'].isin([203])

0        False
1        False
2        False
3        False
4        False
         ...  
21472    False
21473    False
21474    False
21475    False
21476    False
Name: Area Code, Length: 21477, dtype: bool

In [None]:
### test
df_rel.index[df_rel['Area Code'].isin([203])]

Int64Index([17818, 17819, 17820, 17821, 17822, 17823, 17824, 17825, 17826,
            17827,
            ...
            17958, 17959, 17960, 17961, 17962, 17963, 17964, 17965, 17966,
            17967],
           dtype='int64', length=150)

In [None]:
# 특정 컬럼의 값들이 대상 값에 포함되었다면 몇개인지 확인
df_rel['Area Code'].isin([203]).sum()

150

In [None]:
# 특정 컬럼의 값들이 대상 값에 포함되었다면 몇개인지 확인
df_rel['Area Code'].isin([106]).sum()

148

In [None]:
# 특정 컬럼의 값들이 무엇이 있고 몇개씩인지 한꺼번에 출력 (빈도확인)
df_rel['Area Code'].value_counts()

203    150
106    148
79     147
41     146
110    143
      ... 
193     91
213     90
176     86
2       83
122     75
Name: Area Code, Length: 174, dtype: int64

In [None]:
# 선택한 Series의 기초통계 출력
df_rel['Y2007'].describe()

count     21373.000000
mean        508.482104
std        5298.939807
min           0.000000
25%           0.000000
50%           7.000000
75%          80.000000
max      402975.000000
Name: Y2007, dtype: float64

In [None]:
# 선택한 Series의 기초통계 출력
df_rel['Area'].describe()

count     21477
unique      174
top       Spain
freq        150
Name: Area, dtype: object

In [None]:
# 전체 DataFrame의 기초통계 출력
# 기본적으로 숫자값들에 대한 통계만 포함
df_rel.describe()

Unnamed: 0,Area Code,Element Code,latitude,longitude,Y1961,Y1962,Y1963,Y1964,Y1965,Y1966,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
count,21477.0,21477.0,21477.0,21477.0,17938.0,17938.0,17938.0,17938.0,17938.0,17938.0,...,21128.0,21128.0,21373.0,21373.0,21373.0,21373.0,21373.0,21373.0,21477.0,21477.0
mean,125.449411,5211.687154,20.450613,15.794445,195.262069,200.78225,205.4646,209.925577,217.556751,225.988962,...,486.690742,493.153256,496.319328,508.482104,522.844898,524.581996,535.492069,553.399242,560.569214,575.55748
std,72.868149,146.820079,24.628336,66.012104,1864.124336,1884.265591,1861.174739,1862.000116,2014.934333,2100.228354,...,5001.782008,5100.057036,5134.819373,5298.939807,5496.697513,5545.939303,5721.089425,5883.071604,6047.950804,6218.379479
min,1.0,5142.0,-40.9,-172.1,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-169.0,-246.0
25%,63.0,5142.0,6.43,-11.78,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,120.0,5142.0,20.59,19.15,1.0,1.0,1.0,1.0,1.0,1.0,...,6.0,6.0,7.0,7.0,7.0,7.0,7.0,8.0,8.0,8.0
75%,188.0,5142.0,41.15,46.87,21.0,22.0,23.0,24.0,25.0,26.0,...,75.0,77.0,78.0,80.0,82.0,83.0,83.0,86.0,88.0,90.0
max,276.0,5521.0,64.96,179.41,112227.0,109130.0,106356.0,104234.0,119378.0,118495.0,...,360767.0,373694.0,388100.0,402975.0,425537.0,434724.0,451838.0,462696.0,479028.0,489299.0


In [None]:
# 문자도 포함한 DataFrame 기초통계 출력
# 문자 컬럼에선 다른 통계정보가 출력
df_rel.describe(include='all')

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
count,21356,21477.000000,21477,21477,21477,21477.000000,21477,21477,21477.000000,21477.000000,...,21128.000000,21128.000000,21373.000000,21373.000000,21373.000000,21373.000000,21373.000000,21373.000000,21477.000000,21477.00000
unique,173,,174,117,115,,2,1,,,...,,,,,,,,,,
top,ES,,Spain,2905,Milk - Excluding Butter,,Food,1000 tonnes,,,...,,,,,,,,,,
freq,150,,150,347,558,,17528,21477,,,...,,,,,,,,,,
mean,,125.449411,,,,5211.687154,,,20.450613,15.794445,...,486.690742,493.153256,496.319328,508.482104,522.844898,524.581996,535.492069,553.399242,560.569214,575.55748
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
min,,1.000000,,,,5142.000000,,,-40.900000,-172.100000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,-169.000000,-246.00000
25%,,63.000000,,,,5142.000000,,,6.430000,-11.780000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000
50%,,120.000000,,,,5142.000000,,,20.590000,19.150000,...,6.000000,6.000000,7.000000,7.000000,7.000000,7.000000,7.000000,8.000000,8.000000,8.00000
75%,,188.000000,,,,5142.000000,,,41.150000,46.870000,...,75.000000,77.000000,78.000000,80.000000,82.000000,83.000000,83.000000,86.000000,88.000000,90.00000


In [None]:
# 전치행렬: 대각선 기준 대칭으로 회전
df_rel.describe(include='all').T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Area Abbreviation,21356,173,ES,150,,,,,,,
Area Code,21477.0,,,,125.449411,72.868149,1.0,63.0,120.0,188.0,276.0
Area,21477,174,Spain,150,,,,,,,
Item Code,21477,117,2905,347,,,,,,,
Item,21477,115,Milk - Excluding Butter,558,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
Y2009,21373.0,,,,524.581996,5545.939303,0.0,0.0,7.0,83.0,434724.0
Y2010,21373.0,,,,535.492069,5721.089425,0.0,0.0,7.0,83.0,451838.0
Y2011,21373.0,,,,553.399242,5883.071604,0.0,0.0,8.0,86.0,462696.0
Y2012,21477.0,,,,560.569214,6047.950804,-169.0,0.0,8.0,88.0,479028.0


## 값 선택과 다루기(Selection and Indexing)

**1) Series**
> - NumPy 배열과 유사한 원리로 작동되는데 Series는 정수가 아니어도 Indexing이 가능

In [None]:
# 데이터 복사 저장
df_series = df_rel['Area Abbreviation'].copy() # copy() 안쓰면? 각 수정/삭제하면 둘다 반영됨 ~ c언어에서 포인터 처럼..
df_series

0        AF
1        AF
2        AF
3        AF
4        AF
         ..
21472    ZW
21473    ZW
21474    ZW
21475    ZW
21476    ZW
Name: Area Abbreviation, Length: 21477, dtype: object

In [None]:
# 1번과 3번 인덱스에 해당되는 샘플 출력
df_series[[1,3]]

1    AF
3    AF
Name: Area Abbreviation, dtype: object

In [None]:
# 인덱스의 비교 연산 가능
df_series.index < 10

array([ True,  True,  True, ..., False, False, False])

In [None]:
# 인덱스가 10보다 작은 Series만 추출
df_series[df_series.index < 10]

0    AF
1    AF
2    AF
3    AF
4    AF
5    AF
6    AF
7    AF
8    AF
9    AF
Name: Area Abbreviation, dtype: object

In [None]:
# 1이상 5미만의 인덱스 값만 추출
df_series[1:5]

1    AF
2    AF
3    AF
4    AF
Name: Area Abbreviation, dtype: object

In [None]:
# 1이상 3미만의 인덱스 값 변환
df_series[1:3] = 'Test'
df_series

0          AF
1        Test
2        Test
3          AF
4          AF
         ... 
21472      ZW
21473      ZW
21474      ZW
21475      ZW
21476      ZW
Name: Area Abbreviation, Length: 21477, dtype: object

**2) DataFrame**
- [값을 선택하고 Indexing하는 두 가지 주요 옵션이 있으며 특정 목록이나 단일 값을 선택하여 출력가능](http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label)
> - **iloc:** 이름과 상관없이 값의 위치를 기준으로 계산
>> - 하나의 행이 선택되면 Series를 반환하고 여러 행이 선택되면 DataFrame을 반환
>> - 여러 열 또는 여러 행을 선택할 때 선택항목([1:5])에서 선택한 행/열은 1번째 숫자 이상에서 5번째 숫자 미만을 추출
> - **loc:** 값의 위치와 상관없이 실제 이름(인덱스/컬럼명)을 기준으로 계산
>> - Labal / Index / Bool / Logical 기반 값을 전달하여 Series나 DataFrame의 값을 반환
<center><img src='Image/Basic_Pandas_Selection.PNG' width='600'></center>


In [None]:
# 인덱스넘버로 데이터에 접근하는 .iloc[색인]
df_rel.iloc[0] # type : Series

Area Abbreviation                    AF
Area Code                             2
Area                        Afghanistan
Item Code                          2511
Item                 Wheat and products
                            ...        
Y2009                            4538.0
Y2010                            4605.0
Y2011                            4711.0
Y2012                              4810
Y2013                              4895
Name: 0, Length: 63, dtype: object

In [None]:
# 여러개의 인덱스 샘플 출력
df_rel.iloc[0:5] # type : DataFrame

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AF,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
1,AF,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,AF,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AF,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AF,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200


In [None]:
# 특정 간격으로 샘플 출력
df_rel.iloc[:10:2]

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AF,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
2,AF,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
4,AF,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
6,AF,2,Afghanistan,2517,Millet and products,5142,Food,1000 tonnes,33.94,67.71,...,15.0,21.0,11.0,19.0,21.0,18.0,14.0,14.0,14,12
8,AF,2,Afghanistan,2531,Potatoes and products,5142,Food,1000 tonnes,33.94,67.71,...,276.0,294.0,294.0,260.0,242.0,250.0,192.0,169.0,196,230


In [None]:
# 첫번째 0, 10, 20 인덱스 샘플 접근 iloc사용
# 두개 이상의 인덱스가 포함되면 괄호를 두개씩 사용
# 하나만 쓰면 인덱스를의미하는지 컬럼명을 의미하는지 모호
df_rel.iloc[[0, 10, 20]]

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AF,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
10,AF,2,Afghanistan,2537,Sugar beet,5521,Feed,1000 tonnes,33.94,67.71,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
20,AF,2,Afghanistan,2571,Soyabean Oil,5142,Food,1000 tonnes,33.94,67.71,...,6.0,35.0,18.0,21.0,11.0,6.0,15.0,16.0,16,16


In [None]:
# 마지막 인덱스 샘플 출력
df_rel.iloc[[-1]]

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
21476,ZW,181,Zimbabwe,2928,Miscellaneous,5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 특정 컬럼의 위치를 선택하여 출력
# : 기호는 시작부터 끝까지를 의미
# 하나의 컬럼만 선택하면 Series로 출력
df_rel.iloc[:,0]

0        AF
1        AF
2        AF
3        AF
4        AF
         ..
21472    ZW
21473    ZW
21474    ZW
21475    ZW
21476    ZW
Name: Area Abbreviation, Length: 21477, dtype: object

In [None]:
# 하나의 컬럼만 선택하더라도 괄호를 2개를 쓰면 DataFrame로 출력
df_rel.iloc[:,[1]]

Unnamed: 0,Area Code
0,2
1,2
2,2
3,2
4,2
...,...
21472,181
21473,181
21474,181
21475,181


In [None]:
# 마지막 컬럼위치만 선택하여 출력
df_rel.iloc[:,[-1]]

Unnamed: 0,Y2013
0,4895
1,422
2,360
3,89
4,200
...,...
21472,451
21473,15
21474,40
21475,0


In [None]:
# 0번째 이상 2번째 미만 컬럼 추출
df_rel.iloc[:,0:2] # 0번째와 1번쨰 컬럼

Unnamed: 0,Area Abbreviation,Area Code
0,AF,2
1,AF,2
2,AF,2
3,AF,2
4,AF,2
...,...,...
21472,ZW,181
21473,ZW,181
21474,ZW,181
21475,ZW,181


In [None]:
df_rel.iloc[:,[0,2]]  # 0번쨰와 2번쨰 컬럼

Unnamed: 0,Area Abbreviation,Area
0,AF,Afghanistan
1,AF,Afghanistan
2,AF,Afghanistan
3,AF,Afghanistan
4,AF,Afghanistan
...,...,...
21472,ZW,Zimbabwe
21473,ZW,Zimbabwe
21474,ZW,Zimbabwe
21475,ZW,Zimbabwe


In [None]:
# 특정 인덱스와 특정 컬럼의 위치만 추출
# 0, 3, 6, 24번째 행
# 0, 5, 6번째 열
df_rel.iloc[[0,3,6,24],[0,5,6]]

Unnamed: 0,Area Abbreviation,Element Code,Element
0,AF,5142,Food
3,AF,5142,Food
6,AF,5142,Food
24,AF,5142,Food


In [None]:
# 0번째 이상 5번째 미만 행
# 5번째 이상 8번쨰 미만 열
df_rel.iloc[0:5, 5:8]

Unnamed: 0,Element Code,Element,Unit
0,5142,Food,1000 tonnes
1,5142,Food,1000 tonnes
2,5521,Feed,1000 tonnes
3,5142,Food,1000 tonnes
4,5521,Feed,1000 tonnes


In [None]:
# 0번째 위치의 샘플이 아니라 인덱스의 이름이 '0'인 샘플 출력
df_rel.loc[0]

Area Abbreviation                    AF
Area Code                             2
Area                        Afghanistan
Item Code                          2511
Item                 Wheat and products
                            ...        
Y2009                            4538.0
Y2010                            4605.0
Y2011                            4711.0
Y2012                              4810
Y2013                              4895
Name: 0, Length: 63, dtype: object

In [None]:
# 1번째 위치의 샘플이 아니라 인덱스의 이름이 '1'인 샘플 출력
# 괄호를 2개썼기 때문에 DataFrame 형식 출력
df_rel.loc[[1]]

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
1,AF,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422


In [None]:
# 인덱스 이름이 1과 3인 샘플 출력
df_rel.loc[[1,3]] # 순서가 아니라 이름!!!

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
1,AF,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
3,AF,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89


In [None]:
### 특정 열만 출력?
df_rel.loc[[1,3],'Area'] # OK
# df_rel.loc[[1,3],2] # error
df_rel.iloc[[1,3],2] # OK
### df.iloc[ ]는 row와 column의 이름을 그대로 쓰는 것이 아니라 각 row와 column의 인덱스 값으로 인덱싱하는 방법이다.
### https://bearwoong.tistory.com/65

1    Afghanistan
3    Afghanistan
Name: Area, dtype: object

In [None]:
### 합쳐볼까
df_rel.iloc[[1,3]]
df_rel.iloc[[1,3]].loc[:,'Area'] # Series
df_rel.iloc[[1,3]].loc[:,['Area']] # DataFrame

Unnamed: 0,Area
1,Afghanistan
3,Afghanistan


In [None]:
# 컬럼명이 'Item', 'Y2013'인 열
# 컬럼명만 사용한 데이터 출력시 loc기호를 사용하면 오류
# 인덱스인지 컬럼명인지 헷갈릴 수 있기 때문에 loc사용할 때는 인덱스만 또는 인덱스+컬럼명만 가능 (컬렴명만 불가)
# df_rel.loc[['Item','Y2013']]

In [None]:
df_rel[['Item','Y2013']]

Unnamed: 0,Item,Y2013
0,Wheat and products,4895
1,Rice (Milled Equivalent),422
2,Barley and products,360
3,Barley and products,89
4,Maize and products,200
...,...,...
21472,Milk - Excluding Butter,451
21473,"Fish, Seafood",15
21474,"Fish, Seafood",40
21475,"Aquatic Products, Other",0


In [None]:
# 인덱스 이름이 1과 3인 행
# 컬럼명이 'Item', 'Y2013'인 열
df_rel.loc[[1,3],['Item','Y2013']]

Unnamed: 0,Item,Y2013
1,Rice (Milled Equivalent),422
3,Barley and products,89


In [None]:
# 인덱스 이름이 1과 3인 행
# 컬럼명이 'Item' 부터 'Y2013'까지인 열
df_rel.loc[[1,3],'Item':'Y2013']

Unnamed: 0,Item,Element Code,Element,Unit,latitude,longitude,Y1961,Y1962,Y1963,Y1964,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
1,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,183.0,183.0,182.0,220.0,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
3,Barley and products,5142,Food,1000 tonnes,33.94,67.71,237.0,237.0,237.0,238.0,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89


In [None]:
# 인덱스 이름이 1부터 10까지인 행
# 컬럼명이 'Item' 부터 'Y2013'까지인 열
df_rel.loc[1:10,'Item':'Y2013']

Unnamed: 0,Item,Element Code,Element,Unit,latitude,longitude,Y1961,Y1962,Y1963,Y1964,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
1,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,183.0,183.0,182.0,220.0,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,76.0,76.0,76.0,76.0,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,Barley and products,5142,Food,1000 tonnes,33.94,67.71,237.0,237.0,237.0,238.0,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,210.0,210.0,214.0,216.0,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
5,Maize and products,5142,Food,1000 tonnes,33.94,67.71,403.0,403.0,410.0,415.0,...,231.0,67.0,82.0,67.0,69.0,71.0,82.0,73.0,77,76
6,Millet and products,5142,Food,1000 tonnes,33.94,67.71,17.0,18.0,19.0,20.0,...,15.0,21.0,11.0,19.0,21.0,18.0,14.0,14.0,14,12
7,"Cereals, Other",5142,Food,1000 tonnes,33.94,67.71,0.0,0.0,0.0,0.0,...,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0,0
8,Potatoes and products,5142,Food,1000 tonnes,33.94,67.71,111.0,97.0,103.0,110.0,...,276.0,294.0,294.0,260.0,242.0,250.0,192.0,169.0,196,230
9,Sugar cane,5521,Feed,1000 tonnes,33.94,67.71,45.0,45.0,45.0,45.0,...,50.0,29.0,61.0,65.0,54.0,114.0,83.0,83.0,69,81
10,Sugar beet,5521,Feed,1000 tonnes,33.94,67.71,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 컬럼명 확인함수 columns에 인덱싱 및 슬라이싱 적용 가능
df_rel.columns[:5]
df_rel.columns[2:5]

Index(['Area', 'Item Code', 'Item'], dtype='object')

In [None]:
# 열(컬럼) 필터링 쉽게 적용
df_rel[df_rel.columns[:5]]

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item
0,AF,2,Afghanistan,2511,Wheat and products
1,AF,2,Afghanistan,2805,Rice (Milled Equivalent)
2,AF,2,Afghanistan,2513,Barley and products
3,AF,2,Afghanistan,2513,Barley and products
4,AF,2,Afghanistan,2514,Maize and products
...,...,...,...,...,...
21472,ZW,181,Zimbabwe,2948,Milk - Excluding Butter
21473,ZW,181,Zimbabwe,2960,"Fish, Seafood"
21474,ZW,181,Zimbabwe,2960,"Fish, Seafood"
21475,ZW,181,Zimbabwe,2961,"Aquatic Products, Other"


In [None]:
# 반복문으로 컬럼명을 순환하면서 Series에 접근하고,
# 각 컬럼의 고유값의 갯수를 출력
for col in df_rel.columns:
    print(col, df_rel[col].nunique())

Area Abbreviation 173
Area Code 174
Area 174
Item Code 117
Item 115
Element Code 2
Element 2
Unit 1
latitude 173
longitude 174
Y1961 1197
Y1962 1215
Y1963 1209
Y1964 1236
Y1965 1259
Y1966 1263
Y1967 1283
Y1968 1300
Y1969 1309
Y1970 1322
Y1971 1351
Y1972 1360
Y1973 1374
Y1974 1388
Y1975 1405
Y1976 1410
Y1977 1411
Y1978 1463
Y1979 1473
Y1980 1477
Y1981 1469
Y1982 1508
Y1983 1528
Y1984 1540
Y1985 1538
Y1986 1563
Y1987 1592
Y1988 1613
Y1989 1622
Y1990 1621
Y1991 1632
Y1992 1747
Y1993 1785
Y1994 1796
Y1995 1796
Y1996 1807
Y1997 1810
Y1998 1844
Y1999 1859
Y2000 1892
Y2001 1881
Y2002 1909
Y2003 1935
Y2004 1944
Y2005 1963
Y2006 1987
Y2007 1994
Y2008 2028
Y2009 2029
Y2010 2046
Y2011 2081
Y2012 2084
Y2013 2107


In [None]:
df_rel['Item'] 

0              Wheat and products
1        Rice (Milled Equivalent)
2             Barley and products
3             Barley and products
4              Maize and products
                   ...           
21472     Milk - Excluding Butter
21473               Fish, Seafood
21474               Fish, Seafood
21475     Aquatic Products, Other
21476               Miscellaneous
Name: Item, Length: 21477, dtype: object

In [None]:
# 'Item' Seires의 값이 'Sugar beet'인지 여부 출력
df_rel['Item'] == 'Sugar beet'

0        False
1        False
2        False
3        False
4        False
         ...  
21472    False
21473    False
21474    False
21475    False
21476    False
Name: Item, Length: 21477, dtype: bool

In [None]:
# 조건문을 통해 샘플 인덱싱
# 'Item' Seires의 값이 'Sugar beet'인지 여부를 조건으로 사용하여 True인 인덱스 이름의 샘플 출력
df_rel.loc[df_rel['Item'] == 'Sugar beet']

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
10,AF,2,Afghanistan,2537,Sugar beet,5521,Feed,1000 tonnes,33.94,67.71,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
103,AL,3,Albania,2537,Sugar beet,5521,Feed,1000 tonnes,41.15,20.17,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1,1
699,AM,1,Armenia,2537,Sugar beet,5521,Feed,1000 tonnes,40.07,45.04,...,1.0,2.0,1.0,2.0,1.0,3.0,1.0,1.0,0,1
832,AU,10,Australia,2537,Sugar beet,5521,Feed,1000 tonnes,-25.27,133.78,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
1099,AZ,52,Azerbaijan,2537,Sugar beet,5521,Feed,1000 tonnes,40.14,47.58,...,1.0,1.0,4.0,3.0,4.0,4.0,6.0,6.0,4,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20020,AE,225,United Arab Emirates,2537,Sugar beet,5521,Feed,1000 tonnes,23.42,53.85,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0,0
20676,UZ,235,Uzbekistan,2537,Sugar beet,5521,Feed,1000 tonnes,41.38,64.59,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
20900,VE,236,Venezuela (Bolivarian Republic of),2537,Sugar beet,5142,Food,1000 tonnes,6.42,-66.59,...,17.0,20.0,23.0,21.0,21.0,21.0,30.0,35.0,20,22
21135,YE,249,Yemen,2537,Sugar beet,5521,Feed,1000 tonnes,15.55,48.52,...,0.0,0.0,1.0,13.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# loc을 사용하지 않아도 각 인덱스 이름마다 True/Fasle가 분명하기 때문에 출력 가능
df_rel[df_rel['Item'] == 'Sugar beet']

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
10,AF,2,Afghanistan,2537,Sugar beet,5521,Feed,1000 tonnes,33.94,67.71,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
103,AL,3,Albania,2537,Sugar beet,5521,Feed,1000 tonnes,41.15,20.17,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1,1
699,AM,1,Armenia,2537,Sugar beet,5521,Feed,1000 tonnes,40.07,45.04,...,1.0,2.0,1.0,2.0,1.0,3.0,1.0,1.0,0,1
832,AU,10,Australia,2537,Sugar beet,5521,Feed,1000 tonnes,-25.27,133.78,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
1099,AZ,52,Azerbaijan,2537,Sugar beet,5521,Feed,1000 tonnes,40.14,47.58,...,1.0,1.0,4.0,3.0,4.0,4.0,6.0,6.0,4,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20020,AE,225,United Arab Emirates,2537,Sugar beet,5521,Feed,1000 tonnes,23.42,53.85,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0,0
20676,UZ,235,Uzbekistan,2537,Sugar beet,5521,Feed,1000 tonnes,41.38,64.59,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
20900,VE,236,Venezuela (Bolivarian Republic of),2537,Sugar beet,5142,Food,1000 tonnes,6.42,-66.59,...,17.0,20.0,23.0,21.0,21.0,21.0,30.0,35.0,20,22
21135,YE,249,Yemen,2537,Sugar beet,5521,Feed,1000 tonnes,15.55,48.52,...,0.0,0.0,1.0,13.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 'Item' Seires의 값이 'Sugar beet'인지 여부를 조건으로 사용하여 True인 인덱스 이름의 샘플 출력
# 'Area' 컬럼명만 선택하여 Series 출력
df_rel.loc[df_rel['Item'] == 'Sugar beet', 'Area']

10                              Afghanistan
103                                 Albania
699                                 Armenia
832                               Australia
1099                             Azerbaijan
                        ...                
20020                  United Arab Emirates
20676                            Uzbekistan
20900    Venezuela (Bolivarian Republic of)
21135                                 Yemen
21374                              Zimbabwe
Name: Area, Length: 66, dtype: object

In [None]:
# 'Item' Seires의 값이 'Sugar beet'인지 여부를 조건으로 사용하여 True인 인덱스 이름의 샘플 출력
# 'Area' 컬럼명만 괄호 2개로 선택하여 DataFrame 출력
df_rel.loc[df_rel['Item'] == 'Sugar beet', ['Area']]

Unnamed: 0,Area
10,Afghanistan
103,Albania
699,Armenia
832,Australia
1099,Azerbaijan
...,...
20020,United Arab Emirates
20676,Uzbekistan
20900,Venezuela (Bolivarian Republic of)
21135,Yemen


In [None]:
# 'Item' Seires의 값이 'Sugar beet'인지 여부를 조건으로 사용하여 True인 인덱스 이름의 샘플 출력
# 'Area', 'Item', 'latitude' 컬럼명 출력
df_rel.loc[df_rel['Item'] == 'Sugar beet', ['Area', 'Item', 'latitude']]

Unnamed: 0,Area,Item,latitude
10,Afghanistan,Sugar beet,33.94
103,Albania,Sugar beet,41.15
699,Armenia,Sugar beet,40.07
832,Australia,Sugar beet,-25.27
1099,Azerbaijan,Sugar beet,40.14
...,...,...,...
20020,United Arab Emirates,Sugar beet,23.42
20676,Uzbekistan,Sugar beet,41.38
20900,Venezuela (Bolivarian Republic of),Sugar beet,6.42
21135,Yemen,Sugar beet,15.55


In [None]:
# 'Item' Seires의 값이 'Sugar beet'인지 여부를 조건으로 사용하여 True인 인덱스 이름의 샘플 출력
# 'Area'부터 'latitude'컬럼까지 출력
df_rel.loc[df_rel['Item'] == 'Sugar beet', 'Area':'latitude']

Unnamed: 0,Area,Item Code,Item,Element Code,Element,Unit,latitude
10,Afghanistan,2537,Sugar beet,5521,Feed,1000 tonnes,33.94
103,Albania,2537,Sugar beet,5521,Feed,1000 tonnes,41.15
699,Armenia,2537,Sugar beet,5521,Feed,1000 tonnes,40.07
832,Australia,2537,Sugar beet,5521,Feed,1000 tonnes,-25.27
1099,Azerbaijan,2537,Sugar beet,5521,Feed,1000 tonnes,40.14
...,...,...,...,...,...,...,...
20020,United Arab Emirates,2537,Sugar beet,5521,Feed,1000 tonnes,23.42
20676,Uzbekistan,2537,Sugar beet,5521,Feed,1000 tonnes,41.38
20900,Venezuela (Bolivarian Republic of),2537,Sugar beet,5142,Food,1000 tonnes,6.42
21135,Yemen,2537,Sugar beet,5521,Feed,1000 tonnes,15.55


In [None]:
df_rel.loc[df_rel['Item'] == 'Sugar beet', ['Area']]

Unnamed: 0,Area
10,Afghanistan
103,Albania
699,Armenia
832,Australia
1099,Azerbaijan
...,...
20020,United Arab Emirates
20676,Uzbekistan
20900,Venezuela (Bolivarian Republic of)
21135,Yemen


In [None]:
# 'Item' Seires의 값이 'Sugar beet'인지 여부를 조건으로 사용하여 True인 인덱스 이름의 샘플 출력
# 'Area' 컬럼명만 괄호 2개로 선택하여 DataFrame 출력
# 컬럼의 unique 값과 갯수를 출력
df_rel.loc[df_rel['Item'] == 'Sugar beet', ['Area']].value_counts()

Area              
Germany               2
Afghanistan           1
Russian Federation    1
Mali                  1
Malta                 1
                     ..
Italy                 1
Jordan                1
Kazakhstan            1
Kenya                 1
Zimbabwe              1
Length: 65, dtype: int64

In [None]:
# 컬럼의 unique 값과 갯수를 출력하되 빈도수에 따라 내림차순으로 정렬
df_rel.loc[df_rel['Item'] == 'Sugar beet', ['Area']].value_counts(ascending=True)

Area       
Afghanistan    1
Lithuania      1
Mali           1
Malta          1
Mexico         1
              ..
Jordan         1
Kazakhstan     1
Kuwait         1
Zimbabwe       1
Germany        2
Length: 65, dtype: int64

In [None]:
# array 형식으로 변환
df_rel.loc[df_rel['Item'] == 'Sugar beet', ['Area']].values

array([['Afghanistan'],
       ['Albania'],
       ['Armenia'],
       ['Australia'],
       ['Azerbaijan'],
       ['Belarus'],
       ['Botswana'],
       ['Brunei Darussalam'],
       ['Bulgaria'],
       ['Canada'],
       ['China, Hong Kong SAR'],
       ['China, Macao SAR'],
       ['China, mainland'],
       ['Colombia'],
       ["C?e d'Ivoire"],
       ['Croatia'],
       ['Cyprus'],
       ['Czechia'],
       ['Ecuador'],
       ['Egypt'],
       ['El Salvador'],
       ['Estonia'],
       ['Germany'],
       ['Germany'],
       ['Greece'],
       ['Honduras'],
       ['Iceland'],
       ['Iraq'],
       ['Italy'],
       ['Jordan'],
       ['Kazakhstan'],
       ['Kenya'],
       ['Kuwait'],
       ['Latvia'],
       ['Lebanon'],
       ['Lithuania'],
       ['Mali'],
       ['Malta'],
       ['Mexico'],
       ['Montenegro'],
       ['Nepal'],
       ['New Zealand'],
       ['Niger'],
       ['Norway'],
       ['Oman'],
       ['Philippines'],
       ['Republic of Korea'],
 

- **행과 열의 삭제:** "drop"기능 사용
> - 열 또는 여러 열을 삭제하려면 열 이름을 사용하고 "axis"를 1로 지정
> - drop 함수는 열이 제거 된 새 DataFrame을 반환하며 원래 DataFrame 값이 수정되려면 drop 함수 내 **"inplace"** 변수를 True로 설정
> - "axis = 0"을 지정하여“drop”기능을 사용하면 행이 제거
> - drop 함수는 숫자 인덱싱이 아닌 "Label" 기반으로 행을 제거하며, 숫자위치나 Index를 기준으로 행을 삭제하려면 iloc을 사용하여 가능

In [None]:
# 'Area' 컬럼 삭제
df_rel.drop("Area", axis=1)

Unnamed: 0,Area Abbreviation,Area Code,Item Code,Item,Element Code,Element,Unit,latitude,longitude,Y1961,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AF,2,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,1928.0,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
1,AF,2,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,183.0,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,AF,2,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,76.0,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AF,2,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,237.0,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AF,2,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,210.0,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21472,ZW,181,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,230.0,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21473,ZW,181,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,27.0,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21474,ZW,181,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,6.0,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21475,ZW,181,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 'Area' 컬럼 삭제 다른 방법
df_rel.drop(columns='Area')

Unnamed: 0,Area Abbreviation,Area Code,Item Code,Item,Element Code,Element,Unit,latitude,longitude,Y1961,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AF,2,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,1928.0,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
1,AF,2,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,183.0,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,AF,2,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,76.0,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AF,2,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,237.0,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AF,2,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,210.0,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21472,ZW,181,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,230.0,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21473,ZW,181,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,27.0,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21474,ZW,181,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,6.0,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21475,ZW,181,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 컬럼명 확인
df_rel.columns

Index(['Area Abbreviation', 'Area Code', 'Area', 'Item Code', 'Item',
       'Element Code', 'Element', 'Unit', 'latitude', 'longitude', 'Y1961',
       'Y1962', 'Y1963', 'Y1964', 'Y1965', 'Y1966', 'Y1967', 'Y1968', 'Y1969',
       'Y1970', 'Y1971', 'Y1972', 'Y1973', 'Y1974', 'Y1975', 'Y1976', 'Y1977',
       'Y1978', 'Y1979', 'Y1980', 'Y1981', 'Y1982', 'Y1983', 'Y1984', 'Y1985',
       'Y1986', 'Y1987', 'Y1988', 'Y1989', 'Y1990', 'Y1991', 'Y1992', 'Y1993',
       'Y1994', 'Y1995', 'Y1996', 'Y1997', 'Y1998', 'Y1999', 'Y2000', 'Y2001',
       'Y2002', 'Y2003', 'Y2004', 'Y2005', 'Y2006', 'Y2007', 'Y2008', 'Y2009',
       'Y2010', 'Y2011', 'Y2012', 'Y2013'],
      dtype='object')

In [None]:
# 실제 DataFrame에 삭제되지 않은 이유는 inplace=True라는 파라미터를 사용하지 않았기 때문
# 파이썬에선 무분별한 삭제를 막기 위해 별도 파라미터 입력을 사용해서 확실하게 삭제
df_test = df_rel.copy()
df_test.drop("Area", axis=1, inplace=True)
# is same as
# df_test = df_test.drop('Area', axis=1)

In [None]:
df_test.columns

Index(['Area Abbreviation', 'Area Code', 'Item Code', 'Item', 'Element Code',
       'Element', 'Unit', 'latitude', 'longitude', 'Y1961', 'Y1962', 'Y1963',
       'Y1964', 'Y1965', 'Y1966', 'Y1967', 'Y1968', 'Y1969', 'Y1970', 'Y1971',
       'Y1972', 'Y1973', 'Y1974', 'Y1975', 'Y1976', 'Y1977', 'Y1978', 'Y1979',
       'Y1980', 'Y1981', 'Y1982', 'Y1983', 'Y1984', 'Y1985', 'Y1986', 'Y1987',
       'Y1988', 'Y1989', 'Y1990', 'Y1991', 'Y1992', 'Y1993', 'Y1994', 'Y1995',
       'Y1996', 'Y1997', 'Y1998', 'Y1999', 'Y2000', 'Y2001', 'Y2002', 'Y2003',
       'Y2004', 'Y2005', 'Y2006', 'Y2007', 'Y2008', 'Y2009', 'Y2010', 'Y2011',
       'Y2012', 'Y2013'],
      dtype='object')

In [None]:
# 여러개의 컬럼명 "Y2011", "Y2012", "Y2013" 삭제
df_rel.drop(["Y2011", "Y2012", "Y2013"], axis=1)

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2001,Y2002,Y2003,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010
0,AF,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,2668.0,2776.0,3095.0,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0
1,AF,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,411.0,448.0,460.0,419.0,445.0,546.0,455.0,490.0,415.0,442.0
2,AF,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,29.0,70.0,48.0,58.0,236.0,262.0,263.0,230.0,379.0,315.0
3,AF,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,83.0,122.0,144.0,185.0,43.0,44.0,48.0,62.0,55.0,60.0
4,AF,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,48.0,89.0,63.0,120.0,208.0,233.0,249.0,247.0,195.0,178.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21472,ZW,181,Zimbabwe,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,439.0,360.0,386.0,373.0,357.0,359.0,356.0,341.0,385.0,418.0
21473,ZW,181,Zimbabwe,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,1.0,0.0,5.0,4.0,9.0,6.0,9.0,5.0,15.0
21474,ZW,181,Zimbabwe,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,16.0,14.0,18.0,14.0,17.0,14.0,15.0,18.0,29.0
21475,ZW,181,Zimbabwe,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
#  0,1,5번째의 인덱스 삭제
df_rel.drop([0,1,5], axis=0)

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
2,AF,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AF,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AF,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
6,AF,2,Afghanistan,2517,Millet and products,5142,Food,1000 tonnes,33.94,67.71,...,15.0,21.0,11.0,19.0,21.0,18.0,14.0,14.0,14,12
7,AF,2,Afghanistan,2520,"Cereals, Other",5142,Food,1000 tonnes,33.94,67.71,...,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21472,ZW,181,Zimbabwe,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21473,ZW,181,Zimbabwe,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21474,ZW,181,Zimbabwe,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21475,ZW,181,Zimbabwe,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
### 열 삭제해보기
# https://sparkbyexamples.com/pandas/pandas-drop-rows-from-dataframe/#:~:text=In%20order%20to%20remove%20the,the%20last%20row%20use%20df.
df_rel.drop(df_rel.index[[0,1,5]]) # (1) index 이용

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
2,AF,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AF,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AF,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
6,AF,2,Afghanistan,2517,Millet and products,5142,Food,1000 tonnes,33.94,67.71,...,15.0,21.0,11.0,19.0,21.0,18.0,14.0,14.0,14,12
7,AF,2,Afghanistan,2520,"Cereals, Other",5142,Food,1000 tonnes,33.94,67.71,...,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21472,ZW,181,Zimbabwe,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21473,ZW,181,Zimbabwe,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21474,ZW,181,Zimbabwe,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21475,ZW,181,Zimbabwe,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
### (2) iloc 이용 https://velog.io/@dlskawns/Dataframe-%EB%82%B4-Column-row-%EC%84%A0%ED%83%9D-%EC%A0%9C%EA%B1%B0-%EC%B6%94%EA%B0%80-%EB%B3%80%EA%B2%BD%ED%95%98%EA%B8%B0pandas
# 0~4번째 인덱스 샘플을 삭제하는 것은 5번째 이상의 인덱스 샘플만 선택하는 것과 같음
df_rel.iloc[5:,]

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
5,AF,2,Afghanistan,2514,Maize and products,5142,Food,1000 tonnes,33.94,67.71,...,231.0,67.0,82.0,67.0,69.0,71.0,82.0,73.0,77,76
6,AF,2,Afghanistan,2517,Millet and products,5142,Food,1000 tonnes,33.94,67.71,...,15.0,21.0,11.0,19.0,21.0,18.0,14.0,14.0,14,12
7,AF,2,Afghanistan,2520,"Cereals, Other",5142,Food,1000 tonnes,33.94,67.71,...,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0,0
8,AF,2,Afghanistan,2531,Potatoes and products,5142,Food,1000 tonnes,33.94,67.71,...,276.0,294.0,294.0,260.0,242.0,250.0,192.0,169.0,196,230
9,AF,2,Afghanistan,2536,Sugar cane,5521,Feed,1000 tonnes,33.94,67.71,...,50.0,29.0,61.0,65.0,54.0,114.0,83.0,83.0,69,81
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21472,ZW,181,Zimbabwe,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21473,ZW,181,Zimbabwe,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21474,ZW,181,Zimbabwe,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21475,ZW,181,Zimbabwe,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


## 행과 열의 이름바꾸기(Renaming)

- **행 이름 변경**
> - **DataFrame.index = "값"** 형태로 수정 가능("값"의 길이는 index의 길이와 같아야 함)
> - **DataFrame.set_index(column_name)** 함수로 특정 column을 index로 반영 가능
> - **DataFrame.reset_index()** 함수로 초기 index로 변경 가능

- **열 이름 변경**
> - **DataFrame.columns = "값"** 형태로 수정 가능("값"의 길이는 column의 길이와 같아야 함)
> - **DataFrame.rename(columns={'Old':'New'})** 함수로 쉽게 수행 가능
> - **{'old_column_name': 'new_column_name',…}** 형식으로 이전이름과 새이름을 dictionary 형태로 mapping하여 변경


In [None]:
df_rel.index

RangeIndex(start=0, stop=21477, step=1)

In [None]:
# 인덱스 값을 임의의 숫자범위로 바꾸기
# 임의의 숫자범위는 기존 인덱스의 갯수와 같아야 함
df_rel.index = range(100,21577)
df_rel

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
100,AF,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
101,AF,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
102,AF,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
103,AF,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
104,AF,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21572,ZW,181,Zimbabwe,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21573,ZW,181,Zimbabwe,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21574,ZW,181,Zimbabwe,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21575,ZW,181,Zimbabwe,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 인덱스를 특정 컬럼명으로 바꿈
df_rel.set_index("Area")

Unnamed: 0_level_0,Area Abbreviation,Area Code,Item Code,Item,Element Code,Element,Unit,latitude,longitude,Y1961,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,AF,2,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,1928.0,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
Afghanistan,AF,2,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,183.0,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
Afghanistan,AF,2,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,76.0,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
Afghanistan,AF,2,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,237.0,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
Afghanistan,AF,2,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,210.0,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zimbabwe,ZW,181,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,230.0,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
Zimbabwe,ZW,181,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,27.0,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
Zimbabwe,ZW,181,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,6.0,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
Zimbabwe,ZW,181,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 인덱스를 특정 컬럼명으로 바꿈
# 인덱스 이름 중에서 'Afghanistan'으로 샘플 삭제
df_rel.set_index("Area").drop("Afghanistan", axis=0)

Unnamed: 0_level_0,Area Abbreviation,Area Code,Item Code,Item,Element Code,Element,Unit,latitude,longitude,Y1961,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Albania,AL,3,2511,Wheat and products,5521,Feed,1000 tonnes,41.15,20.17,10.0,...,28.0,28.0,30.0,28.0,28.0,30.0,26.0,25.0,20,18
Albania,AL,3,2511,Wheat and products,5142,Food,1000 tonnes,41.15,20.17,166.0,...,449.0,468.0,422.0,425.0,435.0,415.0,432.0,439.0,440,440
Albania,AL,3,2805,Rice (Milled Equivalent),5521,Feed,1000 tonnes,41.15,20.17,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
Albania,AL,3,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,41.15,20.17,2.0,...,23.0,24.0,30.0,27.0,20.0,23.0,24.0,21.0,22,25
Albania,AL,3,2513,Barley and products,5521,Feed,1000 tonnes,41.15,20.17,2.0,...,9.0,4.0,9.0,2.0,3.0,4.0,7.0,8.0,7,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zimbabwe,ZW,181,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,230.0,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
Zimbabwe,ZW,181,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,27.0,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
Zimbabwe,ZW,181,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,6.0,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
Zimbabwe,ZW,181,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# inplace=True 파라미터를 사용하지 않아서 실제 인덱스는 변경되지 않음
# 데이터 삭제 뿐만 아니라 인덱스를 바꾸는 것도 무분별환 변환을 막기 위해 파라미터 입력 추가
# inplace=True 파라미터 사용하여 Area 컬럼명을 인덱스로 설정
df_rel.set_index("Area", inplace=True)
df_rel

Unnamed: 0_level_0,Area Abbreviation,Area Code,Item Code,Item,Element Code,Element,Unit,latitude,longitude,Y1961,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,AF,2,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,1928.0,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
Afghanistan,AF,2,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,183.0,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
Afghanistan,AF,2,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,76.0,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
Afghanistan,AF,2,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,237.0,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
Afghanistan,AF,2,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,210.0,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zimbabwe,ZW,181,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,230.0,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
Zimbabwe,ZW,181,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,27.0,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
Zimbabwe,ZW,181,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,6.0,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
Zimbabwe,ZW,181,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 기존 인덱스로 복원하기
# 복원의 경우도 무분별한 복원을 막으려 inplace=True 파라미터를 써야 최종 변환
# 새 인덱스가 첫번째 컬럼명으로 반영
df_rel.reset_index(inplace=True)
df_rel

Unnamed: 0,Area,Area Abbreviation,Area Code,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,Afghanistan,AF,2,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
1,Afghanistan,AF,2,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,Afghanistan,AF,2,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,Afghanistan,AF,2,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,Afghanistan,AF,2,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21472,Zimbabwe,ZW,181,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21473,Zimbabwe,ZW,181,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21474,Zimbabwe,ZW,181,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21475,Zimbabwe,ZW,181,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 여러가지의 기능을 연속적으로 적용 가능
# 파이썬의 경우 .(점) 기호를 붙여 연속적인 연산이 가능
# 1) 'Area' 컬럼을 인덱스로 적용
# 2) 인덱스명이 'Afghanistan' 샘플 삭제
# 3) 기존 인덱스로 복원
df_rel = df_rel.set_index('Area').drop('Afghanistan', axis=0).reset_index()
df_rel

Unnamed: 0,Area,Area Abbreviation,Area Code,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,Albania,AL,3,2511,Wheat and products,5521,Feed,1000 tonnes,41.15,20.17,...,28.0,28.0,30.0,28.0,28.0,30.0,26.0,25.0,20,18
1,Albania,AL,3,2511,Wheat and products,5142,Food,1000 tonnes,41.15,20.17,...,449.0,468.0,422.0,425.0,435.0,415.0,432.0,439.0,440,440
2,Albania,AL,3,2805,Rice (Milled Equivalent),5521,Feed,1000 tonnes,41.15,20.17,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
3,Albania,AL,3,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,41.15,20.17,...,23.0,24.0,30.0,27.0,20.0,23.0,24.0,21.0,22,25
4,Albania,AL,3,2513,Barley and products,5521,Feed,1000 tonnes,41.15,20.17,...,9.0,4.0,9.0,2.0,3.0,4.0,7.0,8.0,7,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21389,Zimbabwe,ZW,181,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21390,Zimbabwe,ZW,181,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21391,Zimbabwe,ZW,181,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21392,Zimbabwe,ZW,181,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
df_test = df_rel.copy()
df_test.columns

Index(['Area', 'Area Abbreviation', 'Area Code', 'Item Code', 'Item',
       'Element Code', 'Element', 'Unit', 'latitude', 'longitude', 'Y1961',
       'Y1962', 'Y1963', 'Y1964', 'Y1965', 'Y1966', 'Y1967', 'Y1968', 'Y1969',
       'Y1970', 'Y1971', 'Y1972', 'Y1973', 'Y1974', 'Y1975', 'Y1976', 'Y1977',
       'Y1978', 'Y1979', 'Y1980', 'Y1981', 'Y1982', 'Y1983', 'Y1984', 'Y1985',
       'Y1986', 'Y1987', 'Y1988', 'Y1989', 'Y1990', 'Y1991', 'Y1992', 'Y1993',
       'Y1994', 'Y1995', 'Y1996', 'Y1997', 'Y1998', 'Y1999', 'Y2000', 'Y2001',
       'Y2002', 'Y2003', 'Y2004', 'Y2005', 'Y2006', 'Y2007', 'Y2008', 'Y2009',
       'Y2010', 'Y2011', 'Y2012', 'Y2013'],
      dtype='object')

In [None]:
# df_test 데이터에서 컬럼명을 Test0 ~ test63까지로 바꾸기
df_test.columns = ['Test'+str(i) for i in range(len(df_rel.columns))]
df_test

Unnamed: 0,Test0,Test1,Test2,Test3,Test4,Test5,Test6,Test7,Test8,Test9,...,Test53,Test54,Test55,Test56,Test57,Test58,Test59,Test60,Test61,Test62
0,Albania,AL,3,2511,Wheat and products,5521,Feed,1000 tonnes,41.15,20.17,...,28.0,28.0,30.0,28.0,28.0,30.0,26.0,25.0,20,18
1,Albania,AL,3,2511,Wheat and products,5142,Food,1000 tonnes,41.15,20.17,...,449.0,468.0,422.0,425.0,435.0,415.0,432.0,439.0,440,440
2,Albania,AL,3,2805,Rice (Milled Equivalent),5521,Feed,1000 tonnes,41.15,20.17,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
3,Albania,AL,3,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,41.15,20.17,...,23.0,24.0,30.0,27.0,20.0,23.0,24.0,21.0,22,25
4,Albania,AL,3,2513,Barley and products,5521,Feed,1000 tonnes,41.15,20.17,...,9.0,4.0,9.0,2.0,3.0,4.0,7.0,8.0,7,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21389,Zimbabwe,ZW,181,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21390,Zimbabwe,ZW,181,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21391,Zimbabwe,ZW,181,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21392,Zimbabwe,ZW,181,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 딕셔너리 형태로 반영하여 이전 컬럼명을 신규 컬럼명으로 전환
# Area 컬럼명을 New_Area로 변환
df_rel.rename(columns={'Area':'New_Area'})

Unnamed: 0,New_Area,Area Abbreviation,Area Code,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,Albania,AL,3,2511,Wheat and products,5521,Feed,1000 tonnes,41.15,20.17,...,28.0,28.0,30.0,28.0,28.0,30.0,26.0,25.0,20,18
1,Albania,AL,3,2511,Wheat and products,5142,Food,1000 tonnes,41.15,20.17,...,449.0,468.0,422.0,425.0,435.0,415.0,432.0,439.0,440,440
2,Albania,AL,3,2805,Rice (Milled Equivalent),5521,Feed,1000 tonnes,41.15,20.17,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
3,Albania,AL,3,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,41.15,20.17,...,23.0,24.0,30.0,27.0,20.0,23.0,24.0,21.0,22,25
4,Albania,AL,3,2513,Barley and products,5521,Feed,1000 tonnes,41.15,20.17,...,9.0,4.0,9.0,2.0,3.0,4.0,7.0,8.0,7,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21389,Zimbabwe,ZW,181,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21390,Zimbabwe,ZW,181,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21391,Zimbabwe,ZW,181,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21392,Zimbabwe,ZW,181,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 딕셔너리 형태로 반영하여 이전 컬럼명을 신규 컬럼명으로 전환
# Area 컬럼명을 New_Area로 변환
# Y2013 컬럼명을 Year_2013로 변환
df_rel.rename(columns={'Area':'New_Area',
                       'Y2013':'Year_2013'})

Unnamed: 0,New_Area,Area Abbreviation,Area Code,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Year_2013
0,Albania,AL,3,2511,Wheat and products,5521,Feed,1000 tonnes,41.15,20.17,...,28.0,28.0,30.0,28.0,28.0,30.0,26.0,25.0,20,18
1,Albania,AL,3,2511,Wheat and products,5142,Food,1000 tonnes,41.15,20.17,...,449.0,468.0,422.0,425.0,435.0,415.0,432.0,439.0,440,440
2,Albania,AL,3,2805,Rice (Milled Equivalent),5521,Feed,1000 tonnes,41.15,20.17,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
3,Albania,AL,3,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,41.15,20.17,...,23.0,24.0,30.0,27.0,20.0,23.0,24.0,21.0,22,25
4,Albania,AL,3,2513,Barley and products,5521,Feed,1000 tonnes,41.15,20.17,...,9.0,4.0,9.0,2.0,3.0,4.0,7.0,8.0,7,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21389,Zimbabwe,ZW,181,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21390,Zimbabwe,ZW,181,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21391,Zimbabwe,ZW,181,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21392,Zimbabwe,ZW,181,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# lambda 함수를 사용하여 직접 컬럼명을 지정
# 컬럼명을 모두 대문자로 반영
# 공백은 _기로홀 변환
# inplace=True 파라미터로 실제값도 변경
df_rel.rename(columns=lambda x: x.upper().replace(' ', '_'), inplace=True)
df_rel

Unnamed: 0,AREA,AREA_ABBREVIATION,AREA_CODE,ITEM_CODE,ITEM,ELEMENT_CODE,ELEMENT,UNIT,LATITUDE,LONGITUDE,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,Albania,AL,3,2511,Wheat and products,5521,Feed,1000 tonnes,41.15,20.17,...,28.0,28.0,30.0,28.0,28.0,30.0,26.0,25.0,20,18
1,Albania,AL,3,2511,Wheat and products,5142,Food,1000 tonnes,41.15,20.17,...,449.0,468.0,422.0,425.0,435.0,415.0,432.0,439.0,440,440
2,Albania,AL,3,2805,Rice (Milled Equivalent),5521,Feed,1000 tonnes,41.15,20.17,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
3,Albania,AL,3,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,41.15,20.17,...,23.0,24.0,30.0,27.0,20.0,23.0,24.0,21.0,22,25
4,Albania,AL,3,2513,Barley and products,5521,Feed,1000 tonnes,41.15,20.17,...,9.0,4.0,9.0,2.0,3.0,4.0,7.0,8.0,7,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21389,Zimbabwe,ZW,181,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21390,Zimbabwe,ZW,181,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21391,Zimbabwe,ZW,181,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21392,Zimbabwe,ZW,181,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


In [None]:
# 확인하기
df_rel.columns

Index(['AREA', 'AREA_ABBREVIATION', 'AREA_CODE', 'ITEM_CODE', 'ITEM',
       'ELEMENT_CODE', 'ELEMENT', 'UNIT', 'LATITUDE', 'LONGITUDE', 'Y1961',
       'Y1962', 'Y1963', 'Y1964', 'Y1965', 'Y1966', 'Y1967', 'Y1968', 'Y1969',
       'Y1970', 'Y1971', 'Y1972', 'Y1973', 'Y1974', 'Y1975', 'Y1976', 'Y1977',
       'Y1978', 'Y1979', 'Y1980', 'Y1981', 'Y1982', 'Y1983', 'Y1984', 'Y1985',
       'Y1986', 'Y1987', 'Y1988', 'Y1989', 'Y1990', 'Y1991', 'Y1992', 'Y1993',
       'Y1994', 'Y1995', 'Y1996', 'Y1997', 'Y1998', 'Y1999', 'Y2000', 'Y2001',
       'Y2002', 'Y2003', 'Y2004', 'Y2005', 'Y2006', 'Y2007', 'Y2008', 'Y2009',
       'Y2010', 'Y2011', 'Y2012', 'Y2013'],
      dtype='object')

## 결과 출력(Exporting and Saving)

- **목적:** 데이터 전처리 또는 정리가 끝난 결과를 "파일"로 저장해 두는 것이 필요
> - to_csv는 DataFrame을 CSV 파일에 기록
> - to_excel은 DataFrame 정보를 Excel 파일에 기록

<center><img src='Image/io_readwrite.svg' width='700'></center>

In [None]:
# 데이터프레임을 csv 파일로 저장
# index=False 파라미터는 인덱스 이름은 저장하지 않는다는 의미
df_rel.to_csv("Tutorial_Pandas_Output.csv", index=False)

In [None]:
!pip install openpyxl
from openpyxl.workbook import Workbook

Collecting openpyxl
  Downloading openpyxl-3.0.9-py2.py3-none-any.whl (242 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.2/242.2 KB[0m [31m467.2 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.9
You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.7/bin/python3 -m pip install --upgrade pip' command.[0m[33m
[0m

In [None]:
# 데이터프레임을 excel 파일로 저장
# excel 파일이 csv 파일과 달리 sheet 이름을 적용하므로 sheet_name 파라미터로 변경가능
# index=False 파라미터는 인덱스 이름은 저장하지 않는다는 의미
df_rel.to_excel("Tutorial_Pandas_Output.xlsx", sheet_name="Sheet 1", index=False)

## 데이터프레임들의 병합(Merge and Join DataFrames)

> - 실제 현업에서는 각 부서의 목적에 맞게 또는 보안상 여러개의 데이터프레임으로 분산되어 저장 보유
> - 데이터가 분산될 경우 데이터의 크기가 작아지고 다루기가 쉬워지며 필요에 따라 다른 데이터를 병합함
> - 데이터 프레임 병합 및 결합은 데이터 분석가 지망생이 마스터해야 하는 핵심 프로세스
>> - 두 데이터의 병합은 "공통속성/공통열" 기반 하나의 데이터로 가져오고 각각의 행 정렬 프로세스

### SQL 방식

- pandasql 패키지를 사용하여 파이썬에서 sql 언어의 로직으로 데이터 처리 가능

In [None]:
# pandasql 설치
!pip install pandasql

Collecting pandasql
  Downloading pandasql-0.7.3.tar.gz (26 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting sqlalchemy
  Downloading SQLAlchemy-1.4.36-cp37-cp37m-macosx_10_14_x86_64.whl (1.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m0m
Collecting greenlet!=0.4.17
  Downloading greenlet-1.1.2-cp37-cp37m-macosx_10_14_x86_64.whl (92 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 KB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m00:01[0m
Using legacy 'setup.py install' for pandasql, since package 'wheel' is not installed.
Installing collected packages: greenlet, sqlalchemy, pandasql
  Running setup.py install for pandasql ... [?25ldone
[?25hSuccessfully installed greenlet-1.1.2 pandasql-0.7.3 sqlalchemy-1.4.36
You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.7/bin/python3 -m pip install --upgrade pip' command.[0m

In [None]:
# pandasql 패키지를 psql 이름으로 불러오기
import pandasql as psql

#### 값 선택과 다루기

In [None]:
# 데이터 불러오기
import pandas as pd

user_usage = pd.read_csv(r'.\Data\Merge\Data_Usage.csv')
user_device = pd.read_csv(r'.\Data\Merge\Data_Device.csv')
display(user_usage.head(), user_device.head())

FileNotFoundError: [Errno 2] No such file or directory: '.\\Data\\Merge\\Data_Usage.csv'

```sql
select 컬럼명
from 테이블명
where 조건식
group by 컬럼명
having 조건식
order by 컬럼명
```

- **select:** 필수 구성으로 선택할 컬럼을 지칭하며 $*$기호는 모든 열 의미

- **from:** 필수 구성으로 컬럼을 가지고 올 데이터 지칭
- **where:** 선택 구성으로 특정 조건을 만족하는 데이터 추출
- **group by:** 선택 구성으로 컬럼들의 우선순위를 부여하여 그룹핑
- **having:** 선택 구성으로 group by 연산시 조건을 만족하는 경우 부여
- **order by:** 선택 구성으로 오름차순/내림차순 부여

In [None]:
# 모든 열 선택 출력
psql.sqldf("select * from user_device")

In [None]:
# 특정 열만 선택 출력
psql.sqldf("select user_id, use_id, platform, device, use_type_id from user_device")

In [None]:
# 특정 조건을 만족하는 열 선택 출력
psql.sqldf("select * from user_device \
            where platform_version >= 10")

In [None]:
# 특정 조건을 만족하는 열 선택 출력
psql.sqldf("select * from user_device \
            where platform_version >= 10 and use_type_id = 3")

In [None]:
# 특정 조건을 만족하는 열의 연산 출력
psql.sqldf("select user_id, count(device) from user_device \
            where platform_version >= 10 and use_type_id = 3")

In [None]:
# 특정 조건을 만족하는 열의 연산 출력
psql.sqldf("select user_id, avg(platform_version) from user_device \
            where platform_version >= 10 and use_type_id = 3")

In [None]:
# 특정 조건을 만족하는 열의 연산하고 그룹핑하여 퍼진정도 확인
psql.sqldf("select user_id, avg(platform_version) from user_device \
            where platform_version >= 10 and use_type_id = 3 \
            group by user_id")

In [None]:
# 특정 조건을 만족하는 열의 연산하고 그룹핑하여 퍼진정도 확인
psql.sqldf("select user_id, avg(platform_version), count(use_id), device from user_device \
            where platform_version >= 10 and use_type_id = 3 \
            group by user_id")

In [None]:
# 특정 조건을 만족하는 열의 연산하고 특정 조건을 만족하는 그룹핑 파악
psql.sqldf("select user_id, avg(platform_version), count(use_id), device from user_device \
            where platform_version >= 10 and use_type_id = 3 \
            group by user_id \
            having avg(platform_version) >= 10.2")

In [None]:
# 특정 조건을 만족하는 열의 연산하고 특정 조건을 만족하는 그룹핑을 정렬하여 파악
psql.sqldf("select user_id, avg(platform_version), count(use_id), device from user_device \
            where platform_version >= 10 and use_type_id = 3 \
            group by user_id \
            having avg(platform_version) >= 10.2 \
            order by count(use_id) desc")

In [None]:
# 특정 조건을 만족하는 열의 연산하고 특정 조건을 만족하는 그룹핑을 정렬하여 파악
psql.sqldf("select user_id, avg(platform_version), count(use_id), device from user_device \
            where platform_version >= 10 and use_type_id = 3 \
            group by user_id \
            having avg(platform_version) >= 10.2 \
            order by count(use_id) asc")

In [None]:
# 변수로 저장하여 출력 가능
user_device_filter = psql.sqldf("select user_id, avg(platform_version), count(use_id), device from user_device \
                                where platform_version >= 10 and use_type_id = 3 \
                                group by user_id \
                                having avg(platform_version) >= 10.2 \
                                order by count(use_id) asc")
user_device_filter

#### 병합하기

![](https://i.stack.imgur.com/hMKKt.jpg)

```sql
select (테이블)A.컬럼명, (테이블)B.컬럼명
from 기준테이블 A
[something] join 조인테이블 B
on (테이블)A.기준키 = (테이블)B.기준키
```

- **조인(join):** 두개 이상의 테이블이나 데이터베이스를 연결하여 데이터 검색 방법
> - **[inner] join:** 교집합으로 A테이블과 B테이블이 모두 가진 데이터를 결합하여 출력
> - **[left outer] join:** A테이블의 모든 데이터와 B테이블의 중복 데이터만 결합하여 출력
> - **[right outer] join:** B테이블의 모든 데이터와 A테이블의 중복 데이터만 결합하여 출력
> - **[full outer] join:** 합집합으로 A테이블과 B테이블을 모두 결합하여 출력
> - **[left outer] join (if null):** A테이블의 모든 데이터 중와 B테이블의 중복 데이터만 결합하는데 B테이블의 key를 가진 행은 삭제하여 출력
> - **[right outer] join (if null):** B테이블의 모든 데이터 중와 A테이블의 중복 데이터만 결합하는데 A테이블의 key를 가진 행은 삭제하여 출력
> - **참고:** 결합과정에서 값이 없는 경우는 NaN으로 채워짐


In [None]:
# 로딩된 데이터 정리
display(user_device, user_device_filter, user_usage)

In [None]:
# user_device, use_device_filter 결합하여 정리
psql.sqldf("select A.* from user_device A \
            inner join user_device_filter B \
            on A.user_id = B.user_id")

In [None]:
# 새로운 이름으로 저장
user_device_new = psql.sqldf("select A.* from user_device A \
            inner join user_device_filter B \
            on A.user_id = B.user_id")
user_device_new

In [None]:
# user_device_new와 user_usage 데이터 inner join 결합
# 상위 플랫폼을 쓰는 고객의 사용 기록 없음
psql.sqldf("select A.*, B.* from user_device_new A \
            inner join user_usage B \
            on A.use_id = B.use_id")

In [None]:
# user_device와 user_usage 데이터 inner join 결합
psql.sqldf("select A.*, B.* from user_device A \
            inner join user_usage B \
            on A.use_id = B.use_id")

In [None]:
# user_device와 user_usage 데이터 inner join 결합 및 연산 등 지원
psql.sqldf("select A.*, B.* from user_device A \
            inner join user_usage B \
            on A.use_id = B.use_id \
            where outgoing_mins_per_month >= 500")

In [None]:
# user_device와 user_usage 데이터 inner join 결합 및 연산 등 지원
psql.sqldf("select A.*, B.* from user_device A \
            inner join user_usage B \
            on A.use_id = B.use_id \
            where outgoing_mins_per_month >= 500 \
            group by user_id")

In [None]:
# user_device와 user_usage 데이터 inner join 결합 및 연산 등 지원
psql.sqldf("select A.*, B.* from user_device A \
            inner join user_usage B \
            on A.use_id = B.use_id \
            where outgoing_mins_per_month >= 500 \
            group by user_id \
            having platform_version >= 6.0 \
            order by platform_version desc")

In [None]:
psql.sqldf("select A.*, B.* from user_device A \
            inner join user_usage B \
            on A.use_id = B.use_id")

In [None]:
# left join 결과 확인
psql.sqldf("select A.*, B.* from user_device A \
            left join user_usage B \
            on A.use_id = B.use_id")

In [None]:
# right join 결과 확인
# right join의 오류 가능성으로 left join의 기능 유도
# psql.sqldf("select A.*, B.* from user_device A \
#             right join user_usage B \
#             on A.use_id = B.use_id")

In [None]:
# right join 결과 확인
psql.sqldf("select A.*, B.* from user_usage B \
            left join user_device A \
            on A.use_id = B.use_id")

### PANDAS 방식

![](https://i.stack.imgur.com/hMKKt.jpg)

```python
pd.merge(left=기준 데이터프레임, 
         right=병합할 데이터프레임,
         how=병합할 방식,    # {'left', 'right', 'outer', 'inner', 'cross'}, default 'inner'
         [on=key값],    # 두 데이터프레임의 key 값이 같을 경우
         [left_on=left_key값]    # 기존 데이터프레임 key값 설정
         [right_on=right_key값])    # 병합할 데이터프레임 key값 설정
```

In [None]:
# 예시 데이터 생성
df1 = pd.DataFrame({
    '이름': ['원영', '사쿠라', '유리', '예나', '유진', '나코', '은비', '혜원', '채원', '민주'],
    '국어': [100, 70, 70, 70, 60, 90, 90, 70, 70, 80],
    '영어': [100, 90, 80, 50, 70, 100, 70, 90, 100, 100]
    }, columns=['이름', '국어', '영어'])

df2 = pd.DataFrame({
    '일어': [80, 100, 100, 90, 70, 50, 100],
    '수학': [90, 70, 100, 80, 70, 80, 90],
    'name': ['원영', '사쿠라', '나코', '히토미', '예나', '은비', '째욘'],
    }, columns=['일어', '수학', 'name'])

display(df1, df2)

Unnamed: 0,이름,국어,영어
0,원영,100,100
1,사쿠라,70,90
2,유리,70,80
3,예나,70,50
4,유진,60,70
5,나코,90,100
6,은비,90,70
7,혜원,70,90
8,채원,70,100
9,민주,80,100


Unnamed: 0,일어,수학,name
0,80,90,원영
1,100,70,사쿠라
2,100,100,나코
3,90,80,히토미
4,70,70,예나
5,50,80,은비
6,100,90,째욘


In [None]:
# key 값이 다른 경우 inner 결합 예시
pd.merge(df1, df2, left_on='이름', right_on='name', how='inner')

Unnamed: 0,이름,국어,영어,일어,수학,name
0,원영,100,100,80,90,원영
1,사쿠라,70,90,100,70,사쿠라
2,예나,70,50,70,70,예나
3,나코,90,100,100,100,나코
4,은비,90,70,50,80,은비


In [None]:
# key 값이 다른 경우 outer 결합 예시
# 서로 값이 없는 경우는 NaN으로 채워짐
pd.merge(df1, df2, left_on='이름', right_on='name', how='outer')

Unnamed: 0,이름,국어,영어,일어,수학,name
0,원영,100.0,100.0,80.0,90.0,원영
1,사쿠라,70.0,90.0,100.0,70.0,사쿠라
2,유리,70.0,80.0,,,
3,예나,70.0,50.0,70.0,70.0,예나
4,유진,60.0,70.0,,,
...,...,...,...,...,...,...
7,혜원,70.0,90.0,,,
8,채원,70.0,100.0,,,
9,민주,80.0,100.0,,,
10,,,,90.0,80.0,히토미


In [None]:
# key 값이 다른 경우 left 결합 예시
# 서로 값이 없는 경우는 NaN으로 채워짐
pd.merge(df1, df2, left_on='이름', right_on='name', how='left')

Unnamed: 0,이름,국어,영어,일어,수학,name
0,원영,100,100,80.0,90.0,원영
1,사쿠라,70,90,100.0,70.0,사쿠라
2,유리,70,80,,,
3,예나,70,50,70.0,70.0,예나
4,유진,60,70,,,
5,나코,90,100,100.0,100.0,나코
6,은비,90,70,50.0,80.0,은비
7,혜원,70,90,,,
8,채원,70,100,,,
9,민주,80,100,,,


In [None]:
# key 값이 다른 경우 right 결합 예시
# 서로 값이 없는 경우는 NaN으로 채워짐
pd.merge(df1, df2, left_on='이름', right_on='name', how='right')

Unnamed: 0,이름,국어,영어,일어,수학,name
0,원영,100.0,100.0,80,90,원영
1,사쿠라,70.0,90.0,100,70,사쿠라
2,나코,90.0,100.0,100,100,나코
3,,,,90,80,히토미
4,예나,70.0,50.0,70,70,예나
5,은비,90.0,70.0,50,80,은비
6,,,,100,90,째욘


In [None]:
# 데이터 불러오기
import pandas as pd

user_usage = pd.read_csv(r'.\Data\Merge\Data_Usage.csv')
user_device = pd.read_csv(r'.\Data\Merge\Data_Device.csv')
display(user_usage.head(), user_device.head())

In [None]:
psql.sqldf("select A.*, B.* from user_device A \
            inner join user_usage B \
            on A.use_id = B.use_id")

In [None]:
# pandas 사용 inner join 결합
pd.merge(user_device, user_usage, how='inner', on='use_id')

In [None]:
psql.sqldf("select A.*, B.* from user_device A \
            left join user_usage B \
            on A.use_id = B.use_id")

In [None]:
# pandas 사용 left join 결합
pd.merge(user_device, user_usage, how='left', on='use_id')

In [None]:
psql.sqldf("select A.*, B.* from user_usage B \
            left join user_device A \
            on A.use_id = B.use_id")

In [None]:
# pandas 사용 right join 결합
pd.merge(user_device, user_usage, how='right', on='use_id')

In [None]:
# 변수로 저장하여 파이썬 함수 사용가능
result = pd.merge(user_device, user_usage, how='inner', on='use_id')
result.shape

In [None]:
# pandas 기능 사용 가능
result.describe(include='all')

## Questions

아마존 구매내역 데이터로 주어진 Task를 작성/출력 하며, 모든 답안의 길이는 한줄부터 여러줄이 가능함

- **데이터 정보**

> - **파일명:** 
>> 1. ecommerce_userinfo.csv
>> 2. ecommerce_userpurchase.csv
> - **파일위치:** 
>> 1. r'.\Data\Ecommerce\ecommerce_userinfo.csv'
>> 2. r'.\Data\Ecommerce\ecommerce_userpurchase.csv'

In [None]:
# 데이터 로딩
# 1) 파일위치의 2개의 csv 파일을 불러들임
# 2) 각 DataFrame의 이름은 1. user_info, 2. user_purchase로 변수명 지정


In [None]:
# user_info와 user_purchase의 첫 10개의 샘플 출력


In [None]:
# user_info와 user_purchase의 행과 열의 크기 출력


In [None]:
# user_info와 user_purchase의 정보를 출력하여 데이터의 타입(Dtype)을 확인


In [None]:
# user_info와 user_purchase의 컬럼명 출력


In [None]:
# user_info 데이터에서 1000, 3000, 5000, 7000번째 인덱스 행 삭제 후 저장


In [None]:
# user_purchase 데이터에서 Company, Browser Info 열 삭제 후 저장


In [None]:
# user_purchase 데이터에서 컬럼명 변경 후 저장
# Language를 대문자 LANGUAGE로 변경


In [None]:
# user_info와 user_purchase의 기초통계 출력
# 문자값이 있다면 문자값도 통계정보를 출력


In [None]:
# user_info와 user_purchase의 각 컬럼별 고유값(nunique)의 갯수를 출력
# 컬럼명과 고유값을 함께 출력 ex) IP Address 100


In [None]:
# user_info와 user_purchase에서,
# (반복문과 조건문 사용하여) 고유값(nunique) 갯수와 데이터프레임의 행의 수가 일치하는 컬럼명 출력


In [None]:
# 위에서 찾은 컬럼명들 중 데이터 병합시 key값으로 사용할 컬럼명을 아래 주석에 작성
# user_info : 
# user_purchase : 

In [None]:
# user_info와 user_purchase를 inner join으로 병합
# 1) 병합 방법은 pandas를 사용할 것
# 2) 병합된 데이터는 ecommerce 변수명으로 저장할 것


In [None]:
# 병합 데이터의 기초통계 출력
# 문자값이 있다면 문자값도 통계정보를 출력
# 최종 출력은 대칭으로 회전하여 출력


In [None]:
# ecommerce 데이터에서 가장 많은 거래에 사용된 이메일 주소는 무엇인가?


In [None]:
# ecommerce 데이터에서 구매가격의 최소값, 평균, 중앙값, 최대값은 얼마인가?


In [None]:
# 영어(en)를 사용하여 거래를 한 IP 주소의 갯수는 얼마인가?


In [None]:
# 직업이 "Lawyer"인 사람이 거래를 한 IP 주소의 갯수는 얼마인가?


In [None]:
# 오전과 오후의 거래량은 어떻게 되는가?


In [None]:
# 거래량 top 5의 직업은 무엇인가?


In [None]:
# 카드번호가 4917375496558677로 결제한 고객이 취소를 요청한다 고객의 이메일은 무엇인가?


In [None]:
# 카드종류가(CC Provider) American Express를 사용하여 99달러 이상 주문한 거래내역을 출력하라
