<a href="https://colab.research.google.com/github/yooste/SOLUX_COVID_19/blob/main/python_data_analysis/08_pandas_real_data_processing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **raw data를 pandas와 파이썬으로 조작해서 그래프 만들어보기**


###**1. 데이터 시각화란?**
* 데이터 분석 결과를 쉽게 이해할 수 있도록 시각적으로 표현하고 전달되는 과정
* 탐색적 데이터 분석, 데이터 처리, 데이터 예측 모든 경우, 결과를 알아보기 쉽게 하기 위해 데이터 시각화는 필수적임
* 다양한 시각화 기법 중, 가장 최신의 흥미로운 데이터 시각화 과정을 진행해보기로 함
  * https://app.flourish.studio

###**2. 데이터 시각화를 위한 데이터 포멧 이해**
* 데이터 시각화를 위해, raw data를 변환해야 함
* 지금까지 익힌 데이터 처리 기술을 사용해서, 데이터를 변환하기로 함
* 필요 데이터
  * 국가명, 국기, 날짜별 확진자 수

###**3. raw data 가져오기**

In [1]:
import pandas as pd
PATH = "COVID-19-master/csse_covid_19_data/csse_covid_19_daily_reports/"
doc = pd.read_csv(PATH + "04-01-2020.csv", encoding="utf-8-sig")
doc.head()

Unnamed: 0,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key
0,45001.0,Abbeville,South Carolina,US,2020-04-01 21:58:49,34.223334,-82.461707,4,0,0,0,"Abbeville, South Carolina, US"
1,22001.0,Acadia,Louisiana,US,2020-04-01 21:58:49,30.295065,-92.414197,47,1,0,0,"Acadia, Louisiana, US"
2,51001.0,Accomack,Virginia,US,2020-04-01 21:58:49,37.767072,-75.632346,7,0,0,0,"Accomack, Virginia, US"
3,16001.0,Ada,Idaho,US,2020-04-01 21:58:49,43.452658,-116.241552,195,3,0,0,"Ada, Idaho, US"
4,19001.0,Adair,Iowa,US,2020-04-01 21:58:49,41.330756,-94.471059,1,0,0,0,"Adair, Iowa, US"


In [2]:
doc = pd.read_csv(PATH + "03-01-2020.csv", encoding="utf-8-sig")
doc.head()

Unnamed: 0,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered,Latitude,Longitude
0,Hubei,Mainland China,2020-03-01T10:13:19,66907,2761,31536,30.9756,112.2707
1,,South Korea,2020-03-01T23:43:03,3736,17,30,36.0,128.0
2,,Italy,2020-03-01T23:23:02,1694,34,83,43.0,12.0
3,Guangdong,Mainland China,2020-03-01T14:13:18,1349,7,1016,23.3417,113.4244
4,Henan,Mainland China,2020-03-01T14:13:18,1272,22,1198,33.882,113.614


* 3월 중순 데이터까지는 컬럼명이 Province/State, Country/Region 이고, 이후에는 Province_State, Country_Region이므로, try except 구문을 사용해서, 데이터 조작

In [3]:
doc = pd.read_csv(PATH + "01-22-2020.csv", encoding="utf-8-sig")
try:
  doc = doc[['Province_State','Country_Region','Confirmed']]  # 1. 특정 컬럼만 선택해서 데이터프레임 만들기
except:
  doc = doc[['Province/State','Country/Region','Confirmed']]  # 1. 특정 컬럼만 선택해서 데이터프레임 만들기
  doc.columns = ['Province_State','Country_Region','Confirmed']

doc.head()

Unnamed: 0,Province_State,Country_Region,Confirmed
0,Anhui,Mainland China,1.0
1,Beijing,Mainland China,14.0
2,Chongqing,Mainland China,6.0
3,Fujian,Mainland China,1.0
4,Gansu,Mainland China,


###**4. 데이터프레임 데이터 변환하기**
1. 특정 컬럼만 선택해서 데이터프레임 만들기
2. 특정 컬럼에 없는 데이터 삭제하기
3. 특정 컬럼의 데이터 타입 변경하기

In [4]:
doc = pd.read_csv(PATH + "01-22-2020.csv", encoding="utf-8-sig")
try:
  doc = doc[['Province_State','Country_Region','Confirmed']]    # 1. 특정 컬럼만 선택해서 데이터프레임 만들기
except:
  doc = doc[['Province/State','Country/Region','Confirmed']]    # 1. 특정 컬럼만 선택해서 데이터프레임 만들기
  doc.columns = ['Province_State','Country_Region','Confirmed']
doc = doc.dropna(subset=['Confirmed'])    # 2. 특정 컬럼에 없는 데이터 삭제하기
doc = doc.astype({'Confirmed':'int64'})   # 3. 특정 컬럼의 데이터 타입 변경하기
# 삭제를 먼저 한 후에 데이터타입을 변경해야 에러가 나지 않는다.
doc.head()

Unnamed: 0,Province_State,Country_Region,Confirmed
0,Anhui,Mainland China,1
1,Beijing,Mainland China,14
2,Chongqing,Mainland China,6
3,Fujian,Mainland China,1
5,Guangdong,Mainland China,26


* 국가 정보 가져오기

In [5]:
country_info = pd.read_csv("COVID-19-master/csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv",encoding="utf-8-sig")
country_info.head()

Unnamed: 0,UID,iso2,iso3,code3,FIPS,Admin2,Province_State,Country_Region,Lat,Long_,Combined_Key,Population
0,4,AF,AFG,4.0,,,,Afghanistan,33.93911,67.709953,Afghanistan,38928341.0
1,8,AL,ALB,8.0,,,,Albania,41.1533,20.1683,Albania,2877800.0
2,12,DZ,DZA,12.0,,,,Algeria,28.0339,1.6596,Algeria,43851043.0
3,20,AD,AND,20.0,,,,Andorra,42.5063,1.5218,Andorra,77265.0
4,24,AO,AGO,24.0,,,,Angola,-11.2027,17.8739,Angola,32866268.0


* 두 데이터프레임 합쳐보기

In [6]:
test_df = pd.merge(doc, country_info, how='left', on='Country_Region')
test_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3481 entries, 0 to 3480
Data columns (total 14 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Province_State_x  3429 non-null   object 
 1   Country_Region    3481 non-null   object 
 2   Confirmed         3481 non-null   int64  
 3   UID               3455 non-null   float64
 4   iso2              3455 non-null   object 
 5   iso3              3455 non-null   object 
 6   code3             3455 non-null   float64
 7   FIPS              3382 non-null   float64
 8   Admin2            3341 non-null   object 
 9   Province_State_y  3452 non-null   object 
 10  Lat               3334 non-null   float64
 11  Long_             3334 non-null   float64
 12  Combined_Key      3455 non-null   object 
 13  Population        3334 non-null   float64
dtypes: float64(6), int64(1), object(7)
memory usage: 407.9+ KB


* 잘못 매칭된 국가 정보 확인하기
  * iso2 컬럼이 매칭되지 ㅇ낳은 확진자 수 국가 확인해보기

In [7]:
test_df.isnull().sum()

Province_State_x     52
Country_Region        0
Confirmed             0
UID                  26
iso2                 26
iso3                 26
code3                26
FIPS                 99
Admin2              140
Province_State_y     29
Lat                 147
Long_               147
Combined_Key         26
Population          147
dtype: int64

In [8]:
nan_rows = test_df[test_df['iso2'].isnull()]
nan_rows.head()

Unnamed: 0,Province_State_x,Country_Region,Confirmed,UID,iso2,iso3,code3,FIPS,Admin2,Province_State_y,Lat,Long_,Combined_Key,Population
0,Anhui,Mainland China,1,,,,,,,,,,,
1,Beijing,Mainland China,14,,,,,,,,,,,
2,Chongqing,Mainland China,6,,,,,,,,,,,
3,Fujian,Mainland China,1,,,,,,,,,,,
4,Guangdong,Mainland China,26,,,,,,,,,,,


**컬럼값 변경하기**
* Country_Region 국가명이 다양한 경우가 많음
* 각 케이스를 일괄적으로 변경할 키값이 존재하지 않고, 키가 될 수 있는 컬럼도 다양하고, 각 파일마다 키가 될 수 있는 컬럼이 변경되어, 키값으로 매칭이 불가하였음
* 이에 각 케이스를 직접 확인해서, 국가명을 일관되게 변경할 수 있도록 별도 json 파일 작성
* json 파일 기반으로 국가명을 일관되게 변경하기로 함

json.load() 함수로 파일로된 json 데이터를 사전처럼 다룰 수 있음

In [9]:
import json

with open('COVID-19-master/csse_covid_19_data/country_convert.json','r',encoding='utf-8-sig') as json_file:
  json_data = json.load(json_file)
  print(json_data.keys())

dict_keys(['Mainland China', 'Macau', 'South Korea', 'Aruba', ' Azerbaijan', 'Bahamas, The', 'Cape Verde', 'Cayman Islands', 'Channel Islands', 'Curacao', 'Czech Republic', 'East Timor', 'Faroe Islands', 'French Guiana', 'Gambia, The', 'Gibraltar', 'Greenland', 'Guadeloupe', 'Guam', 'Guernsey', 'Hong Kong', 'Hong Kong SAR', 'Iran (Islamic Republic of)', 'Ivory Coast', 'Jersey', 'Macao SAR', 'Martinique', 'Mayotte', 'North Ireland', 'Palestine', 'Puerto Rico', 'Republic of Ireland', 'Republic of Korea', 'Republic of Moldova', 'Republic of the Congo', 'Reunion', 'Russian Federation', 'Saint Barthelemy', 'Saint Martin', 'St. Martin', 'Taipei and environs', 'The Bahamas', 'The Gambia', 'UK', 'Vatican City', 'Viet Nam', 'occupied Palestinian territory', 'Taiwan*', 'Malawi', 'South Sudan', 'Western Sahara', 'Namibia'])


**apply() 함수 사용법**
* apply() 함수를 사용해서, 특정 컬럼값 변경 가능

In [10]:
df = pd.DataFrame({
    '영어':[60,70],
    '수학':[100,50]
}, index=['Dave','David'])
df

Unnamed: 0,영어,수학
Dave,60,100
David,70,50


In [11]:
def func(df_data):
  print(type(df_data))
  print(df_data.index)
  print(df_data.values)
  return df_data



> 참고로 행이 두 개인데, 3번 func가 호출되는 이유는 apply() 함수 자체가, 첫번째 행에 대해서는 두 번 호출하도록 구현되어 있기 때문임(전체 행의 처리를 위한 최적화 기법 적용 가능 여부를 확인코자 이와 같이 구현됨)



In [12]:
df_func = df.apply(func, axis=0)    # axis=0(default): 각각의 열이 인자로 들어온다

<class 'pandas.core.series.Series'>
Index(['Dave', 'David'], dtype='object')
[60 70]
<class 'pandas.core.series.Series'>
Index(['Dave', 'David'], dtype='object')
[100  50]


In [13]:
df_func = df.apply(func, axis=1)

<class 'pandas.core.series.Series'>
Index(['영어', '수학'], dtype='object')
[ 60 100]
<class 'pandas.core.series.Series'>
Index(['영어', '수학'], dtype='object')
[70 50]


In [14]:
df = pd.DataFrame({
    '영어':[60,70],
    '수학':[100,50]
}, index = ['Dave','David'])
df

Unnamed: 0,영어,수학
Dave,60,100
David,70,50


In [15]:
def func(df_data):
  df_data['영어']=80
  return df_data

In [16]:
df_func = df.apply(func, axis=1)

In [17]:
df_func

Unnamed: 0,영어,수학
Dave,80,100
David,80,50


###**apply() 함수를 사용해서, 국가 컬럼값 변경하기**

* 사전 작업 (doc 변수로 데이터프레임 파일 만들기)

In [18]:
import pandas as pd

doc = pd.read_csv(PATH + '01-22-2020.csv',encoding='utf-8-sig')
try:
  doc = doc[['Province_State','Country_Region','Confirmed']]    # 1. 특정 컬럼만 선택해서 데이터프레임 만들기
except:
  doc = doc[['Province/State','Country/Region','Confirmed']]    # 1. 특정 컬럼만 선택해서 데이터프레임 만들기
  doc.columns = ['Province_State','Country_Region','Confirmed']
doc = doc.dropna(subset=['Confirmed'])  # 2. 특정 컬럼에 없는 데이터 삭제하기
doc = doc.astype({'Confirmed':'int64'}) # 3. 특정 컬럼의 데이터 타입 변경하기
doc.head()

Unnamed: 0,Province_State,Country_Region,Confirmed
0,Anhui,Mainland China,1
1,Beijing,Mainland China,14
2,Chongqing,Mainland China,6
3,Fujian,Mainland China,1
5,Guangdong,Mainland China,26


* 변경할 국가명을 가지고 있는 json 파일 읽기

In [19]:
import json

with open('COVID-19-master/csse_covid_19_data/country_convert.json','r',encoding='utf-8-sig') as json_file:
  json_data = json.load(json_file)
  print(json_data.keys())

dict_keys(['Mainland China', 'Macau', 'South Korea', 'Aruba', ' Azerbaijan', 'Bahamas, The', 'Cape Verde', 'Cayman Islands', 'Channel Islands', 'Curacao', 'Czech Republic', 'East Timor', 'Faroe Islands', 'French Guiana', 'Gambia, The', 'Gibraltar', 'Greenland', 'Guadeloupe', 'Guam', 'Guernsey', 'Hong Kong', 'Hong Kong SAR', 'Iran (Islamic Republic of)', 'Ivory Coast', 'Jersey', 'Macao SAR', 'Martinique', 'Mayotte', 'North Ireland', 'Palestine', 'Puerto Rico', 'Republic of Ireland', 'Republic of Korea', 'Republic of Moldova', 'Republic of the Congo', 'Reunion', 'Russian Federation', 'Saint Barthelemy', 'Saint Martin', 'St. Martin', 'Taipei and environs', 'The Bahamas', 'The Gambia', 'UK', 'Vatican City', 'Viet Nam', 'occupied Palestinian territory', 'Taiwan*', 'Malawi', 'South Sudan', 'Western Sahara', 'Namibia'])


* Country_Region 이라는 컬럼값을 확인해서, 국가명이 다르게 기재되어 있을 경우에만, 지정한 국가명으로 변경

In [20]:
def func(row):
  if row['Country_Region'] in json_data:
    row['Country_Region'] = json_data[row['Country_Region']]
  return row

In [21]:
doc = doc.apply(func, axis=1)
doc.head()

Unnamed: 0,Province_State,Country_Region,Confirmed
0,Anhui,China,1
1,Beijing,China,14
2,Chongqing,China,6
3,Fujian,China,1
5,Guangdong,China,26


**참고: 파일명으로 데이터 변환하기**

* lstrip(): 앞에(왼쪽에)서 특정 데이터 삭제하기, rstrip(): 뒤에(오른쪽에)서 특정 데이터 삭제하기
* replace(변경전데이터, 변경후데이터): 문자열에서 변경전데이터를 변경후데이터로 변경

In [24]:
data = '01-22-2020.csv'
date_column = data.split('.')[0].lstrip('0').replace('-','/')

In [25]:
doc.columns

Index(['Province_State', 'Country_Region', 'Confirmed'], dtype='object')

In [26]:
doc.columns = ['Province_State','Country_Region',date_column]
doc.columns

Index(['Province_State', 'Country_Region', '1/22/2020'], dtype='object')

In [27]:
doc.head()

Unnamed: 0,Province_State,Country_Region,1/22/2020
0,Anhui,China,1
1,Beijing,China,14
2,Chongqing,China,6
3,Fujian,China,1
5,Guangdong,China,26


###**5. 중복 데이터 합치기**
* groupby(): 그룹별로 데이터를 집계하는 함수
  * 동일한 컬럼값으로 묶어서 통계 또는 평균등을 확인할 수 있음

In [29]:
df = pd.DataFrame({
    '성별':['남','남','남'],
    '이름':['David','Dave','Dave'],
    '수학':[100,50,80],
    '국어':[80,70,50]
})
df

Unnamed: 0,성별,이름,수학,국어
0,남,David,100,80
1,남,Dave,50,70
2,남,Dave,80,50


* groupby()로 묶은 경우 숫자가 아닌 값으로 된 열은 없어진다.('성별' 열이 없어진 것을 확인할 수 있다.)
* groupby()로 한 컬럼은 결과 데이터프레임의 인덱스가 된다.

In [30]:
df.groupby('이름').mean()

Unnamed: 0_level_0,수학,국어
이름,Unnamed: 1_level_1,Unnamed: 2_level_1
Dave,65,60
David,100,80


In [31]:
df.groupby('이름').sum()

Unnamed: 0_level_0,수학,국어
이름,Unnamed: 1_level_1,Unnamed: 2_level_1
Dave,130,120
David,100,80


* 국가별 총 확진자수 구하기

In [32]:
import pandas as pd

doc = pd.read_csv(PATH + '01-22-2020.csv',encoding='utf-8-sig')
try:
  doc = doc[['Province_State','Country_Region','Confirmed']]    # 1. 특정 컬럼만 선택해서 데이터프레임 만들기
except:
  doc = doc[['Province/State','Country/Region','Confirmed']]    # 1. 특정 컬럼만 선택해서 데이터프레임 만들기
  doc.columns = ['Province_State','Country_Region','Confirmed']
doc = doc.dropna(subset=['Confirmed'])  # 2. 특정 컬럼에 없는 데이터 삭제하기
doc = doc.astype({'Confirmed':'int64'}) # 3. 특정 컬럼의 데이터 타입 변경하기
doc.head()

Unnamed: 0,Province_State,Country_Region,Confirmed
0,Anhui,Mainland China,1
1,Beijing,Mainland China,14
2,Chongqing,Mainland China,6
3,Fujian,Mainland China,1
5,Guangdong,Mainland China,26


In [33]:
doc.groupby('Country_Region').sum()   # 4. Country_Region 컬럼값이 동일한 케이스를 그룹화해서, 각 그룹별 합계 확인하기

Unnamed: 0_level_0,Confirmed
Country_Region,Unnamed: 1_level_1
Japan,2
Macau,1
Mainland China,547
South Korea,1
Taiwan,1
Thailand,2
US,1


###**6. 데이터 전처리하기**
* 지금까지의 과정을 모두 한 데 모아서, 함수로 만들기
  1. csv 파일 읽기
  2. 'Country_Region','Confirmed' 두 개의 컬럼만 가져오기
  3. 'Confirmed'에 데이터가 없는 행 삭제하기
  4. 'Country_Region'의 국가명을 여러 파일에 일관되게 변경하기
  5. 'Confirmed' 데이터 타입을 int64(정수)로 변경
  6. 'Country_Region'를 기준으로 중복된 데이터를 합치기
  7. 파일명을 기반으로 날짜 문자열 변환하고, 'Confirmed' 컬럼명 변경하기

In [36]:
import json

with open('COVID-19-master/csse_covid_19_data/country_convert.json','r',encoding='utf-8-sig') as json_file:
  json_data = json.load(json_file)

def country_name_convert(row):
  if row['Country_Region'] in json_data:
    return json_data[row['Country_Region']]
  return row['Country_Region']

def create_dateframe(filename):

  doc = pd.read_csv(PATH + filename, encoding='utf-8-sig')  # 1. csv  파일 읽기
  try:
    doc = doc[['Country_Region','Confirmed']]   # 2. 특정 컬럼만 선택해서 데이터프레임 만들기
  except:
    doc = doc[['Country/Region','Confirmed']]   # 2. 특정 컬럼만 선택해서 데이터프레임 만들기
    doc.columns = ['Country_Region','Confirmed']
  doc = doc.dropna(subset=['Confirmed'])        # 3. 특정 컬럼에 없는 데이터 삭제하기
  doc['Country_Region'] = doc.apply(country_name_convert, axis=1)   # 4. 'Country_Region'의 국가명을 여러 파일에 일관되게 변경
  doc = doc.astype({'Confirmed':'int64'})       # 5. 특정 컬럼의 데이터 타입 변경하기
  doc = doc.groupby('Country_Region').sum()     # 6. 특정 컬럼으로 중복된 데이터를 합치기

  # 7. 파일명을 기반으로 날짜 문자열 변환하고, 'Confirmed' 컬럼명 변경하기
  date_column = filename.split('.')[0].lstrip('0').replace('-','/')
  doc.columns = [date_column]
  return doc

**테스트해보기**

In [37]:
doc1 = create_dateframe('01-22-2020.csv')
doc2 = create_dateframe('04-01-2020.csv')

In [38]:
 doc2.info()

<class 'pandas.core.frame.DataFrame'>
Index: 181 entries, Afghanistan to Zimbabwe
Data columns (total 1 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   4/01/2020  181 non-null    int64
dtypes: int64(1)
memory usage: 2.8+ KB


In [39]:
doc2.head()

Unnamed: 0_level_0,4/01/2020
Country_Region,Unnamed: 1_level_1
Afghanistan,197
Albania,259
Algeria,847
Andorra,390
Angola,8


**데이터프레임 합치기**

In [40]:
doc = pd.merge(doc1, doc2, how='outer',left_index=True, right_index=True)
doc.head()

Unnamed: 0_level_0,1/22/2020,4/01/2020
Country_Region,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,,197
Albania,,259
Algeria,,847
Andorra,,390
Angola,,8


**없는 데이터는 0으로 값 대체하기**

In [41]:
doc = doc.fillna(0)
doc

Unnamed: 0_level_0,1/22/2020,4/01/2020
Country_Region,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,0.0,197
Albania,0.0,259
Algeria,0.0,847
Andorra,0.0,390
Angola,0.0,8
...,...,...
Venezuela,0.0,143
Vietnam,0.0,218
West Bank and Gaza,0.0,134
Zambia,0.0,36


**참고: 특정 폴더 파일 리스트 확인하기**
* split() 함수를 사용해서 특정 확장자를 가진 파일 리스트만 추출 가능
* 문자열변수.split('.')은 ['파일명', '확장자']와 같은 리스트가 반환되므로, 문자열변수.split('.')[-1]을 통해, 이 중에서 마지막 아이템을 선택하면 됨

In [42]:
import os

PATH = "COVID-19-master/csse_covid_19_data/csse_covid_19_daily_reports/"
file_list, csv_list = os.listdir(PATH), list()
# os.listdir(PATH) : 해당 폴더에 있는 파일 이름을 리스트 형태로 가지고 올 수 있음

for file in file_list:
  if file.split('.')[-1] == 'csv':
    csv_list.append(file)

print(csv_list)

['12-28-2020.csv', '01-27-2020.csv', '04-26-2020.csv', '09-18-2020.csv', '08-18-2020.csv', '11-03-2020.csv', '03-10-2020.csv', '12-12-2020.csv', '11-25-2020.csv', '07-20-2020.csv', '10-21-2020.csv', '09-02-2020.csv', '09-20-2020.csv', '10-22-2020.csv', '02-08-2020.csv', '11-24-2020.csv', '07-05-2020.csv', '09-28-2020.csv', '09-09-2020.csv', '12-13-2020.csv', '03-22-2020.csv', '12-07-2020.csv', '11-18-2020.csv', '02-20-2020.csv', '11-06-2020.csv', '02-17-2020.csv', '06-06-2020.csv', '06-26-2020.csv', '03-20-2020.csv', '09-30-2020.csv', '04-23-2020.csv', '07-26-2020.csv', '10-13-2020.csv', '12-31-2020.csv', '02-28-2020.csv', '08-25-2020.csv', '05-24-2020.csv', '06-20-2020.csv', '12-27-2020.csv', '04-22-2020.csv', '09-14-2020.csv', '03-23-2020.csv', '08-09-2020.csv', '10-14-2020.csv', '11-23-2020.csv', '09-12-2020.csv', '12-10-2020.csv', '09-24-2020.csv', '07-31-2020.csv', '06-14-2020.csv', '04-12-2020.csv', '09-10-2020.csv', '07-28-2020.csv', '11-16-2020.csv', '10-09-2020.csv', '06-04-20

**참고: 리스트 정렬**
* 리스트변수.sort(): 오름차순 정렬(디폴트)
* 리스트변수.sort(reverse=True): 내림차순 정렬

In [44]:
csv_list.sort()
csv_list

['01-22-2020.csv',
 '01-23-2020.csv',
 '01-24-2020.csv',
 '01-25-2020.csv',
 '01-26-2020.csv',
 '01-27-2020.csv',
 '01-28-2020.csv',
 '01-29-2020.csv',
 '01-30-2020.csv',
 '01-31-2020.csv',
 '02-01-2020.csv',
 '02-02-2020.csv',
 '02-03-2020.csv',
 '02-04-2020.csv',
 '02-05-2020.csv',
 '02-06-2020.csv',
 '02-07-2020.csv',
 '02-08-2020.csv',
 '02-09-2020.csv',
 '02-10-2020.csv',
 '02-11-2020.csv',
 '02-12-2020.csv',
 '02-13-2020.csv',
 '02-14-2020.csv',
 '02-15-2020.csv',
 '02-16-2020.csv',
 '02-17-2020.csv',
 '02-18-2020.csv',
 '02-19-2020.csv',
 '02-20-2020.csv',
 '02-21-2020.csv',
 '02-22-2020.csv',
 '02-23-2020.csv',
 '02-24-2020.csv',
 '02-25-2020.csv',
 '02-26-2020.csv',
 '02-27-2020.csv',
 '02-28-2020.csv',
 '02-29-2020.csv',
 '03-01-2020.csv',
 '03-02-2020.csv',
 '03-03-2020.csv',
 '03-04-2020.csv',
 '03-05-2020.csv',
 '03-06-2020.csv',
 '03-07-2020.csv',
 '03-08-2020.csv',
 '03-09-2020.csv',
 '03-10-2020.csv',
 '03-11-2020.csv',
 '03-12-2020.csv',
 '03-13-2020.csv',
 '03-14-2020

###**7. 여러 데이터 수집, 전처리해서, 하나의 데이터프레임 만들기**

* 지금까지의 과정을 모두 한데 모아서, 함수로 만들기
  1. 필요한 파일 리스트만 추출하기
  2. 파일 리스트 정렬하기
  3. 데이터프레임 전처리하기(별도 create_dateframe()함수)
  4. 데이터프레임 합치기

**최종 코드**

In [46]:
import json

with open('COVID-19-master/csse_covid_19_data/country_convert.json','r',encoding='utf-8-sig') as json_file:
  json_data = json.load(json_file)

def country_name_convert(row):
  if row['Country_Region'] in json_data:
    return json_data[row['Country_Region']]
  return row['Country_Region']

def create_dateframe(filename):

  doc = pd.read_csv(PATH + filename, encoding='utf-8-sig')  # 1. csv  파일 읽기
  try:
    doc = doc[['Country_Region','Confirmed']]   # 2. 특정 컬럼만 선택해서 데이터프레임 만들기
  except:
    doc = doc[['Country/Region','Confirmed']]   # 2. 특정 컬럼만 선택해서 데이터프레임 만들기
    doc.columns = ['Country_Region','Confirmed']
  doc = doc.dropna(subset=['Confirmed'])        # 3. 특정 컬럼에 없는 데이터 삭제하기
  doc['Country_Region'] = doc.apply(country_name_convert, axis=1)   # 4. 'Country_Region'의 국가명을 여러 파일에 일관되게 변경
  doc = doc.astype({'Confirmed':'int64'})       # 5. 특정 컬럼의 데이터 타입 변경하기
  doc = doc.groupby('Country_Region').sum()     # 6. 특정 컬럼으로 중복된 데이터를 합치기

  # 7. 파일명을 기반으로 날짜 문자열 변환하고, 'Confirmed' 컬럼명 변경하기
  date_column = filename.split('.')[0].lstrip('0').replace('-','/')
  doc.columns = [date_column]
  return doc

In [47]:
import os

def generate_dateframe_by_path(PATH):

  file_list, csv_list = os.listdir(PATH), list()
  first_doc = True
  for file in file_list:
    if file.split('.')[-1] == 'csv':
      csv_list.append(file)
  csv_list.sort()

  for file in csv_list:
    doc = create_dateframe(file)
    if first_doc:
      final_doc, first_doc = doc, False
    else:
      final_doc = pd.merge(final_doc, doc, how='outer', left_index=True, right_index=True)

  final_doc = final_doc.fillna(0)
  return final_doc

In [49]:
PATH = 'COVID-19-master/csse_covid_19_data/csse_covid_19_daily_reports/'
doc = generate_dateframe_by_path(PATH)
doc

Unnamed: 0_level_0,1/22/2020,1/23/2020,1/24/2020,1/25/2020,1/26/2020,1/27/2020,1/28/2020,1/29/2020,1/30/2020,1/31/2020,2/01/2020,2/02/2020,2/03/2020,2/04/2020,2/05/2020,2/06/2020,2/07/2020,2/08/2020,2/09/2020,2/10/2020,2/11/2020,2/12/2020,2/13/2020,2/14/2020,2/15/2020,2/16/2020,2/17/2020,2/18/2020,2/19/2020,2/20/2020,2/21/2020,2/22/2020,2/23/2020,2/24/2020,2/25/2020,2/26/2020,2/27/2020,2/28/2020,2/29/2020,3/01/2020,...,11/22/2020,11/23/2020,11/24/2020,11/25/2020,11/26/2020,11/27/2020,11/28/2020,11/29/2020,11/30/2020,12/01/2020,12/02/2020,12/03/2020,12/04/2020,12/05/2020,12/06/2020,12/07/2020,12/08/2020,12/09/2020,12/10/2020,12/11/2020,12/12/2020,12/13/2020,12/14/2020,12/15/2020,12/16/2020,12/17/2020,12/18/2020,12/19/2020,12/20/2020,12/21/2020,12/22/2020,12/23/2020,12/24/2020,12/25/2020,12/26/2020,12/27/2020,12/28/2020,12/29/2020,12/30/2020,12/31/2020
Country_Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
Afghanistan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,44706.0,44988.0,45174.0,45384.0,45600.0,45723.0,45844.0,46116.0,46274.0,46516.0,46718.0,46837.0,46837.0,47072.0,47306.0,47516.0,47716.0,47851.0,48053.0,48116.0,48229.0,48527.0,48718.0,48952.0,49161.0,49378.0,49621.0,49681.0,49817.0,50013.0,50190.0,50433.0,50655.0,50810.0,50886.0,51039.0,51280.0,51350.0,51405.0,51526.0
Albania,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,32761.0,33556.0,34300.0,34944.0,35600.0,36245.0,36790.0,37625.0,38182.0,39014.0,39719.0,40501.0,41302.0,42148.0,42988.0,43683.0,44436.0,45188.0,46061.0,46863.0,47742.0,48530.0,49191.0,50000.0,50637.0,51424.0,52004.0,52542.0,53003.0,53425.0,53814.0,54317.0,54827.0,55380.0,55755.0,56254.0,56572.0,57146.0,57727.0,58316.0
Algeria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,74862.0,75867.0,77000.0,78025.0,79110.0,80168.0,81212.0,82221.0,83199.0,84152.0,85084.0,85927.0,86730.0,87502.0,88252.0,88825.0,89416.0,90014.0,90579.0,91121.0,91638.0,92102.0,92597.0,93065.0,93507.0,93933.0,94371.0,94781.0,95203.0,95659.0,96069.0,96549.0,97007.0,97441.0,97857.0,98249.0,98631.0,98988.0,99311.0,99610.0
Andorra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,6256.0,6304.0,6351.0,6428.0,6534.0,6610.0,6610.0,6712.0,6745.0,6790.0,6842.0,6904.0,6955.0,7005.0,7050.0,7084.0,7127.0,7162.0,7190.0,7236.0,7288.0,7338.0,7382.0,7382.0,7446.0,7466.0,7519.0,7560.0,7577.0,7602.0,7633.0,7669.0,7699.0,7756.0,7806.0,7821.0,7875.0,7919.0,7983.0,8049.0
Angola,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,14493.0,14634.0,14742.0,14821.0,14920.0,15008.0,15087.0,15103.0,15139.0,15251.0,15319.0,15361.0,15493.0,15536.0,15591.0,15648.0,15729.0,15804.0,15925.0,16061.0,16161.0,16188.0,16277.0,16362.0,16407.0,16484.0,16562.0,16626.0,16644.0,16686.0,16802.0,16931.0,17029.0,17099.0,17149.0,17240.0,17296.0,17371.0,17433.0,17553.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Vietnam,0.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,6.0,6.0,8.0,8.0,8.0,10.0,10.0,13.0,13.0,14.0,15.0,15.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,...,1307.0,1312.0,1316.0,1321.0,1331.0,1339.0,1341.0,1343.0,1347.0,1351.0,1358.0,1361.0,1361.0,1365.0,1366.0,1367.0,1377.0,1381.0,1385.0,1391.0,1395.0,1397.0,1402.0,1405.0,1405.0,1407.0,1410.0,1411.0,1413.0,1414.0,1420.0,1421.0,1432.0,1439.0,1440.0,1441.0,1451.0,1454.0,1456.0,1465.0
West Bank and Gaza,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,71644.0,73196.0,75007.0,76727.0,78493.0,80429.0,81890.0,83585.0,85647.0,88004.0,90192.0,92708.0,94676.0,96098.0,98038.0,99758.0,101109.0,102992.0,104879.0,106622.0,108099.0,109738.0,111102.0,113409.0,115606.0,117755.0,119612.0,121216.0,122643.0,123945.0,125506.0,127376.0,129080.0,130598.0,131904.0,133093.0,134310.0,135459.0,136736.0,138004.0
Yemen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,2099.0,2107.0,2114.0,2124.0,2137.0,2148.0,2160.0,2177.0,2191.0,2197.0,2217.0,2239.0,2267.0,2304.0,2337.0,2383.0,2078.0,2079.0,2081.0,2082.0,2083.0,2083.0,2084.0,2085.0,2085.0,2087.0,2087.0,2087.0,2087.0,2087.0,2087.0,2087.0,2092.0,2092.0,2092.0,2094.0,2096.0,2096.0,2097.0,2099.0
Zambia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,17424.0,17454.0,17466.0,17535.0,17553.0,17569.0,17589.0,17608.0,17647.0,17665.0,17700.0,17730.0,17857.0,17898.0,17916.0,17931.0,17963.0,18062.0,18091.0,18161.0,18217.0,18274.0,18322.0,18428.0,18456.0,18504.0,18575.0,18620.0,18716.0,18768.0,18881.0,19122.0,19234.0,19571.0,19671.0,19834.0,19943.0,20177.0,20462.0,20725.0


**참고: 데이터 타입 변환이 가능한 모든 열의 데이터 타입 변경**
* pd.astype(데이터타입)
  * object는 파이썬의 str 똔느 혼용 데이터 타입(문자열)
  * int64는 파이썬의 int (정수)
  * float64는 파이썬의 float(부동소숫점)
  * bool는 파이썬의 bool(True 또는 False 값을 가지는 boolean)

In [50]:
doc = doc.astype('int64')
doc

Unnamed: 0_level_0,1/22/2020,1/23/2020,1/24/2020,1/25/2020,1/26/2020,1/27/2020,1/28/2020,1/29/2020,1/30/2020,1/31/2020,2/01/2020,2/02/2020,2/03/2020,2/04/2020,2/05/2020,2/06/2020,2/07/2020,2/08/2020,2/09/2020,2/10/2020,2/11/2020,2/12/2020,2/13/2020,2/14/2020,2/15/2020,2/16/2020,2/17/2020,2/18/2020,2/19/2020,2/20/2020,2/21/2020,2/22/2020,2/23/2020,2/24/2020,2/25/2020,2/26/2020,2/27/2020,2/28/2020,2/29/2020,3/01/2020,...,11/22/2020,11/23/2020,11/24/2020,11/25/2020,11/26/2020,11/27/2020,11/28/2020,11/29/2020,11/30/2020,12/01/2020,12/02/2020,12/03/2020,12/04/2020,12/05/2020,12/06/2020,12/07/2020,12/08/2020,12/09/2020,12/10/2020,12/11/2020,12/12/2020,12/13/2020,12/14/2020,12/15/2020,12/16/2020,12/17/2020,12/18/2020,12/19/2020,12/20/2020,12/21/2020,12/22/2020,12/23/2020,12/24/2020,12/25/2020,12/26/2020,12/27/2020,12/28/2020,12/29/2020,12/30/2020,12/31/2020
Country_Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
Afghanistan,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,...,44706,44988,45174,45384,45600,45723,45844,46116,46274,46516,46718,46837,46837,47072,47306,47516,47716,47851,48053,48116,48229,48527,48718,48952,49161,49378,49621,49681,49817,50013,50190,50433,50655,50810,50886,51039,51280,51350,51405,51526
Albania,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,32761,33556,34300,34944,35600,36245,36790,37625,38182,39014,39719,40501,41302,42148,42988,43683,44436,45188,46061,46863,47742,48530,49191,50000,50637,51424,52004,52542,53003,53425,53814,54317,54827,55380,55755,56254,56572,57146,57727,58316
Algeria,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,...,74862,75867,77000,78025,79110,80168,81212,82221,83199,84152,85084,85927,86730,87502,88252,88825,89416,90014,90579,91121,91638,92102,92597,93065,93507,93933,94371,94781,95203,95659,96069,96549,97007,97441,97857,98249,98631,98988,99311,99610
Andorra,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,6256,6304,6351,6428,6534,6610,6610,6712,6745,6790,6842,6904,6955,7005,7050,7084,7127,7162,7190,7236,7288,7338,7382,7382,7446,7466,7519,7560,7577,7602,7633,7669,7699,7756,7806,7821,7875,7919,7983,8049
Angola,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,14493,14634,14742,14821,14920,15008,15087,15103,15139,15251,15319,15361,15493,15536,15591,15648,15729,15804,15925,16061,16161,16188,16277,16362,16407,16484,16562,16626,16644,16686,16802,16931,17029,17099,17149,17240,17296,17371,17433,17553
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Vietnam,0,2,2,2,2,2,2,2,2,2,6,6,8,8,8,10,10,13,13,14,15,15,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,...,1307,1312,1316,1321,1331,1339,1341,1343,1347,1351,1358,1361,1361,1365,1366,1367,1377,1381,1385,1391,1395,1397,1402,1405,1405,1407,1410,1411,1413,1414,1420,1421,1432,1439,1440,1441,1451,1454,1456,1465
West Bank and Gaza,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,71644,73196,75007,76727,78493,80429,81890,83585,85647,88004,90192,92708,94676,96098,98038,99758,101109,102992,104879,106622,108099,109738,111102,113409,115606,117755,119612,121216,122643,123945,125506,127376,129080,130598,131904,133093,134310,135459,136736,138004
Yemen,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,2099,2107,2114,2124,2137,2148,2160,2177,2191,2197,2217,2239,2267,2304,2337,2383,2078,2079,2081,2082,2083,2083,2084,2085,2085,2087,2087,2087,2087,2087,2087,2087,2092,2092,2092,2094,2096,2096,2097,2099
Zambia,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,17424,17454,17466,17535,17553,17569,17589,17608,17647,17665,17700,17730,17857,17898,17916,17931,17963,18062,18091,18161,18217,18274,18322,18428,18456,18504,18575,18620,18716,18768,18881,19122,19234,19571,19671,19834,19943,20177,20462,20725


**pandas 라이브러리로 csv 파일 쓰기**
* pandas dataframe 데이터를 csv 파일로 저장하기 위해, to_csv() 함수 사용
  ```
  doc.to_csv("00_data/students_default.csv")
  ```
* encoding 옵션 사용 가능
  ```
  doc.to_csv("00_data/students_default.csv", encoding='utf-8-sig')
  ```

In [51]:
doc.to_csv('COVID-19-master/final_df.csv')