### Data Visualization (https://flourish.studio/examples/)
- The process of visually representing and communicating data analysis results for better understanding.
- Data visualization is essential in exploratory data analysis, data processing, and data prediction to make insights more interpretable.
- Among various visualization techniques, the latest and most engaging data visualization methods will be explored.

In [1]:
# 라이브러리 호출
# Import required libraries
import os  # OS 모듈을 사용하여 파일 및 디렉토리 조작
import pandas as pd  # 데이터 분석을 위한 Pandas 라이브러리 호출

In [2]:
# CSV 파일이 저장된 폴더 경로 설정
# Set the file path where CSV files are stored
filePath = "D:/myAnalyze/PANDASPLOTLY_FUNCODING_FULLDATA_20240601/00_Material(Uploaded)/COVID-19-master/csse_covid_19_data/csse_covid_19_daily_reports/"

# 특정 CSV 파일을 읽어 데이터프레임으로 저장
# Read a specific CSV file into a DataFrame
doc_1 = pd.read_csv(filePath + "04-01-2020.csv", encoding="utf-8-sig")

# 데이터프레임의 상위 5개 행 출력 (데이터 확인)
# Display the first 5 rows of the DataFrame (for data inspection)
doc_1.head()

Unnamed: 0,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key
0,45001.0,Abbeville,South Carolina,US,2020-04-01 21:58:49,34.223334,-82.461707,4,0,0,4,"Abbeville, South Carolina, US"
1,22001.0,Acadia,Louisiana,US,2020-04-01 21:58:49,30.295065,-92.414197,47,1,0,46,"Acadia, Louisiana, US"
2,51001.0,Accomack,Virginia,US,2020-04-01 21:58:49,37.767072,-75.632346,7,0,0,7,"Accomack, Virginia, US"
3,16001.0,Ada,Idaho,US,2020-04-01 21:58:49,43.452658,-116.241552,195,3,0,192,"Ada, Idaho, US"
4,19001.0,Adair,Iowa,US,2020-04-01 21:58:49,41.330756,-94.471059,1,0,0,1,"Adair, Iowa, US"


In [3]:
# 또 다른 CSV 파일을 읽어 데이터프레임으로 저장
# Read another CSV file into a DataFrame
doc_2 = pd.read_csv(filePath + "03-01-2020.csv", encoding="utf-8-sig")

# 데이터프레임의 상위 5개 행 출력 (데이터 확인)
# Display the first 5 rows of the DataFrame (for data inspection)
doc_2.head()

Unnamed: 0,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered,Latitude,Longitude
0,Hubei,Mainland China,2020-03-01T10:13:19,66907,2761,31536,30.9756,112.2707
1,,South Korea,2020-03-01T23:43:03,3736,17,30,36.0,128.0
2,,Italy,2020-03-01T23:23:02,1694,34,83,43.0,12.0
3,Guangdong,Mainland China,2020-03-01T14:13:18,1349,7,1016,23.3417,113.4244
4,Henan,Mainland China,2020-03-01T14:13:18,1272,22,1198,33.882,113.614


#### (1) 두 데이터셋에서 필드명이 다름을 확인할 수 있음 (예: Country_Region vs. Country/Region)
#### (1) We can observe that the field names are different in both datasets (e.g., Country_Region vs. Country/Region)

#### (2) Confirmed 필드에 결측값이 존재하는 행이 있음. 이러한 행은 dropna(subset=["Confirmed"])을 사용하여 제거할 예정
#### (2) Some rows have missing values in the Confirmed field. These rows will be removed using dropna(subset=["Confirmed"])

#### (3) Confirmed 필드의 데이터 타입이 object 또는 float일 수 있어, int64로 변환하여 일관성을 유지
#### (3) The Confirmed field might be of type object or float, so we will convert it to int64 for consistency

#### (4) 국가 코드별로 국기를 받아오기 위한 필드를 생성하고 특정 URL 형식으로 값을 할당 (https://flagpedia.net/data/flags/w580/countryCode.png)
#### (4) Create a field to retrieve country flags using country codes and assign values in a specific URL format (https://flagpedia.net/data/flags/w580/countryCode.png)


#### TIP 1. lambda 함수 사용법
#### TIP 1. Using the lambda function

#### `lambda`는 익명 함수(이름이 없는 함수)를 생성하는 Python의 기능입니다.
#### The `lambda` function is a feature in Python that creates anonymous (nameless) functions.

#### 파일명을 `datetime` 객체로 변환하는 람다 함수
#### A lambda function that converts a filename into a `datetime` object
- lambda x: datetime.strptime(x, '%m-%d-%Y.csv')

#### `lambda` 함수는 짧은 함수를 정의할 때 사용됩니다.
#### The `lambda` function is used to define short functions.

#### 일반적인 함수 정의와 달리, `lambda` 함수는 한 줄로 작성됩니다.
#### Unlike regular function definitions, a `lambda` function is written in a single line.

#### 예시:
#### Example:
def add(a, b):
  return a + b

#### 위 함수는 아래와 같이 `lambda` 함수로 표현 가능
#### The above function can be expressed as a `lambda` function like this:
add = lambda a, b: a + b


#### TIP 2. datetime.strptime 함수 사용법
#### TIP 2. Using the datetime.strptime function

#### `datetime.strptime` 함수는 문자열을 `datetime` 객체로 변환합니다.
#### The `datetime.strptime` function converts a string into a `datetime` object.

#### 첫 번째 인자는 변환할 문자열이고, 두 번째 인자는 해당 문자열의 형식을 지정하는 형식 문자열입니다.
#### The first argument is the string to convert, and the second argument is the format specification.

#### 형식 문자열의 주요 옵션:
#### Key options for the format string:
- `%m`: 월 (01에서 12) / Month (01 to 12)
- `%d`: 일 (01에서 31) / Day (01 to 31)
- `%Y`: 연도 (예: 2021) / Year (e.g., 2021)

#### 예시:
#### Example:
date_str = '01-01-2021.csv'
date_obj = datetime.strptime(date_str, '%m-%d-%Y.csv')

In [4]:
import os  # 운영 체제 관련 기능을 사용하기 위한 라이브러리 / Library for using OS-related functions
import pandas as pd  # 데이터 분석을 위한 pandas 라이브러리 / Pandas library for data analysis
from datetime import datetime  # 날짜 처리를 위한 datetime 라이브러리 / Library for handling date operations

# CSV 파일이 저장된 폴더 경로 / Path where the CSV files are stored
filePath = "D:/myAnalyze/PANDASPLOTLY_FUNCODING_FULLDATA_20240601/00_Material(Uploaded)/COVID-19-master/csse_covid_19_data/csse_covid_19_daily_reports/"

# 폴더 내 모든 CSV 파일 목록 가져오기 / Retrieve all CSV file names in the folder
dataFolder = os.listdir(filePath)[1:-1]

# 파일명을 날짜 형식으로 변환 후 정렬 (".csv" 제외) / Convert filenames to date format and sort them (excluding ".csv")
dataFolder.sort(key=lambda x: datetime.strptime(x.replace(".csv", ""), "%m-%d-%Y"))

# 결과를 저장할 데이터프레임 초기화 / Initialize the dataframe to store the final results
raw_data = None

# 파일 반복 처리 / Process each file in the sorted list
for v in dataFolder:
    print(f"Processing file: {v}")  # 현재 처리 중인 파일 출력 / Print the current processing file

    # CSV 파일 읽기 / Read the CSV file
    csv_file = pd.read_csv(filePath + v, encoding="utf-8-sig")

    # 필요한 컬럼만 선택 (Country_Region, Confirmed) / Select only the necessary columns (Country_Region, Confirmed)
    try:
        csv_file = csv_file[["Country_Region", "Confirmed"]]
    except KeyError:
        # 컬럼명이 다를 경우 예외 처리하여 동일한 컬럼명으로 변경
        # Handle cases where column names are different by renaming them
        csv_file = csv_file[["Country/Region", "Confirmed"]]
        csv_file.columns = ["Country_Region", "Confirmed"]

    # 결측값 제거 후 데이터 타입 변환 / Remove missing values and convert data type
    csv_file = csv_file.dropna(subset=["Confirmed"])  # 결측값 제거 / Remove missing values
    csv_file["Confirmed"] = csv_file["Confirmed"].astype("int64")  # 정수형(int64)으로 변환 / Convert to integer type (int64)

    # 날짜 컬럼명을 파일명에서 추출 / Extract the date from the filename to use as the column name
    date_column = v.replace(".csv", "").replace("-", "/")
    csv_file.columns = ["Country_Region", date_column]  # 날짜별 컬럼 추가 / Add a new column for each date

    # 국가별 그룹화 (sum) / Group by country and sum values
    csv_file = csv_file.groupby("Country_Region").sum()

    # 첫 번째 파일이면 그대로 저장, 이후부터는 `merge()` 수행 (오름차순 정렬 유지)
    # If it's the first file, store it directly; otherwise, merge with previous data while maintaining ascending order
    if raw_data is None:
        raw_data = csv_file  # 첫 번째 파일을 그대로 저장 / Store the first file as it is
    else:
        raw_data = raw_data.merge(csv_file, on="Country_Region", how="outer")  # 중복 방지된 컬럼으로 병합 / Merge while preventing duplicate columns

Processing file: 01-22-2020.csv
Processing file: 01-23-2020.csv
Processing file: 01-24-2020.csv
Processing file: 01-25-2020.csv
Processing file: 01-26-2020.csv
Processing file: 01-27-2020.csv
Processing file: 01-28-2020.csv
Processing file: 01-29-2020.csv
Processing file: 01-30-2020.csv
Processing file: 01-31-2020.csv
Processing file: 02-01-2020.csv
Processing file: 02-02-2020.csv
Processing file: 02-03-2020.csv
Processing file: 02-04-2020.csv
Processing file: 02-05-2020.csv
Processing file: 02-06-2020.csv
Processing file: 02-07-2020.csv
Processing file: 02-08-2020.csv
Processing file: 02-09-2020.csv
Processing file: 02-10-2020.csv
Processing file: 02-11-2020.csv
Processing file: 02-12-2020.csv
Processing file: 02-13-2020.csv
Processing file: 02-14-2020.csv
Processing file: 02-15-2020.csv
Processing file: 02-16-2020.csv
Processing file: 02-17-2020.csv
Processing file: 02-18-2020.csv
Processing file: 02-19-2020.csv
Processing file: 02-20-2020.csv
Processing file: 02-21-2020.csv
Processi

In [5]:
raw_data

Unnamed: 0_level_0,01/22/2020,01/23/2020,01/24/2020,01/25/2020,01/26/2020,01/27/2020,01/28/2020,01/29/2020,01/30/2020,01/31/2020,...,02/28/2023,03/01/2023,03/02/2023,03/03/2023,03/04/2023,03/05/2023,03/06/2023,03/07/2023,03/08/2023,03/09/2023
Country_Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Azerbaijan,,,,,,,,,,,...,,,,,,,,,,
Afghanistan,,,,,,,,,,,...,209322.0,209340.0,209358.0,209362.0,209369.0,209390.0,209406.0,209436.0,209451.0,209451.0
Albania,,,,,,,,,,,...,334391.0,334408.0,334408.0,334427.0,334427.0,334427.0,334427.0,334427.0,334443.0,334457.0
Algeria,,,,,,,,,,,...,271441.0,271448.0,271463.0,271469.0,271469.0,271477.0,271477.0,271490.0,271494.0,271496.0
Andorra,,,,,,,,,,,...,47866.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47890.0,47890.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Winter Olympics 2022,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,535.0,535.0,535.0,535.0,535.0,535.0,535.0,535.0,535.0,535.0
Yemen,,,,,,,,,,,...,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0
Zambia,,,,,,,,,,,...,343012.0,343012.0,343079.0,343079.0,343079.0,343135.0,343135.0,343135.0,343135.0,343135.0
Zimbabwe,,,,,,,,,,,...,263921.0,264127.0,264127.0,264127.0,264127.0,264127.0,264127.0,264127.0,264276.0,264276.0


In [6]:
# 인덱스를 리셋하여 데이터프레임을 초기화
# Reset the index to convert it back into a standard DataFrame
raw_data = raw_data.reset_index()

In [7]:
# NaN 값을 0으로 변환
# Replace NaN (missing values) with 0
raw_data = raw_data.fillna(0)
raw_data

Unnamed: 0,Country_Region,01/22/2020,01/23/2020,01/24/2020,01/25/2020,01/26/2020,01/27/2020,01/28/2020,01/29/2020,01/30/2020,...,02/28/2023,03/01/2023,03/02/2023,03/03/2023,03/04/2023,03/05/2023,03/06/2023,03/07/2023,03/08/2023,03/09/2023
0,Azerbaijan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Afghanistan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,209322.0,209340.0,209358.0,209362.0,209369.0,209390.0,209406.0,209436.0,209451.0,209451.0
2,Albania,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,334391.0,334408.0,334408.0,334427.0,334427.0,334427.0,334427.0,334427.0,334443.0,334457.0
3,Algeria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,271441.0,271448.0,271463.0,271469.0,271469.0,271477.0,271477.0,271490.0,271494.0,271496.0
4,Andorra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,47866.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47890.0,47890.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
245,Winter Olympics 2022,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,535.0,535.0,535.0,535.0,535.0,535.0,535.0,535.0,535.0,535.0
246,Yemen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0
247,Zambia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,343012.0,343012.0,343079.0,343079.0,343079.0,343135.0,343135.0,343135.0,343135.0,343135.0
248,Zimbabwe,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,263921.0,264127.0,264127.0,264127.0,264127.0,264127.0,264127.0,264127.0,264276.0,264276.0


#### 국가 코드 데이터 : 국가별 국기 링크 확보를 위해 국가 코드데이터 활용
#### Country code data: Utilizing country code data to obtain flag links.

In [8]:
# 국가 코드 데이터 로드 (ISO2 코드 포함)
# Load country code data (including ISO2 codes)
country_data = pd.read_csv("D:/myAnalyze/PANDASPLOTLY_FUNCODING_FULLDATA_20240601/00_Material(Uploaded)/COVID-19-master/country_region_flag.csv")

# 필요한 컬럼만 선택 (ISO2 코드와 국가명)
# Select only necessary columns (ISO2 code and country name)
country_data = country_data[["iso2", "Country_Region"]]

# 중복 데이터 제거 (국가명 중복 방지)
# Remove duplicate entries (avoid duplicate country names)
country_data = country_data.drop_duplicates()

In [9]:
country_data

Unnamed: 0,iso2,Country_Region
0,BW,Botswana
1,BI,Burundi
2,SL,Sierra Leone
3,AF,Afghanistan
4,AL,Albania
...,...,...
256,GU,US
257,MP,US
258,VI,US
259,PR,US


In [10]:
# 동일한 Country_Region 값이 여러 번 등장하는지 확인
# Check for duplicate Country_Region values
country_data["Country_Region"].value_counts()

Country_Region
France                10
United Kingdom         9
US                     6
Netherlands            4
Denmark                3
                      ..
West Bank and Gaza     1
Zimbabwe               1
Australia              1
Canada                 1
China                  1
Name: count, Length: 180, dtype: int64

In [11]:
# 국가별 코드 표준화 전처리 진행 : US
# Standardizing country code preprocessing for "US"

# "US"인 행의 iso2 값을 "US"로 설정
# Set the 'iso2' value to "US" for rows where 'Country_Region' is "US"
country_data.loc[country_data["Country_Region"] == "US", "iso2"] = "US"

# 중복된 행 제거 (동일한 Country_Region과 iso2 조합 제거)
# Remove duplicate rows (eliminating identical Country_Region and iso2 combinations)
country_data = country_data.drop_duplicates()

# "US" 국가 코드 확인
# Check the standardized "US" country code
country_data.loc[country_data["Country_Region"] == "US", "iso2"]

255    US
Name: iso2, dtype: object

In [12]:
# 국가별 코드 표준화 전처리 진행 : France
# Standardizing country code preprocessing for "France"

# "France" 국가 데이터 조회
# Retrieve country data where 'Country_Region' is "France"
country_data.loc[country_data["Country_Region"] == "France"]

# "France"인 행의 iso2 값을 "FR"로 설정
# Set the 'iso2' value to "FR" for rows where 'Country_Region' is "France"
country_data.loc[country_data["Country_Region"] == "France", "iso2"] = "FR"

# 중복된 행 제거 (동일한 Country_Region과 iso2 조합 제거)
# Remove duplicate rows (eliminating identical Country_Region and iso2 combinations)
country_data = country_data.drop_duplicates()

# "France" 국가 코드 확인
# Check the standardized "France" country code
country_data.loc[country_data["Country_Region"] == "France", "iso2"]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  country_data.loc[country_data["Country_Region"] == "France", "iso2"] = "FR"


59    FR
Name: iso2, dtype: object

In [13]:
country_data["Country_Region"].value_counts()

Country_Region
United Kingdom    9
Netherlands       4
Denmark           3
Botswana          1
Albania           1
                 ..
Zimbabwe          1
Australia         1
Canada            1
China             1
US                1
Name: count, Length: 180, dtype: int64

In [14]:
# 국가별 코드 표준화 전처리 진행 : United Kingdom
# Standardizing country code preprocessing for "United Kingdom"

# "United Kingdom" 국가 데이터 조회
# Retrieve country data where 'Country_Region' is "United Kingdom"
country_data.loc[country_data["Country_Region"] == "United Kingdom", "iso2"]

# "United Kingdom"인 행의 iso2 값을 "GB"로 설정
# Set the 'iso2' value to "GB" for rows where 'Country_Region' is "United Kingdom"
country_data.loc[country_data["Country_Region"] == "United Kingdom", "iso2"] = "GB"

# 중복된 행 제거 (동일한 Country_Region과 iso2 조합 제거)
# Remove duplicate rows (eliminating identical Country_Region and iso2 combinations)
country_data = country_data.drop_duplicates()

# "United Kingdom" 국가 코드 확인
# Check the standardized "United Kingdom" country code
country_data.loc[country_data["Country_Region"] == "United Kingdom", "iso2"]

168    GB
Name: iso2, dtype: object

In [15]:
country_data["Country_Region"].value_counts()

Country_Region
Netherlands     4
Denmark         3
Sierra Leone    1
Botswana        1
Albania         1
               ..
Zimbabwe        1
Australia       1
Canada          1
China           1
US              1
Name: count, Length: 180, dtype: int64

In [16]:
# 국가별 코드 표준화 전처리 진행 : Netherlands
# Standardizing country code preprocessing for "Netherlands"

# "Netherlands" 국가 데이터 조회
# Retrieve country data where 'Country_Region' is "Netherlands"
country_data.loc[country_data["Country_Region"]=="Netherlands"]

# "Netherlands"인 행의 iso2 값을 "NL"로 설정
# Set the 'iso2' value to "NL" for rows where 'Country_Region' is "Netherlands"
country_data.loc[country_data["Country_Region"]=="Netherlands", "iso2"] = "NL"

# 중복된 행 제거 (동일한 Country_Region과 iso2 조합 제거)
# Remove duplicate rows (eliminating identical Country_Region and iso2 combinations)
country_data = country_data.drop_duplicates()

# "Netherlands" 국가 코드 확인
# Check the standardized "Netherlands" country code
country_data.loc[country_data["Country_Region"]=="Netherlands", "iso2"]

117    NL
Name: iso2, dtype: object

In [17]:
country_data["Country_Region"].value_counts()

Country_Region
Denmark         3
Botswana        1
Sierra Leone    1
Burundi         1
Albania         1
               ..
Zimbabwe        1
Australia       1
Canada          1
China           1
US              1
Name: count, Length: 180, dtype: int64

In [18]:
# 국가별 코드 표준화 전처리 진행 : Denmark
# Standardizing country code preprocessing for "Denmark"

# "Denmark" 국가 데이터 조회
# Retrieve country data where 'Country_Region' is "Denmark"
country_data[country_data["Country_Region"]=="Denmark"]

# "Denmark"인 행의 iso2 값을 "DK"로 설정
# Set the 'iso2' value to "DK" for rows where 'Country_Region' is "Denmark"
country_data.loc[country_data["Country_Region"]=="Denmark", "iso2"] = "DK"

# 중복된 행 제거 (동일한 Country_Region과 iso2 조합 제거)
# Remove duplicate rows (eliminating identical Country_Region and iso2 combinations)
country_data = country_data.drop_duplicates()

# "Denmark" 국가 코드 확인
# Check the standardized "Denmark" country code
country_data.loc[country_data["Country_Region"]=="Denmark", "iso2"]

44    DK
Name: iso2, dtype: object

In [19]:
country_data["Country_Region"].value_counts()

Country_Region
Botswana        1
Burundi         1
Sierra Leone    1
Afghanistan     1
Albania         1
               ..
Zimbabwe        1
Australia       1
Canada          1
China           1
US              1
Name: count, Length: 180, dtype: int64

In [20]:
country_data

Unnamed: 0,iso2,Country_Region
0,BW,Botswana
1,BI,Burundi
2,SL,Sierra Leone
3,AF,Afghanistan
4,AL,Albania
...,...,...
175,ZW,Zimbabwe
199,AU,Australia
207,CA,Canada
222,CN,China


In [21]:
# 국가 코드 기반 국기 이미지 링크 생성 및 데이터프레임에 추가
# Generate flag image links based on country codes and add them to the dataframe

# 빈 리스트 생성 (국가 코드의 소문자 변환 및 링크 저장)
# Create an empty list to store lowercase country codes and flag image links
code_lower_link = list()

# 데이터프레임의 모든 행을 반복 처리
# Iterate through all rows of the dataframe
for v in range(country_data.shape[0]):
    # 국가 코드 (iso2) 값을 문자열로 변환 후 소문자로 변환
    # Convert the country code (iso2) to a lowercase string
    lower_code = str(country_data.iloc[v, 0]).lower()

    # 국가 코드 기반 국기 이미지 링크 생성
    # Generate a flag image link based on the lowercase country code
    lower_link = f"https://flagpedia.net/data/flags/w580/{lower_code}.png"

    # 생성된 링크를 리스트에 추가
    # Append the generated link to the list
    code_lower_link.append(lower_link)

# 생성된 국기 이미지 링크를 새로운 컬럼 "Country_link"에 추가
# Add the generated flag image links to a new column "Country_link"
country_data["Country_link"] = code_lower_link

# 데이터프레임에서 필요한 컬럼만 선택하여 정리 (iso2, Country_Region, Country_link)
# Select only the necessary columns from the dataframe (iso2, Country_Region, Country_link)
country_data = country_data[["iso2", "Country_Region", "Country_link"]]

In [22]:
# 데이터프레임에서 필요한 컬럼만 선택
# Select only the required columns from the dataframe
country_data = country_data[["Country_Region", "Country_link"]]

#### 확보한 데이터 셋 2개 
#### 1. 년/월/일별 코로나 확진 데이터(raw_data)
#### 2. 국가 코드별 국기 링크 데이터(country_data)


In [23]:
print(raw_data.shape)
raw_data.head()

(250, 1144)


Unnamed: 0,Country_Region,01/22/2020,01/23/2020,01/24/2020,01/25/2020,01/26/2020,01/27/2020,01/28/2020,01/29/2020,01/30/2020,...,02/28/2023,03/01/2023,03/02/2023,03/03/2023,03/04/2023,03/05/2023,03/06/2023,03/07/2023,03/08/2023,03/09/2023
0,Azerbaijan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Afghanistan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,209322.0,209340.0,209358.0,209362.0,209369.0,209390.0,209406.0,209436.0,209451.0,209451.0
2,Albania,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,334391.0,334408.0,334408.0,334427.0,334427.0,334427.0,334427.0,334427.0,334443.0,334457.0
3,Algeria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,271441.0,271448.0,271463.0,271469.0,271469.0,271477.0,271477.0,271490.0,271494.0,271496.0
4,Andorra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,47866.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47890.0,47890.0


In [24]:
print(country_data.shape)
country_data.head()

(180, 2)


Unnamed: 0,Country_Region,Country_link
0,Botswana,https://flagpedia.net/data/flags/w580/bw.png
1,Burundi,https://flagpedia.net/data/flags/w580/bi.png
2,Sierra Leone,https://flagpedia.net/data/flags/w580/sl.png
3,Afghanistan,https://flagpedia.net/data/flags/w580/af.png
4,Albania,https://flagpedia.net/data/flags/w580/al.png


In [25]:
country_data["Country_Region"].value_counts()

Country_Region
Botswana        1
Burundi         1
Sierra Leone    1
Afghanistan     1
Albania         1
               ..
Zimbabwe        1
Australia       1
Canada          1
China           1
US              1
Name: count, Length: 180, dtype: int64

#### JSON 데이터와 apply 함수를 사용하여 일별 코로나 확진 데이터(raw_data)의 Province_State 컬럼에서 NaN 값 처리
#### Processing NaN Values in the Province_State Column of the Daily COVID-19 Cases Data (raw_data) Using JSON Data and the apply Function

In [26]:
import json  # JSON 처리를 위한 라이브러리 불러오기
# Import the JSON library for handling JSON data

# JSON 파일 열기 및 로드
# Open and load the JSON file
with open("D:/myAnalyze/PANDASPLOTLY_FUNCODING_FULLDATA_20240601/00_Material(Uploaded)/COVID-19-master/csse_covid_19_data/country_convert.json") as json_file:
    myJson = json.load(json_file)  # JSON 데이터를 Python 딕셔너리로 변환
    # Convert JSON data into a Python dictionary

    print(myJson.keys())  # JSON 데이터의 키 값 출력
    # Print the keys of the JSON data

    print(myJson.values())  # JSON 데이터의 값 출력
    # Print the values of the JSON data

dict_keys(['Mainland China', 'Macau', 'South Korea', 'Aruba', ' Azerbaijan', 'Bahamas, The', 'Cape Verde', 'Cayman Islands', 'Channel Islands', 'Curacao', 'Czech Republic', 'East Timor', 'Faroe Islands', 'French Guiana', 'Gambia, The', 'Gibraltar', 'Greenland', 'Guadeloupe', 'Guam', 'Guernsey', 'Hong Kong', 'Hong Kong SAR', 'Iran (Islamic Republic of)', 'Ivory Coast', 'Jersey', 'Macao SAR', 'Martinique', 'Mayotte', 'North Ireland', 'Palestine', 'Puerto Rico', 'Republic of Ireland', 'Republic of Korea', 'Republic of Moldova', 'Republic of the Congo', 'Reunion', 'Russian Federation', 'Saint Barthelemy', 'Saint Martin', 'St. Martin', 'Taipei and environs', 'The Bahamas', 'The Gambia', 'UK', 'Vatican City', 'Viet Nam', 'occupied Palestinian territory', 'Taiwan*', 'Malawi', 'South Sudan', 'Western Sahara', 'Namibia'])
dict_values(['China', 'China', 'Korea, South', 'Netherlands', 'Azerbaijan', 'Bahamas', 'Cabo Verde', 'United Kingdom', 'United Kingdom', 'Netherlands', 'Czechia', 'Timor-Leste

#### Country_Region 컬럼 값을 확인해서 국가명이 다르게 기재되어 있을 경우에만 지정한 국가명으로 변경
#### Check the values in the 'Country_Region' column and change only if the country name is written differently

In [27]:
# 함수 정의
# Define a function
def notNaN(x):
    # 만약 'Country_Region' 값이 myJson 딕셔너리에 존재한다면
    # If the 'Country_Region' value exists in the myJson dictionary
    if x["Country_Region"] in myJson:
        # 해당 국가명을 myJson에 정의된 표준 국가명으로 변경
        # Change the country name to the standardized name defined in myJson
        x["Country_Region"] = myJson[x["Country_Region"]]
    return x  # 변환된 데이터 반환
    # Return the modified data

# apply() 함수를 이용하여 모든 행에 notNaN 함수를 적용
# Apply the notNaN function to all rows using the apply() function
raw_data = raw_data.apply(notNaN, axis=1)

# 변환된 데이터 확인
# Display the modified data
raw_data.head()

Unnamed: 0,Country_Region,01/22/2020,01/23/2020,01/24/2020,01/25/2020,01/26/2020,01/27/2020,01/28/2020,01/29/2020,01/30/2020,...,02/28/2023,03/01/2023,03/02/2023,03/03/2023,03/04/2023,03/05/2023,03/06/2023,03/07/2023,03/08/2023,03/09/2023
0,Azerbaijan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Afghanistan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,209322.0,209340.0,209358.0,209362.0,209369.0,209390.0,209406.0,209436.0,209451.0,209451.0
2,Albania,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,334391.0,334408.0,334408.0,334427.0,334427.0,334427.0,334427.0,334427.0,334443.0,334457.0
3,Algeria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,271441.0,271448.0,271463.0,271469.0,271469.0,271477.0,271477.0,271490.0,271494.0,271496.0
4,Andorra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,47866.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47890.0,47890.0


#### raw_data에 국가별 국기 링크 추가
#### Add country flag links to raw_data

#### country_data와 raw_data를 "Country_Region"을 기준으로 병합
#### Merge country_data with raw_data based on "Country_Region"
raw_data = raw_data.merge(country_data, on="Country_Region", how="left")

#### 변환된 데이터 확인
#### Display the modified data
raw_data.head()

In [28]:
# raw_data와 country_data를 "Country_Region"을 기준으로 병합하여 final_data 생성
# Merge raw_data and country_data based on "Country_Region" to create final_data

final_data = pd.merge(raw_data, country_data, on="Country_Region", how="left")

# 병합된 데이터 확인
# Display the merged data
final_data.head()

Unnamed: 0,Country_Region,01/22/2020,01/23/2020,01/24/2020,01/25/2020,01/26/2020,01/27/2020,01/28/2020,01/29/2020,01/30/2020,...,03/01/2023,03/02/2023,03/03/2023,03/04/2023,03/05/2023,03/06/2023,03/07/2023,03/08/2023,03/09/2023,Country_link
0,Azerbaijan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,https://flagpedia.net/data/flags/w580/az.png
1,Afghanistan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,209340.0,209358.0,209362.0,209369.0,209390.0,209406.0,209436.0,209451.0,209451.0,https://flagpedia.net/data/flags/w580/af.png
2,Albania,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,334408.0,334408.0,334427.0,334427.0,334427.0,334427.0,334427.0,334443.0,334457.0,https://flagpedia.net/data/flags/w580/al.png
3,Algeria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,271448.0,271463.0,271469.0,271469.0,271477.0,271477.0,271490.0,271494.0,271496.0,https://flagpedia.net/data/flags/w580/dz.png
4,Andorra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47890.0,47890.0,https://flagpedia.net/data/flags/w580/ad.png


In [29]:
final_data

Unnamed: 0,Country_Region,01/22/2020,01/23/2020,01/24/2020,01/25/2020,01/26/2020,01/27/2020,01/28/2020,01/29/2020,01/30/2020,...,03/01/2023,03/02/2023,03/03/2023,03/04/2023,03/05/2023,03/06/2023,03/07/2023,03/08/2023,03/09/2023,Country_link
0,Azerbaijan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,https://flagpedia.net/data/flags/w580/az.png
1,Afghanistan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,209340.0,209358.0,209362.0,209369.0,209390.0,209406.0,209436.0,209451.0,209451.0,https://flagpedia.net/data/flags/w580/af.png
2,Albania,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,334408.0,334408.0,334427.0,334427.0,334427.0,334427.0,334427.0,334443.0,334457.0,https://flagpedia.net/data/flags/w580/al.png
3,Algeria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,271448.0,271463.0,271469.0,271469.0,271477.0,271477.0,271490.0,271494.0,271496.0,https://flagpedia.net/data/flags/w580/dz.png
4,Andorra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47890.0,47890.0,https://flagpedia.net/data/flags/w580/ad.png
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
245,Winter Olympics 2022,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,535.0,535.0,535.0,535.0,535.0,535.0,535.0,535.0,535.0,
246,Yemen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0,11945.0,
247,Zambia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,343012.0,343079.0,343079.0,343079.0,343135.0,343135.0,343135.0,343135.0,343135.0,https://flagpedia.net/data/flags/w580/zm.png
248,Zimbabwe,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,264127.0,264127.0,264127.0,264127.0,264127.0,264127.0,264127.0,264276.0,264276.0,https://flagpedia.net/data/flags/w580/zw.png


In [30]:
# 컬럼 순서 변경을 위한 리스트 변환
# Convert column names to a list for reordering
edit_column = list(final_data.columns)

# 마지막 컬럼을 두 번째 위치로 이동
# Move the last column to the second position
edit_column.insert(1, edit_column[-1])

# 기존 마지막 컬럼을 제거 (이미 insert로 이동했으므로 중복 방지)
# Remove the last column after inserting it in the second position
del edit_column[-1]

# 변경된 순서대로 데이터프레임 재배치
# Reorder the DataFrame based on the modified column order
final_data = final_data[edit_column]

# 변경된 데이터프레임 확인
# Display the updated DataFrame
final_data.head()

Unnamed: 0,Country_Region,Country_link,01/22/2020,01/23/2020,01/24/2020,01/25/2020,01/26/2020,01/27/2020,01/28/2020,01/29/2020,...,02/28/2023,03/01/2023,03/02/2023,03/03/2023,03/04/2023,03/05/2023,03/06/2023,03/07/2023,03/08/2023,03/09/2023
0,Azerbaijan,https://flagpedia.net/data/flags/w580/az.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Afghanistan,https://flagpedia.net/data/flags/w580/af.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,209322.0,209340.0,209358.0,209362.0,209369.0,209390.0,209406.0,209436.0,209451.0,209451.0
2,Albania,https://flagpedia.net/data/flags/w580/al.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,334391.0,334408.0,334408.0,334427.0,334427.0,334427.0,334427.0,334427.0,334443.0,334457.0
3,Algeria,https://flagpedia.net/data/flags/w580/dz.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,271441.0,271448.0,271463.0,271469.0,271469.0,271477.0,271477.0,271490.0,271494.0,271496.0
4,Andorra,https://flagpedia.net/data/flags/w580/ad.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,47866.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47890.0,47890.0


In [31]:
# 특정 컬럼("Country_link")에서 NaN 값이 있는 행 제거
# Drop rows where the "Country_link" column has NaN values
final_data = final_data.dropna(subset=["Country_link"])

# 변경된 데이터프레임 확인
# Display the updated DataFrame
final_data.head()

Unnamed: 0,Country_Region,Country_link,01/22/2020,01/23/2020,01/24/2020,01/25/2020,01/26/2020,01/27/2020,01/28/2020,01/29/2020,...,02/28/2023,03/01/2023,03/02/2023,03/03/2023,03/04/2023,03/05/2023,03/06/2023,03/07/2023,03/08/2023,03/09/2023
0,Azerbaijan,https://flagpedia.net/data/flags/w580/az.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Afghanistan,https://flagpedia.net/data/flags/w580/af.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,209322.0,209340.0,209358.0,209362.0,209369.0,209390.0,209406.0,209436.0,209451.0,209451.0
2,Albania,https://flagpedia.net/data/flags/w580/al.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,334391.0,334408.0,334408.0,334427.0,334427.0,334427.0,334427.0,334427.0,334443.0,334457.0
3,Algeria,https://flagpedia.net/data/flags/w580/dz.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,271441.0,271448.0,271463.0,271469.0,271469.0,271477.0,271477.0,271490.0,271494.0,271496.0
4,Andorra,https://flagpedia.net/data/flags/w580/ad.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,47866.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47890.0,47890.0


In [32]:
# "Country_Region" 및 "Country_link" 컬럼을 기준으로 그룹화하여 숫자 데이터만 합산
# Grouping by "Country_Region" and "Country_link", summing only numeric columns
final_data = final_data.groupby(["Country_Region", "Country_link"]).sum(numeric_only=True)

# 변경된 데이터프레임 확인
# Display the updated DataFrame
final_data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,01/22/2020,01/23/2020,01/24/2020,01/25/2020,01/26/2020,01/27/2020,01/28/2020,01/29/2020,01/30/2020,01/31/2020,...,02/28/2023,03/01/2023,03/02/2023,03/03/2023,03/04/2023,03/05/2023,03/06/2023,03/07/2023,03/08/2023,03/09/2023
Country_Region,Country_link,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Afghanistan,https://flagpedia.net/data/flags/w580/af.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,209322.0,209340.0,209358.0,209362.0,209369.0,209390.0,209406.0,209436.0,209451.0,209451.0
Albania,https://flagpedia.net/data/flags/w580/al.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,334391.0,334408.0,334408.0,334427.0,334427.0,334427.0,334427.0,334427.0,334443.0,334457.0
Algeria,https://flagpedia.net/data/flags/w580/dz.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,271441.0,271448.0,271463.0,271469.0,271469.0,271477.0,271477.0,271490.0,271494.0,271496.0
Andorra,https://flagpedia.net/data/flags/w580/ad.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,47866.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47890.0,47890.0
Angola,https://flagpedia.net/data/flags/w580/ao.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,105255.0,105277.0,105277.0,105277.0,105277.0,105277.0,105277.0,105277.0,105288.0,105288.0


In [33]:
final_data

Unnamed: 0_level_0,Unnamed: 1_level_0,01/22/2020,01/23/2020,01/24/2020,01/25/2020,01/26/2020,01/27/2020,01/28/2020,01/29/2020,01/30/2020,01/31/2020,...,02/28/2023,03/01/2023,03/02/2023,03/03/2023,03/04/2023,03/05/2023,03/06/2023,03/07/2023,03/08/2023,03/09/2023
Country_Region,Country_link,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Afghanistan,https://flagpedia.net/data/flags/w580/af.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,209322.0,209340.0,209358.0,209362.0,209369.0,209390.0,209406.0,209436.0,209451.0,209451.0
Albania,https://flagpedia.net/data/flags/w580/al.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,334391.0,334408.0,334408.0,334427.0,334427.0,334427.0,334427.0,334427.0,334443.0,334457.0
Algeria,https://flagpedia.net/data/flags/w580/dz.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,271441.0,271448.0,271463.0,271469.0,271469.0,271477.0,271477.0,271490.0,271494.0,271496.0
Andorra,https://flagpedia.net/data/flags/w580/ad.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,47866.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47875.0,47890.0,47890.0
Angola,https://flagpedia.net/data/flags/w580/ao.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,105255.0,105277.0,105277.0,105277.0,105277.0,105277.0,105277.0,105277.0,105288.0,105288.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Venezuela,https://flagpedia.net/data/flags/w580/ve.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,551981.0,551986.0,551986.0,552014.0,552051.0,552051.0,552125.0,552157.0,552157.0,552162.0
Vietnam,https://flagpedia.net/data/flags/w580/vn.png,0.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,...,11526917.0,11526926.0,11526937.0,11526950.0,11526962.0,11526966.0,11526966.0,11526986.0,11526994.0,11526994.0
West Bank and Gaza,https://flagpedia.net/data/flags/w580/ps.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,703228.0,703228.0,703228.0,703228.0,703228.0,703228.0,703228.0,703228.0,703228.0,703228.0
Zambia,https://flagpedia.net/data/flags/w580/zm.png,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,343012.0,343012.0,343079.0,343079.0,343079.0,343135.0,343135.0,343135.0,343135.0,343135.0


In [34]:
# 처리된 데이터를 CSV 파일로 저장
# Save the processed data to a CSV file
final_data.to_csv("myPtcData_Covid.csv")

# 파일 저장 완료 메시지 출력
# Print confirmation message after saving the file
print("CSV file 'myPtcData_Covid.csv' has been saved successfully.")

CSV file 'myPtcData_Covid.csv' has been saved successfully.
