# COVID 19

# Introduction

As the lead data scientist for the government of the Republic of Lithuania, I was given a personal mission by President Gitanas Nausėda to prepare the nation for a possible next wave of the Corona virus pandemic. South Korea has been one of the more successful nations in the fight against COVID-19, so I decided to analyze the available COVID-19 data on South Korea to learn of their success strategies, their pitfalls. In my analysis, I will determine what I have learned that can be applied here in Lithuania.

As of the date of this analysis, February 25, 2022, the COVID-19 numbers from South Korea are as follows:

- Confirmed Cases: 2,665,077
- Deaths: 7,783
- Recovered: 969,524

The first known case of COVID-19 in South Korea was on January 20, 2020 when a 35-year-old Chinese woman arrived at Incheon Airport from Wuhan and tested positive. By early March, approximately forty days after its first confirmed case South Korea became the second most infected country after China. South Korea undertook a massive public and private sector effort to fashion a national response to the pandemic. 

The South Korean response is distinct in several respects: 
- Early: An early and almost immediate response after the first case on January 20.
- Speed: A premium on moving as quickly as possible in setting up a testing regime.
- Transparency: Real-time and frequent information dissemination to the public.
- Public-Private sector: Enlisting companies with needed resources in a private-public sector response.
- National organization: Organized as a national effort rather than at the city, provincial, or local levels.

I will utilize the following datasets which contain data from January to June 2020:
1. Age: Confirmed and deceased cases based on age groups
2. Gender: Confirmed and deceased cases based on gender, male or female 
3. Province: Confirmed and deceased cases based on province
4. Patient: Confirmed and deceased cases based on age, gender, method of infection, province and city
5. Case: Confirmed cases based on method of infection, province, city, and specific location
6. Population: Count and ratio of academic organizations, nursing homes and elderly population
7. Test: Number of tests and number of negative and positive results.

I will perform the following tasks on these datasets:
1. Load the data into Pandas
2. Provide basic information about the data
3. Clean and prepare the data for analysis
4. Provide basic descriptive statistics
5. Perform Exploratory Data Analysis (EDA)
6. Summarize and provide conclusions of my findings
7. Discuss possible further analysis

# Goals

 The goal of this project is to present the report of my analysis of the South Korean datasets to President Gitanas Nausėda and provide the government of the Republic of Lithuania with recommendations on how to prepare of a possible next wave of the Corona virus pandemic. I will concentrate on the following topics:
 
**Total Numbers** 

**Effect of Population Density**

**Effect of Weather**

**Effect of Testing**

**Effect of Travel**

**Some Ideas for Analysis**
    
	2. Deceased demographics
	3. People bringing infection from abroad
	4. Super Spreader demographics
	5. Top 10 Infection Cases
	7. Join Gender and Age Data
	8. Organizations
	9. Province Population Density and Numbers Infected Totals
	10. Number Tested / Deceased
	11. Effect of Weather
    12. Different policies on number of cases or number of fatalities
	13. Cases and deaths in different age groups
	14. Male vs Female
	15. Population density 
	16. Weather
	17. Mortality Rate

# Technical Requirements

## General Objectives

- Practice identifying opportunities for data analysis.
- Practice performing EDA.
- Practice working with data from Kaggle.
- Practice visualizing data with Matplotlib & Seaborn.
- Practice reading data, performing queries, and filtering data using Pandas.

## The Project

The world is still struggling with one of the most rapidly spreading pandemics. There are a lot of people who say that data is the best weapon we can use in this "Corona Fight."
Imagine that you are one of the best data scientists in your country. The president of your country asked you to analyze the COVID-19 patient-level data of South Korea and prepare your homeland for the next wave of the pandemic. You, as the lead data scientist of your country have to create and prove a plan of fighting the pandemics in your country by analyzing the provided data. You must get the most critical insights using learned data science techniques and present them to the lead of your country.


# Libraries

In [1]:
%matplotlib inline

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
from IPython.display import display

sns.set_palette("pastel")

# Loading Data

In [2]:
age = pd.read_csv(
    "C:\py\Projects\TuringCollege\COVID19\DataSets\\age.csv",
    index_col=False,
    skipinitialspace=True,
)

In [3]:
gender = pd.read_csv(
    "C:\py\Projects\TuringCollege\COVID19\DataSets\gender.csv",
    index_col=False,
    skipinitialspace=True,
)

In [4]:
province = pd.read_csv(
    "C:\py\Projects\TuringCollege\COVID19\DataSets\province.csv",
    index_col=False,
    skipinitialspace=True,
)

In [5]:
patient = pd.read_csv(
    "C:\py\Projects\TuringCollege\COVID19\DataSets\patient.csv",
    index_col=False,
    skipinitialspace=True,
)

In [6]:
case = pd.read_csv(
    "C:\py\Projects\TuringCollege\COVID19\DataSets\case.csv",
    index_col=False,
    skipinitialspace=True,
)

In [7]:
population = pd.read_csv(
    "C:\py\Projects\TuringCollege\COVID19\DataSets\population.csv",
    index_col=False,
    skipinitialspace=True,
)

In [8]:
test = pd.read_csv(
    "C:\py\Projects\TuringCollege\COVID19\DataSets\\test.csv",
    index_col=False,
    skipinitialspace=True,
)

# Functions

In [9]:
def update_dictionary(age, gender, province, patient, case, population, test):
    all_data = {
    "Age:": age,
    "Gender:": gender,
    "Province:": province,
    "Patient:": patient,
    "Case:": case,
    "Population:": population,
    "Test:": test
}
    return all_data

# Basic Information

In [10]:
all_data = update_dictionary(age, gender, province, patient, case, population, test)

## Number of Rows

In [11]:
for name, data in all_data.items():
    print(name, data.shape[0])

Age: 1089
Gender: 242
Province: 2771
Patient: 5165
Case: 174
Population: 243
Test: 163


## Number of Columns

In [12]:
for name, data in all_data.items():
    print(name, data.shape[1])

Age: 5
Gender: 5
Province: 6
Patient: 14
Case: 8
Population: 12
Test: 7


## Information on each Dataset

In [13]:
for name, data in all_data.items():
    print('\n\n{}'.format(name))
    print()
    print()
    display(data.info())



Age:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1089 entries, 0 to 1088
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   date       1089 non-null   object
 1   time       1089 non-null   int64 
 2   age        1089 non-null   object
 3   confirmed  1089 non-null   int64 
 4   deceased   1089 non-null   int64 
dtypes: int64(3), object(2)
memory usage: 42.7+ KB


None



Gender:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 242 entries, 0 to 241
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   date       242 non-null    object
 1   time       242 non-null    int64 
 2   sex        242 non-null    object
 3   confirmed  242 non-null    int64 
 4   deceased   242 non-null    int64 
dtypes: int64(3), object(2)
memory usage: 9.6+ KB


None



Province:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2771 entries, 0 to 2770
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   date       2771 non-null   object
 1   time       2771 non-null   int64 
 2   province   2771 non-null   object
 3   confirmed  2771 non-null   int64 
 4   released   2771 non-null   int64 
 5   deceased   2771 non-null   int64 
dtypes: int64(4), object(2)
memory usage: 130.0+ KB


None



Patient:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5165 entries, 0 to 5164
Data columns (total 14 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   ID                  5165 non-null   int64 
 1   sex                 4043 non-null   object
 2   age                 3785 non-null   object
 3   country             5165 non-null   object
 4   province            5165 non-null   object
 5   city                5071 non-null   object
 6   infection_case      4246 non-null   object
 7   infected_by         1346 non-null   object
 8   contact_number      791 non-null    object
 9   symptom_onset_date  689 non-null    object
 10  confirmed_date      5162 non-null   object
 11  released_date       1587 non-null   object
 12  deceased_date       66 non-null     object
 13  state               5165 non-null   object
dtypes: int64(1), object(13)
memory usage: 565.0+ KB


None



Case:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 174 entries, 0 to 173
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   case_id         174 non-null    int64 
 1   province        174 non-null    object
 2   city            174 non-null    object
 3   group           174 non-null    bool  
 4   infection_case  174 non-null    object
 5   confirmed       174 non-null    int64 
 6   latitude        174 non-null    object
 7   longitude       174 non-null    object
dtypes: bool(1), int64(2), object(5)
memory usage: 9.8+ KB


None



Population:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 243 entries, 0 to 242
Data columns (total 12 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   code                      243 non-null    int64  
 1   province                  243 non-null    object 
 2   city                      243 non-null    object 
 3   latitude                  243 non-null    float64
 4   longitude                 243 non-null    float64
 5   elementary_school_count   243 non-null    int64  
 6   kindergarten_count        243 non-null    int64  
 7   university_count          243 non-null    int64  
 8   academy_ratio             243 non-null    float64
 9   elderly_population_ratio  243 non-null    float64
 10  elderly_alone_ratio       243 non-null    float64
 11  nursing_home_count        243 non-null    int64  
dtypes: float64(5), int64(5), object(2)
memory usage: 22.9+ KB


None



Test:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 163 entries, 0 to 162
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   date       163 non-null    object
 1   time       163 non-null    int64 
 2   test       163 non-null    int64 
 3   negative   163 non-null    int64 
 4   confirmed  163 non-null    int64 
 5   released   163 non-null    int64 
 6   deceased   163 non-null    int64 
dtypes: int64(6), object(1)
memory usage: 9.0+ KB


None

## First 5 Rows

In [15]:
for name, data in all_data.items():
    print('\n\n{}'.format(name))
    display(data.head())



Age:


Unnamed: 0,date,time,age,confirmed,deceased
0,3/2/2020,0,0s,32,0
1,3/2/2020,0,10s,169,0
2,3/2/2020,0,20s,1235,0
3,3/2/2020,0,30s,506,1
4,3/2/2020,0,40s,633,1




Gender:


Unnamed: 0,date,time,sex,confirmed,deceased
0,3/2/2020,0,male,1591,13
1,3/2/2020,0,female,2621,9
2,3/3/2020,0,male,1810,16
3,3/3/2020,0,female,3002,12
4,3/4/2020,0,male,1996,20




Province:


Unnamed: 0,date,time,province,confirmed,released,deceased
0,1/20/2020,16,Seoul,0,0,0
1,1/20/2020,16,Busan,0,0,0
2,1/20/2020,16,Daegu,0,0,0
3,1/20/2020,16,Incheon,1,0,0
4,1/20/2020,16,Gwangju,0,0,0




Patient:


Unnamed: 0,ID,sex,age,country,province,city,infection_case,infected_by,contact_number,symptom_onset_date,confirmed_date,released_date,deceased_date,state
0,1000000001,male,50s,Korea,Seoul,Gangseo-gu,overseas inflow,,75,1/22/2020,1/23/2020,2/5/2020,,released
1,1000000002,male,30s,Korea,Seoul,Jungnang-gu,overseas inflow,,31,,1/30/2020,3/2/2020,,released
2,1000000003,male,50s,Korea,Seoul,Jongno-gu,contact with patient,2002000001.0,17,,1/30/2020,2/19/2020,,released
3,1000000004,male,20s,Korea,Seoul,Mapo-gu,overseas inflow,,9,1/26/2020,1/30/2020,2/15/2020,,released
4,1000000005,female,20s,Korea,Seoul,Seongbuk-gu,contact with patient,1000000002.0,2,,1/31/2020,2/24/2020,,released




Case:


Unnamed: 0,case_id,province,city,group,infection_case,confirmed,latitude,longitude
0,1000001,Seoul,Yongsan-gu,True,Itaewon Clubs,139,37.538621,126.992652
1,1000002,Seoul,Gwanak-gu,True,Richway,119,37.48208,126.901384
2,1000003,Seoul,Guro-gu,True,Guro-gu Call Center,95,37.508163,126.884387
3,1000004,Seoul,Yangcheon-gu,True,Yangcheon Table Tennis Club,43,37.546061,126.874209
4,1000005,Seoul,Dobong-gu,True,Day Care Center,43,37.679422,127.044374




Population:


Unnamed: 0,code,province,city,latitude,longitude,elementary_school_count,kindergarten_count,university_count,academy_ratio,elderly_population_ratio,elderly_alone_ratio,nursing_home_count
0,10000,Seoul,Seoul,37.566953,126.977977,607,830,48,1.44,15.38,5.8,22739
1,10010,Seoul,Gangnam-gu,37.518421,127.047222,33,38,0,4.18,13.17,4.3,3088
2,10020,Seoul,Gangdong-gu,37.530492,127.123837,27,32,0,1.54,14.55,5.4,1023
3,10030,Seoul,Gangbuk-gu,37.639938,127.025508,14,21,0,0.67,19.49,8.5,628
4,10040,Seoul,Gangseo-gu,37.551166,126.849506,36,56,1,1.17,14.39,5.7,1080




Test:


Unnamed: 0,date,time,test,negative,confirmed,released,deceased
0,1/20/2020,16,1,0,1,0,0
1,1/21/2020,16,1,0,1,0,0
2,1/22/2020,16,4,3,1,0,0
3,1/23/2020,16,22,21,1,0,0
4,1/24/2020,16,27,25,2,0,0


# Data Cleaning

## Deleting Unneeded Columns

In [16]:
age = age.drop(["time"], axis=1)

In [17]:
gender = gender.drop(["time"], axis=1)

In [18]:
province = province.drop(["time", "released"], axis=1)

In [19]:
patient = patient.drop(
    [
        "country",
        "province",
        "city",
        "symptom_onset_date",
        "confirmed_date",
        "released_date",
        "deceased_date",
        "state",
        "infected_by",
        "contact_number",
        "ID"
    ],
    axis=1,
)

In [20]:
case = case.drop(["case_id", "latitude", "longitude"], axis=1)

In [21]:
population = population.drop(["code", "latitude", "longitude"], axis=1)

In [22]:
test = test.drop(["time"], axis=1)

## Renaming Columns

In [23]:
age.rename(
    columns={
        "date": "Date",
        "age": "Age",
        "confirmed": "Confirmed",
        "deceased": "Deceased",
    },
    inplace=True,
)

In [24]:
gender.rename(
    columns={
        "date": "Date",
        "sex": "Sex",
        "confirmed": "Confirmed",
        "deceased": "Deceased",
    },
    inplace=True,
)

In [25]:
province.rename(
    columns={
        "date": "Date",
        "province": "Province",
        "confirmed": "Confirmed",
        "deceased": "Deceased",
    },
    inplace=True,
)

In [26]:
patient.rename(
    columns={
        "sex": "Sex",
        "age": "Age",
        "infection_case": "Location",      
    },
    inplace=True,
)

In [27]:
case.rename(
    columns={
        "province": "Province",
        "city": "City",
        "group": "Group",
        "infection_case": "Location",
        "confirmed": "Confirmed",
        
    },
    inplace=True,
)

In [28]:
population.rename(
    columns={
        "province": "Province",
        "city": "City",
        "elementary_school_count": "Elementary",
        "kindergarten_count": "Kindergarten",
        "university_count": "University",
        "academy_ratio": "Academy",
        "elderly_population_ratio": "Elderly",
        "elderly_alone_ratio": "Elderly Alone",
        "nursing_home_count": "Nursing Home"    
    },
    inplace=True,
)

In [29]:
test.rename(
    columns={
        "date": "Date",
        "test": "Test",
        "negative": "Negative",
        "confirmed": "Confirmed",
        "released": "Released",
        "deceased": "Deceased"   
    },
    inplace=True,
)

In [30]:
all_data = update_dictionary(age, gender, province, patient, case, population, test)

In [31]:
for name, data in all_data.items():
    print('\n\n{}'.format(name))
    display(data.head())



Age:


Unnamed: 0,Date,Age,Confirmed,Deceased
0,3/2/2020,0s,32,0
1,3/2/2020,10s,169,0
2,3/2/2020,20s,1235,0
3,3/2/2020,30s,506,1
4,3/2/2020,40s,633,1




Gender:


Unnamed: 0,Date,Sex,Confirmed,Deceased
0,3/2/2020,male,1591,13
1,3/2/2020,female,2621,9
2,3/3/2020,male,1810,16
3,3/3/2020,female,3002,12
4,3/4/2020,male,1996,20




Province:


Unnamed: 0,Date,Province,Confirmed,Deceased
0,1/20/2020,Seoul,0,0
1,1/20/2020,Busan,0,0
2,1/20/2020,Daegu,0,0
3,1/20/2020,Incheon,1,0
4,1/20/2020,Gwangju,0,0




Patient:


Unnamed: 0,Sex,Age,Location
0,male,50s,overseas inflow
1,male,30s,overseas inflow
2,male,50s,contact with patient
3,male,20s,overseas inflow
4,female,20s,contact with patient




Case:


Unnamed: 0,Province,City,Group,Location,Confirmed
0,Seoul,Yongsan-gu,True,Itaewon Clubs,139
1,Seoul,Gwanak-gu,True,Richway,119
2,Seoul,Guro-gu,True,Guro-gu Call Center,95
3,Seoul,Yangcheon-gu,True,Yangcheon Table Tennis Club,43
4,Seoul,Dobong-gu,True,Day Care Center,43




Population:


Unnamed: 0,Province,City,Elementary,Kindergarten,University,Academy,Elderly,Elderly Alone,Nursing Home
0,Seoul,Seoul,607,830,48,1.44,15.38,5.8,22739
1,Seoul,Gangnam-gu,33,38,0,4.18,13.17,4.3,3088
2,Seoul,Gangdong-gu,27,32,0,1.54,14.55,5.4,1023
3,Seoul,Gangbuk-gu,14,21,0,0.67,19.49,8.5,628
4,Seoul,Gangseo-gu,36,56,1,1.17,14.39,5.7,1080




Test:


Unnamed: 0,Date,Test,Negative,Confirmed,Released,Deceased
0,1/20/2020,1,0,1,0,0
1,1/21/2020,1,0,1,0,0
2,1/22/2020,4,3,1,0,0
3,1/23/2020,22,21,1,0,0
4,1/24/2020,27,25,2,0,0


## Missing Data

In [32]:
for name, data in all_data.items():
    print('\n\n{}'.format(name))
    display(data.isnull().sum())



Age:


Date         0
Age          0
Confirmed    0
Deceased     0
dtype: int64



Gender:


Date         0
Sex          0
Confirmed    0
Deceased     0
dtype: int64



Province:


Date         0
Province     0
Confirmed    0
Deceased     0
dtype: int64



Patient:


Sex         1122
Age         1380
Location     919
dtype: int64



Case:


Province     0
City         0
Group        0
Location     0
Confirmed    0
dtype: int64



Population:


Province         0
City             0
Elementary       0
Kindergarten     0
University       0
Academy          0
Elderly          0
Elderly Alone    0
Nursing Home     0
dtype: int64



Test:


Date         0
Test         0
Negative     0
Confirmed    0
Released     0
Deceased     0
dtype: int64

## Duplicate Data

In [33]:
case[case.duplicated(keep=False)].any()
for name, data in all_data.items():
    print('\n\n{}'.format(name))
    display(data[data.duplicated(keep=False)].any())



Age:


Date         False
Age          False
Confirmed    False
Deceased     False
dtype: bool



Gender:


Date         False
Sex          False
Confirmed    False
Deceased     False
dtype: bool



Province:


Date         False
Province     False
Confirmed    False
Deceased     False
dtype: bool



Patient:


Sex         True
Age         True
Location    True
dtype: bool



Case:


Province     False
City         False
Group        False
Location     False
Confirmed    False
dtype: bool



Population:


Province         False
City             False
Elementary       False
Kindergarten     False
University       False
Academy          False
Elderly          False
Elderly Alone    False
Nursing Home     False
dtype: bool



Test:


Date         False
Test         False
Negative     False
Confirmed    False
Released     False
Deceased     False
dtype: bool

# Exploratory Data Analysis (EDA)

## Age

In [34]:
groupby_age = age[['Age', 'Confirmed', 'Deceased']].groupby(['Age']).sum()

In [35]:
groupby_age["Percentage"] = (
    groupby_age["Deceased"] / groupby_age["Confirmed"]
) * 100
groupby_age['Percentage'] = groupby_age['Percentage'].apply('{:,.2f}'.format).astype('float64')

groupby_age.sort_values('Percentage', ascending = False)

Unnamed: 0_level_0,Confirmed,Deceased,Percentage
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
80s,54086,12136,22.44
70s,82107,7599,9.25
60s,158505,3743,2.36
50s,230030,1537,0.67
40s,168250,295,0.18
30s,137539,194,0.14
0s,16107,0,0.0
10s,68752,0,0.0
20s,345827,0,0.0


## Gender

In [36]:
groupby_gender = gender[['Sex', 'Confirmed', 'Deceased']].groupby(['Sex']).sum()

In [37]:
groupby_gender["Percentage"] = (
    groupby_gender["Deceased"] / groupby_gender["Confirmed"]
) * 100
groupby_gender['Percentage'] = groupby_gender['Percentage'].apply('{:,.2f}'.format).astype('float64')
groupby_gender.sort_values('Percentage', ascending = False)

Unnamed: 0_level_0,Confirmed,Deceased,Percentage
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
male,513727,13484,2.62
female,747467,12019,1.61


## Province

In [38]:
groupby_province = province[['Province', 'Confirmed', 'Deceased']].groupby(['Province']).sum()

In [39]:
groupby_province["Percentage"] = (
    groupby_province["Deceased"] / groupby_province["Confirmed"]
) * 100
groupby_province['Percentage'] = groupby_province['Percentage'].apply('{:,.2f}'.format).astype('float64')
groupby_province.sort_values('Percentage', ascending = False)

Unnamed: 0_level_0,Confirmed,Deceased,Percentage
Province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Gangwon-do,5908,225,3.81
Gyeongsangbuk-do,161079,5393,3.35
Daegu,807506,17624,2.18
Gyeonggi-do,81059,1600,1.97
Busan,16341,299,1.83
Ulsan,5269,91,1.73
Daejeon,5217,58,1.11
Seoul,81923,298,0.36
Incheon,16645,16,0.1
Gwangju,3359,0,0.0


## Patient

In [41]:
pd.reset_option('display.max_rows')
Total = patient['Location'].value_counts()
Total

contact with patient                             1610
overseas inflow                                   840
etc                                               703
Itaewon Clubs                                     162
Richway                                           128
Guro-gu Call Center                               112
Shincheonji Church                                107
Coupang Logistics Center                           80
Yangcheon Table Tennis Club                        44
Day Care Center                                    43
SMR Newly Planted Churches Group                   36
Onchun Church                                      33
Bonghwa Pureun Nursing Home                        31
gym facility in Cheonan                            30
Ministry of Oceans and Fisheries                   28
Wangsung Church                                    24
Cheongdo Daenam Hospital                           21
Dongan Church                                      17
Eunpyeong St. Mary's Hospita

## Case

In [43]:
case = case[
    ["Province", "City", "Location", "Confirmed", "Group"]
]
pd.set_option("max_rows", None)
case.sort_values(['Confirmed', 'Location'], ascending=[False, False])


Unnamed: 0,Province,City,Location,Confirmed,Group
48,Daegu,Nam-gu,Shincheonji Church,4511,True
56,Daegu,-,contact with patient,917,False
57,Daegu,-,etc,747,False
145,Gyeongsangbuk-do,from other city,Shincheonji Church,566,True
109,Gyeonggi-do,-,overseas inflow,305,False
35,Seoul,-,overseas inflow,298,False
49,Daegu,Dalseong-gun,Second Mi-Ju Hospital,196,True
156,Gyeongsangbuk-do,-,contact with patient,190,False
36,Seoul,-,contact with patient,162,False
0,Seoul,Yongsan-gu,Itaewon Clubs,139,True


## Population

In [44]:
population = population[
    ["Province", "City", "Kindergarten", "Elementary", "University", "Nursing Home", "Academy", "Elderly", "Elderly Alone"]
]
pd.set_option("max_rows", None)
# case.sort_values(['Confirmed', 'Location'], ascending=[False, False])
population


Unnamed: 0,Province,City,Kindergarten,Elementary,University,Nursing Home,Academy,Elderly,Elderly Alone
0,Seoul,Seoul,830,607,48,22739,1.44,15.38,5.8
1,Seoul,Gangnam-gu,38,33,0,3088,4.18,13.17,4.3
2,Seoul,Gangdong-gu,32,27,0,1023,1.54,14.55,5.4
3,Seoul,Gangbuk-gu,21,14,0,628,0.67,19.49,8.5
4,Seoul,Gangseo-gu,56,36,1,1080,1.17,14.39,5.7
5,Seoul,Gwanak-gu,33,22,1,909,0.89,15.12,4.9
6,Seoul,Gwangjin-gu,33,22,3,723,1.16,13.75,4.8
7,Seoul,Guro-gu,34,26,3,741,1.0,16.21,5.7
8,Seoul,Geumcheon-gu,19,18,0,475,0.96,16.15,6.7
9,Seoul,Nowon-gu,66,42,6,952,1.39,15.4,7.4


## Test

# Conclusions

# Recommendations

# Suggestions for Improvement