<a href="https://cognitiveclass.ai/">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0101EN-Coursera/v2/M1_R_Basics/images/IDSNlogo.png" width="200" align="center">
</a>


<h1>Analysis of Global COVID-19 Pandemic Data</h1>

Estimated time needed: **90** minutes


## Overview:

There are 10 tasks in this final project. All tasks will be graded by your peers who are also completing this assignment within the same session.

You need to submit the following the screenshot for the code and output for each task for review.

If you need to refresh your memories about specific coding details, you may refer to previous hands-on labs for code examples.


In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

import warnings
import re

pd.set_option('display.max_columns',None)
pd.set_option('display.max_rows',None)
pd.set_option('display.width', 1000)
pd.set_option('display.float_format','{:.2f}'.format)

Note: if you can import above libraries, please use install.packages() to install them first.


## TASK 1: Get a `COVID-19 pandemic` Wiki page using HTTP request


First, let's write a function to use HTTP request to get a public COVID-19 Wiki page.

Before you write the function, you can open this public page from this 

URL [https://en.wikipedia.org/w/index.php?title=Template:COVID-19_testing_by_country](https://en.wikipedia.org/w/index.php?title=Template:COVID-19_testing_by_country&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-RP0101EN-Coursera-23911160&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-RP0101EN-Coursera-23911160&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) using a web browser.

The goal of task 1 is to get the html page using HTTP request (`httr` library)


In [2]:
  # Our target COVID-19 wiki page URL is: https://en.wikipedia.org/w/index.php?title=Template:COVID-19_testing_by_country  
  # Which has two parts: 
    # 1) base URL `https://en.wikipedia.org/w/index.php  
    # 2) URL parameter: `title=Template:COVID-19_testing_by_country`, seperated by question mark ?
    
  # Wiki page base
  #wiki_base_url <- "https://en.wikipedia.org/w/index.php"
  # You will need to create a List which has an element called `title` to specify which page you want to get from Wiki
  # in our case, it will be `Template:COVID-19_testing_by_country`
 
  # - Use the `GET` function in httr library with a `url` argument and a `query` arugment to get a HTTP response
    
  # Use the `return` function to return the response



In [3]:
url = "https://en.wikipedia.org/w/index.php?title=Template:COVID-19_testing_by_country"

Call the `get_wiki_covid19_page` function to get a http response with the target html page


In [4]:
# Call the get_wiki_covid19_page function and print the response


In [5]:
requests.get(url)

<Response [200]>

## TASK 2: Extract COVID-19 testing data table from the wiki HTML page


On the COVID-19 testing wiki page, you should see a data table `<table>` node contains COVID-19 testing data by country on the page:

<a href="https://cognitiveclass.ai/">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0101EN-Coursera/v2/M5_Final/images/covid-19-by-country.png" width="400" align="center">
</a>

Note the numbers you actually see on your page may be different from above because it is still an on-going pandemic when creating this notebook.

The goal of task 2 is to extract above data table and convert it into a data frame


Now use the `read_html` function in rvest library to get the root html node from response


In [6]:
data  = requests.get(url).text

Get the first table in the HTML root node using `html_node` function


In [7]:
# Get the table node from the root html node
soup = BeautifulSoup(data,"html5lib")

In [8]:
tables = soup.find_all('table')

In [9]:
len(tables)

2

Read the table node as a data frame using `html_table` function


In [10]:
# Read the table node and convert it into a data frame, and print the data frame for review
df = pd.read_html(io=url)[0]
df

Unnamed: 0,Country or region,Date[a],Tested,Units[b],Confirmed(cases),"Confirmed /tested,%","Tested /population,%","Confirmed /population,%",Ref.
0,Afghanistan,17 Dec 2020,154767,samples,49621,32.1,0.40,0.13,[1]
1,Albania,18 Feb 2021,428654,samples,96838,22.6,15.0,3.4,[2]
2,Algeria,2 Nov 2020,230553,samples,58574,25.4,0.53,0.13,[3][4]
3,Andorra,12 Apr 2021,175789,samples,12581,7.2,227,16.2,[5]
4,Angola,12 Mar 2021,399228,samples,20981,5.3,1.3,0.067,[6]
5,Antigua and Barbuda,6 Mar 2021,15268,samples,832,5.4,15.9,0.86,[7]
6,Argentina,20 Apr 2021,10320622,samples,2743620,26.6,22.7,6.0,[8]
7,Armenia,19 Apr 2021,937603,samples,208818,22.3,31.8,7.1,[9]
8,Australia,20 Apr 2021,16413981,samples,29559,0.18,65.4,0.12,[10]
9,Austria,20 Apr 2021,29014550,samples,594057,2.0,326,6.7,[11]


## TASK 3: Pre-process and export the extracted data frame

The goal of task 3 is to pre-process the extracted data frame from the previous step, and export it as a csv file


Let's get a summary of the data frame


In [11]:
# Print the summary of the data frame
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 173 entries, 0 to 172
Data columns (total 9 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   Country or region        173 non-null    object
 1   Date[a]                  173 non-null    object
 2   Tested                   173 non-null    object
 3   Units[b]                 156 non-null    object
 4   Confirmed(cases)         173 non-null    object
 5   Confirmed /tested,%      173 non-null    object
 6   Tested /population,%     173 non-null    object
 7   Confirmed /population,%  173 non-null    object
 8   Ref.                     173 non-null    object
dtypes: object(9)
memory usage: 12.3+ KB


As you can see from the summary, the columns names are little bit different to understand and some column data types are not correct. For example, the `Tested` column shows as `character`. 

As such, the data frame read from HTML table will need some pre-processing such as removing irrelvant columns, renaming columns, and convert columns into proper data types.


We have prepared a pre-processing function for you to conver the data frame but you can also try to write one by yourself


In [12]:
# preprocess_covid_data_frame <- function(data_frame) {
    
#     shape <- dim(data_frame)

#     # Remove the World row
#     data_frame<-data_frame[!(data_frame$`Country or region`=="World"),]
#     # Remove the last row
#     data_frame <- data_frame[1:172, ]
    
#     # We dont need the Units and Ref columns, so can be removed
#     data_frame["Ref."] <- NULL
#     data_frame["Units[b]"] <- NULL
    
#     # Renaming the columns
#     names(data_frame) <- c("country", "date", "tested", "confirmed", "confirmed.tested.ratio", "tested.population.ratio", "confirmed.population.ratio")
    
#     # Convert column data types
#     data_frame$country <- as.factor(data_frame$country)
#     data_frame$date <- as.factor(data_frame$date)
#     data_frame$tested <- as.numeric(gsub(",","",data_frame$tested))
#     data_frame$confirmed <- as.numeric(gsub(",","",data_frame$confirmed))
#     data_frame$'confirmed.tested.ratio' <- as.numeric(gsub(",","",data_frame$`confirmed.tested.ratio`))
#     data_frame$'tested.population.ratio' <- as.numeric(gsub(",","",data_frame$`tested.population.ratio`))
#     data_frame$'confirmed.population.ratio' <- as.numeric(gsub(",","",data_frame$`confirmed.population.ratio`))
    
#     return(data_frame)
# }



Call the `preprocess_covid_data_frame` function


In [13]:
df.columns

Index(['Country or region', 'Date[a]', 'Tested', 'Units[b]', 'Confirmed(cases)', 'Confirmed /tested,%', 'Tested /population,%', 'Confirmed /population,%', 'Ref.'], dtype='object')

In [14]:
df.columns = ['country', 'date', 'tested', 'units',
       'confirmed', 'confirmed.tested.ratio', 'tested.population.ratio',
       'confirmed.population.ratio', 'ref']

In [15]:
df

Unnamed: 0,country,date,tested,units,confirmed,confirmed.tested.ratio,tested.population.ratio,confirmed.population.ratio,ref
0,Afghanistan,17 Dec 2020,154767,samples,49621,32.1,0.40,0.13,[1]
1,Albania,18 Feb 2021,428654,samples,96838,22.6,15.0,3.4,[2]
2,Algeria,2 Nov 2020,230553,samples,58574,25.4,0.53,0.13,[3][4]
3,Andorra,12 Apr 2021,175789,samples,12581,7.2,227,16.2,[5]
4,Angola,12 Mar 2021,399228,samples,20981,5.3,1.3,0.067,[6]
5,Antigua and Barbuda,6 Mar 2021,15268,samples,832,5.4,15.9,0.86,[7]
6,Argentina,20 Apr 2021,10320622,samples,2743620,26.6,22.7,6.0,[8]
7,Armenia,19 Apr 2021,937603,samples,208818,22.3,31.8,7.1,[9]
8,Australia,20 Apr 2021,16413981,samples,29559,0.18,65.4,0.12,[10]
9,Austria,20 Apr 2021,29014550,samples,594057,2.0,326,6.7,[11]


In [16]:
# call `preprocess_covid_data_frame` function and assign it to a new data frame

In [17]:
df = df[0:172]

In [18]:
df

Unnamed: 0,country,date,tested,units,confirmed,confirmed.tested.ratio,tested.population.ratio,confirmed.population.ratio,ref
0,Afghanistan,17 Dec 2020,154767,samples,49621,32.1,0.4,0.13,[1]
1,Albania,18 Feb 2021,428654,samples,96838,22.6,15.0,3.4,[2]
2,Algeria,2 Nov 2020,230553,samples,58574,25.4,0.53,0.13,[3][4]
3,Andorra,12 Apr 2021,175789,samples,12581,7.2,227.0,16.2,[5]
4,Angola,12 Mar 2021,399228,samples,20981,5.3,1.3,0.067,[6]
5,Antigua and Barbuda,6 Mar 2021,15268,samples,832,5.4,15.9,0.86,[7]
6,Argentina,20 Apr 2021,10320622,samples,2743620,26.6,22.7,6.0,[8]
7,Armenia,19 Apr 2021,937603,samples,208818,22.3,31.8,7.1,[9]
8,Australia,20 Apr 2021,16413981,samples,29559,0.18,65.4,0.12,[10]
9,Austria,20 Apr 2021,29014550,samples,594057,2.0,326.0,6.7,[11]


In [19]:
df2 = df.copy()

In [20]:
df2.drop(["units","ref"],axis=1,inplace=True)

In [21]:
df2

Unnamed: 0,country,date,tested,confirmed,confirmed.tested.ratio,tested.population.ratio,confirmed.population.ratio
0,Afghanistan,17 Dec 2020,154767,49621,32.1,0.4,0.13
1,Albania,18 Feb 2021,428654,96838,22.6,15.0,3.4
2,Algeria,2 Nov 2020,230553,58574,25.4,0.53,0.13
3,Andorra,12 Apr 2021,175789,12581,7.2,227.0,16.2
4,Angola,12 Mar 2021,399228,20981,5.3,1.3,0.067
5,Antigua and Barbuda,6 Mar 2021,15268,832,5.4,15.9,0.86
6,Argentina,20 Apr 2021,10320622,2743620,26.6,22.7,6.0
7,Armenia,19 Apr 2021,937603,208818,22.3,31.8,7.1
8,Australia,20 Apr 2021,16413981,29559,0.18,65.4,0.12
9,Austria,20 Apr 2021,29014550,594057,2.0,326.0,6.7


In [22]:
df2["date"] = pd.to_datetime(df2["date"])

In [23]:
df2["tested"] = df2["tested"].astype('int')

In [24]:
df2["confirmed"] = df2["confirmed"].astype('int')

In [25]:
df2["confirmed.tested.ratio"] = df2["confirmed.tested.ratio"].astype('float')

In [26]:
df2["tested.population.ratio"] = df2["tested.population.ratio"].astype('float')

In [27]:
df2["confirmed.population.ratio"] = df2["confirmed.population.ratio"].astype('float')

In [28]:
df2.head()

Unnamed: 0,country,date,tested,confirmed,confirmed.tested.ratio,tested.population.ratio,confirmed.population.ratio
0,Afghanistan,2020-12-17,154767,49621,32.1,0.4,0.13
1,Albania,2021-02-18,428654,96838,22.6,15.0,3.4
2,Algeria,2020-11-02,230553,58574,25.4,0.53,0.13
3,Andorra,2021-04-12,175789,12581,7.2,227.0,16.2
4,Angola,2021-03-12,399228,20981,5.3,1.3,0.07


Get the summary of the processed data frame again


In [29]:
# Print the summary of the processed data frame again
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 172 entries, 0 to 171
Data columns (total 7 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   country                     172 non-null    object        
 1   date                        172 non-null    datetime64[ns]
 2   tested                      172 non-null    int32         
 3   confirmed                   172 non-null    int32         
 4   confirmed.tested.ratio      172 non-null    float64       
 5   tested.population.ratio     172 non-null    float64       
 6   confirmed.population.ratio  172 non-null    float64       
dtypes: datetime64[ns](1), float64(3), int32(2), object(1)
memory usage: 8.2+ KB


After pre-processing, you can see the columns and columns names are simplified, and columns types are converted into correct types.


The data frame has following columns:

-   **country** - The name of the country
-   **date** - Reported date
-   **tested** - Total tested cases by the reported date
-   **confirmed** - Total confirmed cases by the reported date
-   **confirmed.tested.ratio** - The ratio of confirmed cases to the tested cases
-   **tested.population.ratio** - The ratio of tested cases to the population of the country
-   **confirmed.population.ratio** - The ratio of confirmed cases to the population of the country


OK, we can call `write.csv()` function to save the csv file into a file. 


In [30]:
# Export the data frame to a csv file
df2.to_csv("covid.csv",index=False)

Note for IBM Waston Studio, there is no traditional "hard disk" associated with a R workspace.

Even if you call `write.csv()` method to save the data frame as a csv file, it won't be shown in IBM Cloud Object Storage asset UI automatically.

However, you may still check if the `covid.csv` exists using following code snippet:


In [31]:
# # Get working directory
# wd <- getwd()
# # Get exported 
# file_path <- paste(wd, sep="", "/covid.csv")
# # File path
# print(file_path)
# file.exists(file_path)

**Optional Step**: If you have difficulties finishing above webscraping tasks, you may still continue with next tasks by downloading a provided csv file from here:


In [32]:
## Download a sample csv file
# covid_csv_file <- download.file("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0101EN-Coursera/v2/dataset/covid.csv", destfile="covid.csv")
# covid_data_frame_csv <- read.csv("covid.csv", header=TRUE, sep=",")

## TASK 4: Get a subset of the extracted data frame

The goal of task 4 is to get the 5th to 10th rows from the data frame with only `country` and `confirmed` columns selected


In [33]:
df2

Unnamed: 0,country,date,tested,confirmed,confirmed.tested.ratio,tested.population.ratio,confirmed.population.ratio
0,Afghanistan,2020-12-17,154767,49621,32.1,0.4,0.13
1,Albania,2021-02-18,428654,96838,22.6,15.0,3.4
2,Algeria,2020-11-02,230553,58574,25.4,0.53,0.13
3,Andorra,2021-04-12,175789,12581,7.2,227.0,16.2
4,Angola,2021-03-12,399228,20981,5.3,1.3,0.07
5,Antigua and Barbuda,2021-03-06,15268,832,5.4,15.9,0.86
6,Argentina,2021-04-20,10320622,2743620,26.6,22.7,6.0
7,Armenia,2021-04-19,937603,208818,22.3,31.8,7.1
8,Australia,2021-04-20,16413981,29559,0.18,65.4,0.12
9,Austria,2021-04-20,29014550,594057,2.0,326.0,6.7


In [34]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 172 entries, 0 to 171
Data columns (total 7 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   country                     172 non-null    object        
 1   date                        172 non-null    datetime64[ns]
 2   tested                      172 non-null    int32         
 3   confirmed                   172 non-null    int32         
 4   confirmed.tested.ratio      172 non-null    float64       
 5   tested.population.ratio     172 non-null    float64       
 6   confirmed.population.ratio  172 non-null    float64       
dtypes: datetime64[ns](1), float64(3), int32(2), object(1)
memory usage: 8.2+ KB


In [35]:
# Read covid_data_frame_csv from the csv file

# Get the 5th to 10th rows, with two "country" "confirmed" columns
df2.iloc[4:11][['country','confirmed']]

Unnamed: 0,country,confirmed
4,Angola,20981
5,Antigua and Barbuda,832
6,Argentina,2743620
7,Armenia,208818
8,Australia,29559
9,Austria,594057
10,Azerbaijan,301661


## TASK 5: Calculate worldwide COVID testing positive ratio

The goal of task 5 is to get the total confirmed and tested cases worldwide, and try to figure the overall positive ratio using `confirmed cases / tested cases`


In [36]:
# Get the total confirmed cases worldwide
total_confirmed = df2["confirmed"].sum()
print(total_confirmed)
# Get the total tested cases worldwide
total_tested = df2["tested"].sum()
print(total_tested)
# Get the positive ratio (confirmed / tested)
positive_ratio = total_confirmed/total_tested
print(positive_ratio)

136574035
1991573377
0.06857594933596062


## TASK 6: Get a country list which reported their testing data

The goal of task 6 is to get a catalog or sorted list of countries who have reported their COVID-19 testing data


In [37]:
# Get the `country` column
df2["country"]
# Check its class (should be Factor)

# Conver the country column into character so that you can easily sort them

# Sort the countries AtoZ

# Sort the countries ZtoA

# Print the sorted ZtoA list


0                 Afghanistan
1                     Albania
2                     Algeria
3                     Andorra
4                      Angola
5         Antigua and Barbuda
6                   Argentina
7                     Armenia
8                   Australia
9                     Austria
10                 Azerbaijan
11                    Bahamas
12                    Bahrain
13                 Bangladesh
14                   Barbados
15                    Belarus
16                    Belgium
17                     Belize
18                      Benin
19                     Bhutan
20                    Bolivia
21     Bosnia and Herzegovina
22                   Botswana
23                     Brazil
24                     Brunei
25                   Bulgaria
26               Burkina Faso
27                    Burundi
28                   Cambodia
29                   Cameroon
30                     Canada
31                       Chad
32                      Chile
33        

In [38]:
type(df2["country"])

pandas.core.series.Series

In [39]:
df2["country"] = df2["country"].astype('string')

In [40]:
df2["country"].sort_values()

0                 Afghanistan
1                     Albania
2                     Algeria
3                     Andorra
4                      Angola
5         Antigua and Barbuda
6                   Argentina
7                     Armenia
8                   Australia
9                     Austria
10                 Azerbaijan
11                    Bahamas
12                    Bahrain
13                 Bangladesh
14                   Barbados
15                    Belarus
16                    Belgium
17                     Belize
18                      Benin
19                     Bhutan
20                    Bolivia
21     Bosnia and Herzegovina
22                   Botswana
23                     Brazil
24                     Brunei
25                   Bulgaria
26               Burkina Faso
27                    Burundi
28                   Cambodia
29                   Cameroon
30                     Canada
31                       Chad
32                      Chile
33        

In [41]:
df2["country"].sort_values(ascending=False)

171                  Zimbabwe
170                    Zambia
169                   Vietnam
168                 Venezuela
167                Uzbekistan
166                   Uruguay
165             United States
164            United Kingdom
163      United Arab Emirates
162                   Ukraine
161                    Uganda
160                    Turkey
159                   Tunisia
158       Trinidad and Tobago
157                      Togo
156                  Thailand
155                  Tanzania
154                 Taiwan[m]
153            Switzerland[l]
152                    Sweden
151                     Sudan
150                 Sri Lanka
149                     Spain
148               South Sudan
147               South Korea
146              South Africa
145                  Slovenia
144                  Slovakia
143                 Singapore
142                    Serbia
141                   Senegal
140              Saudi Arabia
139                San Marino
138       

In [42]:
print(df2["country"].sort_values(ascending=False))

171                  Zimbabwe
170                    Zambia
169                   Vietnam
168                 Venezuela
167                Uzbekistan
166                   Uruguay
165             United States
164            United Kingdom
163      United Arab Emirates
162                   Ukraine
161                    Uganda
160                    Turkey
159                   Tunisia
158       Trinidad and Tobago
157                      Togo
156                  Thailand
155                  Tanzania
154                 Taiwan[m]
153            Switzerland[l]
152                    Sweden
151                     Sudan
150                 Sri Lanka
149                     Spain
148               South Sudan
147               South Korea
146              South Africa
145                  Slovenia
144                  Slovakia
143                 Singapore
142                    Serbia
141                   Senegal
140              Saudi Arabia
139                San Marino
138       

## TASK 7: Identify countries names with a specific pattern

The goal of task 7 is using a regular expression to find any countires start with `United`


In [43]:
# Use a regular expression `United.+` to find matches

# Print the matched country names


In [44]:
df2["country"].str.findall('United.+')

0                          []
1                          []
2                          []
3                          []
4                          []
5                          []
6                          []
7                          []
8                          []
9                          []
10                         []
11                         []
12                         []
13                         []
14                         []
15                         []
16                         []
17                         []
18                         []
19                         []
20                         []
21                         []
22                         []
23                         []
24                         []
25                         []
26                         []
27                         []
28                         []
29                         []
30                         []
31                         []
32                         []
33        

## TASK 8: Pick two countries you are interested, and then review their testing data

The goal of task 8 is to compare the COVID-19 test data between two countires, you will need to select two rows from the dataframe, and select `country`, `confirmed`, `confirmed-population-ratio` columns


In [45]:
df2.loc[98]

country                                  Malaysia
date                          2021-04-20 00:00:00
tested                                    8718106
confirmed                                  379473
confirmed.tested.ratio                       4.40
tested.population.ratio                     26.60
confirmed.population.ratio                   1.20
Name: 98, dtype: object

In [46]:
df2.loc[143]

country                                 Singapore
date                          2021-04-19 00:00:00
tested                                    9278789
confirmed                                   60865
confirmed.tested.ratio                       0.66
tested.population.ratio                    163.00
confirmed.population.ratio                   1.10
Name: 143, dtype: object

In [47]:
# Select a subset (should be only one row) of data frame based on a selected country name and columns
malaysia = pd.DataFrame(df2.loc[98][['country','confirmed','confirmed.population.ratio']]).T
malaysia

Unnamed: 0,country,confirmed,confirmed.population.ratio
98,Malaysia,379473,1.2


In [48]:
# Select a subset (should be only one row) of data frame based on a selected country name and columns
singapore = pd.DataFrame(df2.loc[143][['country','confirmed','confirmed.population.ratio']]).T
singapore

Unnamed: 0,country,confirmed,confirmed.population.ratio
143,Singapore,60865,1.1


## TASK 9: Compare which one of the selected countries has a larger ratio of confirmed cases to population

The goal of task 9 is to find out which country you have selected before has larger ratio of confirmed cases to population, which may indicate that country has higher COVID-19 infection risk


In [49]:
malaysia.iloc[:,2]

98   1.20
Name: confirmed.population.ratio, dtype: object

In [50]:
malaysia.reset_index(drop=True, inplace=True)

In [51]:
singapore.reset_index(drop=True, inplace=True)

In [52]:
malaysia.reset_index(drop=True) == singapore.reset_index(drop=True)

Unnamed: 0,country,confirmed,confirmed.population.ratio
0,False,False,False


In [53]:
malaysia.iloc[:,2]

0   1.20
Name: confirmed.population.ratio, dtype: object

In [54]:
singapore.iloc[:,2]

0   1.10
Name: confirmed.population.ratio, dtype: object

In [55]:
# Use if-else statement
# if (check which confirmed.population value is greater) {
#    print()
# } else {
#    print()
# }

if 1.20 > 1.10:
    print("Malaysia")
else:
    print("Singapore")

Malaysia


## TASK 10: Find countries with confirmed to population ratio rate less than a threshold

The goal of task 10 is to find out which countries have the confirmed to population ratio less than 1%, it may indicate the risk of those countries are relatively low


In [58]:
# Get a subset of any countries with `confirmed.population.ratio` less than the threshold
df2[df2["confirmed.population.ratio"] < 0.01]

Unnamed: 0,country,date,tested,confirmed,confirmed.tested.ratio,tested.population.ratio,confirmed.population.ratio
27,Burundi,2021-01-05,90019,884,0.98,0.76,0.01
33,China[c],2020-07-31,160000000,87655,0.06,11.1,0.01
53,Fiji,2021-04-17,42492,72,0.17,4.7,0.01
88,Laos,2021-03-01,114030,45,0.04,1.6,0.0
118,North Korea,2020-11-25,16914,0,0.0,0.07,0.0
154,Taiwan[m],2021-04-19,513203,1076,0.21,2.2,0.0
155,Tanzania,2020-11-18,3880,509,13.1,0.01,0.0
169,Vietnam,2021-04-11,2847776,2693,0.1,2.9,0.0
