I worked on this project for a Python elective course during my doctoral studies in economics at PIMES for approximately three months. The project required the inclusion of at least one section on web scraping and data visualization. While many sections of the project were not covered in the course, particularly the libraries I utilized, there were also other course sections that were not included in the project. I selected Covid-19 as the theme for this project, and the data was scraped from the web on April 5th 2023 (the data on covid19project_andreluizcoelho.ipynb on the same repository was scraped on July 30th, 2021). Any data analyzed can be downloaded using the links provided throughout the project.

# Covid-19 

## 1. Webscraping Covid-19 Total and Death Cases

In [1]:
#importing libraries

In [2]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [3]:
#defining the url

In [4]:
url = 'https://www.worldometers.info/coronavirus/'

In [5]:
url

'https://www.worldometers.info/coronavirus/'

In [6]:
#get request to get the raw html content

In [7]:
html_content = requests.get(url).text

In [8]:
html_content

'\n<!DOCTYPE html>\n<!--[if IE 8]> <html lang="en" class="ie8"> <![endif]-->\n<!--[if IE 9]> <html lang="en" class="ie9"> <![endif]-->\n<!--[if !IE]><!-->\n<html lang="en">\n<!--<![endif]-->\n\n\n\n<head>\n    <meta charset="utf-8">\n    <meta http-equiv="X-UA-Compatible" content="IE=edge">\n    <meta name="viewport" content="width=device-width, initial-scale=1">\n\n    <title>COVID - Coronavirus Statistics - Worldometer</title>\n    <meta name="description" content="Daily and weekly updated statistics tracking the number of COVID-19 cases, recovered, and deaths. Historical data with cumulative charts, graphs, and updates.">\n\n\n    \n\t<!-- Favicon -->\n\t<link rel="shortcut icon" href="/favicon/favicon.ico" type="image/x-icon">\n\t<link rel="apple-touch-icon" sizes="57x57" href="/favicon/apple-icon-57x57.png">\n\t<link rel="apple-touch-icon" sizes="60x60" href="/favicon/apple-icon-60x60.png">\n\t<link rel="apple-touch-icon" sizes="72x72" href="/favicon/apple-icon-72x72.png">\n\t<lin

In [9]:
#parsing the html code for the entire site

In [10]:
soup = BeautifulSoup(html_content, 'lxml')

In [11]:
#print the parsed data of html

In [12]:
print(soup.prettify())

<!DOCTYPE html>
<!--[if IE 8]> <html lang="en" class="ie8"> <![endif]-->
<!--[if IE 9]> <html lang="en" class="ie9"> <![endif]-->
<!--[if !IE]><!-->
<html lang="en">
 <!--<![endif]-->
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <title>
   COVID - Coronavirus Statistics - Worldometer
  </title>
  <meta content="Daily and weekly updated statistics tracking the number of COVID-19 cases, recovered, and deaths. Historical data with cumulative charts, graphs, and updates." name="description"/>
  <!-- Favicon -->
  <link href="/favicon/favicon.ico" rel="shortcut icon" type="image/x-icon"/>
  <link href="/favicon/apple-icon-57x57.png" rel="apple-touch-icon" sizes="57x57"/>
  <link href="/favicon/apple-icon-60x60.png" rel="apple-touch-icon" sizes="60x60"/>
  <link href="/favicon/apple-icon-72x72.png" rel="apple-touch-icon" sizes="72x72"/>
  <link href="/favicon/apple-icon-76x

In [13]:
#picking the id of the table to scrape and extract html for only the specific table

In [14]:
covid_table = soup.find('table', attrs = {'id': 'main_table_countries_today'})

In [15]:
#head will form the columns

In [16]:
head = covid_table.thead.find_all('tr') 
#the <table> tag defines an HTML table 
#each table row is defined with a <tr> tag
#A <tr> element contains one or more <th> or <td> elements
#each table data/cell is defined with a <td> tag
#The <th> tag defines a header cell in an HTML table

In [17]:
head #headers are inside this html code

[<tr>
 <th width="1%">#</th>
 <th width="100">Country,<br/>Other</th>
 <th width="20">Total<br/>Cases</th>
 <th width="30">New<br/>Cases</th>
 <th width="30">Total<br/>Deaths</th>
 <th width="30">New<br/>Deaths</th>
 <th width="30">Total<br/>Recovered</th>
 <th width="30">New<br/>Recovered</th>
 <th width="30">Active<br/>Cases</th>
 <th width="30">Serious,<br/>Critical</th>
 <th width="30">Tot Cases/<br/>1M pop</th>
 <th width="30">Deaths/<br/>1M pop</th>
 <th width="30">Total<br/>Tests</th>
 <th width="30">Tests/<br/>
 <nobr>1M pop</nobr>
 </th>
 <th width="30">Population</th>
 <th style="display:none" width="30">Continent</th>
 <th width="30">1 Case<br/>every X ppl</th><th width="30">1 Death<br/>every X ppl</th><th width="30">1 Test<br/>every X ppl</th>
 <th width="30">New Cases/1M pop</th>
 <th width="30">New Deaths/1M pop</th>
 <th width="30">Active Cases/1M pop</th>
 </tr>]

In [18]:
#extracting headers from html to a list

In [19]:
headings = []
for th in head[0].find_all('th'):
    #removing newlines and extra spaces from left and right
    print(th.text)
    headings.append(th.text.replace('\n','').strip())
print(headings)

#
Country,Other
TotalCases
NewCases
TotalDeaths
NewDeaths
TotalRecovered
NewRecovered
ActiveCases
Serious,Critical
Tot Cases/1M pop
Deaths/1M pop
TotalTests
Tests/
1M pop

Population
Continent
1 Caseevery X ppl
1 Deathevery X ppl
1 Testevery X ppl
New Cases/1M pop
New Deaths/1M pop
Active Cases/1M pop
['#', 'Country,Other', 'TotalCases', 'NewCases', 'TotalDeaths', 'NewDeaths', 'TotalRecovered', 'NewRecovered', 'ActiveCases', 'Serious,Critical', 'Tot\xa0Cases/1M pop', 'Deaths/1M pop', 'TotalTests', 'Tests/1M pop', 'Population', 'Continent', '1 Caseevery X ppl', '1 Deathevery X ppl', '1 Testevery X ppl', 'New Cases/1M pop', 'New Deaths/1M pop', 'Active Cases/1M pop']


In [20]:
#extracting the rest of rows with tbody element

In [21]:
body = covid_table.tbody.find_all('tr')

In [22]:
body

[<tr class="total_row_world row_continent" data-continent="North America" style="display: none">
 <td></td>
 <td style="text-align:left;">
 <nobr>North America</nobr>
 </td>
 <td>125,757,735</td>
 <td>+2,930</td>
 <td>1,623,020</td>
 <td>+13</td>
 <td>121,356,872</td>
 <td>+2,821</td>
 <td>2,777,843</td>
 <td>7,061</td>
 <td></td>
 <td></td>
 <td></td>
 <td></td>
 <td></td>
 <td data-continent="North America" style="display:none;">North America</td>
 <!-- 1 Case every X -->
 <td>
 </td>
 <!-- 1 Death every X -->
 <td></td>
 <!-- 1 test every X -->
 <td></td>
 <td></td>
 <td></td>
 <td></td>
 </tr>,
 <tr class="total_row_world row_continent" data-continent="Asia" style="display: none">
 <td></td>
 <td style="text-align:left;">
 <nobr>Asia</nobr>
 </td>
 <td>215,366,500</td>
 <td>+26,242</td>
 <td>1,540,992</td>
 <td>+45</td>
 <td>199,939,630</td>
 <td>+21,591</td>
 <td>13,885,878</td>
 <td>15,557</td>
 <td></td>
 <td></td>
 <td></td>
 <td></td>
 <td></td>
 <td data-continent="Asia" styl

In [23]:
body[0] #the first row for example

<tr class="total_row_world row_continent" data-continent="North America" style="display: none">
<td></td>
<td style="text-align:left;">
<nobr>North America</nobr>
</td>
<td>125,757,735</td>
<td>+2,930</td>
<td>1,623,020</td>
<td>+13</td>
<td>121,356,872</td>
<td>+2,821</td>
<td>2,777,843</td>
<td>7,061</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td data-continent="North America" style="display:none;">North America</td>
<!-- 1 Case every X -->
<td>
</td>
<!-- 1 Death every X -->
<td></td>
<!-- 1 test every X -->
<td></td>
<td></td>
<td></td>
<td></td>
</tr>

In [24]:
#appending the values of rows into a list, since there are lists inside a list here
#declaring empty list data that'll hold all rows data

In [25]:
data = []
for r in range(1, len(body)):
    row = [] #empty list to hold one row data
    for tr in body[r].find_all('td'):
        row.append(tr.text.replace('\n','').strip())
        #appending row data to row after removing newlines escape and triming unnecessary spaces
    data.append(row)

In [26]:
data

[['',
  'Asia',
  '215,366,500',
  '+26,242',
  '1,540,992',
  '+45',
  '199,939,630',
  '+21,591',
  '13,885,878',
  '15,557',
  '',
  '',
  '',
  '',
  '',
  'Asia',
  '',
  '',
  '',
  '',
  '',
  ''],
 ['',
  'Europe',
  '247,975,158',
  '+23,327',
  '2,031,654',
  '+125',
  '243,656,835',
  '+56,925',
  '2,286,669',
  '6,421',
  '',
  '',
  '',
  '',
  '',
  'Europe',
  '',
  '',
  '',
  '',
  '',
  ''],
 ['',
  'South America',
  '68,397,845',
  '+1,535',
  '1,352,933',
  '+1',
  '66,414,624',
  '+3,113',
  '630,288',
  '10,183',
  '',
  '',
  '',
  '',
  '',
  'South America',
  '',
  '',
  '',
  '',
  '',
  ''],
 ['',
  'Oceania',
  '14,018,297',
  '',
  '26,715',
  '',
  '13,838,930',
  '',
  '152,652',
  '61',
  '',
  '',
  '',
  '',
  '',
  'Australia/Oceania',
  '',
  '',
  '',
  '',
  '',
  ''],
 ['',
  'Africa',
  '12,812,815',
  '+285',
  '258,665',
  '',
  '12,080,928',
  '+7',
  '473,222',
  '548',
  '',
  '',
  '',
  '',
  '',
  'Africa',
  '',
  '',
  '',
  '',
  '',

In [27]:
row

['231',
 'China',
 '503,302',
 '',
 '5,272',
 '',
 '379,053',
 '',
 '118,977',
 'N/A',
 '347',
 '4',
 '160,000,000',
 '110,461',
 '1,448,471,400',
 'Asia',
 '2,878',
 '274,748',
 '9',
 '',
 '',
 '82']

In [28]:
#data contains all the rows excluding header
#row contains data for one row

In [29]:
#passing the values on the body as the data and headings as the columns 
#to a DataFrame

In [30]:
#with headings as the columns

In [31]:
df = pd.DataFrame(data, columns = headings)

In [32]:
df

Unnamed: 0,#,"Country,Other",TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",...,TotalTests,Tests/1M pop,Population,Continent,1 Caseevery X ppl,1 Deathevery X ppl,1 Testevery X ppl,New Cases/1M pop,New Deaths/1M pop,Active Cases/1M pop
0,,Asia,215366500,+26242,1540992,+45,199939630,+21591,13885878,15557,...,,,,Asia,,,,,,
1,,Europe,247975158,+23327,2031654,+125,243656835,+56925,2286669,6421,...,,,,Europe,,,,,,
2,,South America,68397845,+1535,1352933,+1,66414624,+3113,630288,10183,...,,,,South America,,,,,,
3,,Oceania,14018297,,26715,,13838930,,152652,61,...,,,,Australia/Oceania,,,,,,
4,,Africa,12812815,+285,258665,,12080928,+7,473222,548,...,,,,Africa,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
233,227,Vatican City,29,,,,29,,0,,...,,,799,Europe,28,,,,,
234,228,Western Sahara,10,,1,,9,,0,,...,,,626161,Africa,62616,626161,,,,
235,229,MS Zaandam,9,,2,,7,,0,,...,,,,,,,,,,
236,230,Tokelau,5,,,,,,5,,...,,,1378,Australia/Oceania,276,,,,,3628


In [33]:
df.shape

(238, 22)

In [34]:
df.head()

Unnamed: 0,#,"Country,Other",TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",...,TotalTests,Tests/1M pop,Population,Continent,1 Caseevery X ppl,1 Deathevery X ppl,1 Testevery X ppl,New Cases/1M pop,New Deaths/1M pop,Active Cases/1M pop
0,,Asia,215366500,26242.0,1540992,45.0,199939630,21591.0,13885878,15557,...,,,,Asia,,,,,,
1,,Europe,247975158,23327.0,2031654,125.0,243656835,56925.0,2286669,6421,...,,,,Europe,,,,,,
2,,South America,68397845,1535.0,1352933,1.0,66414624,3113.0,630288,10183,...,,,,South America,,,,,,
3,,Oceania,14018297,,26715,,13838930,,152652,61,...,,,,Australia/Oceania,,,,,,
4,,Africa,12812815,285.0,258665,,12080928,7.0,473222,548,...,,,,Africa,,,,,,


In [35]:
df.head(10)

Unnamed: 0,#,"Country,Other",TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",...,TotalTests,Tests/1M pop,Population,Continent,1 Caseevery X ppl,1 Deathevery X ppl,1 Testevery X ppl,New Cases/1M pop,New Deaths/1M pop,Active Cases/1M pop
0,,Asia,215366500,26242.0,1540992,45.0,199939630,21591.0,13885878,15557.0,...,,,,Asia,,,,,,
1,,Europe,247975158,23327.0,2031654,125.0,243656835,56925.0,2286669,6421.0,...,,,,Europe,,,,,,
2,,South America,68397845,1535.0,1352933,1.0,66414624,3113.0,630288,10183.0,...,,,,South America,,,,,,
3,,Oceania,14018297,,26715,,13838930,,152652,61.0,...,,,,Australia/Oceania,,,,,,
4,,Africa,12812815,285.0,258665,,12080928,7.0,473222,548.0,...,,,,Africa,,,,,,
5,,,721,,15,,706,,0,0.0,...,,,,,,,,,,
6,,World,684329071,54319.0,6833994,184.0,657288525,84457.0,20206552,39831.0,...,,,,All,,,,,,
7,1.0,USA,106273691,,1155668,,104055782,,1062241,1895.0,...,1174878661.0,3509140.0,334805269.0,North America,3.0,290.0,0.0,,,3173.0
8,2.0,India,44733719,,530916,,44179712,,23091,,...,922033777.0,655491.0,1406631776.0,Asia,31.0,2649.0,2.0,,,16.0
9,3.0,France,39817657,9922.0,165794,58.0,39517646,16280.0,134217,869.0,...,271490188.0,4139547.0,65584518.0,Europe,2.0,396.0,0.0,151.0,0.9,2046.0


In [36]:
df.tail()

Unnamed: 0,#,"Country,Other",TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",...,TotalTests,Tests/1M pop,Population,Continent,1 Caseevery X ppl,1 Deathevery X ppl,1 Testevery X ppl,New Cases/1M pop,New Deaths/1M pop,Active Cases/1M pop
233,227,Vatican City,29,,,,29.0,,0,,...,,,799.0,Europe,28.0,,,,,
234,228,Western Sahara,10,,1.0,,9.0,,0,,...,,,626161.0,Africa,62616.0,626161.0,,,,
235,229,MS Zaandam,9,,2.0,,7.0,,0,,...,,,,,,,,,,
236,230,Tokelau,5,,,,,,5,,...,,,1378.0,Australia/Oceania,276.0,,,,,3628.0
237,231,China,503302,,5272.0,,379053.0,,118977,,...,160000000.0,110461.0,1448471400.0,Asia,2878.0,274748.0,9.0,,,82.0


In [37]:
#to end up with the data only from today, it's needed to remove duplicates, because the data is kept up to three days on the website

In [38]:
data=df[df['#']!=''].reset_index(drop=True)

In [39]:
#the data points with # value are the contries while data points with null values for # columns are features like continents, totals, etc

In [40]:
data

Unnamed: 0,#,"Country,Other",TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",...,TotalTests,Tests/1M pop,Population,Continent,1 Caseevery X ppl,1 Deathevery X ppl,1 Testevery X ppl,New Cases/1M pop,New Deaths/1M pop,Active Cases/1M pop
0,1,USA,106273691,,1155668,,104055782,,1062241,1895,...,1174878661,3509140,334805269,North America,3,290,0,,,3173
1,2,India,44733719,,530916,,44179712,,23091,,...,922033777,655491,1406631776,Asia,31,2649,2,,,16
2,3,France,39817657,+9922,165794,+58,39517646,+16280,134217,869,...,271490188,4139547,65584518,Europe,2,396,0,151,0.9,2046
3,4,Germany,38363343,,171169,,38104500,+6900,87674,,...,122332384,1458359,83883596,Europe,2,490,1,,,1045
4,5,Brazil,37319254,,700556,,36249161,,369537,,...,63776166,296146,215353593,South America,6,307,3,,,1716
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
226,227,Vatican City,29,,,,29,,0,,...,,,799,Europe,28,,,,,
227,228,Western Sahara,10,,1,,9,,0,,...,,,626161,Africa,62616,626161,,,,
228,229,MS Zaandam,9,,2,,7,,0,,...,,,,,,,,,,
229,230,Tokelau,5,,,,,,5,,...,,,1378,Australia/Oceania,276,,,,,3628


In [41]:
data = data.drop_duplicates(subset = ['Country,Other'])

In [42]:
#because the worldmeter reports data for up to three days, counting today to two days back, there's a need to drop duplicates
#when duplicates are removed, the values for the last two days are removed, while today´s values are kept

In [43]:
data.head()

Unnamed: 0,#,"Country,Other",TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",...,TotalTests,Tests/1M pop,Population,Continent,1 Caseevery X ppl,1 Deathevery X ppl,1 Testevery X ppl,New Cases/1M pop,New Deaths/1M pop,Active Cases/1M pop
0,1,USA,106273691,,1155668,,104055782,,1062241,1895.0,...,1174878661,3509140,334805269,North America,3,290,0,,,3173
1,2,India,44733719,,530916,,44179712,,23091,,...,922033777,655491,1406631776,Asia,31,2649,2,,,16
2,3,France,39817657,9922.0,165794,58.0,39517646,16280.0,134217,869.0,...,271490188,4139547,65584518,Europe,2,396,0,151.0,0.9,2046
3,4,Germany,38363343,,171169,,38104500,6900.0,87674,,...,122332384,1458359,83883596,Europe,2,490,1,,,1045
4,5,Brazil,37319254,,700556,,36249161,,369537,,...,63776166,296146,215353593,South America,6,307,3,,,1716


In [44]:
#if some columns are wished to be dropped, as the ones below

In [45]:
cols = ['#', 
        'Tot\xa0Cases/1M pop',
        'Deaths/1M pop',
        'Tests/1M pop', 
        '1 Caseevery X ppl',
        '1 Deathevery X ppl',
        '1 Testevery X ppl']

In [46]:
cols

['#',
 'Tot\xa0Cases/1M pop',
 'Deaths/1M pop',
 'Tests/1M pop',
 '1 Caseevery X ppl',
 '1 Deathevery X ppl',
 '1 Testevery X ppl']

In [47]:
data_final = data.drop(cols, axis=1)

In [48]:
data_final.head()

Unnamed: 0,"Country,Other",TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",TotalTests,Population,Continent,New Cases/1M pop,New Deaths/1M pop,Active Cases/1M pop
0,USA,106273691,,1155668,,104055782,,1062241,1895.0,1174878661,334805269,North America,,,3173
1,India,44733719,,530916,,44179712,,23091,,922033777,1406631776,Asia,,,16
2,France,39817657,9922.0,165794,58.0,39517646,16280.0,134217,869.0,271490188,65584518,Europe,151.0,0.9,2046
3,Germany,38363343,,171169,,38104500,6900.0,87674,,122332384,83883596,Europe,,,1045
4,Brazil,37319254,,700556,,36249161,,369537,,63776166,215353593,South America,,,1716


In [49]:
data_final.tail()

Unnamed: 0,"Country,Other",TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",TotalTests,Population,Continent,New Cases/1M pop,New Deaths/1M pop,Active Cases/1M pop
226,Vatican City,29,,,,29.0,,0,,,799.0,Europe,,,
227,Western Sahara,10,,1.0,,9.0,,0,,,626161.0,Africa,,,
228,MS Zaandam,9,,2.0,,7.0,,0,,,,,,,
229,Tokelau,5,,,,,,5,,,1378.0,Australia/Oceania,,,3628.0
230,China,503302,,5272.0,,379053.0,,118977,,160000000.0,1448471400.0,Asia,,,82.0


In [50]:
data_final

Unnamed: 0,"Country,Other",TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",TotalTests,Population,Continent,New Cases/1M pop,New Deaths/1M pop,Active Cases/1M pop
0,USA,106273691,,1155668,,104055782,,1062241,1895,1174878661,334805269,North America,,,3173
1,India,44733719,,530916,,44179712,,23091,,922033777,1406631776,Asia,,,16
2,France,39817657,+9922,165794,+58,39517646,+16280,134217,869,271490188,65584518,Europe,151,0.9,2046
3,Germany,38363343,,171169,,38104500,+6900,87674,,122332384,83883596,Europe,,,1045
4,Brazil,37319254,,700556,,36249161,,369537,,63776166,215353593,South America,,,1716
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
226,Vatican City,29,,,,29,,0,,,799,Europe,,,
227,Western Sahara,10,,1,,9,,0,,,626161,Africa,,,
228,MS Zaandam,9,,2,,7,,0,,,,,,,
229,Tokelau,5,,,,,,5,,,1378,Australia/Oceania,,,3628


In [51]:
pd.set_option('display.max_rows', None)

In [52]:
data_final

Unnamed: 0,"Country,Other",TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",TotalTests,Population,Continent,New Cases/1M pop,New Deaths/1M pop,Active Cases/1M pop
0,USA,106273691,,1155668.0,,104055782.0,,1062241.0,1895.0,1174878661.0,334805269.0,North America,,,3173.0
1,India,44733719,,530916.0,,44179712.0,,23091.0,,922033777.0,1406631776.0,Asia,,,16.0
2,France,39817657,9922.0,165794.0,58.0,39517646.0,16280.0,134217.0,869.0,271490188.0,65584518.0,Europe,151.0,0.9,2046.0
3,Germany,38363343,,171169.0,,38104500.0,6900.0,87674.0,,122332384.0,83883596.0,Europe,,,1045.0
4,Brazil,37319254,,700556.0,,36249161.0,,369537.0,,63776166.0,215353593.0,South America,,,1716.0
5,Japan,33491480,9500.0,74002.0,21.0,21722070.0,328.0,11695408.0,56.0,98452116.0,125584838.0,Asia,76.0,0.2,93128.0
6,S. Korea,30871740,14465.0,34296.0,7.0,30655934.0,8987.0,181510.0,126.0,15804065.0,51329899.0,Asia,282.0,0.1,3536.0
7,Italy,25695311,,189089.0,,25372554.0,,133668.0,84.0,270012027.0,60262770.0,Europe,,,2218.0
8,UK,24448729,,209396.0,,24217438.0,4283.0,21895.0,,522526476.0,68497907.0,Europe,,,320.0
9,Russia,22679739,8636.0,397420.0,36.0,22049429.0,11807.0,232890.0,,273400000.0,145805947.0,Europe,59.0,0.2,1597.0


In [53]:
pd.set_option('display.max_rows', 100)

In [54]:
data_final

Unnamed: 0,"Country,Other",TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",TotalTests,Population,Continent,New Cases/1M pop,New Deaths/1M pop,Active Cases/1M pop
0,USA,106273691,,1155668,,104055782,,1062241,1895,1174878661,334805269,North America,,,3173
1,India,44733719,,530916,,44179712,,23091,,922033777,1406631776,Asia,,,16
2,France,39817657,+9922,165794,+58,39517646,+16280,134217,869,271490188,65584518,Europe,151,0.9,2046
3,Germany,38363343,,171169,,38104500,+6900,87674,,122332384,83883596,Europe,,,1045
4,Brazil,37319254,,700556,,36249161,,369537,,63776166,215353593,South America,,,1716
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
226,Vatican City,29,,,,29,,0,,,799,Europe,,,
227,Western Sahara,10,,1,,9,,0,,,626161,Africa,,,
228,MS Zaandam,9,,2,,7,,0,,,,,,,
229,Tokelau,5,,,,,,5,,,1378,Australia/Oceania,,,3628


In [55]:
data_final.to_excel('covidcasesdeaths_April5th2023.xlsx', index=False) #index=False saves the file without the first "Unnamed: 0" column, otherwise, there's the need to drop this column every time after openning the file

### Continue on part 2 Covid-19 Vaccines