## Author: Dere Abdulhameed Abiola
## Web Scraping of Nigeria COVID-19 cases

It is assumed that the user/reader as at this point has the necessary libraries installed. If not, you can use the 
pip install beautifulsoup4 and pip install requests.

However, if you are using anaconda, then, perhaps, it is already installed

### Objective: To scrape the data of COVID-19 cases from their website and convert it into a format that in understandable by the computer

In [1]:
# import the required libraries
from bs4 import BeautifulSoup
import requests
from csv import writer

In [2]:
# url of the ncdc microsite
url = 'https://covid19.ncdc.gov.ng/'

In [3]:
# instantiate a variable
page = requests.get(url)
page

<Response [200]>

In [4]:
# Get the content of the html page
covid = BeautifulSoup(page.content, 'html.parser')
covid

<!DOCTYPE html>

<html lang="en">
<meta content="text/html;charset=utf-8" http-equiv="content-type"/>
<head>
<title>NCDC Coronavirus COVID-19 Microsite</title>
<!--[if lt IE 11]>
    	<script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>
    	<script src="https://oss.maxcdn.com/libs/respond.js/1.4.2/respond.min.js"></script>
    	<![endif]-->
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1.0, user-scalable=0, minimal-ui" name="viewport"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible">
<meta content="" name="description">
<meta content="" name="keywords"/>
<meta content="Codedthemes" name="author">
<!-- Google Tag Manager -->
<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
  new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
  j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
  'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
  })(w

In [5]:
# Since we are interested in scraping the table, we need to find the section for table in the script
table = covid.find('table')
table

<table id="custom1">
<thead>
<tr>
<th>States Affected</th>
<th>No. of Cases (Lab Confirmed)</th>
<th>No. of Cases (on admission)</th>
<th>No. Discharged</th>
<th>No. of Deaths</th>
</tr>
</thead>
<tbody>
<tr>
<td>
                                                Lagos
                                            </td>
<td>98,293
                                            </td>
<td>17,458
                                            </td>
<td>80,066
                                            </td>
<td>769
                                            </td>
</tr>
<tr>
<td>
                                                FCT
                                            </td>
<td>28,150
                                            </td>
<td>1,154
                                            </td>
<td>26,749
                                            </td>
<td>247
                                            </td>
</tr>
<tr>
<td>
                                                Rivers
            

In [6]:
rows = table.find_all("tr")

In [7]:
for row in rows[1:]:
    cells = row.find_all(['td','th'])
    
    cells_text = [cell.get_text(strip = True) for cell in cells]
    print(cells_text)

['Lagos', '98,293', '17,458', '80,066', '769']
['FCT', '28,150', '1,154', '26,749', '247']
['Rivers', '16,389', '144', '16,091', '154']
['Kaduna', '11,175', '29', '11,058', '88']
['Plateau', '10,227', '3', '10,149', '75']
['Oyo', '10,185', '355', '9,629', '201']
['Edo', '7,658', '16', '7,322', '320']
['Ogun', '5,795', '21', '5,692', '82']
['Delta', '5,312', '31', '5,170', '111']
['Ondo', '5,124', '344', '4,673', '107']
['Kano', '4,915', '92', '4,696', '127']
['Akwa Ibom', '4,625', '87', '4,494', '44']
['Kwara', '4,541', '302', '4,175', '64']
['Gombe', '3,265', '115', '3,086', '64']
['Osun', '3,261', '67', '3,102', '92']
['Enugu', '2,952', '13', '2,910', '29']
['Anambra', '2,743', '46', '2,678', '19']
['Nasarawa', '2,697', '313', '2,345', '39']
['Imo', '2,426', '117', '2,252', '57']
['Katsina', '2,399', '23', '2,339', '37']
['Abia', '2,152', '6', '2,112', '34']
['Benue', '2,129', '340', '1,764', '25']
['Ebonyi', '2,064', '28', '2,004', '32']
['Ekiti', '1,978', '34', '1,916', '28']
['Bau

In [9]:
# export into a csv file
with open('covid_19.csv', 'w', encoding='utf8', newline='') as f:
    thewriter = writer(f)
    header = ['States Affected', 'No of cases lab','No of cases','No Discharged','No of Deaths']
    thewriter.writerow(header)
    
    for row in rows[1:]:
        cells = row.find_all(['td','th'])
    
        cells_text = [cell.get_text(strip = True) for cell in cells]
        thewriter.writerow(cells_text)