# Scrape Dashboard Data for COVID Cases 
## in Pinellas County Schools

In this notebook, we develop tools to scrape and analyze the data contained in Pinellas County School's COVID database. The tools include county wide totals, school by school analyses, and data visualization.

## Navigate to the url and click the submit button
This web page provides some searchability of the COVID results during the 2021-2022 school year in Pinellas County Schools. We do not need the searchability functionality of the webpage; we need the data contained in the database. To get the whole database, only only needs to click on the `Submit` button without any filters applied. Then the web page dynamically displays a table. 

The cell below loads packages, sets the correct url, and clicks the `Submit` button. After that, the notebook picks out the table and creates a dataframe. 



In [26]:
#Load packages
import requests
from bs4 import BeautifulSoup
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as WDW
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas as pd
import lxml

#Set URL
URL = 'https://www.pcsb.org/covid19cases'
#Set up selenium web interaction -
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificat-errors')
options.add_argument('--incognito')
#options.add_argument('--headless')         #Operates webpage without viewing through Chrome
driver = webdriver.Chrome("C:/webdrivers/chromedriver.exe")

#Open webpage with webdriver, un-comment --headless argument above if you don't want to 
#view the page. 
driver.get(URL)

# Wait for the page to fully load
driver.implicitly_wait(5)

#Now that the web page is open and operable, we need to click on the submit
#button. Clicking on the search button allows us to get all of the data in a table. 

submit_button = driver.find_element_by_class_name('ui-btn-general-primary').click()


## Scraping the table into a df

Once the `Submit` button has been clicked with no filters, all the data are in the table. However, the table is split into different pages. 

Next, define two functions. One takes the data from the table on the present page. The other clicks the next page button using selenium. We cycle these functions through all of the buttons. 

In [111]:
#Define functions to get table and to go to the next page
def get_table(driver):
    #Access table on each page
    soup = BeautifulSoup(driver.page_source, 'lxml')
    table = soup.find_all('table')

    #read the table
    new_df = pd.read_html(str(table))
    
    return new_df[0]

def cycle_table_pages(driver):
    #Count the number of pages
    table_pages = driver.find_element_by_xpath('//*[@id="ui-paging-container"]/ul/li[2]/a')

covid_df = get_table(driver)

print(covid_df.head())

         Date              Locations affected  Number of positive employees  \
0  2021/08/12                 Nina Harris ESE                             0   
1  2021/08/12     Northwest Elementary School                             0   
2  2021/08/12         Oak Grove Middle School                             0   
3  2021/08/12      Oakhurst Elementary School                             0   
4  2021/08/12  Orange Grove Elementary School                             0   

   Number of positive students  
0                            1  
1                            2  
2                            3  
3                            1  
4                            5  


In [112]:
#Figure out how to loop through the paging buttons dynamically, i.e. not knowing how 
#many there are (it changes every day).
data = pd.DataFrame([])

paging_buttons = driver.find_element(By.ID, 'ui-paging-container').text
for number in paging_buttons.split('\n'):
    driver.implicitly_wait(1)
    if number != '...':
        data.append(get_table(driver))
        print(data.head())
        new_xpath = '//*[@id="ui-paging-container"]/ul/li[' + number + ']/a'
        button_to_click = driver.find_element_by_xpath(new_xpath).click()
        print(new_xpath)
        




Empty DataFrame
Columns: []
Index: []


ElementClickInterceptedException: Message: element click intercepted: Element <a class="ui-page-number-current-span" role="button" aria-disabled="true" href="javascript:;" aria-label="Current Page ...">1</a> is not clickable at point (173, 707). Other element would receive the click: <div id="previewTabIcon" class="previewTabIcon tabcustompinellastextcolor"></div>
  (Session info: chrome=92.0.4515.131)


In [114]:
#Slice the dataframe for the data

print(covid_df.groupby(['Date']).sum())

            Number of positive employees  Number of positive students
Date                                                                 
2021/08/12                             5                           34
