# Data Gathering  
&nbsp; The city hall of Ubatuba, on the northern coast of the state of São Paulo, provides daily bulletins containing information regarding the situation of the COVID-19 pandemic. These documents are in PDF file format and can be downloaded directly through  
the website at: [https://www.ubatuba.sp.gov.br/covid-19/](https://www.ubatuba.sp.gov.br/covid-19/)  
  
&nbsp;Downloading a large number of files can take a long time, so I developed this script to help me accomplish this task automatically.

In [1]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import time

&nbsp;The Selenium framework provides ways to automate online tasks by allowing websites opened in a browser to be accessed via script.
&nbsp;So that the files are downloaded when clicked, instead of being opened in a new tab, it is possible to pass guidelines, through a dictionary, to "Options", during the "web driver" object's creation.

In [2]:
#Set up the web driver

options = Options()
options.add_argument("--start-maximized")
options.headless = False
profile = {
"download.default_directory": "./Pdf_Files", #Change default directory for downloads
"download.prompt_for_download": False, #To auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True, #It will not show PDF directly in chrome
'profile.default_content_setting_values.automatic_downloads': 1
}
options.add_experimental_option("prefs", profile)
path = "./SeleniumWebDriver/chromedriver"
b = webdriver.Chrome(executable_path=path, options=options)
wait = WebDriverWait(b, 10, poll_frequency=1)
time.sleep(2)

In [3]:
#Access the web page and wait for it to load

b.get('https://www.ubatuba.sp.gov.br/covid-19/')
time.sleep(3)

In [4]:
#Click at all files buttons loaded on the page, then click the next page button. 
#Raise an exception when it's over.

while True:
    try:
        #Files buttons are inside "table > tbody > tr" element tags.  
        
        table = wait.until(EC.visibility_of_element_located((By.TAG_NAME, 'table')))
        tbody = table.find_element(By.TAG_NAME, 'tbody')
        for row in tbody.find_elements(By.TAG_NAME, 'tr'):
            bttn = row.find_element(By.TAG_NAME, 'a')
            b.execute_script("arguments[0].click()", bttn)
            time.sleep(2)
            
        next_bttn = b.find_element(By.XPATH, '//*[@class="paginate_button next"]')
        b.execute_script("arguments[0].click()",next_bttn)
        time.sleep(1)
    except Exception as e:
        print(e)
        break

Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@class="paginate_button next"]"}
  (Session info: chrome=96.0.4664.45)
Stacktrace:
#0 0x564af34eaac3 <unknown>
#1 0x564af2fc48f8 <unknown>
#2 0x564af2ffa6fa <unknown>
#3 0x564af302d004 <unknown>
#4 0x564af3017b2d <unknown>
#5 0x564af302aca1 <unknown>
#6 0x564af30179f3 <unknown>
#7 0x564af2fede14 <unknown>
#8 0x564af2feee05 <unknown>
#9 0x564af351c25e <unknown>
#10 0x564af3531afa <unknown>
#11 0x564af351d1b5 <unknown>
#12 0x564af35334c8 <unknown>
#13 0x564af351195b <unknown>
#14 0x564af354e298 <unknown>
#15 0x564af354e418 <unknown>
#16 0x564af3569bed <unknown>
#17 0x7f2c281b3609 <unknown>



The "Pdf_Files" folder was populated with 689 files.  
Along with the bulletins that I need for analysis, other types of files, such as decrees, guidelines, and resolutions, were downloaded.