# Scraping the CDC Wonder Database

This method uses Selenium.

Selenium tutorials:  
http://www.eyalfrank.com/scraping-web-forms-with-selenium/
https://realpython.com/modern-web-automation-with-python-and-selenium/
https://medium.com/the-andela-way/introduction-to-web-scraping-using-selenium-7ec377a8cf72

https://stackoverflow.com/questions/48286382/extract-option-values-from-drop-down-menu-using-selenium-beautiful-soup-pytho

NOTE: This method requires ChromeDriver: 
https://sites.google.com/a/chromium.org/chromedriver/downloads
 
Extract the chromedriver executable to a folder in the python path, which you can find by running:

```echo $PATH```

In [1]:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait

### Getting past the agreement page

In [2]:
# The place we will direct our WebDriver to
url = 'https://wonder.cdc.gov/ucd-icd10.html'
# Creating the WebDriver object using the ChromeDriver
driver = webdriver.Chrome()
# Directing the driver to the defined url
driver.get(url)
# Get the ID of the "I Agree" Button
agree_tab = driver.find_element_by_name('action-I Agree')

In [3]:
# "Clicking" on the "I Agree" Button
agree_tab.click()

## DEFINING THE ELEMENTS OF THE WEBFORM

### 1. Table Layout

In [None]:
# First, define the drivers for the 5 "group by" pulldown menus
group_by_1 = Select(driver.find_element_by_name('B_1'))
group_by_2 = Select(driver.find_element_by_name('B_2'))
group_by_3 = Select(driver.find_element_by_name('B_3'))
group_by_4 = Select(driver.find_element_by_name('B_4'))
group_by_5 = Select(driver.find_element_by_name('B_5'))

In [None]:
# Example: choose two values for a couple of menus
group_by_1.select_by_visible_text('Gender')
group_by_5.select_by_visible_text('Place of Death')

In [None]:
# Which variables do I want to group by?

### 2. Location

In [None]:
# Gather information about each possible location in the menu.
# We want every county in the country
driver.find_element_by_name('finder-action-D76.V9-Close All').click()
loc_driver = driver.find_element_by_name('F_D76.V9')
loc_select = Select(loc_driver)
loc_select.deselect_all()
# Get a list of all the states
states = [loc.text for loc in loc_driver.find_elements_by_tag_name('option')]
states = states[1:]

# Now build the list of counties
# If you start getting errors about stale references, etc., try asking the driver to wait after
# actions that require the browser to reload (e.g., button presses)
#     driver.implicitly_wait(0.5)
counties = []
for state in states:
    print(state)
    # select each state
    Select(driver.find_element_by_name('F_D76.V9')).select_by_visible_text(state)

    # Expand it to counties
    driver.find_element_by_name('finder-action-D76.V9-Open').click()
    
    # Grab the new counties and add them to the list of counties
    all_loc = [loc.text for loc in driver.find_element_by_name('F_D76.V9').find_elements_by_tag_name('option')]
    cur_counties = [i for i in all_loc if 'County' in i]
    counties.append(cur_counties)
    
    # Collapse and deselect the current state
    driver.find_element_by_name('finder-action-D76.V9-Close All').click()
    Select(driver.find_element_by_name('F_D76.V9')).deselect_all()

print(counties)

# TO DO: save the counties variable as a separate cache file so you don't have to keep scraping
# this during the design phase.

### 3. Demographics

### 4. Year and month

In [None]:
# # Year and month "currently selected" field
# ym_field = driver.find_element_by_name('I_D76.V1')
# # List of month/year
# date_list = driver.find_element_by_name('F_D76.V1')

# # Open (expand) button
# date_open_button = driver.find_element_by_name('finder-action-D76.V1-Open')
# # Close (collapse) button
# date_close_button = driver.find_element_by_name('finder-action-D76.V1-Close')
# # Close (collapse) all button
# date_closeALL_button = driver.find_element_by_name('finder-action-D76.V1-Close All')

In [None]:
# The range of years is from 1999-2016
# Example: select the month 2009
month_select = Select(driver.find_element_by_name('F_D76.V1'))
month_select.deselect_all()
month_select.select_by_value('2009')

### 5. Weekday, autopsy, place of death

### 6. Cause of death

### 7. Other options

In [None]:
# Define the 'Send' Button (this will execute the query)
send_button = driver.find_element_by_name('action-Send')

In [None]:
driver.quit()