**Objective**: Scrape license data from the [Arkansas StateBoard of Nursing website]( https://www.ark.org/arsbn/statuswatch/index.php/nurse/search) for all nurses who have names that contain 'zach'. Scrape all licenses, not only only the first of each name.

1. Automate search for 'zach'.
2. Loop through pages
    - Keep hitting 'Next >' until that button is no longer there (try/except)
3. Loop through rows
    - Click name, retrieve element info, go bac to search results
4. Loop through licenses on each nurse's link
    - Instead of starting xpath from main content, start from the class associated with license tables
4. Organize data into dataframe and export as .csv

In [1]:
import requests # raw data
from selenium import webdriver # webpage automation
import pandas as pd # dataframe organization

In [2]:
# Open chrome and navigate to page
browser = webdriver.Chrome('C:/webdrivers/chromedriver')
url = 'https://www.ark.org/arsbn/statuswatch/index.php/nurse/search'
browser.get(url)

In [3]:
# Search for 'zach'
search = browser.find_element_by_xpath('//*[@id="nurse_search"]/table/tbody/tr[3]/td[2]/input')
search.send_keys('zach')
searchButton = browser.find_element_by_xpath('//*[@id="nurse_search"]/table/tbody/tr[5]/td[2]/input')
searchButton.click()

In [4]:
# Find how many pages of results there are to see how many times to click 'Next >'
click_next_count = len(browser.find_elements_by_xpath('//*[@id="main_content"]/div[2]/center/a'))

In [5]:
# Initialize dataframe
column_names = ['License Number', 'Name', 'License Status', 'License Type', 'Expiration Date']
data = pd.DataFrame(columns = column_names)

In [6]:
for page in range(0, click_next_count):
    
    for row in range(1, len(browser.find_elements_by_xpath('//*[@id="main_content"]/table/tbody/tr')) + 1):
        
        # Clicks on next name
        rowxpath = '//*[@id="main_content"]/table/tbody/tr[' + str(row) + ']/td[1]/a'
        browser.find_element_by_xpath(rowxpath).click() 
        
        # Finds number of licenses associated with this name
        lic_qty = len(browser.find_elements_by_xpath("//div[@class = 'license_table box form']"))
        
        # Loop through licenses per nurse page
        for x in range(0,lic_qty): 
            data = data.append(pd.DataFrame([
                browser.find_elements_by_xpath('//*[@class="license_table box form"]/h2')[x].text, # licnum
                browser.find_element_by_xpath('//*[@id="nurse_watch_view"]/div/table/tbody/tr/td[1]/div/strong').text, # name
                browser.find_elements_by_xpath('//*[@class="license_table box form"]/table/tbody/tr[1]/td[2]')[x].text, # status
                browser.find_elements_by_xpath('//*[@class="license_table box form"]/table/tbody/tr[2]/td[2]')[x].text, # type
                browser.find_elements_by_xpath('//*[@class="license_table box form"]/table/tbody/tr[6]/td[2]')[x].text # expdate
            ], index = data.columns).T)
        
        # Back to page with list of names
        browser.back() 
    
    try:
        browser.find_element_by_partial_link_text('Next >').click()
    except:
        print('End of search')

End of search


In [7]:
data.index = range(1,len(data)+1)
data.to_csv('license_data_zach_v2.csv')
pd.read_csv('license_data_zach_v2.csv', index_col = 0) # Check

Unnamed: 0,License Number,Name,License Status,License Type,Expiration Date
1,License #: R088591,ZACHARY T HUNTER BRANSCUM,Active,Registered Nurse (RN),01-31-2020
2,License #: RTP-009225,ZACHARY T HUNTER BRANSCUM,Null & Void,Temporary Registered Nurse Permit,06-27-2011
3,License #: RTP-018807,ZACHARY MATTHEW LEE BROWN,Null & Void,Temporary Registered Nurse Permit,06-29-2016
4,License #: R101512,ZACHARY MATTHEW LEE BROWN,Active,Registered Nurse (RN),03-31-2020
5,License #: C000800,ZACHARY KIM BURNETT,Expired,Certified Registered Nurse Anesthetist (CRNA),10-01-1994
6,License #: R039861,ZACHARY KIM BURNETT,Inactive,Registered Nurse (RN),03-31-1996
7,License #: Temporary Permit(Temporary Register...,ZACHARY WYATT CAGLE,Null & Void,Temporary Registered Nurse Permit,06-23-2006
8,License #: R076951,ZACHARY WYATT CAGLE,Probation,Registered Nurse (RN),06-30-2019
9,License #: RTP-008989,ZACHARY TODD CALVERT,Null & Void,Temporary Registered Nurse Permit,04-14-2011
10,License #: R088351,ZACHARY TODD CALVERT,Active,Registered Nurse (RN),06-30-2020
