**Objective**: Scrape license data from the [Arkansas StateBoard of Nursing website]( https://www.ark.org/arsbn/statuswatch/index.php/nurse/search) for all nurses who have names that contain 'zach'.
1. Automate search for 'zach'.
2. Loop through pages
    - Keep hitting 'Next >' until that button is no longer there (try/except)
3. Loop through rows
    - Click name, retrieve element info, go bac to search results
4. Organize data into dataframe and export as .csv

In [1]:
import requests # raw data
from selenium import webdriver # webpage automation
import pandas as pd # dataframe organization

In [2]:
# Open chrome and navigate to page
browser = webdriver.Chrome('C:/webdrivers/chromedriver')
url = 'https://www.ark.org/arsbn/statuswatch/index.php/nurse/search'
browser.get(url)

In [3]:
# Search for 'zach'
search = browser.find_element_by_xpath('//*[@id="nurse_search"]/table/tbody/tr[3]/td[2]/input')
search.send_keys('zach')
searchButton = browser.find_element_by_xpath('//*[@id="nurse_search"]/table/tbody/tr[5]/td[2]/input')
searchButton.click()

In [4]:
# Find how many pages of results there are to see how many times to click 'Next >'
click_next_count = len(browser.find_elements_by_xpath('//*[@id="main_content"]/div[2]/center/a'))

In [5]:
# Initialize dataframe
column_names = ['License Number', 'Name', 'License Status', 'License Type', 'Expiration Date']
data=pd.DataFrame(columns = column_names)

In [6]:
# Scrape by looping by row, then by page.
for page in range(0, click_next_count):
    for row in range(1, len(browser.find_elements_by_xpath('//*[@id="main_content"]/table/tbody/tr')) + 1):
        rowxpath = '//*[@id="main_content"]/table/tbody/tr[' + str(row) + ']/td[1]/a'
        browser.find_element_by_xpath(rowxpath).click() # Clicks on next name
        data = data.append(pd.DataFrame([browser.find_element_by_xpath('//*[@id="main_content"]/div[4]/h2').text[11:],
                   browser.find_element_by_xpath('//*[@id="nurse_watch_view"]/div/table/tbody/tr/td[1]/div/strong').text,
                   browser.find_element_by_xpath('//*[@id="main_content"]/div[4]/table/tbody/tr[1]/td[2]').text,
                   browser.find_element_by_xpath('//*[@id="main_content"]/div[4]/table/tbody/tr[2]/td[2]').text,
                   browser.find_element_by_xpath('//*[@id="main_content"]/div[4]/table/tbody/tr[6]/td[2]').text], index = data.columns).T)
        browser.back() # Back to page with list of names
    try:
        browser.find_element_by_partial_link_text('Next >').click()
    except:
        print('End of search')

End of search


In [7]:
data.index = range(1,len(data)+1)
data.to_csv('license_data_zach.csv')
pd.read_csv('license_data_zach.csv', index_col = 0) # Check

Unnamed: 0,License Number,Name,License Status,License Type,Expiration Date
1,R088591,ZACHARY T HUNTER BRANSCUM,Active,Registered Nurse (RN),01-31-2020
2,RTP-018807,ZACHARY MATTHEW LEE BROWN,Null & Void,Temporary Registered Nurse Permit,06-29-2016
3,C000800,ZACHARY KIM BURNETT,Expired,Certified Registered Nurse Anesthetist (CRNA),10-01-1994
4,Temporary Permit(Temporary Registered Nurse Pe...,ZACHARY WYATT CAGLE,Null & Void,Temporary Registered Nurse Permit,06-23-2006
5,RTP-008989,ZACHARY TODD CALVERT,Null & Void,Temporary Registered Nurse Permit,04-14-2011
6,RTP-017195,ZACHARY DILLARD COX,Null & Void,Temporary Registered Nurse Permit,09-02-2015
7,R093702,ZACHARY ALAN DAVIS,Active,Registered Nurse (RN),08-31-2020
8,L049670,ZACHARY AARON ESTES,Active,Licensed Practical Nurse (LPN),12-31-2019
9,R105466,ZACHARY JAMES GEILING,Active,Registered Nurse (RN),12-31-2019
10,R086376,ZACHARY WILLIAM GODWIN,Active,Registered Nurse (RN),07-31-2019
