# Texas Cosmetologist Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for cosmetologists!

## Setup: Import what you'll need to scrape the page

We'll be using Selenium for this, *not* BeautifulSoup and requests.

In [184]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait

In [185]:
driver = webdriver.Chrome()

In [186]:
driver.get('https://www.tdlr.texas.gov/cimsfo/fosearch.asp')

## Starting your search

Starting from [here](https://www.tdlr.texas.gov/cimsfo/fosearch.asp), search for cosmetologist violations for people with the last name **Nguyen**.

In [187]:
dropdown = Select(driver.find_element_by_name("pht_status"))
dropdown.select_by_visible_text('Cosmetologists')

In [188]:
last_name = driver.find_element_by_name('pht_lnm')

In [189]:
last_name.send_keys('Nguyen')

In [190]:
button = driver.find_element_by_name('B1')
driver.execute_script("arguments[0].scrollIntoView(true)", button)
button.click()

## Scraping

Once you are on the results page, do this.

### Loop through each result and print the entire row

Okay wait, that's a heck of a lot. Use `[:10]` to only do the first ten (`listname[:10]` gives you the first ten).

In [191]:

rows = driver.find_elements_by_tag_name('tr')
for row in rows[:10]:
    print(row.text)


Name and Location Order Basis for Order
NGUYEN, TOAN HUU
City: SAN ANTONIO
County: BEXAR
Zip Code: 78217


License #(s): 780948, 1706491, 1699123

Complaint # COS20180004289 Date: 5/30/2018

Respondent is assessed an administrative penalty in the amount of $500. Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day.
NGUYEN, HANH CONG
City: EL PASO
County: EL PASO
Zip Code: 79934


License #: 737708

Complaint # COS20180006594 Date: 5/30/2018

Respondent is assessed an administrative penalty in the amount of $1,000. Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day; Respondent failed to use items subject to possible cross contamination in a manner that does not contaminate the remaining product.
NGUYEN, KHIEM VAN
City: LONGVIEW
County: GREGG
Zip Code: 75604


License #: 731665

Complaint # COS20180000257 Date: 5/17/2018

Respondent is assessed an administrative penalty in the amount of $1,250. Responde

### Loop through each result and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen?! If you want to ignore an error, you use code like this:

```python
try:
   try to do something
except:
   print("It didn't work')
```

It should help you out. If you don't want to print anything, you can type `pass` instead of the `print` statement.

**Why doesn't the first one have a name?**

In [192]:
# for row in rows[1:]:
#     print(row.text)

In [193]:
# for row in rows[1:]:
#     cell = row.find_element_by_tag_name('td')
#     name = cell.find_element_by_tag_name('span')
#     print(name.text)  

## Loop through each result, printing each violation description ("Basis for order")

> - *Tip: You'll get an error even if you're ALMOST right - which row is causing the problem?*
> - *Tip: You can get the HTML of something by doing `.get_attribute('innerHTML')` - it might help you diagnose your issue.*
> - *Tip: Or I guess you could just skip the one with the problem...

In [194]:
# for row in rows[1:]:
#     cells = row.find_elements_by_tag_name('td')   
#     info = cells[-1]
#     print(info.text)

## Loop through each result, printing the complaint number

- TIP: Think about the order of the elements

In [195]:
# for row in rows[1:]:
#     cell = row.find_element_by_tag_name('td')
#     numbers = cell.find_elements_by_tag_name('span')
#     number = numbers[-1]
#     print(number.text)
    

## Saving the results

### Loop through each result to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number
- License Numbers
- Zip Code
- County
- City

Create a new dictionary for each result (except the header).

> *Tip: If you want to ask for the "next sibling," you can't use `find_next_sibling` in Selenium, you need to use `element.find_element_by_xpath("following-sibling::div")` to find the next div, or `element.find_element_by_xpath("following-sibling::*")` to find the next anything.

In [196]:
rows = driver.find_elements_by_tag_name('tr')
records = []

for row in rows[1:30]:
    record_dict = {}
    cells = row.find_elements_by_tag_name('span')
    lines = row.find_elements_by_tag_name('td')
    
    record_dict['name'] = cells[0].text
    record_dict['violation_num'] = cells[-1].text
    record_dict['license'] = cells[-3].text
    record_dict['code'] = cells[6].text
    record_dict['county'] = cells[4].text
    record_dict['city'] = cells[2].text
    record_dict['description'] = lines[-1].text
    
#     record_dict['description'] = cells[-1].text  
       
#     print(record_dict)
    records.append(record_dict)
print(records)

[{'name': 'NGUYEN, TOAN HUU', 'violation_num': '5/30/2018', 'license': 'COS20180004289', 'code': '78217', 'county': 'BEXAR', 'city': 'SAN ANTONIO', 'description': 'Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day.'}, {'name': 'NGUYEN, HANH CONG', 'violation_num': '5/30/2018', 'license': 'COS20180006594', 'code': '79934', 'county': 'EL PASO', 'city': 'EL PASO', 'description': 'Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day; Respondent failed to use items subject to possible cross contamination in a manner that does not contaminate the remaining product.'}, {'name': 'NGUYEN, KHIEM VAN', 'violation_num': '5/17/2018', 'license': 'COS20180000257', 'code': '75604', 'county': 'GREGG', 'city': 'LONGVIEW', 'description': 'Respondent failed to follow whirlpool foot spas cleaning and sanitization procedures as required; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements








### Save that to a CSV

- Tip: You'll want to use pandas here

In [197]:
import pandas as pd

In [198]:
df = pd.DataFrame(records)
df.head()

Unnamed: 0,city,code,county,description,license,name,violation_num
0,SAN ANTONIO,78217,BEXAR,Respondent failed to clean and sanitize whirlp...,COS20180004289,"NGUYEN, TOAN HUU",5/30/2018
1,EL PASO,79934,EL PASO,Respondent failed to clean and sanitize whirlp...,COS20180006594,"NGUYEN, HANH CONG",5/30/2018
2,LONGVIEW,75604,GREGG,Respondent failed to follow whirlpool foot spa...,COS20180000257,"NGUYEN, KHIEM VAN",5/17/2018
3,HOUSTON,77014,HARRIS,"Respondent failed to disinfect tools, implemen...",COS20180004915,"NGUYEN, DIEP THI NGOC",5/17/2018
4,SAN ANTONIO,78255,BEXAR,"Respondent failed to clean, disinfect, and ste...",COS20180009255,"NGUYEN, LAN T-THUY",5/17/2018


In [199]:
df = df[['name', 'description', 'violation_num', 'code', 'county', 'city']]

In [200]:
df.shape

(29, 6)

In [201]:
df.to_csv("scraping_texas_results_30.csv", index=False)

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [202]:
pd.read_csv("scraping_texas_results_30.csv")

Unnamed: 0,name,description,violation_num,code,county,city
0,"NGUYEN, TOAN HUU",Respondent failed to clean and sanitize whirlp...,5/30/2018,78217,BEXAR,SAN ANTONIO
1,"NGUYEN, HANH CONG",Respondent failed to clean and sanitize whirlp...,5/30/2018,79934,EL PASO,EL PASO
2,"NGUYEN, KHIEM VAN",Respondent failed to follow whirlpool foot spa...,5/17/2018,75604,GREGG,LONGVIEW
3,"NGUYEN, DIEP THI NGOC","Respondent failed to disinfect tools, implemen...",5/17/2018,77014,HARRIS,HOUSTON
4,"NGUYEN, LAN T-THUY","Respondent failed to clean, disinfect, and ste...",5/17/2018,78255,BEXAR,SAN ANTONIO
5,"NGUYEN, TUAN A",Respondent failed to clean and disinfect all w...,5/9/2018,78723,TRAVIS,AUSTIN
6,"NGUYEN, THAO B",Respondent failed to clean and sanitize whirlp...,5/9/2018,76039,TARRANT,EULESS
7,"NGUYEN, BETH MARIA",The Respondent's license was revoked upon Resp...,4/30/2018,77083,HARRIS,HOUSTON
8,"NGUYEN, TRUNG N","Respondent failed to clean, disinfect, and ste...",4/25/2018,79106,POTTER,AMARILLO
9,"NGUYEN, NGAT THI",Respondent failed to follow whirlpool foot spa...,4/25/2018,75686,CAMP,PITTSBURG
