# Texas Cosmetologist Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for cosmetologists!

## Setup: Import what you'll need to scrape the page

We'll be using Selenium for this, *not* BeautifulSoup and requests.

In [1]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
import pandas as pd

## Starting your search

Starting from [here](https://www.tdlr.texas.gov/cimsfo/fosearch.asp), search for cosmetologist violations for people with the last name **Nguyen**.

In [2]:
driver = webdriver.Chrome()

In [3]:
driver.get('https://www.tdlr.texas.gov/cimsfo/fosearch.asp')

In [4]:
last_name = driver.find_element_by_name('pht_lnm')
driver.execute_script("arguments[0].scrollIntoView(true)", last_name) # to scroll down to the search

In [5]:
last_name.send_keys('Nguyen')

In [7]:
search_button = driver.find_element_by_name('B1')
search_button.click()

## Scraping

Once you are on the results page, do this.

### Loop through each result and print the entire row

Okay wait, that's a heck of a lot. Use `[:10]` to only do the first ten (`listname[:10]` gives you the first ten).

In [10]:
nguyens = driver.find_elements_by_tag_name('tr') 
len(nguyens) #that gives me all the Nguyens plus the first row, I guess...

543

In [14]:
for nguyen in nguyens[:10]:
    print(nguyen.text)

Name and Location Order Basis for Order
NGUYEN, TOAN HUU
City: SAN ANTONIO
County: BEXAR
Zip Code: 78217


License #(s): 780948, 1706491, 1699123

Complaint # COS20180004289 Date: 5/30/2018

Respondent is assessed an administrative penalty in the amount of $500. Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day.
NGUYEN, HANH CONG
City: EL PASO
County: EL PASO
Zip Code: 79934


License #: 737708

Complaint # COS20180006594 Date: 5/30/2018

Respondent is assessed an administrative penalty in the amount of $1,000. Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day; Respondent failed to use items subject to possible cross contamination in a manner that does not contaminate the remaining product.
NGUYEN, KHIEM VAN
City: LONGVIEW
County: GREGG
Zip Code: 75604


License #: 731665

Complaint # COS20180000257 Date: 5/17/2018

Respondent is assessed an administrative penalty in the amount of $1,250. Responde

### Loop through each result and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen?! If you want to ignore an error, you use code like this:

```python
try:
   try to do something
except:
   print("It didn't work')
```

It should help you out. If you don't want to print anything, you can type `pass` instead of the `print` statement.

**Why doesn't the first one have a name?**

In [21]:
nguyens = driver.find_elements_by_tag_name('tr')

for nguyen in nguyens[0:10]:
    try:
        rows = nguyen.find_elements_by_class_name('results_text')
        print(rows[0].text)
    
    
    except IndexError:
        pass

NGUYEN, TOAN HUU
NGUYEN, HANH CONG
NGUYEN, KHIEM VAN
NGUYEN, DIEP THI NGOC
NGUYEN, LAN T-THUY
NGUYEN, TUAN A
NGUYEN, THAO B
NGUYEN, BETH MARIA
NGUYEN, KENNEY TUAN


## Loop through each result, printing each violation description ("Basis for order")

> - *Tip: You'll get an error even if you're ALMOST right - which row is causing the problem?*
> - *Tip: You can get the HTML of something by doing `.get_attribute('innerHTML')` - it might help you diagnose your issue.*
> - *Tip: Or I guess you could just skip the one with the problem...

In [30]:
nguyens = driver.find_elements_by_tag_name('tr') #just to remind myself that I'm still into the nguyens
#nguyen.find_element_by_tag_name('td') #I have to look at td, not at tag-name 'font'

for nguyen in nguyens [0:10]:
    try:
        print("---------------")
        rows = nguyen.find_elements_by_tag_name('td')
        print(rows[2].text)
           
    except IndexError:
        print('it did not work')


---------------
it did not work
---------------
Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day.
---------------
Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day; Respondent failed to use items subject to possible cross contamination in a manner that does not contaminate the remaining product.
---------------
Respondent failed to follow whirlpool foot spas cleaning and sanitization procedures as required; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to clean and disinfect all wax pots.
---------------
Respondent failed to disinfect tools, implements, and supplies with an EPA-registered disinfectant solution; Respondent failed to disinfect multi-use equipment, implements, and tools prior to use on each client.
---------------
Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each 

## Loop through each result, printing the complaint number

- TIP: Think about the order of the elements

In [33]:
nguyens = driver.find_elements_by_tag_name('tr')

for nguyen in nguyens[0:10]:
    try:
        print("---------------")
        rows = nguyen.find_elements_by_class_name('results_text')
        print(rows[-2].text)
           
    except IndexError:
        print('it did not work')
    

---------------
it did not work
---------------
COS20180004289
---------------
COS20180006594
---------------
COS20180000257
---------------
COS20180004915
---------------
COS20180009255
---------------
COS20140018343
---------------
COS20180008846
---------------
COS20180000897
---------------
BAR20180001231


## Saving the results

### Loop through each result to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number
- License Numbers
- Zip Code
- County
- City

Create a new dictionary for each result (except the header).

> *Tip: If you want to ask for the "next sibling," you can't use `find_next_sibling` in Selenium, you need to use `element.find_element_by_xpath("following-sibling::div")` to find the next div, or `element.find_element_by_xpath("following-sibling::*")` to find the next anything.

In [53]:
all_nguyens = []
nguyens = driver.find_elements_by_tag_name('tr')

for nguyen in nguyens[:10]:
    try:
        
        print('---------')
        n_dic = {}
        
        rows = nguyen.find_elements_by_class_name('results_text')
        n_dic['Name'] = rows[0].text
        print(rows[0].text)
        
        rows=nguyen.find_elements_by_tag_name('td')
        n_dic['Violation description'] = rows[2].text
        print(rows[2].text)
        
        rows = nguyen.find_elements_by_class_name('results_text')
        n_dic['Violation number'] = rows[-2].text
        print(rows[-2].text)
        
        rows = nguyen.find_elements_by_class_name('results_text') #wouldn't have to write rows again but I think it's cleaner that way
        n_dic['License Numbers'] = rows[-3].text
        print(rows[-3].text)
        
        rows = nguyen.find_elements_by_class_name('results_text') 
        n_dic['Zip Code'] = rows[-4].text
        print(rows[-4].text)
        
        rows = nguyen.find_elements_by_class_name('results_text') 
        n_dic['County'] = rows[-5].text
        print(rows[-5].text)
        
        rows = nguyen.find_elements_by_class_name('results_text') 
        n_dic['City'] = rows[-6].text
        print(rows[-6].text)
        
        all_nguyens.append(n_dic)
           
    except IndexError:
        print('it did not work')


---------
it did not work
---------
NGUYEN, TOAN HUU
Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day.
COS20180004289
780948, 1706491, 1699123
78217
BEXAR
SAN ANTONIO
---------
NGUYEN, HANH CONG
Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day; Respondent failed to use items subject to possible cross contamination in a manner that does not contaminate the remaining product.
COS20180006594
737708
79934
EL PASO
EL PASO
---------
NGUYEN, KHIEM VAN
Respondent failed to follow whirlpool foot spas cleaning and sanitization procedures as required; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to clean and disinfect all wax pots.
COS20180000257
731665
75604
GREGG
LONGVIEW
---------
NGUYEN, DIEP THI NGOC
Respondent failed to disinfect tools, implements, and supplies with an EPA-registered disinfectant solution; Respondent failed to

In [62]:
all_nguyens #to doublecheck

[{'Name': 'NGUYEN, TOAN HUU',
  'Violation description': 'Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day.',
  'Violation number': 'COS20180004289',
  'License Numbers': '780948, 1706491, 1699123',
  'Zip Code': '78217',
  'County': 'BEXAR',
  'City': 'SAN ANTONIO'},
 {'Name': 'NGUYEN, HANH CONG',
  'Violation description': 'Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day; Respondent failed to use items subject to possible cross contamination in a manner that does not contaminate the remaining product.',
  'Violation number': 'COS20180006594',
  'License Numbers': '737708',
  'Zip Code': '79934',
  'County': 'EL PASO',
  'City': 'EL PASO'},
 {'Name': 'NGUYEN, KHIEM VAN',
  'Violation description': 'Respondent failed to follow whirlpool foot spas cleaning and sanitization procedures as required; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use

### Save that to a CSV

- Tip: You'll want to use pandas here

In [56]:
df = pd.DataFrame(all_nguyens)
df.head(10)

Unnamed: 0,City,County,License Numbers,Name,Violation description,Violation number,Zip Code
0,SAN ANTONIO,BEXAR,"780948, 1706491, 1699123","NGUYEN, TOAN HUU",Respondent failed to clean and sanitize whirlp...,COS20180004289,78217
1,EL PASO,EL PASO,737708,"NGUYEN, HANH CONG",Respondent failed to clean and sanitize whirlp...,COS20180006594,79934
2,LONGVIEW,GREGG,731665,"NGUYEN, KHIEM VAN",Respondent failed to follow whirlpool foot spa...,COS20180000257,75604
3,HOUSTON,HARRIS,"1347649, 760528","NGUYEN, DIEP THI NGOC","Respondent failed to disinfect tools, implemen...",COS20180004915,77014
4,SAN ANTONIO,BEXAR,767339,"NGUYEN, LAN T-THUY","Respondent failed to clean, disinfect, and ste...",COS20180009255,78255
5,ARLINGTON,TARRANT,681274,"NGUYEN, TUAN A",Respondent failed to clean and disinfect all w...,COS20140018343,76011
6,EULESS,TARRANT,"721373, 1142884","NGUYEN, THAO B",Respondent failed to clean and sanitize whirlp...,COS20180008846,76039
7,HOUSTON,HARRIS,1470271,"NGUYEN, BETH MARIA",The Respondent's license was revoked upon Resp...,COS20180000897,77083
8,CEDAR PARK,WILLIAMSON,692892,"NGUYEN, KENNEY TUAN",Respondent leased space in a barber shop to an...,BAR20180001231,78613


In [57]:
df.to_csv("cosmetologists_nguyen.csv", index=False)

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [59]:
df_test = pd.read_csv('cosmetologists_nguyen.csv')
df_test.head(10)

Unnamed: 0,City,County,License Numbers,Name,Violation description,Violation number,Zip Code
0,SAN ANTONIO,BEXAR,"780948, 1706491, 1699123","NGUYEN, TOAN HUU",Respondent failed to clean and sanitize whirlp...,COS20180004289,78217
1,EL PASO,EL PASO,737708,"NGUYEN, HANH CONG",Respondent failed to clean and sanitize whirlp...,COS20180006594,79934
2,LONGVIEW,GREGG,731665,"NGUYEN, KHIEM VAN",Respondent failed to follow whirlpool foot spa...,COS20180000257,75604
3,HOUSTON,HARRIS,"1347649, 760528","NGUYEN, DIEP THI NGOC","Respondent failed to disinfect tools, implemen...",COS20180004915,77014
4,SAN ANTONIO,BEXAR,767339,"NGUYEN, LAN T-THUY","Respondent failed to clean, disinfect, and ste...",COS20180009255,78255
5,ARLINGTON,TARRANT,681274,"NGUYEN, TUAN A",Respondent failed to clean and disinfect all w...,COS20140018343,76011
6,EULESS,TARRANT,"721373, 1142884","NGUYEN, THAO B",Respondent failed to clean and sanitize whirlp...,COS20180008846,76039
7,HOUSTON,HARRIS,1470271,"NGUYEN, BETH MARIA",The Respondent's license was revoked upon Resp...,COS20180000897,77083
8,CEDAR PARK,WILLIAMSON,692892,"NGUYEN, KENNEY TUAN",Respondent leased space in a barber shop to an...,BAR20180001231,78613
