# Texas Cosmetologist Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for cosmetologists!

## Setup: Import what you'll need to scrape the page

We'll be using Selenium for this, *not* BeautifulSoup and requests.

In [32]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait

## Starting your search

Starting from [here](https://www.tdlr.texas.gov/cimsfo/fosearch.asp), search for cosmetologist violations for people with the last name **Nguyen**.

In [33]:
driver = webdriver.Chrome()

In [34]:
driver.get('https://www.tdlr.texas.gov/cimsfo/fosearch.asp')

In [35]:
last_name = driver.find_element_by_name('pht_lnm')

In [36]:
last_name.send_keys('Nguyen')

In [37]:
dropdown = Select(driver.find_element_by_name("pht_status"))

In [38]:
dropdown = dropdown.select_by_visible_text('Cosmetologists')

In [39]:
button = driver.find_element_by_name('B1')
driver.execute_script("arguments[0].scrollIntoView(true)", button)
button.click()

## Scraping

Once you are on the results page, do this.

### Loop through each result and print the entire row

Okay wait, that's a heck of a lot. Use `[:10]` to only do the first ten (`listname[:10]` gives you the first ten).

In [55]:
rows = driver.find_elements_by_tag_name('tr')

books = []

for row in rows[:10]:
  cells = row.find_elements_by_tag_name('td')

for cell in cells[:10]:

  book = {}
    
  book['span.results_text'] = cells[0].text

  books.append(book)

print(book)



{'span.results_text': 'NGUYEN, TRUNG N\nCity: AMARILLO\nCounty: POTTER\nZip Code: 79106\n\n\nLicense #(s): 1196244, 767015, 767014\n\nComplaint # COS20170023893'}


### Loop through each result and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen?! If you want to ignore an error, you use code like this:

```python
try:
   try to do something
except:
   print("It didn't work')
```

It should help you out. If you don't want to print anything, you can type `pass` instead of the `print` statement.

**Why doesn't the first one have a name?**

## Loop through each result, printing each violation description ("Basis for order")

> - *Tip: You'll get an error even if you're ALMOST right - which row is causing the problem?*
> - *Tip: You can get the HTML of something by doing `.get_attribute('innerHTML')` - it might help you diagnose your issue.*
> - *Tip: Or I guess you could just skip the one with the problem...

## Loop through each result, printing the complaint number

- TIP: Think about the order of the elements

## Saving the results

### Loop through each result to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number
- License Numbers
- Zip Code
- County
- City

Create a new dictionary for each result (except the header).

> *Tip: If you want to ask for the "next sibling," you can't use `find_next_sibling` in Selenium, you need to use `element.find_element_by_xpath("following-sibling::div")` to find the next div, or `element.find_element_by_xpath("following-sibling::*")` to find the next anything.

### Save that to a CSV

- Tip: You'll want to use pandas here

In [59]:
import pandas as pd

In [60]:
df= pd.DataFrame(books)

In [61]:
df.to_csv("output.csv", index=False)

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [64]:
df.head()

Unnamed: 0,span.results_text
0,"NGUYEN, TRUNG N\nCity: AMARILLO\nCounty: POTTE..."
1,"NGUYEN, TRUNG N\nCity: AMARILLO\nCounty: POTTE..."
2,"NGUYEN, TRUNG N\nCity: AMARILLO\nCounty: POTTE..."
