# Texas Barber Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for barbers in Houson!

## Preparation: Knowing your tags

These questions are the same for every data set, and might not work exactly for yours.

In [46]:
from bs4 import BeautifulSoup
import requests

In [61]:
data = {
    'pht_status':'BAR',
    'pht_lic':'',
    'pht_lnm':'',
    'pht_fnm':'',
    'pht_oth_name':'',
    'phy_city':'HOUSTON',
    'phy_cnty':'-1',
    'phy_zip':'',
    'B1':'Search'
}
headers = {
    'Referer':'https://www.tdlr.texas.gov/cimsfo/fosearch.asp',
    'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
}


In [62]:
response = requests.post("https://www.tdlr.texas.gov/cimsfo/fosearch_results.asp", data=data, headers=headers)
doc = BeautifulSoup(response.text, 'html.parser')

.... type -brew install curl- on your terminal

paste on your terminal If you need to have this software first in your PATH run:
  --- echo 'export PATH="/usr/local/opt/curl/bin:$PATH"' >> ~/.bash_profile ----

and then check if it works by typing ---curl----

### What is the tag and class name for every row of data?

The tag is 'tr' and there is no class. 

### What is the tag and class name for every person's name?

The tag is 'span' and the class is 'results_text'. 

### What is the tag and class name for the violation number?

The tag is 'span' and the class is 'results_text'. 

### What is the tag and class name for the description of their violation?

The tag is 'td' and there is no class.

## Setup: Import what you'll need to scrape the page

Use `requests`, not `urllib`.

## Try to scrape the page

To test if you requested the page correctly, save the BeautifulSoup document as `doc` and run the code `doc.find_all('tr')[1].text` to get the text of the first `<tr>` element.

- If the result starts with  **nPlease enter at least one (1) parameter** you were NOT successful.
- If the result starts with **MONTES DE OCA, REINIER**, you were successful.

### Try to request the page however you think you should.

"Try" to do it, because it *will not work.* Once you've learned that it won't work, you should **ask how to do it on the board**.

In [73]:
doc.find_all('tr')[1].text

' MONTES DE OCA, REINIER  Company: LA BENDICION City: HOUSTON County: HARRIS Zip Code: 77072  License: Not LicensedComplaint # BAR20170009735       Date: 5/24/2017Respondent is assessed an administrative penalty in the amount of $1,125. Respondent performed barbering without the required license.'

### Try to request the page with the correct data parameters

Secret tip: It still won't work. **Ask why not on the board.**

### What is the smallest `curl` command that still gives you a result?

curl 'https://www.tdlr.texas.gov/cimsfo/fosearch_results.asp' -H 'Referer: https://www.tdlr.texas.gov/cimsfo/fosearch.asp' --data 'pht_status=BAR&pht_lic=&pht_lnm=&pht_fnm=&pht_oth_name=&phy_city=HOUSTON+++++++++++++&phy_cnty=-1&phy_zip=&B1=Search' --compressed

## Request the page with the correct data parameters AND the correct MINIMUM headers

This time it should work.

## Scraping

### Loop through each `tr` and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen? I'm happy to help if you ask on the board.

In [12]:
for name in doc.find_all('tr'):
    name = name.find('span', attrs={"class":'results_text'})
    if name:
        print(name.text)

MONTES DE OCA, REINIER 
ALFORD, RAYMOND 
CHAPMAN, JESSICA 
SALAZAR-ALVAREZ, SAMUEL 
GONZALES, DAVID 
FLORES, CHRISTOPHER 
ARMSTEAD, CEDRIC J
MORAH, PATRICK 
TREJO, BLADIMAR A
DAVIS, RICHARD D
HOPKINS, JOSHUA 
NINO, ROBERT 
HEATH, LOLETHA N
SALAZAR-ALVAREZ, SAMUEL 
MONTES DE OCA, REINIER 
MARLEN'S BEAUTY SALON LIC 747062
TOP STYLES BARBER SHOP
SUTTON, EMANUEL B
SHEPHARD, JAMES C
HERNANDEZ, MARIA DIOCELINA 
WILLIAMS, DONTUEL 
JOHNSON, JEFFERY J
PERFECTION BARBER & HAIR STUDIO
HUERTA, FRANCISCO 
TIPTON, SELINA I
ARREOLA, ERIC D
HARRISON, OTTO M
RIVERA TORRES, ANGEL D
PECK, MARVIN 
MOTA SOTO, CRISTIAN D
WADDLE, EDDIE D
SON, YOUNG J
HILL, BRIAN 
BROWN, DELRICK JAREL 
FRANKLIN, KELVIN 
LEDET, LEON 
WILLIAMS, DONTUEL 
LACY, JUSTIN J
MAKE THE CUT
ARELLANO, GREGORY F
MACEDO, ANTONIO 
MILLER, SHAWN ERIC 
HAYWARD, ABBIE DEAN 
BROWN, CHARLES EARL 
MCQUEEN, IDA M
MCQUEEN, IDA M
CAESAR, RON 
MORRIS, VICTOR B
NOLAN, CHRIS B
BICKHAM, DONNELL 
LOUIS, DIONNE N
HARRELL, KENTON D
SUBRAHMANIAN, CHITRA N
FR

## Loop through each `tr`, printing each violation description

- TIP: What is the container tag name for it?
- TIP: You'll get an error even if you're ALMOST right - which row is causing the problem?

In [32]:
for desc in doc.find_all('tr')[1:]:
    desc = desc.find_all('td')[2]
    print(desc.text)
    print('---')

Respondent performed barbering without the required license.
---
Respondent performed barbering without the required license.
---
Respondent failed to electronically submit to the Department at least one time per month student's accrued hours.
---
Respondent performed barbering without the required license.
---
Respondent leased space in a barber shop to an individual who engaged in the practice of barbering but had not obtained a barber license.
---
Respondent leased space in a barber shop to an individual who engaged in the practice of barbering but had not obtained a barber license.
---
The Respondent's license was revoked upon Respondent's imprisonment in a penitentiary.
---
Respondent leased space in a barber shop to an individual who engaged in the practice of barbering but had not obtained a barber license; Respondent failed to prepare fresh disinfectant solution daily or more often as needed, for immersion of implements.
---
Respondent performed barbering without the required l

## Loop through each `tr`, printing the complaint number

- TIP: It should be the last piece of the fist `td`

In [42]:
for num in doc.find_all('tr')[1:]:
    num = num.find_all('span', attrs={"class":'results_text'})[-2]
    print(num.text)
    print('---')

BAR20170009735      
---
BAR20170013061      
---
BAR20160014463      
---
BAR20170009706      
---
BAR20160024898      
---
BAR20170003858      
---
BAR20170017750      
---
BAR20170001067      
---
BAR20170015712      
---
BAR20160026976      
---
BAR20170004945      
---
BAR20170005752      
---
BAR20170008862      
---
BAR20170009706      
---
BAR20170009735      
---
BAR20170010211      
---
BAR20170015711      
---
BAR20170005607      
---
BAR20170012408      
---
BAR20160015455      
---
BAR20170004000      
---
BAR20170004622      
---
BAR20170009953      
---
BAR20160019178      
---
BAR20170003998      
---
BAR20170005585      
---
BAR20170004247      
---
BAR20170004644      
---
BAR20170001084      
---
BAR20170003233      
---
BAR20170007267      
---
BAR20170004607      
---
BAR20170004726      
---
BAR20170000258      
---
BAR20170000872      
---
BAR20170000888      
---
BAR20170004000      
---
BAR20170001296      
---
BAR20170001765      
---
BAR20160000930      
---


## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number

Create a new dictionary for each `tr` (except the header).

In [58]:
barbers_list = []
for line in doc.find_all('tr')[1:]:
    barbers_dict = {}
    name = line.find('span', attrs={"class":'results_text'})
    if name:
        #print(name.text)
        barbers_dict['name'] = name.text.strip()
    desc = line.find_all('td')[2]
    if desc:
        #print(desc.text)
        barbers_dict['desc'] = desc.text.strip()
    num = line.find_all('span', attrs={"class":'results_text'})[-2]
    if num:
        #print(num.text)
        barbers_dict['num'] = num.text.strip()
    barbers_list.append(barbers_dict)
    print(barbers_dict)
    print("--------")

{'name': 'MONTES DE OCA, REINIER', 'desc': 'Respondent performed barbering without the required license.', 'num': 'BAR20170009735'}
--------
{'name': 'ALFORD, RAYMOND', 'desc': 'Respondent performed barbering without the required license.', 'num': 'BAR20170013061'}
--------
{'name': 'CHAPMAN, JESSICA', 'desc': "Respondent failed to electronically submit to the Department at least one time per month student's accrued hours.", 'num': 'BAR20160014463'}
--------
{'name': 'SALAZAR-ALVAREZ, SAMUEL', 'desc': 'Respondent performed barbering without the required license.', 'num': 'BAR20170009706'}
--------
{'name': 'GONZALES, DAVID', 'desc': 'Respondent leased space in a barber shop to an individual who engaged in the practice of barbering but had not obtained a barber license.', 'num': 'BAR20160024898'}
--------
{'name': 'FLORES, CHRISTOPHER', 'desc': 'Respondent leased space in a barber shop to an individual who engaged in the practice of barbering but had not obtained a barber license.', 'nu

--------
{'name': 'BOYKIN, BRANDON W', 'desc': 'Respondent performed barbering without the required license.', 'num': 'BAR20140006139'}
--------
{'name': 'MEDINA, JENIFER', 'desc': 'Respondent leased space in a barber shop to an individual who engaged in the practice of barbering but had not obtained a barber license.', 'num': 'BAR20140016318'}
--------
{'name': 'METOYER, SHAWN M', 'desc': "The Respondent's license was revoked upon Respondent's imprisonment in a penitentiary.", 'num': 'BAR20140017873'}
--------
{'name': 'PADRON, ESMERALDA', 'desc': 'Respondent leased space in a barber shop to an individual who engaged in the practice of barbering but had not obtained a barber license.', 'num': 'BAR20140017648'}
--------
{'name': 'WILLIAMS, DERRICK D', 'desc': "The Respondent's license was revoked upon Respondent's imprisonment in a penitentiary.", 'num': 'BAR20140019563'}
--------
{'name': 'RANDALL, ROGER', 'desc': 'Respondent leased space in a barber shop to an individual who engaged 

### Save that to a CSV

In [52]:
import pandas as pd

In [60]:
df = pd.DataFrame(barbers_list)
df.head()

Unnamed: 0,desc,name,num
0,Respondent performed barbering without the req...,"MONTES DE OCA, REINIER",BAR20170009735
1,Respondent performed barbering without the req...,"ALFORD, RAYMOND",BAR20170013061
2,Respondent failed to electronically submit to ...,"CHAPMAN, JESSICA",BAR20160014463
3,Respondent performed barbering without the req...,"SALAZAR-ALVAREZ, SAMUEL",BAR20170009706
4,Respondent leased space in a barber shop to an...,"GONZALES, DAVID",BAR20160024898


In [61]:
df.to_csv("barbers_list.csv", index=False)

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

OK! done