# Texas Barber Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for barbers in Houson!

## Preparation: Knowing your tags

These questions are the same for every data set, and might not work exactly for yours.

### What is the tag and class name for every row of data?

In [None]:
# tag = div, class = "dat-menu-container"

### What is the tag and class name for every person's name?

In [None]:
# Tag is span and class is results_text 

### What is the tag and class name for the violation number?

In [None]:
# Tag is span and class is results_text

### What is the tag and class name for the description of their violation?

In [None]:
# Tag is td

## Setup: Import what you'll need to scrape the page

Use `requests`, not `urllib`.

In [1]:
from bs4 import BeautifulSoup
import requests

In [38]:
data = {'pht_status':'BAR',
'pht_lic' : '',
'pht_lnm':'',
'pht_fnm':'',
'pht_oth_name':'',
"phy_city": 'HOUSTON',
'phy_cnty':'-1',                               
'phy_zip': '',
'B1':'Search'
}

url= "https://www.tdlr.texas.gov/cimsfo/fosearch_results.asp"

#'Referer: https://www.tdlr.texas.gov/cimsfo/fosearch.asp' -H 'Connection: keep-alive'

Headers = {
    'Referer': 'https://www.tdlr.texas.gov/cimsfo/fosearch.asp',
    'Connection': 'keep-alive'
}
response = requests.post(url, data=data, headers=Headers)
response.text

'\r\n<!DOCTYPE>\r\n<html lang = "en">\r\n<head>\r\n\t\t<!-- Meta Tags -->\r\n\t\t<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\r\n\t\t<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1" />\r\n\t\t<!-- Favicon -->\r\n\t\t<link rel="shortcut icon" href="/images/favicon.png" type="image/x-icon" />\r\n\t\t<!-- Stylesheets -->\r\n\t\t<link type="text/css" rel="stylesheet" href="/css/reset.css" />\r\n\t\t<link type="text/css" rel="stylesheet" href="/css/font-awesome.min.css" />\r\n\t\t<link type="text/css" rel="stylesheet" href="/css/animate.css" />\r\n\t\t<link type="text/css" rel="stylesheet" href="/css/main-stylesheet.css" />\r\n\t\t<link type="text/css" rel="stylesheet" href="/css/lightbox.css" />\r\n\t\t<link type="text/css" rel="stylesheet" href="/css/shortcodes.css" />\r\n\t\t<link type="text/css" rel="stylesheet" href="/css/custom-fonts.css" />\r\n\t\t<link type="text/css" rel="stylesheet" href="/css/custom-colors.css" />\r\n\t

## Try to scrape the page

To test if you requested the page correctly, save the BeautifulSoup document as `doc` and run the code `doc.find_all('tr')[1].text` to get the text of the first `<tr>` element.

- If the result starts with  **nPlease enter at least one (1) parameter** you were NOT successful.
- If the result starts with **MONTES DE OCA, REINIER**, you were successful.

### Try to request the page however you think you should.

"Try" to do it, because it *will not work.* Once you've learned that it won't work, you should **ask how to do it on the board**.

In [39]:
doc = BeautifulSoup(response.text, 'html.parser')
doc.find_all('tr')[1].text

' MONTES DE OCA, REINIER  Company: LA BENDICION City: HOUSTON County: HARRIS Zip Code: 77072  License: Not LicensedComplaint # BAR20170009735       Date: 5/24/2017Respondent is assessed an administrative penalty in the amount of $1,125. Respondent performed barbering without the required license.'

### Try to request the page with the correct data parameters

Secret tip: It still won't work. **Ask why not on the board.**

### What is the smallest `curl` command that still gives you a result?

In [None]:
##curl :'https://www.tdlr.texas.gov/cimsfo/fosearch_results.asp' -H 'Referer: https://www.tdlr.texas.gov/cimsfo/fosearch.asp' -H

## Request the page with the correct data parameters AND the correct MINIMUM headers

This time it should work.

## Scraping

### Loop through each `tr` and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen? I'm happy to help if you ask on the board.

In [46]:
names = doc.find_all('tr')
for name in names:
    actual_name = name.find('span', attrs={'class': 'results_text'})
    if actual_name is not None:
        print(actual_name.string)

MONTES DE OCA, REINIER 
ALFORD, RAYMOND 
CHAPMAN, JESSICA 
SALAZAR-ALVAREZ, SAMUEL 
GONZALES, DAVID 
FLORES, CHRISTOPHER 
ARMSTEAD, CEDRIC J
MORAH, PATRICK 
TREJO, BLADIMAR A
DAVIS, RICHARD D
HOPKINS, JOSHUA 
NINO, ROBERT 
HEATH, LOLETHA N
SALAZAR-ALVAREZ, SAMUEL 
MONTES DE OCA, REINIER 
MARLEN'S BEAUTY SALON LIC 747062
TOP STYLES BARBER SHOP
SUTTON, EMANUEL B
SHEPHARD, JAMES C
HERNANDEZ, MARIA DIOCELINA 
WILLIAMS, DONTUEL 
JOHNSON, JEFFERY J
PERFECTION BARBER & HAIR STUDIO
HUERTA, FRANCISCO 
TIPTON, SELINA I
ARREOLA, ERIC D
HARRISON, OTTO M
RIVERA TORRES, ANGEL D
PECK, MARVIN 
MOTA SOTO, CRISTIAN D
WADDLE, EDDIE D
SON, YOUNG J
HILL, BRIAN 
BROWN, DELRICK JAREL 
FRANKLIN, KELVIN 
LEDET, LEON 
WILLIAMS, DONTUEL 
LACY, JUSTIN J
MAKE THE CUT
ARELLANO, GREGORY F
MACEDO, ANTONIO 
MILLER, SHAWN ERIC 
HAYWARD, ABBIE DEAN 
BROWN, CHARLES EARL 
MCQUEEN, IDA M
MCQUEEN, IDA M
CAESAR, RON 
MORRIS, VICTOR B
NOLAN, CHRIS B
BICKHAM, DONNELL 
LOUIS, DIONNE N
HARRELL, KENTON D
SUBRAHMANIAN, CHITRA N
FR

## Loop through each `tr`, printing each violation description

- TIP: What is the container tag name for it?
- TIP: You'll get an error even if you're ALMOST right - which row is causing the problem?

In [141]:
violations = doc.find_all('tr')
for violation in violations:
    violation_data = violation.find_all('td') 
    for paragraph in violation_data:
        print(paragraph)
    print("----")

----
<td style="padding:4px; text-align:left; font-size:11px; font:Arial, Helvetica, sans-serif; width:22%;"><span class="results_text">MONTES DE OCA, REINIER </span><br/><br/> <span class="default_text">Company:</span> <span class="results_text">LA BENDICION</span><br/> <span class="default_text">City:</span> <span class="results_text">HOUSTON</span><br/> <span class="default_text">County:</span> <span class="results_text">HARRIS</span><br/> <span class="default_text">Zip Code:</span> <span class="results_text">77072</span><br/> <br/><br/><span class="default_text"> License:</span> <span class="results_text">Not Licensed</span><br/><br/><span class="default_text">Complaint #</span> <span class="results_text">BAR20170009735      </span></td>
<td style="padding:4px; text-align:left; font-size:11px; font:Arial, Helvetica, sans-serif; width:39%;"><span class="default_text">Date:</span> <span class="results_text">5/24/2017</span><br/><br/>Respondent is assessed an administrative penalty in

<td style="padding:4px; text-align:left; font-size:11px; font:Arial, Helvetica, sans-serif; width:39%;">Respondent failed to comply with an order previously issued by the Executive Director.</td>
----
<td style="padding:4px; text-align:left; font-size:11px; font:Arial, Helvetica, sans-serif; width:22%;"><span class="results_text">BATES, JEMOND D</span><br/> <span class="default_text">City:</span> <span class="results_text">HOUSTON</span><br/> <span class="default_text">County:</span> <span class="results_text">HARRIS</span><br/> <span class="default_text">Zip Code:</span> <span class="results_text">77014</span><br/><br/><br/><span class="default_text"> License #:</span> <span class="results_text">632822</span><br/><br/><span class="default_text">Complaint #</span> <span class="results_text">BAR20150009491      </span></td>
<td style="padding:4px; text-align:left; font-size:11px; font:Arial, Helvetica, sans-serif; width:39%;"><span class="default_text">Date:</span> <span class="results_

In [147]:
violations = doc.find_all('tr')
for violation in violations:
    description = violation.find_all('span', attrs={'class': 'results_text'})
    if description[6] is not None:
    print(description[6])

IndentationError: expected an indented block (<ipython-input-147-9a8468f7a93f>, line 5)

## Loop through each `tr`, printing the complaint number

- TIP: It should be the last piece of the fist `td`

In [158]:
new_list = doc.find_all('tr')
for number in new_list:
    vio_number = number.find('td')
    print(vio_number)
    print("-----")

None
-----
<td style="padding:4px; text-align:left; font-size:11px; font:Arial, Helvetica, sans-serif; width:22%;"><span class="results_text">MONTES DE OCA, REINIER </span><br/><br/> <span class="default_text">Company:</span> <span class="results_text">LA BENDICION</span><br/> <span class="default_text">City:</span> <span class="results_text">HOUSTON</span><br/> <span class="default_text">County:</span> <span class="results_text">HARRIS</span><br/> <span class="default_text">Zip Code:</span> <span class="results_text">77072</span><br/> <br/><br/><span class="default_text"> License:</span> <span class="results_text">Not Licensed</span><br/><br/><span class="default_text">Complaint #</span> <span class="results_text">BAR20170009735      </span></td>
-----
<td style="padding:4px; text-align:left; font-size:11px; font:Arial, Helvetica, sans-serif; width:22%;"><span class="results_text">ALFORD, RAYMOND </span><br/> <span class="default_text">City:</span> <span class="results_text">HOUSTON</

## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number

Create a new dictionary for each `tr` (except the header).

### Save that to a CSV

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.