# Fixing up a house in Louisiana

I think there are a lot fewer laws than in NYC, but it's (probably) tough.

**Topics:**

* PDF to CSV
* Secret APIs
* Scraping or regex
* Data cleaning

## Arborists `4 points`

If you need someone to cut down a tree, you might need an arborist!

* Fortunately, Louisiana has [a list of them](https://www.ldaf.state.la.us/ldaf-programs/horticulture-programs/louisiana-horticulture-commission/).
* Unfortunately [it's a PDF](http://www.ldaf.state.la.us/wp-content/uploads/2021/08/ARL-LIST-080421.pdf).

### Convert the PDF to a CSV file or dataframe

In [23]:
import camelot
import pandas as pd
import re

In [2]:
tables = camelot.read_pdf('http://www.ldaf.state.la.us/wp-content/uploads/2021/08/ARL-LIST-080421.pdf', flavor='stream', pages='1-end')

In [3]:
pages = [table.df for table in tables]
df = pd.concat(pages,ignore_index=True)

In [4]:
df.head(10)

Unnamed: 0,0,1,2,3,4,5
0,,,ARBORIST,,,
1,,The Louisiana Horticulture Law states that no ...,,,,
2,,"appropriate license or permit, or is employed ...",,,,
3,required to place at least one of their licens...,,,,,
4,Arborist License,,,,,
5,,Authorizes the holder to make recommendations ...,,,,
6,,specifying work to be done and sum to be paid....,,,,
7,,,This list will be updated periodically. If you...,,,
8,Parish,Name,Bus Phone\nPlace of Business,City,St,Zip
9,Acadia,"BROUSSARD, LISA C",(337) 783-9390 GERALD'S LANDSCAPE & LAWN SERVI...,MORSE,LA,70559


In [5]:
cleandf = df[8:]
cleandf.reset_index()

Unnamed: 0,index,0,1,2,3,4,5
0,8,Parish,Name,Bus Phone\nPlace of Business,City,St,Zip
1,9,Acadia,"BROUSSARD, LISA C",(337) 783-9390 GERALD'S LANDSCAPE & LAWN SERVI...,MORSE,LA,70559
2,10,Acadia,"DUPRE, ADAM ANTHONY",(337) 580-1021 ADTEK ENTERPRISES LLC,IOTA,LA,70543
3,11,Acadia,"HARGRAVE, DOUGLAS LEON",(337) 523-6123 DOUGLAS HARGRAVE,IOTA,LA,70543
4,12,Acadia,"LELEUX, CHARLOTTE R",(337) 784-0035\nLELEUX'S CUT & HAUL LLC,CROWLEY,LA,70526
...,...,...,...,...,...,...,...
635,643,West Baton Rouge,"PREJEAN, JEREMY PAUL",(225) 445-5568 MARMAK ENTERPRISES LLC,PORT ALLEN,LA,70767
636,644,West Carroll,"HARRISON, JACOB GLENN",(318) 884-6011 HARRISON TREE SERVICE,OAK GROVE,LA,71263
637,645,West Feliciana,"SPINKS, JAMES T","(225) 635-3840 SPINKS CONSTRUCTION, INC.",ST. FRANCISVILLE,LA,70775
638,646,Winn,"ALLEN, DON",(318) 268-1565 DON ALLEN LOGGING,BRADLEY,AR,71826


In [6]:
cleandf.columns = cleandf.iloc[0]
cleandf = cleandf[1:]
cleandf.reset_index()

8,index,Parish,Name,Bus Phone\nPlace of Business,City,St,Zip
0,9,Acadia,"BROUSSARD, LISA C",(337) 783-9390 GERALD'S LANDSCAPE & LAWN SERVI...,MORSE,LA,70559
1,10,Acadia,"DUPRE, ADAM ANTHONY",(337) 580-1021 ADTEK ENTERPRISES LLC,IOTA,LA,70543
2,11,Acadia,"HARGRAVE, DOUGLAS LEON",(337) 523-6123 DOUGLAS HARGRAVE,IOTA,LA,70543
3,12,Acadia,"LELEUX, CHARLOTTE R",(337) 784-0035\nLELEUX'S CUT & HAUL LLC,CROWLEY,LA,70526
4,13,Acadia,"MOUTON, MARY LOUISE",(337) 250-0226\n DARBY'S TREE SERVICE,CHURCHPOINT,LA,70525
...,...,...,...,...,...,...,...
634,643,West Baton Rouge,"PREJEAN, JEREMY PAUL",(225) 445-5568 MARMAK ENTERPRISES LLC,PORT ALLEN,LA,70767
635,644,West Carroll,"HARRISON, JACOB GLENN",(318) 884-6011 HARRISON TREE SERVICE,OAK GROVE,LA,71263
636,645,West Feliciana,"SPINKS, JAMES T","(225) 635-3840 SPINKS CONSTRUCTION, INC.",ST. FRANCISVILLE,LA,70775
637,646,Winn,"ALLEN, DON",(318) 268-1565 DON ALLEN LOGGING,BRADLEY,AR,71826


In [7]:
finaldf = cleandf[cleandf['Parish'] != '08/04/2021']

In [8]:
finaldf.to_csv('arborists.csv')

### Get me a list of every arborists in Shreveport

In [9]:
finaldf['Name'].sort_values().unique()

array(['ABBOTT, DAVID EUGENE', 'ADAMS, JOHN C', 'ADAMS, NATHAN B',
       'ALEGRIA JR, ANDRES ABELINO', 'ALELLO, MATTHEW PAUL',
       'ALLEN JR, JACK E', 'ALLEN, DERRICK S', 'ALLEN, DON',
       'ALUGAS, EURICKA LOWE', 'ALVAREZ, ANDER J',
       'ANDERSON, ANGELA DENISE', 'ANTHONY JR, TED WAYNE',
       'ATKINS III, REGINALD B', 'ATKINS JR, REGINALD B',
       'ATKINS, RANDALL KENNON', 'ATKINS, SANDRA B',
       'AUCOIN, BRIAN JASON', 'AUDIRSCH, JOSEPH MICHEAL',
       'AUTHEMENT, KATIE LYNNE', 'AYRES, LOYE STILLMON',
       'BABIN, CARL ANTHONY', 'BADON, MARK JASON', 'BAKER, DONALD J',
       'BALLARD JR, JIMMIE LANE', 'BALLARD, ERIC D', 'BANNON, DAVID C',
       'BARBER, KELLEY R', 'BARRETT JR, ANTHONY T', 'BARRON, JACK E',
       'BARZE SR, SYMENTRIC L ANTHONY', 'BATES, HOUSTON JAKE',
       'BATSON, MICHELE', 'BATTY JR, JAMES N', 'BEENE, GREGORY WADE',
       'BELLANGER, RAY JOSEPH', 'BELLINGTON, JEFFERY JOHN',
       'BENNETT, BRETT R', 'BENNETT, DAARON DAURAY',
       'BENOIT, D

## Contractors `2 points`

We skipped the arborists and tore down the tree. Unfortunately it landed on our house! So now we need a contractor to fix it up. Luckily there's an [easy contractor search page](https://lslbc.louisiana.gov/contractor-search/search-type-contractor/) on the Louisiana governemnt website.

### Get me a list of all residential roofing contractors

Convert it to a dataframe, save it as a CSV. Note that you **won't be scraping** for this one. You might *start* it as scraping, but you'll find an API.

In [10]:
import requests

cookies = {
    '__wpdm_client': 'f3fe9a403ef29b3eab65e52b6a8523dc',
    '_ga': 'GA1.2.642528063.1630955302',
    '_gid': 'GA1.2.177540430.1630955302',
    'PHPSESSID': 'ed890e49ef89d4cc2ecf3ef7867e0df7',
}

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Accept-Language': 'pt-BR,pt;q=0.8,en-US;q=0.5,en;q=0.3',
    'X-Requested-With': 'XMLHttpRequest',
    'Alt-Used': 'lslbc.louisiana.gov',
    'Connection': 'keep-alive',
    'Referer': 'https://lslbc.louisiana.gov/contractor-search/search-type-contractor/',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin',
    'TE': 'trailers',
}

params = (
    ('api_action', 'advanced'),
    ('contractor_type', 'Residential License'),
    ('classification', '3966,3971'),
    ('action', 'api_actions'),
)

response = requests.get('https://lslbc.louisiana.gov/wp-admin/admin-ajax.php', headers=headers, params=params, cookies=cookies)

In [11]:
df = pd.DataFrame.from_dict(response.json()['results'])

df

Unnamed: 0,city,company_name,id,qualifying_party,state
0,Baton Rouge,"A. S. Gomez, Inc.",259104,,LA
1,Houston,"Acadian Roofing, LLC",298888,,TX
2,BATON ROUGE,AGR Construction LLC,283441,,LA
3,Covington,"ALL AROUND ROOFING, LLC",293664,,LA
4,Baton Rouge,Alvarez Roofing L.L.C.,242739,,LA
...,...,...,...,...,...
81,Baton Rouge,"V & V Roofing & Sheet Metal, L.L.C.",219384,,LA
82,Shreveport,Vigil Construction LLC,305587,,LA
83,Baton Rouge,"VR&C, LLC",303865,,LA
84,Denham Springs,"Welner's Construction, LLC",297583,,LA


In [12]:
df.to_csv('contractors.csv')

## Chiropractors

We tried to do the work ourselves and broke all of our bones. Guess we need a chiropractor! Thank goodness there's [a list of them on the Louisiana chiropractor board site](http://www.lachiropracticboard.com/lic-drs.htm).

### Create a dataframe of all of the chiropractors' names `5 points`

I'll give you this for free: the HTML on that page is awful, so make sure you're using `html5lib` as your parser. Otherwise it's going to be a lot more difficult!

I'll also give you the option of doing this by **pulling the name lists from the page** and parsing them with regular expressions, it's up to you! It's a serious pain either way.

> Also: they broke the page when they updated it last, so you might want to scrape [the archive.org version](https://web.archive.org/web/20210112131408/http://www.lachiropracticboard.com/lic-drs.htm) instead

In [203]:
from bs4 import BeautifulSoup
import requests

result = requests.get('https://web.archive.org/web/20210112131408/http://www.lachiropracticboard.com/lic-drs.htm')
soup = BeautifulSoup(result.content, 'html5lib')
tables = soup.find_all('table')

In [204]:
# Get the names
rows = soup.findAll("td", {"width" : "80%"})[1].findAll("td", {"width" : "64%"})

list = []

for item in rows:
    names = item.findAll('p')
    for name in names:
        list.append(name.text)

In [205]:
# Remove items matching some terms
trash = ['\xa0','Return To Search List','Doctors of Chiropractic']
names = [ name for name in list[3:-1] if name not in trash ]

In [206]:
# Remove items containing some terms
names = [word for word in names if not any(bad in word for bad in trash)]

In [207]:
# Clean the names
names = [re.sub(r'\n|D.\s?C.\s?|\s\s| \. | III,| \. ', '', i) for i in names]
names = [name.strip(' ') for name in names]

In [208]:
names = [name.replace(" , ", "") for name in names]

In [209]:
# Remove empty items
names = [name for name in names if name]

In [210]:
df_chiro = pd.DataFrame(names,columns=['FullName'])

df_chiro

Unnamed: 0,FullName
0,"ABSHIRE,Jean-Paul"
1,"ABSHIRE,Rowdy C."
2,"ACCARDO,Casey Patrick"
3,"ALLEN,Peggy Alayna"
4,"ALLIGOOD, Jr.,Manfred Duval"
...,...
759,"ZAHN,Robert L."
760,"ZAKREWSKI,Edward W."
761,"ZEAGLER,Jon Eric"
762,"ZHANG,Yuwei"


### Add columns for their names: first, last, middle `1 point`

In [227]:
df_chiro['FullNameClean'] = [name.replace("Jr. ,", "") for name in df_chiro['FullName']]
df_chiro['FullNameClean'] = [name.replace(" Jr.,", "") for name in df_chiro['FullNameClean']]
df_chiro['FullNameClean'] = [name.replace(" . ", "") for name in df_chiro['FullNameClean']]
# df_chiro['FullNameClean'] = [name.replace(". ", "") for name in df_chiro['FullNameClean']]
df_chiro['FullNameClean'] = [name.replace(", ", ",") for name in df_chiro['FullNameClean']]
df_chiro['FullNameClean'] = [name.replace(",,", ",") for name in df_chiro['FullNameClean']]

df_chiro.head(30)

Unnamed: 0,FullName,FullNameClean,LastName,FirstName,MiddleName
0,"ABSHIRE,Jean-Paul","ABSHIRE,Jean-Paul",ABSHIRE,Jean-Paul,
1,"ABSHIRE,Rowdy C.","ABSHIRE,Rowdy C.",ABSHIRE,Rowdy,C.
2,"ACCARDO,Casey Patrick","ACCARDO,Casey Patrick",ACCARDO,Casey,Patrick
3,"ALLEN,Peggy Alayna","ALLEN,Peggy Alayna",ALLEN,Peggy,Alayna
4,"ALLIGOOD, Jr.,Manfred Duval","ALLIGOOD,Manfred Duval",ALLIGOOD,Manfred,Duval
5,"ANCAR,Kristin Patrice","ANCAR,Kristin Patrice",ANCAR,Kristin,Patrice
6,"ANDERSON, . Amy Allison","ANDERSON,Amy Allison",ANDERSON,Amy,Allison
7,"ANDRY,Derek J.","ANDRY,Derek J.",ANDRY,Derek,J.
8,"ANGELY,Timothy Wright","ANGELY,Timothy Wright",ANGELY,Timothy,Wright
9,"ANTHON, Jr.,George C.","ANTHON,George C.",ANTHON,George,C.


In [228]:
fname = df_chiro['FullNameClean'].str.split(',')

df_chiro['LastName'] = fname.str[0]
df_chiro['FirstName'] = fname.str[1].str.split(' ').str[0]
df_chiro['MiddleName'] = fname.str[1].str.split(' ').str[1]

In [229]:
df_chiro.sample(30)

Unnamed: 0,FullName,FullNameClean,LastName,FirstName,MiddleName
17,"AUCOIN,Mark Stephen","AUCOIN,Mark Stephen",AUCOIN,Mark,Stephen
492,"MITCHELL,Ryan Joshua","MITCHELL,Ryan Joshua",MITCHELL,Ryan,Joshua
667,"STEFFINS,Daniel F.","STEFFINS,Daniel F.",STEFFINS,Daniel,F.
121,"CHAUVIN,Edward R.","CHAUVIN,Edward R.",CHAUVIN,Edward,R.
671,"STRATTON,Angelle","STRATTON,Angelle",STRATTON,Angelle,
668,"STEINERT,Kent","STEINERT,Kent",STEINERT,Kent,
82,"BROWER,Stephen C.","BROWER,Stephen C.",BROWER,Stephen,C.
81,"BROUSSARD,Andre A.","BROUSSARD,Andre A.",BROUSSARD,Andre,A.
637,"SHOEMAKER,James L.","SHOEMAKER,James L.",SHOEMAKER,James,L.
5,"ANCAR,Kristin Patrice","ANCAR,Kristin Patrice",ANCAR,Kristin,Patrice
