# Texas Tow Trucks

We're going to scrape some [tow trucks in Texas](https://www.tdlr.texas.gov/tools_search/).

Try searching for the TLDR Number `006179570C`.

## Preparation: Knowing your tags

These questions are the same for every data set, and might not work exactly for yours.

### What is the tag and class name for the business name?

In [1]:
# <tr> 
    # <td align="left" width="50%"><strong>Name:&nbsp;&nbsp;&nbsp;</strong>B.D. SMITH TOWING</td>
    # <td align="left" nowrap="nowrap" width="50%"><strong>DBA:&nbsp;&nbsp;&nbsp;<font color="red">NO DATA</font></strong></td>		
#</tr>

### What is the tag and class name for their phone number?

In [2]:
# <td><strong>Phone:</strong>&nbsp;&nbsp;&nbsp;8173330706<br><br></td>

### What is the tag and class name for their license status? (Expired, Active, Suspended)

In [3]:
# <td bgcolor="#FFF8DC"><strong>Status:</strong>&nbsp;&nbsp;<font color="green"><font color="green">Active</font></font></td>

### What is the tag and class name for the physical address?

In [4]:
# <td nowrap="nowrap"><strong>Carrier Type:</strong>&nbsp;&nbsp;Tow Truck Company
   # <br>
	   # <b>Number of Active Tow Trucks:</b> &nbsp; 0
           # <br><br>
           # <strong>Address Information</strong><br>
           # <strong>Mailing:</strong><br>13619 BRETT JACKSON RD <br>
			   # FORT WORTH,&nbsp;TX.&nbsp;76179
           # <br><br>
           # <strong>Physical:</strong><br>13619 BRETT JACKSON RD.<br>
			   # FORT WORTH,&nbsp;TX.&nbsp;76179
       # </td>

## Setup: Import what you'll need to scrape the page

Use `requests`, not `urllib`.

In [5]:
import requests

In [6]:
from bs4 import BeautifulSoup

## Scrape this page

Scrape this page, displaying the

- The business name
- Phone number
- License status
- Physical address

**You should know how to do `.post` requests by now.**

- *TIP: For physical address, **ask me on the board** and I'll give you a secret trick about situations like this.*

In [7]:
# If you try to print the code, and returns errors, the cookie in headers might have expired.

url_tow = "https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=006179570C"
data = {
    'namedata':'',
    'name_carrier_type':'COMPANY',
    'searchtype':'mcr',
    'mcrdata':'006179570C',
    'citydata':'',
    'city_status':'A',
    'city_carrier_type':'tow',
    'zipcodedata':'',
    'zip_status':'ALL',
    'zip_carrier_type':'all',
    'proc':'',
}
headers = {
    'Referer': 'https://www.tdlr.texas.gov/tools_search/',
    'Cookie': 'ASPSESSIONIDSGHSRQDT=EIFGAGBADBMHBNPKHMCEOBHG'
}

response_tow = requests.post(url_tow, data=data, headers=headers)
doc_tow = BeautifulSoup(response_tow.text, "html.parser")
doc_tow


<link href="style.css" rel="stylesheet" type="text/css"/>
<html>
<head><title>Towing Company Information</title></head>
<body onload="scrollTo(4000,4000)" topmargin="0">
<style media="screen" type="text/css">
	body {
		background-color: #CCCCCC;
		border-color: #000000;
		color: #2e2d2c;
		font-family: Arial, Helvetica, sans-serif;
		font-size: .9em;
		line-height: normal;
		margin: 0 0 0 0;
		padding: 0 0 0 0;
		text-align: center;
	}

	#outerWrapper {
		position: relative;
		background-color: #ffc;
		margin: 0 auto 0 auto; 
		width: 960px;
	}
	#outerWrapper #header {
		background-color: #dbc9a0;
		text-align: left; 
	}
	#outerWrapper #header h1 {
		background-color: #fff6d9;
		text-align: left; 
		color: #000000;
	}
	#outerWrapper #topNavigation {
		clear: left;
		margin: 0 0 0 0;
		line-height: 2em;
		border-bottom: 1px solid #ccc;
		background-color: #FFC;
		border-bottom: 1px solid #ccc;
		border-top: 1px solid #ccc;
	}
	#outerWrapper #main-nav {
		background-color: #FFC;
		font-

In [8]:
doc_tow.find_all('td', attrs = {'width':'50%'})

[<td align="left" width="50%"><strong>Name:   </strong>B.D. SMITH TOWING</td>,
 <td align="left" nowrap="nowrap" width="50%"><strong>DBA:   <font color="red">NO DATA</font></strong></td>,
 <td align="left" width="50%"><a href="mccs_search.asp"><strong>New Search</strong></a></td>,
 <td align="right" width="50%"><a href="#top"><strong>Top of Page</strong></a>
 </td>]

In [9]:
business_name = doc_tow.find_all('td', attrs = {'width':'50%'})[0].strong.next.next.title()
business_name

'B.D. Smith Towing'

In [10]:
for td in doc_tow.find_all('td', attrs = {'width':'50%'}):
    if td.strong.next.strip() == 'Name:':
        business_name = td.strong.next.next
        break
print(business_name.title())

B.D. Smith Towing


In [11]:
phone_no = doc_tow.find('table', attrs = {'id':'t1'}).find_all('table')[2].find_all('td')[-2].strong.next.next.strip()
phone_no

'8173330706'

In [12]:
for tg in doc_tow.find('table', attrs = {'id':'t1'}).find_all('table')[2].find_all('td'):
    if tg.strong:
        if tg.strong.next.string == 'Phone:':
            phone_no = tg.strong.next.next.strip()
            break
print(phone_no)

8173330706


In [13]:
license_status = doc_tow.find_all('td', attrs = {'bgcolor':'#FFF8DC'})[-2].font.font.string
license_status

'Active'

In [14]:
for ti in doc_tow.find_all('td', attrs = {'bgcolor':'#FFF8DC'}):
    if ti.strong:
        if ti.strong.next == 'Status:':
            license_status = ti.font.font.string
            break
print(license_status)

Active


In [15]:
address = doc_tow.find_all('table')[-3].find_all('strong')[-1].next.next.next.strip()
state = doc_tow.find_all('table')[-3].find_all('strong')[-1].next.next.next.next.next.strip().replace("\xa0","")
physical_address = address + ', '+ state
physical_address

'13619 BRETT JACKSON RD., FORT WORTH,TX.76179'

In [16]:
for to in doc_tow.find_all('table')[-3].find_all('strong'):
    if to.next == 'Physical:':
        physical_addresses = to.next.next.next.strip() + ', ' + to.next.next.next.next.next.strip()
        break
print(physical_addresses)

13619 BRETT JACKSON RD., FORT WORTH, TX. 76179


## Getting information on many tow truck companies

### Reading in our source

Using pandas, read in `trucks-subset.csv`.

In [17]:
import pandas as pd

In [18]:
trucks_subset = pd.read_csv('trucks-subset.csv', converters = {'TDLR Number':str})
trucks_subset

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C


In [19]:
trucks_subset.dtypes

TDLR Number    object
dtype: object

## Scrape every single row, storing the information we scraped before into your dataframe.

You probably want to open up the Jupyter Notebook that's about `.apply`.

In [20]:
def trucks_rows(row):
    url_tracks = "https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=" + row['TDLR Number']
    data_tracks = {
        'namedata':'',
        'name_carrier_type':'COMPANY',
        'searchtype':'mcr',
        'mcrdata': row['TDLR Number'],
        'citydata':'',
        'city_status':'A',
        'city_carrier_type':'tow',
        'zipcodedata':'',
        'zip_status':'ALL',
        'zip_carrier_type':'all',
        'proc':'',
    }
    headers_tracks = {
        'Referer': 'https://www.tdlr.texas.gov/tools_search/',
        'Cookie': 'ASPSESSIONIDSGHSRQDT=EIFGAGBADBMHBNPKHMCEOBHG'
        
    }
    
    response_tracks = requests.post(url_tracks, data=data_tracks, headers=headers_tracks)
    doc_tracks = BeautifulSoup(response_tracks.text, "html.parser")
    
    for td in doc_tracks.find_all('td', attrs = {'width':'50%'}):
        if td.strong.next.strip() == 'Name:':
            name_biz = td.strong.next.next
            break
    for tg in doc_tracks.find('table', attrs = {'id':'t1'}).find_all('table')[2].find_all('td'):
        if tg.strong:
            if tg.strong.next.string == 'Phone:':
                number_phone = tg.strong.next.next.strip()
                break
    for ti in doc_tracks.find_all('td', attrs = {'bgcolor':'#FFF8DC'}):
        if ti.strong:
            if ti.strong.next == 'Status:':
                status_lic = ti.font.font.string
                break
    for to in doc_tracks.find_all('table')[-3].find_all('strong'):
        if to.next == 'Physical:':
            address_phy = to.next.next.next.strip() + ', ' + to.next.next.next.next.next.strip()
            break
    
    return pd.Series({
        'business_name' : name_biz,
        'phone_number' : number_phone,
         'license_status' : status_lic,
         'physical_address' : address_phy
    })


new_trucks_subset = trucks_subset.apply(trucks_rows, axis=1).join(trucks_subset)
new_trucks_subset

Unnamed: 0,business_name,license_status,phone_number,physical_address,TDLR Number
0,AUGUSTUS E SMITH,Active,9032276464,"103 N MAIN ST, BONHAM, TX. 75418",006507931C
1,B.D. SMITH TOWING,Active,8173330706,"13619 BRETT JACKSON RD., FORT WORTH, TX. 76179",006179570C
2,BARRY MICHAEL SMITH,Active,8066544404,"4501 W CEMETERY RD, CANYON, TX. 79015",006502097C


### Save your dataframe

In [21]:
new_trucks_subset.dtypes

business_name       object
license_status      object
phone_number        object
physical_address    object
TDLR Number         object
dtype: object

In [22]:
new_trucks_subset.to_csv('trucks-subset_expanded.csv', index=False)

### Re-open your dataframe to confirm you didn't save any extra weird columns

In [23]:
pd.read_csv('trucks-subset_expanded.csv')

Unnamed: 0,business_name,license_status,phone_number,physical_address,TDLR Number
0,AUGUSTUS E SMITH,Active,9032276464,"103 N MAIN ST, BONHAM, TX. 75418",006507931C
1,B.D. SMITH TOWING,Active,8173330706,"13619 BRETT JACKSON RD., FORT WORTH, TX. 76179",006179570C
2,BARRY MICHAEL SMITH,Active,8066544404,"4501 W CEMETERY RD, CANYON, TX. 79015",006502097C


## Repeat this process for the entire `tow-trucks.csv` file

In [24]:
tow_trucks = pd.read_csv('tow-trucks.csv')
tow_trucks.head()

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C
3,006494912C
4,0649468VSF


In [25]:
tow_trucks.dtypes

TDLR Number    object
dtype: object

In [26]:
def tow_trucks_rows(row):
    url_towtracks = "https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=" + row['TDLR Number']
    headers_towtracks = {
        'Referer': 'https://www.tdlr.texas.gov/tools_search/',
        'Cookie': 'ASPSESSIONIDSGHSRQDT=EIFGAGBADBMHBNPKHMCEOBHG'
        
    }
    
    response_towtracks = requests.post(url_towtracks, headers=headers_towtracks)
    doc_towtracks = BeautifulSoup(response_towtracks.text, "html.parser")
    
    for td in doc_towtracks.find_all('td', attrs = {'width':'50%'}):
        if td.strong.next.strip() == 'Name:':
            name_business = td.strong.next.next
            break
    for tg in doc_towtracks.find('table', attrs = {'id':'t1'}).find_all('table')[2].find_all('td'):
        if tg.strong:
            if tg.strong.next.string == 'Phone:':
                phone_num = tg.strong.next.next.strip()
                break
    for ti in doc_towtracks.find_all('td', attrs = {'bgcolor':'#FFF8DC'}):
        if ti.strong:
            if ti.strong.next == 'Status:':
                lic_status = ti.font.string
                break
    for to in doc_towtracks.find_all('table')[-3].find_all('strong'):
        if to.next == 'Physical:':
            phy_address = to.next.next.next.strip() + ', ' + to.next.next.next.next.next.strip()
            break
    
    return pd.Series({
        'business_name' : name_business,
        'phone_number' : phone_num,
          'license_status' : lic_status,
        'physical_address' : phy_address
    })


tow_trucks_extended = tow_trucks.apply(tow_trucks_rows, axis=1).join(tow_trucks)
tow_trucks_extended

Unnamed: 0,business_name,license_status,phone_number,physical_address,TDLR Number
0,AUGUSTUS E SMITH,Active,9032276464,"103 N MAIN ST, BONHAM, TX. 75418",006507931C
1,B.D. SMITH TOWING,Active,8173330706,"13619 BRETT JACKSON RD., FORT WORTH, TX. 76179",006179570C
2,BARRY MICHAEL SMITH,Active,8066544404,"4501 W CEMETERY RD, CANYON, TX. 79015",006502097C
3,HEATH SMITH,Expired,940-552-0687,"1529 WILBARGER ST, VERNON, TX. 76384",006494912C
4,HEATH SMITH,Expired,9405520687,"1529 WILBARGER ST, VERNON, TX. 76384",0649468VSF
5,HYSMITH AUTOMOTIVE,Active,940-521-0294,"1210 US 380 BYPASS, GRAHAM, TX. 76450",006448786C
6,HYSMITH AUTOMOTIVE & TRUCK REPAIR INC,Expired,9405210294,"927 LOVING HWY, GRAHAM, TX. 76450",0648444VSF
7,HYSMITH AUTOMOTIVE & TRUCK REPAIR INC,Active,9405210294,"1210 380 BYPASS, GRAHAM, TX. 76450",0651667VSF
8,JEFF & WENDY SMITH,Expired,903 586 6234,"10842 FM 2138 N, JACKSONVILLE, TX. 75766",006017767C
9,JEFF SMITH,Active,8324354670,"4338 HARVEY RD, CROSBY, TX. 77532",006495492C


In [27]:
tow_trucks_extended.dtypes

business_name       object
license_status      object
phone_number        object
physical_address    object
TDLR Number         object
dtype: object

In [28]:
tow_trucks_extended.to_csv('tow-trucks_extended.csv', index=False)

In [29]:
pd.read_csv('tow-trucks_extended.csv').head()

Unnamed: 0,business_name,license_status,phone_number,physical_address,TDLR Number
0,AUGUSTUS E SMITH,Active,9032276464,"103 N MAIN ST, BONHAM, TX. 75418",006507931C
1,B.D. SMITH TOWING,Active,8173330706,"13619 BRETT JACKSON RD., FORT WORTH, TX. 76179",006179570C
2,BARRY MICHAEL SMITH,Active,8066544404,"4501 W CEMETERY RD, CANYON, TX. 79015",006502097C
3,HEATH SMITH,Expired,940-552-0687,"1529 WILBARGER ST, VERNON, TX. 76384",006494912C
4,HEATH SMITH,Expired,9405520687,"1529 WILBARGER ST, VERNON, TX. 76384",0649468VSF


In [30]:
pd.read_csv('tow-trucks_extended.csv').shape

(20, 5)

In [31]:
pd.read_csv('tow-trucks_extended.csv').dtypes

business_name       object
license_status      object
phone_number        object
physical_address    object
TDLR Number         object
dtype: object