# Texas Tow Trucks (`.apply` and `requests`)

We're going to scrape some [tow trucks in Texas](https://www.tdlr.texas.gov/tools_search/).

Try searching for the TLDR Number `006179570C`.

## Preparation

> You do not need to actually search this out using BeautifulSoup, this is more for you to say "it's a td, it isn't special, but it looks like the third td in a tr with a class" or something

### What is the URL you will be scraping?

In [1]:
# https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=006179570C

### When you search for information on a specific mine, do you need form data? If so, what is your form data going to be?

In [2]:
# Just the TDLR number on the URL
# and Referer and Cookie headers
# Referer: https://www.tdlr.texas.gov/tools_search/
# Cookie: ASPSESSIONIDSCHQSRDT=IHPBKFEDPMFEAFGGINJJOLHJ

## Scrape this page

Scrape this page, displaying the

- The business name
- Phone number
- License status
- Physical address

.

- *TIP: This one isn't very fun, but I have some secret tricks. **Ask me on the board**.*

In [3]:
# Look out for expired cookie
import requests
from bs4 import BeautifulSoup

url = 'https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=006179570C'
headers = {
    'Referer' : 'https://www.tdlr.texas.gov/tools_search/',
    'Cookie' : 'ASPSESSIONIDSCHQSRDT=MADCKFEDLJJKHEIBKMBGICEO'
}

response = requests.post(url, headers = headers)
doc = BeautifulSoup(response.text, 'html.parser')
doc


<link href="style.css" rel="stylesheet" type="text/css"/>
<html>
<head><title>Towing Company Information</title></head>
<body onload="scrollTo(4000,4000)" topmargin="0">
<style media="screen" type="text/css">
	body {
		background-color: #CCCCCC;
		border-color: #000000;
		color: #2e2d2c;
		font-family: Arial, Helvetica, sans-serif;
		font-size: .9em;
		line-height: normal;
		margin: 0 0 0 0;
		padding: 0 0 0 0;
		text-align: center;
	}

	#outerWrapper {
		position: relative;
		background-color: #ffc;
		margin: 0 auto 0 auto; 
		width: 960px;
	}
	#outerWrapper #header {
		background-color: #dbc9a0;
		text-align: left; 
	}
	#outerWrapper #header h1 {
		background-color: #fff6d9;
		text-align: left; 
		color: #000000;
	}
	#outerWrapper #topNavigation {
		clear: left;
		margin: 0 0 0 0;
		line-height: 2em;
		border-bottom: 1px solid #ccc;
		background-color: #FFC;
		border-bottom: 1px solid #ccc;
		border-top: 1px solid #ccc;
	}
	#outerWrapper #main-nav {
		background-color: #FFC;
		font-

In [4]:
tds = doc.find_all('td')
for td in tds:
    try:
        # The business name
        if td.strong.string.strip() == 'Name:':
            business_name = td.strong.next.next.string.strip().title()
        # Phone Number
        if td.strong.string.strip() == 'Phone:':
            phone_number = td.strong.next.next.string.strip() 
        # License status
        if td.strong.string.strip() == 'Status:':
            license_status = td.font.next.next.string.strip()
        # Physical address
        strongs = td.find_all('strong')
        for strong in strongs:
            if strong.string.strip() == 'Physical:':
                street = strong.next.next.next
                state = street.next.next
    except:
        pass
print(business_name)
print(phone_number)
print(license_status)
print(street.strip().title(), state.strip().title())

B.D. Smith Towing
8173330706
Active
13619 Brett Jackson Rd. Fort Worth, Tx. 76179


# Using .apply to find data about SEVERAL tow truck companies

The file `trucks-subset.csv` has information about the trucks, we'll use it to find the pages to scrape.

### Open up `trucks-subset.csv` and save it into a dataframe

In [5]:
import pandas as pd
trucks_subs = pd.read_csv('trucks-subset.csv')
trucks_subs

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C


### Open up `trucks-subset.csv` in a text editor, then look at your dataframe. Is something different about them? If so, make them match.

- *TIP: I can help with this.*

In [6]:
trucks_subs = pd.read_csv('trucks-subset.csv', converters = {'TDLR Number':str})
trucks_subs

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C


## Go through each row of the dataset, printing out the URL you will need to scrape for the information on that row

For example, `https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=006507931C`.

- *TIP: use .apply and a function*
- *TIP: If you need help with .apply, look at the "Using apply in pandas" notebook *

In [7]:
def urls_to_scrape(row):
    truck_url = 'https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber='+row['TDLR Number']
    print(truck_url)

trucks_subs.apply(urls_to_scrape, axis=1)

https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=006507931C
https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=006179570C
https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=006502097C


0    None
1    None
2    None
dtype: object

### Save this URL into a new column of your dataframe

- *TIP: Use a function and `.apply`*
- *TIP: Be sure to use `return`*

In [8]:
def urls_to_scrape(row):
    truck_url = 'https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=' + row['TDLR Number']
    return pd.Series ({
        'truck_url' : truck_url
    })

df_trucks_subs = trucks_subs.apply(urls_to_scrape, axis=1).join(trucks_subs)
df_trucks_subs

Unnamed: 0,truck_url,TDLR Number
0,https://www.tdlr.texas.gov/tools_search/mccs_d...,006507931C
1,https://www.tdlr.texas.gov/tools_search/mccs_d...,006179570C
2,https://www.tdlr.texas.gov/tools_search/mccs_d...,006502097C


## Go through each row of the dataset, printing out information about each tow truck company.

Now will be **scraping** inside of your function.

- The business name
- Phone number
- License status
- Physical address

Just print it out for now.

- *TIP: use .apply and a function*
- *TIP: If you need help with .apply, look at the "Using apply in pandas" notebook *

In [9]:
# Look out for expired cookie

import requests
from bs4 import BeautifulSoup

def scrape_trucks(row):
    url_trucks = row['truck_url']
    headers_trucks = {
        'Referer' : 'https://www.tdlr.texas.gov/tools_search/',
        'Cookie' : 'ASPSESSIONIDSCHQSRDT=MADCKFEDLJJKHEIBKMBGICEO'
    }
    response_trucks = requests.post(url_trucks, headers = headers_trucks)
    doc_trucks = BeautifulSoup(response_trucks.text, 'html.parser')

    tds = doc_trucks.find_all('td')
    for td in tds:
        try:
            # The business name
            if td.strong.string.strip() == 'Name:':
                business_name = td.strong.next.next.string.strip().title()
            # Phone Number
            if td.strong.string.strip() == 'Phone:':
                phone_number = td.strong.next.next.string.strip()
            # License status
            if td.strong.string.strip() == 'Status:':
                license_status = td.font.next.next.string.strip()
            # Physical address
            strongs = td.find_all('strong')
            for strong in strongs:
                if strong.string.strip() == 'Physical:':
                            street = strong.next.next.next
                            state = street.next.next
                            street = street.strip().title()
                            state = state.strip().title()
        except:
            pass
    print(business_name)
    print(phone_number)
    print(license_status)
    print(street, state)
    print('------')
    

df_trucks_subs.apply(scrape_trucks, axis = 1)

Augustus E Smith
9032276464
Active
103 N Main St Bonham, Tx. 75418
------
B.D. Smith Towing
8173330706
Active
13619 Brett Jackson Rd. Fort Worth, Tx. 76179
------
Barry Michael Smith
8066544404
Active
4501 W Cemetery Rd Canyon, Tx. 79015
------


0    None
1    None
2    None
dtype: object

## Scrape the following information for each row of the dataset, and save it into new columns in your dataframe.

- The business name
- Phone number
- License status
- Physical address

It's basically what we did before, but using the function a little differently.

- *TIP: Use .apply and a function*
- *TIP: Remember to use `return`*

In [10]:
# Look out for expired cookie

import requests
from bs4 import BeautifulSoup
import re

def scrape_trucks(row):
    url_trucks = row['truck_url']
    headers_trucks = {
        'Referer' : 'https://www.tdlr.texas.gov/tools_search/',
        'Cookie' : 'ASPSESSIONIDSCHQSRDT=MADCKFEDLJJKHEIBKMBGICEO'
    }
    response_trucks = requests.post(url_trucks, headers = headers_trucks)
    doc_trucks = BeautifulSoup(response_trucks.text, 'html.parser')

    tds = doc_trucks.find_all('td')
    for td in tds:
        try:
            # The business name
            if td.strong.string.strip() == 'Name:':
                business_name = td.strong.next.next.string.strip().title()
            # Phone Number
            if td.strong.string.strip() == 'Phone:':
                phone_number = td.strong.next.next.string.strip()
            # License status
            if td.strong.string.strip() == 'Status:':
                license_status = td.font.next.next.string.strip()
            # Physical address
            strongs = td.find_all('strong')
            for strong in strongs:
                if strong.string.strip() == 'Physical:':
                            street = strong.next.next.next
                            state = street.next.next
                            street = street.strip().title()
                            state = state.strip().title()
        except:
            pass
        
    return pd.Series({
        'business_name':business_name,
        'phone_number':phone_number,
        'license_status':license_status,
        'physical_address': street + ', '+ state
    })
    

df_trucks_subs = df_trucks_subs.apply(scrape_trucks, axis = 1).join(df_trucks_subs)

In [11]:
df_trucks_subs

Unnamed: 0,business_name,license_status,phone_number,physical_address,truck_url,TDLR Number
0,Augustus E Smith,Active,9032276464,"103 N Main St, Bonham, Tx. 75418",https://www.tdlr.texas.gov/tools_search/mccs_d...,006507931C
1,B.D. Smith Towing,Active,8173330706,"13619 Brett Jackson Rd., Fort Worth, Tx. 76179",https://www.tdlr.texas.gov/tools_search/mccs_d...,006179570C
2,Barry Michael Smith,Active,8066544404,"4501 W Cemetery Rd, Canyon, Tx. 79015",https://www.tdlr.texas.gov/tools_search/mccs_d...,006502097C


### Save your dataframe as a CSV

In [12]:
df_trucks_subs.to_csv('trucks-subset-requests.csv', index = False)

### Re-open your dataframe to confirm you didn't save any extra weird columns

In [13]:
pd.read_csv('trucks-subset-requests.csv')

Unnamed: 0,business_name,license_status,phone_number,physical_address,truck_url,TDLR Number
0,Augustus E Smith,Active,9032276464,"103 N Main St, Bonham, Tx. 75418",https://www.tdlr.texas.gov/tools_search/mccs_d...,006507931C
1,B.D. Smith Towing,Active,8173330706,"13619 Brett Jackson Rd., Fort Worth, Tx. 76179",https://www.tdlr.texas.gov/tools_search/mccs_d...,006179570C
2,Barry Michael Smith,Active,8066544404,"4501 W Cemetery Rd, Canyon, Tx. 79015",https://www.tdlr.texas.gov/tools_search/mccs_d...,006502097C


## Repeat this process for the entire `tow-trucks.csv` file

In [14]:
tow_trucks = pd.read_csv('tow-trucks.csv')
tow_trucks.head()

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C
3,006494912C
4,0649468VSF


In [15]:
tow_trucks.shape

(20, 1)

In [16]:
# Look out for expired cookie

import requests
from bs4 import BeautifulSoup
import re

def urls_to_scrape(row):
    truck_url = 'https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=' + row['TDLR Number']
    return pd.Series ({
        'truck_url' : truck_url
    })



def scrape_trucks(row):
    url_trucks = row['truck_url']
    headers_trucks = {
        'Referer' : 'https://www.tdlr.texas.gov/tools_search/',
        'Cookie' : 'ASPSESSIONIDSCHQSRDT=MADCKFEDLJJKHEIBKMBGICEO'
    }
    response_trucks = requests.post(url_trucks, headers = headers_trucks)
    doc_trucks = BeautifulSoup(response_trucks.text, 'html.parser')

    tds = doc_trucks.find_all('td')
    for td in tds:
        try:
            # The business name
            if td.strong.string.strip() == 'Name:':
                business_name = td.strong.next.next.string.strip().title()
            # Phone Number
            if td.strong.string.strip() == 'Phone:':
                phone_number = td.strong.next.next.string.strip()
            # License status
            if td.strong.string.strip() == 'Status:':
                if td.font.next.next.string.strip():
                    license_status = 'Active'
                else:
                    license_status = 'Expired'
            # Physical address
            strongs = td.find_all('strong')
            for strong in strongs:
                if strong.string.strip() == 'Physical:':
                            street = strong.next.next.next
                            state = street.next.next
                            street = street.strip().title()
                            state = state.strip().title()
        except:
            pass
        
    return pd.Series({
        'business_name':business_name,
        'phone_number':phone_number,
        'license_status':license_status,
        'physical_address': street + ', '+ state
    })
    

df_tow_trucks = tow_trucks.apply(urls_to_scrape, axis=1).join(tow_trucks)
df_tow_trucks = df_tow_trucks.apply(scrape_trucks, axis = 1).join(df_tow_trucks)
df_tow_trucks.head()

Unnamed: 0,business_name,license_status,phone_number,physical_address,truck_url,TDLR Number
0,Augustus E Smith,Active,9032276464,"103 N Main St, Bonham, Tx. 75418",https://www.tdlr.texas.gov/tools_search/mccs_d...,006507931C
1,B.D. Smith Towing,Active,8173330706,"13619 Brett Jackson Rd., Fort Worth, Tx. 76179",https://www.tdlr.texas.gov/tools_search/mccs_d...,006179570C
2,Barry Michael Smith,Active,8066544404,"4501 W Cemetery Rd, Canyon, Tx. 79015",https://www.tdlr.texas.gov/tools_search/mccs_d...,006502097C
3,Heath Smith,Expired,940-552-0687,"1529 Wilbarger St, Vernon, Tx. 76384",https://www.tdlr.texas.gov/tools_search/mccs_d...,006494912C
4,Heath Smith,Expired,9405520687,"1529 Wilbarger St, Vernon, Tx. 76384",https://www.tdlr.texas.gov/tools_search/mccs_d...,0649468VSF


In [17]:
df_tow_trucks.shape

(20, 6)

In [18]:
df_tow_trucks.to_csv('tow-trucks-requests.csv', index = False)

In [19]:
pd.read_csv('tow-trucks-requests.csv').head()

Unnamed: 0,business_name,license_status,phone_number,physical_address,truck_url,TDLR Number
0,Augustus E Smith,Active,9032276464,"103 N Main St, Bonham, Tx. 75418",https://www.tdlr.texas.gov/tools_search/mccs_d...,006507931C
1,B.D. Smith Towing,Active,8173330706,"13619 Brett Jackson Rd., Fort Worth, Tx. 76179",https://www.tdlr.texas.gov/tools_search/mccs_d...,006179570C
2,Barry Michael Smith,Active,8066544404,"4501 W Cemetery Rd, Canyon, Tx. 79015",https://www.tdlr.texas.gov/tools_search/mccs_d...,006502097C
3,Heath Smith,Expired,940-552-0687,"1529 Wilbarger St, Vernon, Tx. 76384",https://www.tdlr.texas.gov/tools_search/mccs_d...,006494912C
4,Heath Smith,Expired,9405520687,"1529 Wilbarger St, Vernon, Tx. 76384",https://www.tdlr.texas.gov/tools_search/mccs_d...,0649468VSF


In [20]:
pd.read_csv('tow-trucks-requests.csv').shape

(20, 6)