# Texas Tow Trucks (`.apply` and Selenium)

We're going to scrape some [tow trucks in Texas](https://www.tdlr.texas.gov/tools_search/).

Try searching for the TLDR Number `006179570C`.

## Preparation

### What URL will Selenium be starting on?

- Tip: The answer is *not* `https://www.tdlr.texas.gov/tools_search/`

In [4]:
#https://www.tdlr.texas.gov/tools_search/mccs_search.asp

### Why are you using Selenium for this?

In [5]:
#because the page doesn't give us the data if we use bots! it needs a browser

## Scrape this page

Scrape this page, displaying the

- The business name
- Phone number
- License status
- Physical address

- *TIP: For physical address, **ask me on the board** and I'll give you a secret trick about situations like this.*

In [39]:
from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--incognito")

driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.tdlr.texas.gov/tools_search/mccs_search.asp')

In [31]:
check_box = driver.find_element_by_xpath('//*[@id="mcrbutton"]')
check_box.click()

search_in = driver.find_element_by_xpath('//*[@id="mcrdata"]')
search_in.click()
search_in.send_keys("006179570C")


In [32]:
search_box = driver.find_element_by_xpath('//*[@id="submit3"]').click()

In [33]:
name = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]').text
name

'Name:   B.D. SMITH TOWING'

In [10]:
phone = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[4]/td[1]').text
phone

'Phone:   8173330706'

In [11]:
status = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[1]/td[2]/font/font').text
status

'Active'

In [12]:
physical = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[2]')
print(physical.text.split(":")[-1].strip())

13619 BRETT JACKSON RD.
FORT WORTH, TX. 76179


In [34]:
from bs4 import BeautifulSoup
doc = BeautifulSoup(driver.page_source, 'html.parser')

# Using .apply to find data about SEVERAL tow truck companies

The file `trucks-subset.csv` has information about the trucks, we'll use it to find the pages to scrape.

### Open up `trucks-subset.csv` and save it into a dataframe

In [44]:
import pandas as pd

In [45]:
df = pd.read_csv("trucks-subset.csv")
df

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C


### Open up `trucks-subset.csv` in a text editor, then look at your dataframe. Is something different about them? If so, make them match.

- *TIP: I can help with this.*

In [16]:
#no

## Use `.apply` to go through each row of the dataset, printing out information about each tow truck company.

- The business name
- Phone number
- License status
- Physical address

Just print it out for now.

- *TIP: use .apply and a function*
- *TIP: If you need help with .apply, look at the "Using apply in pandas" notebook *

In [46]:


def info(row):
    driver = webdriver.Chrome(chrome_options=chrome_options)
    driver.get('https://www.tdlr.texas.gov/tools_search/mccs_search.asp')
    check_box = driver.find_element_by_xpath('//*[@id="mcrbutton"]')
    check_box.click()
    search_in = driver.find_element_by_xpath('//*[@id="mcrdata"]')
    search_in.click()
    search_in.send_keys(row['TDLR Number'])
    driver.find_element_by_xpath('//*[@id="submit3"]').click()
    name = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]').text
    phone = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[4]/td[1]').text
    status = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[1]/td[2]/font/font').text
    physical = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[2]')
    return name, phone, status, physical.text.split(":")[-1].strip()


df.apply(info, axis=1)



0    (Name:   AUGUSTUS E SMITH, Phone:   9032276464...
1    (Name:   B.D. SMITH TOWING, Phone:   817333070...
2    (Name:   BARRY MICHAEL SMITH, Phone:   8066544...
dtype: object

## Scrape the following information for each row of the dataset, and save it into new columns in your dataframe.

- The business name
- Phone number
- License status
- Physical address

It's basically what we did before, but using the function a little differently.

- *TIP: Use .apply and a function*
- *TIP: Remember to use `return`*

In [50]:
def infophone(row):
    driver = webdriver.Chrome(chrome_options=chrome_options)
    driver.get('https://www.tdlr.texas.gov/tools_search/mccs_search.asp')
    check_box = driver.find_element_by_xpath('//*[@id="mcrbutton"]')
    check_box.click()
    search_in = driver.find_element_by_xpath('//*[@id="mcrdata"]')
    search_in.click()
    search_in.send_keys(row['TDLR Number'])
    driver.find_element_by_xpath('//*[@id="submit3"]').click()
    name = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]').text
    phone = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[4]/td[1]').text
    status = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[1]/td[2]/font/font').text
    physical = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[2]')
    return pd.Series({
        'name': name,
        'phone': phone,
        'status': status,
        'physical': physical.text.split(":")[-1].strip()
    })

#df.apply(infophone, axis=1)
df = df.apply(infophone, axis=1).join(df)
#df.head()


In [53]:
df

Unnamed: 0,name,phone,physical,status,TDLR Number
0,Name: AUGUSTUS E SMITH,Phone: 9032276464,"103 N MAIN ST\nBONHAM, TX. 75418",Active,006507931C
1,Name: B.D. SMITH TOWING,Phone: 8173330706,"13619 BRETT JACKSON RD.\nFORT WORTH, TX. 76179",Active,006179570C
2,Name: BARRY MICHAEL SMITH,Phone: 8066544404,"4501 W CEMETERY RD\nCANYON, TX. 79015",Active,006502097C


In [54]:
df['name'] = df['name'].str.replace("Name: ", "")
df

Unnamed: 0,name,phone,physical,status,TDLR Number
0,AUGUSTUS E SMITH,Phone: 9032276464,"103 N MAIN ST\nBONHAM, TX. 75418",Active,006507931C
1,B.D. SMITH TOWING,Phone: 8173330706,"13619 BRETT JACKSON RD.\nFORT WORTH, TX. 76179",Active,006179570C
2,BARRY MICHAEL SMITH,Phone: 8066544404,"4501 W CEMETERY RD\nCANYON, TX. 79015",Active,006502097C


In [55]:
df['phone'] = df['phone'].str.replace("Phone: ", "")
df

Unnamed: 0,name,phone,physical,status,TDLR Number
0,AUGUSTUS E SMITH,9032276464,"103 N MAIN ST\nBONHAM, TX. 75418",Active,006507931C
1,B.D. SMITH TOWING,8173330706,"13619 BRETT JACKSON RD.\nFORT WORTH, TX. 76179",Active,006179570C
2,BARRY MICHAEL SMITH,8066544404,"4501 W CEMETERY RD\nCANYON, TX. 79015",Active,006502097C


In [22]:
#done above!

### Save your dataframe as a CSV

In [56]:
df.to_csv("tow_subset_new.csv", index = False)

### Re-open your dataframe to confirm you didn't save any extra weird columns

In [57]:
df = pd.read_csv("tow_subset_new.csv")
df.head()

Unnamed: 0,name,phone,physical,status,TDLR Number
0,AUGUSTUS E SMITH,9032276464,"103 N MAIN ST\nBONHAM, TX. 75418",Active,006507931C
1,B.D. SMITH TOWING,8173330706,"13619 BRETT JACKSON RD.\nFORT WORTH, TX. 76179",Active,006179570C
2,BARRY MICHAEL SMITH,8066544404,"4501 W CEMETERY RD\nCANYON, TX. 79015",Active,006502097C


## Repeat this process for the entire `tow-trucks.csv` file

In [60]:
df1 = pd.read_csv("tow-trucks.csv")
df1

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C
3,006494912C
4,0649468VSF
5,006448786C
6,0648444VSF
7,0651667VSF
8,006017767C
9,006495492C


In [75]:
from selenium.common.exceptions import NoSuchElementException
def infophone(row):
    driver = webdriver.Chrome(chrome_options=chrome_options)
    driver.get('https://www.tdlr.texas.gov/tools_search/mccs_search.asp')
    try:
        check_box = driver.find_element_by_xpath('//*[@id="mcrbutton"]')
        check_box.click()
        search_in = driver.find_element_by_xpath('//*[@id="mcrdata"]')
        search_in.click()
        search_in.send_keys(row['TDLR Number'])
        driver.find_element_by_xpath('//*[@id="submit3"]').click()
        name = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]').text
        phone = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[4]/td[1]').text
        status = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[1]/td[2]/font').text
        physical = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[2]')
        physical_txt = physical.text.split(":")[-1].strip()
    except NoSuchElementException:
        name = "NOT FOUND"
        phone = "NOT FOUND"
        status = "NOT FOUND"
        physical_txt = "NOT FOUND"
    return pd.Series({
        'name': name,
        'phone': phone,
        'status': status,
        'physical': physical_txt
        })


df1.apply(infophone, axis=1)
#df1 = df1.apply(infoall, axis=1).join(df)
#df.head()


Unnamed: 0,name,phone,physical,status
0,Name: AUGUSTUS E SMITH,Phone: 9032276464,"103 N MAIN ST\nBONHAM, TX. 75418",Active
1,Name: B.D. SMITH TOWING,Phone: 8173330706,"13619 BRETT JACKSON RD.\nFORT WORTH, TX. 76179",Active
2,Name: BARRY MICHAEL SMITH,Phone: 8066544404,"4501 W CEMETERY RD\nCANYON, TX. 79015",Active
3,Name: HEATH SMITH,Phone: 940-552-0687,"1529 WILBARGER ST\nVERNON, TX. 76384",Expired
4,Name: HEATH SMITH,Phone: 9405520687,"1529 WILBARGER ST\nVERNON, TX. 76384",Expired
5,Name: HYSMITH AUTOMOTIVE,Owner/Officer: ASHLEY ERIN HYSMITH / TREASURER,"1210 US 380 BYPASS\nGRAHAM, TX. 76450",Active
6,Name: HYSMITH AUTOMOTIVE & TRUCK REPAIR INC,Owner/Officer: WILLIAM THOMAS HYSMITH / PRES...,"927 LOVING HWY\nGRAHAM, TX. 76450",Expired
7,Name: HYSMITH AUTOMOTIVE & TRUCK REPAIR INC,Owner/Officer: ASHLEY ERIN HYSMITH / TREASURER,"1210 380 BYPASS\nGRAHAM, TX. 76450",Active
8,Name: JEFF & WENDY SMITH,Owner/Officer: WENDY SMITH / PARTNER,"10842 FM 2138 N\nJACKSONVILLE, TX. 75766",Expired
9,Name: JEFF SMITH,Phone: 8324354670,"4338 HARVEY RD\nCROSBY, TX. 77532",Active
