# Texas Tow Trucks (`.apply` and Selenium)

We're going to scrape some [tow trucks in Texas](https://www.tdlr.texas.gov/tools_search/).

Try searching for the TLDR Number `006179570C`.

## Preparation

### What URL will Selenium be starting on?

- Tip: The answer is *not* `https://www.tdlr.texas.gov/tools_search/`

In [1]:
### https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=


### Why are you using Selenium for this?

In [2]:
###

## Scrape this page

Scrape this page, displaying the

- The business name
- Phone number
- License status
- Physical address

**You should know how to do `.post` requests by now.**

- *TIP: For physical address, **ask me on the board** and I'll give you a secret trick about situations like this.*

In [3]:
###


# Using .apply to find data about SEVERAL tow truck companies

The file `trucks-subset.csv` has information about the trucks, we'll use it to find the pages to scrape.

### Open up `trucks-subset.csv` and save it into a dataframe

In [4]:
import pandas as pd
df = pd.read_csv("trucks-subset.csv")
df


Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C


### Open up `trucks-subset.csv` in a text editor, then look at your dataframe. Is something different about them? If so, make them match.

- *TIP: I can help with this.*

In [5]:
# Nothing is different


## Use `.apply` to go through each row of the dataset, printing out information about each tow truck company.

- The business name
- Phone number
- License status
- Physical address

Just print it out for now.

- *TIP: use .apply and a function*
- *TIP: If you need help with .apply, look at the "Using apply in pandas" notebook *

In [6]:
# def display_info(value):
#     url = "https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber="+value
# #     print(url)
#     return url
# # df["TDLR Number"].apply(display_info)

## Scrape the following information for each row of the dataset, and save it into new columns in your dataframe.

- The business name
- Phone number
- License status
- Physical address

It's basically what we did before, but using the function a little differently.

- *TIP: Use .apply and a function*
- *TIP: Remember to use `return`*

In [7]:
from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("- - icognito")

driver = webdriver.Chrome(chrome_options = chrome_options)
driver.get("https://www.tdlr.texas.gov/tools_search/")



In [8]:
check_box = driver.find_element_by_xpath('//*[@id="mcrbutton"]')
check_box.click()


In [9]:
search_input_dd = driver.find_element_by_xpath('//*[@id="mcrdata"]')
search_input_dd.send_keys("006179570C.")


In [10]:
search_button = driver.find_element_by_xpath('//*[@id="submit3"]')
search_button.click()


In [11]:
from bs4 import BeautifulSoup
doc = BeautifulSoup(driver.page_source, 'html.parser')
mine_info = doc.find_all('tr') 
print(len(mine_info)  )
print(type(mine_info)  )


12
<class 'bs4.element.ResultSet'>


In [12]:
name_string = doc.find_all('tr')[4].find_all('td')[0].text
" ".join( name_string.split()[1:]  )

'B.D. SMITH TOWING'

In [13]:
phone_number = doc.find_all('tr')[6].find_all('td')[0].text
" ".join( phone_number.split()[1:]  )


'8173330706'

In [14]:
license_status = doc.find_all('tr')[7].find_all('td')[1].text
" ".join( license_status.split()[1:]  )

'Active'

In [15]:
physical_address = doc.find_all('tr')[8].find_all('td')[1].text
" ".join( physical_address.split()[23:] )


'13619 BRETT JACKSON RD. FORT WORTH, TX. 76179'

### Save your dataframe as a CSV

### Re-open your dataframe to confirm you didn't save any extra weird columns

## Repeat this process for the entire `tow-trucks.csv` file

In [17]:
import pandas as pd
df = pd.read_csv("tow-trucks.csv")
df

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C
3,006494912C
4,0649468VSF
5,006448786C
6,0648444VSF
7,0651667VSF
8,006017767C
9,006495492C


In [18]:
def display_info(value):
    url = "https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber="+value
#     print(url)
    return url
# df["TDLR Number"].apply(display_info)


In [19]:
TTT_url = df["TDLR Number"].apply(display_info)
# print(TTT_url)

TTT_dictionary_list = []

for TDLR_url in TTT_url :
    TTT_dictionary = {}
    driver.get(TDLR_url)
    doc = BeautifulSoup(driver.page_source, 'html.parser')
    
    name_string = doc.find_all('tr')[4].find_all('td')[0].text
    if name_string :
        TTT_dictionary['The business name'] = (" ".join( name_string.split()[1:]  )  )

    phone_number = doc.find_all('tr')[6].find_all('td')[0].text
    if phone_number :
        TTT_dictionary['Phone number'] = (" ".join( phone_number.split()[1:]  ) )
    
    license_status = doc.find_all('tr')[7].find_all('td')[1].text
    if license_status :
        TTT_dictionary['License status'] = (" ".join( license_status.split()[1:]  ) )
        
    physical_address = doc.find_all('tr')[8].find_all('td')[1].text
    if physical_address :
        TTT_dictionary['Physical address'] = (" ".join( physical_address.split()[23:]  ) )

    TTT_dictionary_list.append(TTT_dictionary)
TTT_dictionary_list


[{'License status': 'Active',
  'Phone number': '9032276464',
  'Physical address': 'N MAIN ST BONHAM, TX. 75418',
  'The business name': 'AUGUSTUS E SMITH'},
 {'License status': 'Active',
  'Phone number': '8173330706',
  'Physical address': '13619 BRETT JACKSON RD. FORT WORTH, TX. 76179',
  'The business name': 'B.D. SMITH TOWING'},
 {'License status': 'Active',
  'Phone number': '8066544404',
  'Physical address': 'W CEMETERY RD CANYON, TX. 79015',
  'The business name': 'BARRY MICHAEL SMITH'},
 {'License status': 'Expired',
  'Phone number': '940-552-0687',
  'Physical address': 'ST VERNON, TX. 76384',
  'The business name': 'HEATH SMITH'},
 {'License status': 'Expired',
  'Phone number': '9405520687',
  'Physical address': '',
  'The business name': 'HEATH SMITH'},
 {'Phone number': 'ASHLEY ERIN HYSMITH / TREASURER',
  'Physical address': '',
  'The business name': 'HYSMITH AUTOMOTIVE'},
 {'Phone number': 'WILLIAM THOMAS HYSMITH / PRESIDENT',
  'Physical address': '',
  'The busin

In [20]:
import pandas as pd
df_TTT = pd.DataFrame(TTT_dictionary_list)
df_TTT.to_csv("Texas_Tow_Trucks.csv", index=False)

In [21]:
df_TTT_1 = pd.read_csv("Texas_Tow_Trucks.csv")
df_TTT_1.head()

Unnamed: 0,License status,Phone number,Physical address,The business name
0,Active,9032276464,"N MAIN ST BONHAM, TX. 75418",AUGUSTUS E SMITH
1,Active,8173330706,"13619 BRETT JACKSON RD. FORT WORTH, TX. 76179",B.D. SMITH TOWING
2,Active,8066544404,"W CEMETERY RD CANYON, TX. 79015",BARRY MICHAEL SMITH
3,Expired,940-552-0687,"ST VERNON, TX. 76384",HEATH SMITH
4,Expired,9405520687,,HEATH SMITH


In [22]:
df_TTT_1.tail()

Unnamed: 0,License status,Phone number,Physical address,The business name
15,Expired,3252159496,"N 16TH ST JUNCTION, TX. 76849",SAMMY L SMITH
16,,GLEN SMITH / PARTNER,,SMITH BRO. WRECKER SERVICE
17,,GLEN SMITH / PARTNER,,SMITH BROS. WRECKER SERVICE
18,Active,9362693915,"Physical: 12741 HWY 84E JOAQUIN, TX. 75954","SMITH TOWING & RECOVERY, LLC"
19,Active,9362693915,TX. 75954,"SMITH TOWING & RECOVERY,LLC"
