# Scrape Texas death row inmates, part 2

The table we scraped in the last notebook _probably_ could have been imported directly into Excel without too much trouble. But what if you also wanted to append a few columns of information from each inmate's detail page, as well?

In this section, we're going to supplement the scraper we just wrote with a _function_ that extracts data from inmates' detail pages. We're also going to use Python's built-in `time.sleep` function to pause for a few seconds between each row to give the government's servers a break.

First, let's import the libraries we'll need.

In [22]:
import requests
from bs4 import BeautifulSoup
import csv
import time
from collections import defaultdict

## Let's write a function

We need a function that will take a URL of a detail page and do these things:

- Open the detail page URL using `requests`
- Parse the contents using `BeautifulSoup`
- Isolate the bits of information we're interested in: height, weight, eye color, hair color, native county, native state, link to mugshot
- Return those bits of information to the script that called the function -- let's use a special kind of dictionary called a `defaultdict`

We shall call our function `inmateDetails()`.

In [24]:
def inmateDetails(url):
    
    # get the page
    r = requests.get(url)
    
    # soup the HTML
    soup = BeautifulSoup(r.text, 'html.parser')

    # find the table of info
    table = soup.find('table', {'class': 'tabledata_deathrow_table'})

    # get the mugshot URL
    mug = table.find('img', {'class': 'photo_border_black_right'})['src']

    # get a list of the "label" cells
    label_cells = table.find_all('td', {'class': 'tabledata_bold_align_right_deathrow'})

    # create a special type of dictionary
    # it'll save us some keystrokes later on
    # https://docs.python.org/3/library/collections.html#collections.defaultdict
    out_dict = defaultdict()

    # if there's a mug, add it to the dictionary
    if mug:
        out_dict['mug'] = 'http://www.tdcj.state.tx.us/death_row/dr_info/' + mug

    # a list of the things we're interested in -- should match exactly the text of the cells
    attr_list = ['Height', 'Weight', 'Eye Color', 'Hair Color', 'Native County', 'Native State']

    # loop over the list of label cells
    for cell in label_cells:
        
        # check to see if the cell text is in our list of attributes
        if cell.text in attr_list:
            
            # if so, find the value -- go up to the tr and search for the other td --
            # and add that attribute to our dictionary
            out_dict[cell.text] = cell.parent.find('td', {'class': 'tabledata_align_left_deathrow'}).text

    # return the dictionary to the script
    return(out_dict)

http://www.tdcj.state.tx.us/death_row/dr_info/wellsamos2.jpg
