# Parsing HTML with BeautifulSoup

Once we've fetched the HTML using requests, the next step is to parse the HTML into a data structure that Python can work with. To do this, we'll use a Python library called [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/), which is aliased to `bs4` in the package management system we're using.

Our goal is to scrape each row of that [table of inmates on death row in Texas](https://www.tdcj.state.tx.us/death_row/dr_offenders_on_dr.html) into a Python data structure called a _list_.

`'https://www.tdcj.state.tx.us/death_row/dr_offenders_on_dr.html'`

Along the way, we'll also encounter a Python data structure called a dictionary, and a Python statement called a `for loop`. How much fun are _we_ having!

ðŸ‘‰ For more details on lists, dictionaries and for loops, [see this notebook](../_Python%20syntax%20cheat%20sheet.ipynb).

So to start off, we need to import our dependencies -- `requests` to fetch the HTML and `bs4` to parse it:

In [1]:
import requests
import bs4

Next, let's fetch the page, save it to a variable and make sure we've got the HTML:

In [2]:
dr_page = requests.get('https://www.tdcj.state.tx.us/death_row/dr_offenders_on_dr.html')

In [3]:
dr_page.text

'<!doctype html>\r\n<html lang="en-US"><!-- InstanceBegin template="/Templates/generic_inside.dwt" codeOutsideHTMLIsLocked="false" -->\r\n<head>\r\n<meta charset="utf-8">\r\n<meta name="viewport" content="width=device-width, initial-scale=1">\r\n<!-- stylesheet: global -->\r\n<link rel="stylesheet" href="/stylesheets/global.css">\r\n<!-- stylesheet: page-specific -->\r\n<link rel="stylesheet" href="/stylesheets/content.css">\r\n<link rel="stylesheet" href="/stylesheets/menu_style.css">\r\n<!-- InstanceBeginEditable name="stylesheets" -->\r\n\r\n<!-- InstanceEndEditable -->\r\n<!-- jQuery library (if CDN fails, use local copy) -->\r\n<script type="text/javascript" src="//ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>\r\n<script type="text/javascript"> window.jQuery || document.write(\'<script src="/javascripts/jquery.min.js"><\\/script>\') </script>\r\n<!-- javascripts -->\r\n<script type="text/javascript" src="/javascripts/google_analytics.js"></script>\r\n<script 

Now we can hand off that HTML, which lives in `dr_page.text`, to a BeautifulSoup object, which will parse the HTML into something we can more easily navigate. We'll save the result as a new variable, `soup`.

Here's how to do that:

In [5]:
soup = bs4.BeautifulSoup(dr_page.text, 'html.parser')

The `'html.parser'` bit specifies _how_ we want to parse the text ([more details here, if you're interested](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser)).

If you run a cell to return `soup`, here's what you get:

In [9]:
soup

<!DOCTYPE doctype html>

<html lang="en-US"><!-- InstanceBegin template="/Templates/generic_inside.dwt" codeOutsideHTMLIsLocked="false" -->
<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<!-- stylesheet: global -->
<link href="/stylesheets/global.css" rel="stylesheet"/>
<!-- stylesheet: page-specific -->
<link href="/stylesheets/content.css" rel="stylesheet"/>
<link href="/stylesheets/menu_style.css" rel="stylesheet"/>
<!-- InstanceBeginEditable name="stylesheets" -->
<!-- InstanceEndEditable -->
<!-- jQuery library (if CDN fails, use local copy) -->
<script src="//ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js" type="text/javascript"></script>
<script type="text/javascript"> window.jQuery || document.write('<script src="/javascripts/jquery.min.js"><\/script>') </script>
<!-- javascripts -->
<script src="/javascripts/google_analytics.js" type="text/javascript"></script>
<script src="/javascripts/google_analytics_all.js" ty

Doesn't look too different from `dr_page.text`. But what's happened under the hood is that `bs4` has parsed that raw chunk of HTML into a navigable tree that we can search through to target specific elements.

In [10]:
type(dr_page.text)

str

In [11]:
type(soup)

bs4.BeautifulSoup

Now we can target the elements we're interested in scraping. If you don't have the page open already, [open the page in a new tab](https://www.tdcj.texas.gov/death_row/dr_offenders_on_dr.html) and right-click on the table of data we want to scrape and select "Inspect" (assuming you're in Chrome -- Firefox has similar developer tools).

If you clicked on one of the header cells, you'll see that the text is inside a `th` tag, which stands for "table header."

`<th scope="col" abbr="tdcj number">TDCJ<br>Number</th>`

HTML elements are nested -- if you work your way up a bit, you'll see that this `th` is a child element to a `tr` tag, which stands for "table row." That parent `tr` tag contains all of the headers in that table.

Moving up one more level, we come to what we're looking for: the `table` element. (For more details on HTML tables, [check out this explainer](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/table).)

`<table class="tdcj_table indent" style="width:98%">`

(Another way to check this out would be to hit Ctrl+U to view source on the page, hit Ctrl+F and search for "TDCJ Number," which is the first cell in the table.)

BeautifulSoup offers several ways to target elements on a page -- for this one, we could say:
- Find the first table on the page (or whatever number it is -- you'd want to Ctrl-F and search for `<table` to see how many show up), or
- Find the table with the CSS class "tdcj_table", or
- Find the table with the style "width:98%", or
- Some combination of those

For this exercise, let's use the `class` attribute to target this table. We'll use the BeautifulSoup `find()` method to look for a table and pass it a _dictionary_ with the class information. While we're at it, we'll save the results to a new variable called `table`.

(Dictionaries are Python data structures that match keys to values -- you can read more about them [here](../_Python%20syntax%20cheat%20sheet.ipynb#Dictionaries).)

In [12]:
table = soup.find('table', {'class': 'tdcj_table'})

In [13]:
table

<table class="tdcj_table indent" style="width:98%">
<caption>Offenders on Death Row</caption>
<tr>
<th abbr="tdcj number" scope="col">TDCJ<br/>Number</th>
<th abbr="link" scope="col" style="width: 16%">Link</th>
<th abbr="last name" scope="col">Last Name</th>
<th abbr="first name" scope="col">First Name</th>
<th abbr="date/birth" scope="col">Date of<br/>Birth</th>
<th abbr="gender" scope="col">Gender</th>
<th abbr="race" scope="col">Race</th>
<th abbr="date received" scope="col">Date<br/>Received</th>
<th abbr="county" scope="col">County</th>
<th abbr="date/offense" scope="col">Date of<br/>Offense</th>
</tr>
<tr>
<td>999614</td>
<td style="text-align: center"><a href="dr_info/lovekristopher.html" title="Offender Information for Kristopher Love">Offender Information</a></td>
<td>Love</td>
<td>Kristopher</td>
<td>03/23/1984</td>
<td style="text-align: center">M</td>
<td>Black</td>
<td>11/15/2018</td>
<td>Dallas</td>
<td>09/02/2015</td>
</tr>
<tr>
<td>999613</td>
<td style="text-align: ce

Perfect! Next, we want to get a _list_ of every row in the table. (The tag for a table row, again, is `tr`.) To do this, we'll use a different BeautifulSoup method called `find_all()`, which returns a Python list of elements that match the criteria. In human words, we're saying: Go to the table we just targeted and find all of the `tr` tags within.

Save the results as a new variable, `rows`.

In [14]:
rows = table.find_all('tr')

In [15]:
rows

[<tr>
 <th abbr="tdcj number" scope="col">TDCJ<br/>Number</th>
 <th abbr="link" scope="col" style="width: 16%">Link</th>
 <th abbr="last name" scope="col">Last Name</th>
 <th abbr="first name" scope="col">First Name</th>
 <th abbr="date/birth" scope="col">Date of<br/>Birth</th>
 <th abbr="gender" scope="col">Gender</th>
 <th abbr="race" scope="col">Race</th>
 <th abbr="date received" scope="col">Date<br/>Received</th>
 <th abbr="county" scope="col">County</th>
 <th abbr="date/offense" scope="col">Date of<br/>Offense</th>
 </tr>, <tr>
 <td>999614</td>
 <td style="text-align: center"><a href="dr_info/lovekristopher.html" title="Offender Information for Kristopher Love">Offender Information</a></td>
 <td>Love</td>
 <td>Kristopher</td>
 <td>03/23/1984</td>
 <td style="text-align: center">M</td>
 <td>Black</td>
 <td>11/15/2018</td>
 <td>Dallas</td>
 <td>09/02/2015</td>
 </tr>, <tr>
 <td>999613</td>
 <td style="text-align: center"><a href="dr_info/lewishoward.html" title="Offender Informatio

It's a little hard to see, but we've got ourselves a list (lists are enclosed in square brackets `[]`) of rows. To see how many items are in the list, we can use the `len()` function:

In [18]:
len(rows)

223

We could also use a `for loop` to check out each item a little more closely.

A `for loop` starts with the word `for` (lowercase), then a placeholder value that will stand in for each item in the list as we loop over it, then the word `in` (lowercase), then the name of the list we're looping over (`rows`, in this case), then a colon.

The lines underneath the for loop need to be indented with the same number of spaces (or you could use tabs, if you're a monster). Jupyter defaults to 4 spaces. Everything in that indended code block will be applied to each item in the list as we loop over it.

Let's just `print()` each item in the list, then a couple of blank lines to make it easier to see. (More on printing [here](_Python%20syntax%20cheat%20sheet.ipynb#The-print()-function).)

In [34]:
for item in rows:
    print(item)
    print('')
    print('')

<tr>
<th abbr="tdcj number" scope="col">TDCJ<br/>Number</th>
<th abbr="link" scope="col" style="width: 16%">Link</th>
<th abbr="last name" scope="col">Last Name</th>
<th abbr="first name" scope="col">First Name</th>
<th abbr="date/birth" scope="col">Date of<br/>Birth</th>
<th abbr="gender" scope="col">Gender</th>
<th abbr="race" scope="col">Race</th>
<th abbr="date received" scope="col">Date<br/>Received</th>
<th abbr="county" scope="col">County</th>
<th abbr="date/offense" scope="col">Date of<br/>Offense</th>
</tr>


<tr>
<td>999614</td>
<td style="text-align: center"><a href="dr_info/lovekristopher.html" title="Offender Information for Kristopher Love">Offender Information</a></td>
<td>Love</td>
<td>Kristopher</td>
<td>03/23/1984</td>
<td style="text-align: center">M</td>
<td>Black</td>
<td>11/15/2018</td>
<td>Dallas</td>
<td>09/02/2015</td>
</tr>


<tr>
<td>999613</td>
<td style="text-align: center"><a href="dr_info/lewishoward.html" title="Offender Information for Howard Lewis">Off

OK, now we're getting a clearer picture of the data we're targeting. In each row are several `td` tags, which stand for "table data," and they provide various bits of data in this order:
0. Offender number
1. Link to the inmate's detail page
2. Last name
3. First name
4. DOB
5. Gender
6. Race
7. Date inmate went to death row
8. County
9. Date of offense

Why did we start counting with 0? Becuase that's how Python (and other programming languages) start counting -- that will be important here in a second.

So first, a quick recap: Up to this point we have:
- Fetched a web page
- Parsed it into a BeautifulSoup object
- Found the table in the soup
- Found the rows in the table

Our next step is to grab the pieces of data within each row. So as we're looping over the rows, we'll use `find_all()` again to grab the `td` cells and set them equal to variables we can make sense of.

One thing, though -- we want to skip the header row and start with the actual data. In other words, we want to loop over the `rows` list minus the first item in that list, which we can do using a technique called "list slicing." Counting in Python starts with zero, so we want item 1 in that list all the way to the end (skipping item 0, which is the header row). Here's how that works:

In [19]:
rows[1:]

[<tr>
 <td>999614</td>
 <td style="text-align: center"><a href="dr_info/lovekristopher.html" title="Offender Information for Kristopher Love">Offender Information</a></td>
 <td>Love</td>
 <td>Kristopher</td>
 <td>03/23/1984</td>
 <td style="text-align: center">M</td>
 <td>Black</td>
 <td>11/15/2018</td>
 <td>Dallas</td>
 <td>09/02/2015</td>
 </tr>, <tr>
 <td>999613</td>
 <td style="text-align: center"><a href="dr_info/lewishoward.html" title="Offender Information for Howard Lewis">Offender Information</a></td>
 <td>Lewis</td>
 <td>Howard</td>
 <td>09/20/1967</td>
 <td style="text-align: center">M</td>
 <td>Black</td>
 <td>11/09/2018</td>
 <td>Walker</td>
 <td>07/24/2013</td>
 </tr>, <tr>
 <td>999612</td>
 <td style="text-align: center"><a href="dr_info/comptondillion.html" title="Offender Information for Dillion Compton">Offender Information</a></td>
 <td>Compton</td>
 <td>Dillion</td>
 <td>07/27/1994</td>
 <td style="text-align: center">M</td>
 <td>Black</td>
 <td>11/06/2018</td>
 <td

So instead of this:

```python
for item in rows:
    ...
```

We'll do this:

```python
for item in rows[1:]:
    ...
```

In [20]:
for item in rows[1:]:
    print(item)

<tr>
<td>999614</td>
<td style="text-align: center"><a href="dr_info/lovekristopher.html" title="Offender Information for Kristopher Love">Offender Information</a></td>
<td>Love</td>
<td>Kristopher</td>
<td>03/23/1984</td>
<td style="text-align: center">M</td>
<td>Black</td>
<td>11/15/2018</td>
<td>Dallas</td>
<td>09/02/2015</td>
</tr>
<tr>
<td>999613</td>
<td style="text-align: center"><a href="dr_info/lewishoward.html" title="Offender Information for Howard Lewis">Offender Information</a></td>
<td>Lewis</td>
<td>Howard</td>
<td>09/20/1967</td>
<td style="text-align: center">M</td>
<td>Black</td>
<td>11/09/2018</td>
<td>Walker</td>
<td>07/24/2013</td>
</tr>
<tr>
<td>999612</td>
<td style="text-align: center"><a href="dr_info/comptondillion.html" title="Offender Information for Dillion Compton">Offender Information</a></td>
<td>Compton</td>
<td>Dillion</td>
<td>07/27/1994</td>
<td style="text-align: center">M</td>
<td>Black</td>
<td>11/06/2018</td>
<td>Jones</td>
<td>07/16/2016</td>
</

Within each iteration, then, we're going to pull out a list of table data cells and start working with them. Let's start with the first one:

In [22]:
for item in rows[1:]:
    # find all of the `td` tags inside this row
    cells = item.find_all('td')
    
    # the inmate ID is in the first [0] cell
    inmate_id = cells[0]
    
    # print it
    print(inmate_id)

<td>999614</td>
<td>999613</td>
<td>999612</td>
<td>999611</td>
<td>999610</td>
<td>999609</td>
<td>999608</td>
<td>999607</td>
<td>999606</td>
<td>999605</td>
<td>999604</td>
<td>999603</td>
<td>999602</td>
<td>999601</td>
<td>999600</td>
<td>999599</td>
<td>999598</td>
<td>999597</td>
<td>999596</td>
<td>999595</td>
<td>999594</td>
<td>999593</td>
<td>999592</td>
<td>999591</td>
<td>999590</td>
<td>999589</td>
<td>999588</td>
<td>999587</td>
<td>999586</td>
<td>999585</td>
<td>999584</td>
<td>999582</td>
<td>999581</td>
<td>999580</td>
<td>999579</td>
<td>999578</td>
<td>999577</td>
<td>999576</td>
<td>999575</td>
<td>999573</td>
<td>999572</td>
<td>999571</td>
<td>999570</td>
<td>999569</td>
<td>999568</td>
<td>999567</td>
<td>999566</td>
<td>999565</td>
<td>999564</td>
<td>999563</td>
<td>999562</td>
<td>999561</td>
<td>999560</td>
<td>999559</td>
<td>999558</td>
<td>999557</td>
<td>999556</td>
<td>999554</td>
<td>999553</td>
<td>999551</td>
<td>999550</td>
<td>999549</td>
<td>9995

You'll notice that we're getting the entire tag, but we just want the contents. You can access the `text` attribute of that tag to get just the contents:

In [26]:
for item in rows[1:]:
    # find all of the `td` tags inside this row
    cells = item.find_all('td')
    
    # the inmate ID is in the first [0] cell
    # just want the text tho
    inmate_id = cells[0].text
    
    # print it all out
    print(inmate_id)

999614
999613
999612
999611
999610
999609
999608
999607
999606
999605
999604
999603
999602
999601
999600
999599
999598
999597
999596
999595
999594
999593
999592
999591
999590
999589
999588
999587
999586
999585
999584
999582
999581
999580
999579
999578
999577
999576
999575
999573
999572
999571
999570
999569
999568
999567
999566
999565
999564
999563
999562
999561
999560
999559
999558
999557
999556
999554
999553
999551
999550
999549
999548
999547
999544
999543
999542
999541
999538
999537
999536
999535
999531
999529
999527
999526
999524
999523
999520
999519
999517
999516
999515
999514
999513
999512
999509
999507
999506
999505
999501
999498
999497
999495
999493
999492
999490
999489
999484
999482
999480
999477
999476
999473
999472
999469
999465
999464
999462
999461
999460
999459
999458
999453
999450
999447
999446
999443
999442
999436
999433
999423
999420
999416
999410
999406
999402
999399
999398
999396
999392
999391
999390
999388
999386
999383
999379
999376
999373
999369
999366
999361
999354

Now we can follow that pattern to get the rest of the data bits (we'll deal with the link in a second):

In [30]:
for item in rows[1:]:
    # find all of the `td` tags inside this row
    cells = item.find_all('td')
    
    # the inmate ID is in the first [0] cell
    # just want the text tho
    inmate_id = cells[0].text
    
    # link [1]
    link = cells[1].text
    
    # last name [2]
    last = cells[2].text

    # first name [3]
    first = cells[3].text

    # dob [4]
    dob = cells[4].text

    # gender [5]
    gender = cells[5].text

    # race [6]
    race = cells[6].text

    # intake datae [7]
    intake_date = cells[7].text

    # county [8]
    county = cells[8].text

    # offense date [9]
    offense_date = cells[9].text
    
    # drop it into a list
    inmate_data = [inmate_id, link, last, first, dob, gender, race, intake_date, county, offense_date]
    
    # and print that list
    print(inmate_data)

['999614', 'Offender Information', 'Love', 'Kristopher', '03/23/1984', 'M', 'Black', '11/15/2018', 'Dallas', '09/02/2015']
['999613', 'Offender Information', 'Lewis', 'Howard', '09/20/1967', 'M', 'Black', '11/09/2018', 'Walker', '07/24/2013']
['999612', 'Offender Information', 'Compton', 'Dillion', '07/27/1994', 'M', 'Black', '11/06/2018', 'Jones', '07/16/2016']
['999611', 'Offender Information', 'Irsan', 'Ali', '12/27/1957', 'M', 'Other', '08/20/2018', 'Harris', '01/15/2018']
['999610', 'Offender Information', 'Delacruz', 'Isidro', '10/07/1990', 'M', 'Hispanic', '04/26/2018', 'Tom Green', '09/02/2014']
['999609', 'Offender Information', 'Delacerda', 'Jason', '07/26/1977', 'M', 'Hispanic', '03/08/2018', 'Hardin', '08/17/2011']
['999608', 'Offender Information', 'Hudson', 'William', '07/03/1982', 'M', 'White', '11/16/2017', 'Anderson', '11/14/2015']
['999607', 'Offender Information', 'Tracy', 'Billy', '11/30/1977', 'M', 'White', '11/15/2017', 'Bowie', '07/15/2015']
['999606', 'Offender 

The text "offender information" isn't super useful -- it would be better if we grabbed the actual link, instead. So instead of accessing the `.text` attribute of the `td` tag, we're going to grab the `'href'` value of the `a` tag inside it (`a` tags are hyperlinks). N.B.: To get the `href` attribute, we'll use bracket notation `['href']`, not dot notation `.href`.

In [36]:
for item in rows[1:]:
    # find all of the `td` tags inside this row
    cells = item.find_all('td')
    
    # the inmate ID is in the first [0] cell
    # just want the text tho
    inmate_id = cells[0].text
    
    # link [1]
    link = cells[1].a['href']
    
    # last name [2]
    last = cells[2].text

    # first name [3]
    first = cells[3].text

    # dob [4]
    dob = cells[4].text

    # gender [5]
    gender = cells[5].text

    # race [6]
    race = cells[6].text

    # intake datae [7]
    intake_date = cells[7].text

    # county [8]
    county = cells[8].text

    # offense date [9]
    offense_date = cells[9].text
    
    # drop it into a list
    inmate_data = [inmate_id, link, last, first, dob, gender, race, intake_date, county, offense_date]
    
    # and print that list
    print(inmate_data)

['999614', 'dr_info/lovekristopher.html', 'Love', 'Kristopher', '03/23/1984', 'M', 'Black', '11/15/2018', 'Dallas', '09/02/2015']
['999613', 'dr_info/lewishoward.html', 'Lewis', 'Howard', '09/20/1967', 'M', 'Black', '11/09/2018', 'Walker', '07/24/2013']
['999612', 'dr_info/comptondillion.html', 'Compton', 'Dillion', '07/27/1994', 'M', 'Black', '11/06/2018', 'Jones', '07/16/2016']
['999611', 'dr_info/irsanali.html', 'Irsan', 'Ali', '12/27/1957', 'M', 'Other', '08/20/2018', 'Harris', '01/15/2018']
['999610', 'dr_info/delacruzisidro.html', 'Delacruz', 'Isidro', '10/07/1990', 'M', 'Hispanic', '04/26/2018', 'Tom Green', '09/02/2014']
['999609', 'dr_info/delacerdajason.html', 'Delacerda', 'Jason', '07/26/1977', 'M', 'Hispanic', '03/08/2018', 'Hardin', '08/17/2011']
['999608', 'dr_info/hudsonwilliam.html', 'Hudson', 'William', '07/03/1982', 'M', 'White', '11/16/2017', 'Anderson', '11/14/2015']
['999607', 'dr_info/tracybillyjoel.html', 'Tracy', 'Billy', '11/30/1977', 'M', 'White', '11/15/2017'

Almost there! The link is relative, and we want a fully qualified URL for our data set. So let's prepend the rest of the URL to each link: `https://www.tdcj.texas.gov/death_row/`. In Python, if you "add" two strings together with a plus sign, it concatenates them:

In [37]:
for item in rows[1:]:
    # find all of the `td` tags inside this row
    cells = item.find_all('td')
    
    # the inmate ID is in the first [0] cell
    # just want the text tho
    inmate_id = cells[0].text
    
    # link [1]
    link = 'https://www.tdcj.texas.gov/death_row/' + cells[1].a['href']
    
    # last name [2]
    last = cells[2].text

    # first name [3]
    first = cells[3].text

    # dob [4]
    dob = cells[4].text

    # gender [5]
    gender = cells[5].text

    # race [6]
    race = cells[6].text

    # intake datae [7]
    intake_date = cells[7].text

    # county [8]
    county = cells[8].text

    # offense date [9]
    offense_date = cells[9].text
    
    # drop it into a list
    inmate_data = [inmate_id, link, last, first, dob, gender, race, intake_date, county, offense_date]
    
    # and print that list
    print(inmate_data)

['999614', 'https://www.tdcj.texas.gov/death_row/dr_info/lovekristopher.html', 'Love', 'Kristopher', '03/23/1984', 'M', 'Black', '11/15/2018', 'Dallas', '09/02/2015']
['999613', 'https://www.tdcj.texas.gov/death_row/dr_info/lewishoward.html', 'Lewis', 'Howard', '09/20/1967', 'M', 'Black', '11/09/2018', 'Walker', '07/24/2013']
['999612', 'https://www.tdcj.texas.gov/death_row/dr_info/comptondillion.html', 'Compton', 'Dillion', '07/27/1994', 'M', 'Black', '11/06/2018', 'Jones', '07/16/2016']
['999611', 'https://www.tdcj.texas.gov/death_row/dr_info/irsanali.html', 'Irsan', 'Ali', '12/27/1957', 'M', 'Other', '08/20/2018', 'Harris', '01/15/2018']
['999610', 'https://www.tdcj.texas.gov/death_row/dr_info/delacruzisidro.html', 'Delacruz', 'Isidro', '10/07/1990', 'M', 'Hispanic', '04/26/2018', 'Tom Green', '09/02/2014']
['999609', 'https://www.tdcj.texas.gov/death_row/dr_info/delacerdajason.html', 'Delacerda', 'Jason', '07/26/1977', 'M', 'Hispanic', '03/08/2018', 'Hardin', '08/17/2011']
['999608

## Your turn

In groups, fetch [the page of Senate press accreditations](https://www.dailypress.senate.gov/?page_id=67) and parse each row in the table of journalists into a list.