# Scraping data from more than one page

Problem: We scraped our table, but there are a few key pieces of information
sitting on each reactor's detail page that we want to include in our
analysis.

How we're going to deal with it:
- Do everything we did before: fetch a page, navigate to the main table and output those details to a CSV
- Refine our script so that it dips into the detail page for each reactor
- Target the location of these new data with BeautifulSoup
- Write a function that will extract them for us.

Let's add Python's `time` library, which we can use to slow down the speed of our requests to this government website.

In [None]:
import requests
from bs4 import BeautifulSoup
import csv


Let's take a look at one of these [reactor pages](http://www.nrc.gov/info-finder/reactors/bru1.html) for a second. All the data we need actually sits inside a two-column, one-row table near the top — we can target that information the same way we did with the main table.

There's one issue to solve, though. How do we pull out the individual pieces of information in the absence of HTML tags to latch onto?

We can attack it with string functions, splitting it up into a list on line breaks and then searching for certain important words.

We'll write our function here. It will take two arguments: A list and a value to find. When it finds a match, it will return the item it matched on. We'll also have it only return the part after the colon (:).

In [None]:
url = 'http://www.nrc.gov/reactors/operating/list-power-reactor-units.html'

web_page = requests.get(url)
soup = BeautifulSoup(web_page.content, 'html.parser')

reactor_table = soup.find('table')

First change to our existing code: We'll send our results to a different file.

In [None]:
csv_file = open('reactors.csv', 'wb')
output = csv.writer(csv_file)

We'll also add the two new fields we'll be grabbing from the detail page to the header row.

In [None]:
output.writerow(['NAME', 'LINK', 'DOCKET', 'LICENSE_NUM', 'TYPE', 'LOCATION', 'OWNER', 'REGION'])

Now we need to revise our loop to include new steps:
- Retrieve the individual reactor page, going through the same process as the main table
- Isolate the table cell we want to collect
- Boil it down to just the text and then turn it into a list based on line breaks
- Use a function to search through it and return some new values for a CSV
- **PAUSE BETWEEN PAGES**

In [None]:
for row in reactor_table.find_all('tr')[1:]:
    cell = row.find_all('td')
    name = cell[0].contents[0].text
    link = cell[0].contents[0].get('href')
    docket = cell[0].contents[2].strip()
    lic_num = cell[1].text
    reactype = cell[2].text
    location = cell[3].text.encode('utf-8')
    owner = cell[4].text.strip().encode('utf-8')
    region = cell[5].text

    # Add the new steps for this loop below
    
csv_file.close()
