Skip to content

Commit

Permalink
minor doc tweaks on the web scraping tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
Serdar Tumgoren committed Jan 23, 2012
1 parent a925f97 commit 56c5bf1
Showing 1 changed file with 10 additions and 7 deletions.
17 changes: 10 additions & 7 deletions tutorials/webscraping101/failed_banks_scrape.py
@@ -1,28 +1,28 @@
#!/usr/bin/env python
"""
This is the first example scrape in our series.
This is the first example scrape in our series.
In this scrape, we'll demonstrate some Python basics
using the FDIC's Failed Banks List.
USAGE:
You can run this scrape by going to command line, navigating to the
directory containing this script, and typing the below command:
python failed_banks_scrape.py
NOTE:
The original FDIC data is located at the below URL:
http://www.fdic.gov/bank/individual/failed/banklist.html
In order to be considerate to the FDIC's servers, we're scraping
a copy of the page stored on one of Amazon S3.
a copy of the page stored on Amazon S3.
"""

# Import a built-in library for working with data on the Web
# DOCS: http://docs.python.org/library/urllib.html
import urllib
Expand Down Expand Up @@ -61,21 +61,24 @@
for header in headers:
columns.append(header.text)

# Use the tab character's 'join' method to concatenate
# Use the tab character's "join" method to concatenate
# the column names into a single, tab-separated string.
# Then print out the header column.
print '\t'.join(columns)

# 4) Process the data, skipping the initial header row
for row in rows[1:]:

# Extract the data points from the table row and print them
# Extract data points from the table row
data = row.findAll('td')

# Pluck out the text of each field and store in a separate variable
bank_name = data[0].text
city = data[1].text
state = data[2].text
cert_num = data[3].text
ai = data[4].text
closed_on = data[5].text
updated = data[6].text

print "\t".join([bank_name, city, state, cert_num, ai, closed_on, updated])

0 comments on commit 56c5bf1

Please sign in to comment.