Skip to content

Commit

Permalink
Initial Creation
Browse files Browse the repository at this point in the history
  • Loading branch information
coderpunk committed Feb 7, 2010
0 parents commit 8bd8fa7
Show file tree
Hide file tree
Showing 2 changed files with 55 additions and 0 deletions.
6 changes: 6 additions & 0 deletions README
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
This is a simple Python application to pull the newspaper jpg imges from newseum.org

Edit the script to set the id's of the papers to pull.

Images are stored in the current directory.

49 changes: 49 additions & 0 deletions newseum-pages.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/usr/bin/env python
"""
Quick Newseum Frontpage Grabber script
Copyright 2009 by Brian C. Lane
Imp Software
All Rights Reserved
Modify CITIES list below to add the city designators (as seen in the
URLS at http://www.newseum.org/todaysfrontpages/default.asp)
"""
import urllib2
import re
import os
import urlparse

# Add more cities here
CITIES = [ "AL_AS", "AL_MA", ]

NEWSEUM_URL="http://www.newseum.org/todaysfrontpages/hr.asp?fpVname=%s"
NEWSEUM_IMG="http://www.newseum.org"

def fetchNewseumImage(city):
"""
Fetch the image for a city
"""
print "Parsing the page for %s" % (city)
page = urllib2.urlopen(NEWSEUM_URL % city).read()

# Quick and dirty grep for the image name
match = re.search('<img class="tfp_lrg_img" src="(.*)" alt=', page)
if match:
img_url = NEWSEUM_IMG + os.path.abspath(match.group(1))
print "Saving the image for %s" % (city)
image = urllib2.urlopen(img_url).read()
open(os.path.basename(match.group(1)), "wb").write(image)


def main():
"""
Main code goes here
"""
for city in CITIES:
fetchNewseumImage(city)


if __name__ == '__main__':
main()


0 comments on commit 8bd8fa7

Please sign in to comment.