Skip to content

Commit

Permalink
Fork of code from ScraperWiki at https://classic.scraperwiki.com/scra…
Browse files Browse the repository at this point in the history
  • Loading branch information
clabrow committed Aug 8, 2017
0 parents commit 3525e8f
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Ignore output of scraper
data.sqlite
1 change: 1 addition & 0 deletions README.textile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Start here: check the ScraperWiki interface is working, then learn how to download a web page.
33 changes: 33 additions & 0 deletions scraper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
###############################################################################
# START HERE: Tutorial 1: Getting used to the ScraperWiki editing interface.
# Follow the actions listed with -- BLOCK CAPITALS below.
###############################################################################

# -----------------------------------------------------------------------------
# 1. Start by running a really simple Python script, just to make sure that
# everything is working OK.
# -- CLICK THE 'RUN' BUTTON BELOW
# You should see some numbers print in the 'Console' tab below. If it doesn't work,
# try reopening this page in a different browser - Chrome or the latest Firefox.
# -----------------------------------------------------------------------------

for i in range(10):
print "Hello", i

# -----------------------------------------------------------------------------
# 2. Next, try scraping an actual web page and getting some raw HTML.
# -- UNCOMMENT THE THREE LINES BELOW (i.e. delete the # at the start of the lines)
# -- CLICK THE 'RUN' BUTTON AGAIN
# You should see the raw HTML at the bottom of the 'Console' tab.
# Click on the 'more' link to see it all, and the 'Sources' tab to see our URL -
# you can click on the URL to see the original page.
# -----------------------------------------------------------------------------

#import scraperwiki
#html = scraperwiki.scrape('https://scraperwiki.com/hello_world.html')
#print html

# -----------------------------------------------------------------------------
# In the next tutorial, you'll learn how to extract the useful parts
# from the raw HTML page.
# -----------------------------------------------------------------------------

0 comments on commit 3525e8f

Please sign in to comment.