Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

NOTE: This project is no longer maintained! more info

Scrapemark

Scrapemark is a super-convenient way to scrape webpages in Python.

It utilizes an HTML-like markup language to extract the data you need. You get your results as plain old Python lists and dictionaries. Scrapemark internally utilizes regular expressions and is super-fast.

As an example, here is a way you could scrape all the links on the Digg homepage in one fell swoop:

import scrapemark

print scrapemark.scrape("""
  {*
    <div class='news-summary'>
      <h3><a href='{{ [links].url }}'>{{ [links].title }}</a></h3>
      <p>{{ [links].description }}</p>
      <li class='digg-count'>
        <strong>{{ [links].diggs|int }}</strong>
      </li>
    </div>
  *}
  """,
  url='http://digg.com/')

About

Super-convenient web scraping in Python

Resources

Packages

No packages published
You can’t perform that action at this time.