Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



10 Commits

Repository files navigation

NOTE: This project is no longer maintained! more info


Scrapemark is a super-convenient way to scrape webpages in Python.

It utilizes an HTML-like markup language to extract the data you need. You get your results as plain old Python lists and dictionaries. Scrapemark internally utilizes regular expressions and is super-fast.

As an example, here is a way you could scrape all the links on the Digg homepage in one fell swoop:

import scrapemark

print scrapemark.scrape("""
    <div class='news-summary'>
      <h3><a href='{{ [links].url }}'>{{ [links].title }}</a></h3>
      <p>{{ [links].description }}</p>
      <li class='digg-count'>
        <strong>{{ [links].diggs|int }}</strong>