Jabba's headless webkit browser for scraping AJAX-powered webpages.
Pull request Compare This branch is 1 commit behind jabbalaci:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
README.md
jabba_webkit.py

README.md

Jabba-Webkit

Jabba's headless webkit browser for scraping AJAX-powered webpages.

Usage:

jabba_webkit.py <url> [<time>]

url: the page whose source you want to get

time: The application will quit after this given time (in seconds).

If the webpage is AJAX-powered and updates itself, you can tell this browser to wait X seconds. Then it fetches the generated HTML source.

You can also use it as a library:

>>> import jabba_webkit as jw
>>> html1 = jw.get_page(url1, time1)
>>> html2 = jw.get_page(url2)    # yes, you can call it several times