Our Collections of Scrapers!!!!
Failed to load latest commit information.
hansard_parser.py added the right hansard parser Oct 29, 2011
order_paper_parser.py updated order paper extractor Jun 18, 2012
readme.rst fix readme Oct 29, 2011


Scraper Collections


Use the pdftotext tools of xpdf package to convert pdf to text.

pdftotext -layout pdf_file.pdf


  • hansard_parser.py

    this split question and answer into different files, work in progress. it only handle question and answer, but there is more in the hansard, but the question and answer block is more sane to parse

  • order_paper_parser.py

    This is to parse the order paper in the parliament, but not much user I think