Our Collections of Scrapers!!!!
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
hansard_parser.py
order_paper_parser.py
readme.rst

readme.rst

Scraper Collections

Preprocessing

Use the pdftotext tools of xpdf package to convert pdf to text.

pdftotext -layout pdf_file.pdf

Scraper

  • hansard_parser.py

    this split question and answer into different files, work in progress. it only handle question and answer, but there is more in the hansard, but the question and answer block is more sane to parse

  • order_paper_parser.py

    This is to parse the order paper in the parliament, but not much user I think