Skip to content
Web app that can perform OCR on correspondence and sort PQs
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Correspondence OCR and PQ Classification web app

A flask web app that can extract names, addresses etc from correspondence and also sort parliamentary questions into responding units.

Download and Installation

The repo contains pretty much everything you need


Within 'model' are all the required trained models, and a number of other required data structures that we pickled


Within stanford-ner is the stanford named entity recogniser, which needs to be run as a java servlet in my implementation. In the stanford-ner directory and run:

"java -Djava.ext.dirs=./lib -cp stanford-ner.jar -port 9199 -loadClassifier english.all.3class.distsim.crf.ser.gz"

This will load an instance of the named entity tagger, which the application makes calls to. You need java installed.

Alternatively you can load the tagger in your code, but it's slower!


All the javascript and css for the front end (required by flask)


The HTML for the front end

Loads our models and does all the route handling

Sorry, stupid name. Contains our NeuralNet class, which contains the neural net and has prediction methods

Another dumb name. Contains the SGDModel class, which containes the sgd classifier and has prediction methods

Contains a get_pqs function which gets the most recent PQs from the parliament written question API

bad name. This contains our Chapter2_Case class which holds all the data about the scanned correspondence image and the methods to extract that data. Think of it like a case file that you pass data to and it decides what to extract to build the case file.

checks the extracted names against parliament's member API

conducts the optical character recognition on the images of correspondence, sorts it into chunks of text, sifts it by position on the page, and passes it to a chapter2_case object to identify the relevant data. Kind of messy...more details on how to use it below

I think this prepares input strings for prediction by the sgd classifier...but it's not very explicit. Maybe ask Will B...all i know is it needs to be in the directory.

Optical Character Recognition (

We use google cloud storage to host our correspondence images, and the google vision API to read them, so you'll need to set up your own google cloud project to use these services.

To access those services you the easiest way is to store the Service Account Credentials that google provide as a .json in the workding directory, and put that location on your path, e.g.


To run the app:


also set: export FLASK_DEBUG=1

so the browswer doesnt cache the site

run the app by running:

flask run

You can’t perform that action at this time.