Minimal example of historical data parsing using layoutparser
-
Set up google cloud vision API
- Create a google service account service by following this guide
- Download a key (should end in
.json
) to a safe location
-
Or set up
tesseract
- Follow instructions listed here
-
create a clean conda/venv environment
-
Install layoutparser using the instructions on their github ; their website seems out of date
Random historical table picked from 1951 UP census
DocParse.ipynb
runs layoutparser
and extracts lists to produce a
table. Here, we're interested in the 2nd numerical column (which is
population).
Some errors, but much preferable to manual entry.