Skip to content

apoorvalal/historical_data_extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Minimal example of historical data parsing using layoutparser

Setup

  • Set up google cloud vision API

    • Create a google service account service by following this guide
    • Download a key (should end in .json) to a safe location
  • Or set up tesseract

    • Follow instructions listed here
  • create a clean conda/venv environment

  • Install layoutparser using the instructions on their github ; their website seems out of date

Input

Random historical table picked from 1951 UP census

Processing

DocParse.ipynb runs layoutparser and extracts lists to produce a table. Here, we're interested in the 2nd numerical column (which is population).

Output

Some errors, but much preferable to manual entry.

About

minimal example of layoutparser use

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published