Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[ocr] Research ways of cutting printed statements into smaller subsections #15
Since OCR has better results on printed statements, we want to cut the statements into the pieces with text that we can feed to the Google OCR API.
We first want to get the tables. Here we need to find out how we can connect tables that start on one page and finish on another. Then, we take each table and we cut the cells, while keeping a reference to the column to which they belong.
The final version should look like a tree:
Revelant links for our issue: