Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table descriptions above tables #3

Open
Evildoor opened this issue May 16, 2017 · 2 comments
Open

Table descriptions above tables #3

Evildoor opened this issue May 16, 2017 · 2 comments

Comments

@Evildoor
Copy link
Contributor

Evildoor commented May 16, 2017

PDF Analyzer's table processing algorithm includes detection of table description and separation of table lines from all other lines. These procedures work on assumption that table description is positioned below the table:
proper_table
However, some documents can position descriptions above tables or even mix both kinds of positioning. PDF Analyzer either fails to extract such tables or extracts them incorrectly.

Document examples: CDS_CERN-ATL-COM-PHYS-2016-135, page 13.

@Evildoor
Copy link
Contributor Author

Evildoor commented Jun 1, 2018

Note: it seems that term "caption" rather than "description" or "header" is often used.

@Evildoor
Copy link
Contributor Author

Evildoor commented Jun 5, 2018

Some work was done on this (see 5e149e7). As usual, there is much to improve - however, I should highlight the fact that measuring the position of main text strings may cause problems with rotated pages. This should be looked into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant