Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to recognize headings #247

Closed
davidkong0987 opened this issue Jan 8, 2019 · 3 comments
Closed

Ability to recognize headings #247

davidkong0987 opened this issue Jan 8, 2019 · 3 comments

Comments

@davidkong0987
Copy link
Contributor

I was wondering if there is a way for Camelot to recognize when there is a heading. A few ways this can be done is recognizing when a font size (or even font style!) is different and tagging it (which the software seems to be able to do, at least for super/subscripts), or by putting in empty rows in the table output so that the spacing of the original document is preserved.

@vinayak-mehta
Copy link
Contributor

vinayak-mehta commented Jan 9, 2019

Since table headers would mostly remain the same for same type of tables in a PDF, you can add a post-extraction step where you can remove them by explicitly specifying them in your code.

This would be a nice feature to have but I'm afraid the font size/style and adding spaces implementations would be highly heuristical and won't be able to generalize over a large set of different table types. Maybe there are better ways to do this, I remember reading about this in a paper, I'll try to find it. If you come across any literature or other ideas, do comment!

@davidkong0987
Copy link
Contributor Author

davidkong0987 commented Jan 9, 2019 via email

@vinayak-mehta
Copy link
Contributor

Closed in favor of camelot-dev/camelot#7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants