-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to recognize headings #247
Comments
Since table headers would mostly remain the same for same type of tables in a PDF, you can add a post-extraction step where you can remove them by explicitly specifying them in your code. This would be a nice feature to have but I'm afraid the font size/style and adding spaces implementations would be highly heuristical and won't be able to generalize over a large set of different table types. Maybe there are better ways to do this, I remember reading about this in a paper, I'll try to find it. If you come across any literature or other ideas, do comment! |
Got it
I understand that there would need to be a post-extraction step, but perhaps that can be aided by something that shows when the font size and/or line spacing is different than say the mode, similar to what is done with subscripts/superscripts.
Sent via Superhuman ( https://sprh.mn/?vip=davidxmkong@gmail.com )
…On Wed, Jan 09, 2019 at 6:13 AM, Vinayak Mehta < ***@***.*** > wrote:
Since table headers would mostly remain the same for same type of tables
in a PDF, you can add a post-extraction step where you can remove them by
explicitly specifying them in your code.
This would be a nice feature to have but I'm afraid the font size/style
implementation would be highly heuristical and won't be able to generalize
over a large of different table types. Maybe there are better ways to do
this, I remember reading about this in a paper, I'll try to find it. If
you come across any literature or other ideas, do comment!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub (
#247 (comment)
) , or mute the thread (
https://github.com/notifications/unsubscribe-auth/An8Sgf-99oZuv4Fz5Uz6OLgDHQa0UsEiks5vBc7BgaJpZM4Z2LyT
).
|
Closed in favor of camelot-dev/camelot#7. |
I was wondering if there is a way for Camelot to recognize when there is a heading. A few ways this can be done is recognizing when a font size (or even font style!) is different and tagging it (which the software seems to be able to do, at least for super/subscripts), or by putting in empty rows in the table output so that the spacing of the original document is preserved.
The text was updated successfully, but these errors were encountered: