Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic high performance Header/Footer detection #10

Closed
royjohal opened this issue Aug 19, 2019 · 3 comments
Closed

Automatic high performance Header/Footer detection #10

royjohal opened this issue Aug 19, 2019 · 3 comments
Labels
feature New enhancement or request processing

Comments

@royjohal
Copy link
Contributor

The current header/footer detection module HeaderFooterDetectionModule requires an estimate in percentage of the maximal distance from the page limit, where the header and footers lie.
It would be great to have this module automatically detect headers and footers (using techniques like NLP, Vision, etc) without the need of such a parameter.

@royjohal royjohal added the feature New enhancement or request label Aug 19, 2019
@jcsrb
Copy link

jcsrb commented Mar 4, 2020

here is a public sample file with a big recurring header, it might be useful

@royjohal
Copy link
Contributor Author

royjohal commented Mar 4, 2020

here is a public sample file with a big recurring header, it might be useful

Definitely a useful test document! Thanks!

@royjohal
Copy link
Contributor Author

royjohal commented Jun 18, 2020

here is a public sample file with a big recurring header, it might be useful

I tried a higher percentage for maxMarginPercentage (I tried 25); I was able to eliminate the numeric header from the Markdown output. Thanks for the interesting document again @jcsrb !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New enhancement or request processing
Projects
None yet
Development

No branches or pull requests

2 participants