Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] - Ability to find and remove blank pages #134

Closed
nodecentral opened this issue May 6, 2023 · 7 comments
Closed

[Feature Request] - Ability to find and remove blank pages #134

nodecentral opened this issue May 6, 2023 · 7 comments
Labels
Done for next release Items that are completed and will be included in the next release enhancement New feature or request

Comments

@nodecentral
Copy link

Hi

As I scan a lot of documents in , there are often numerous blank pages. It would be great to have a facility where it can scan a pdf, find blank pages and then remove them..

@Frooodle Frooodle added enhancement New feature or request Prioritised enhancement High-priority enhancements labels May 6, 2023
@Frooodle
Copy link
Member

Frooodle commented May 7, 2023

@nodecentral Can you define a blank page?
If its a page without text or images thats easy to detect and remove, if its a scanned paged as you mentioned the pdf page isnt blank but contains a image of pure white..
I can use computer vision to detect this but then there are also the usecases or blank pages that are different colours and with lines but no text etc.. plus it would need some adjustable threashold of specs of dust etc
Can you try expand the definition slightly to help clarify things

@Frooodle
Copy link
Member

Frooodle commented May 7, 2023

Like would a blank page still be a blank page if it has a footer but nothing else?

@Frooodle Frooodle removed the Prioritised enhancement High-priority enhancements label May 7, 2023
@nodecentral
Copy link
Author

nodecentral commented May 7, 2023

Like would a blank page still be a blank page if it has a footer but nothing else?

Is the footer you are referring to something visible on the page, or something hidden in the pdf itself ?

In my situation the blank pages occur due to scanning documents in and it interpets the back of a page (empty page) as being a page to keep. Such a page could have a picture on it, so there would be no text either, therefor it would need to do a bit of detective work..

@Frooodle
Copy link
Member

Frooodle commented May 7, 2023

Cool ill develop the app to handle blank pages added in software (no text or images)

And blank pages caused by scanning, since it's giant image I will try detect the image contains no information by checking each pixel
Can't be sure it will always work due to see-through paper or dust etc but I'll see what I can do
Also for ease I'll do white only paper

@nodecentral
Copy link
Author

Many thanks @Frooodle

, since it's giant image I will try detect the image contains no information by checking each pixel Can't be sure it will always work due to see-through paper or dust etc but I'll see what I can do Also for ease I'll do white only paper

I seem to recall seeing it done elsewhere, I think they used a % / tolerance level on what’s found, or maybe it was a colour range, sorry can’t be sure, maybe a combination of them both..:-)

@Frooodle Frooodle added the Done for next release Items that are completed and will be included in the next release label May 10, 2023
@Frooodle
Copy link
Member

Done as part of v0.8.0 release. which is now live

@Frooodle
Copy link
Member

This is first interaction so i imagine it will have some issues... i added some logging as well
Please let me know how it goes, it might need improvements down the line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Done for next release Items that are completed and will be included in the next release enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants