-
Notifications
You must be signed in to change notification settings - Fork 1.5k
ENH: _writer: Implement flattening #3312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is a vibe-coded attempt at implementing flattening of pdf forms. It is currently able to flatten text fields and button fields. It is not at all integrated with the rest of the pypdf code. It introduces a dependency on pdfminer.six to calculate font widths. This should also be possible with some pypdf code but I don't think that pypdf has correct font metrics. (I tried with STANDARD_WIDTHS from pypdf/_text_extraction/_layout_mode/_font.py , but text started overflowing certain fields. So, basically a proof of principle.
Oh, and I didn't run the test suite, sorry about that! I see that:
This is stuff that I can work on while waiting for feedback. I'll try to remember and run local test before a next push to this PR. |
Thanks for the PR. Some notes:
|
@stefan6419846, thanks for your comments! I think what I need right now is a little bit more guidance. You wrote:
I only have python-3.9 and I don't know if ruff runs on that. But I can fix some of this manually as well.
If you see any functionality that is duplicated, please tell me! From what I can tell:
If you can, please indicate what you consider generic functionality, and, more importantly, where it ought to go. I think my font_name_map function qualifies?
I could try to correct STANDARD_WIDTHS from pypdf/_text_extraction/_layout_mode/_font.py . I don't know how to deal with encoding (yet) Thanks for your comments! |
Please note that while I try to help you with getting these changes integrated, I do not know every aspect of the spec or implementation. My main goal is to ensure that any code fulfills our requirements, especially regarding maintenance. Python 3.9 should not be an issue, as the latest published release (and version used by us) is still compatible to it. Regarding re-use: You mentioned in your initial comment that integrating this into Some candidates for moving them out of the writer module:
What do you mean by "correcting them"? Ideally, we would not use the functionality of the text extraction code here, but a dedicated implementation the text extraction code could use later on. This might be based upon the implementation from pdfminer.six, but we would have to analyze which parts of it we would actually have to port. |
This is a vibe-coded attempt at implementing flattening of pdf forms. It is currently able to flatten text fields and button fields.
It is not at all integrated with the rest of the pypdf code. It introduces a dependency on pdfminer.six to calculate font widths. This should also be possible with some pypdf code but I don't think that pypdf has correct font metrics. (I tried with STANDARD_WIDTHS from
pypdf/_text_extraction/_layout_mode/_font.py , but text started overflowing certain fields.
So, basically a proof of principle.
Why this PR?
I think first and foremost to get some comments:
I don't have a lot of python coding experience. Just for reference, I tested this code in a different project called dungeonsheets, see here for the correct branch: https://github.com/PJBrs/dungeon-sheets/tree/flatten_vibe; and here for the blank forms: https://github.com/canismarko/dungeon-sheets/tree/master/dungeonsheets/forms.
This PR is in response to #232 .