Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-breaking space support in multi_cell() and write_html() #834

Closed
rysson opened this issue Jul 3, 2023 · 9 comments · Fixed by #851
Closed

Non-breaking space support in multi_cell() and write_html() #834

rysson opened this issue Jul 3, 2023 · 9 comments · Fixed by #851

Comments

@rysson
Copy link

rysson commented Jul 3, 2023

Error details

Incorrect support for non-breaking space (NBSP, hard-space, 0x00a0).

Non-breaking space prevents line breaking and it works fine.

Non-breaking space is not fixed-width at all. I know some old systems still use fixed-width NBSP but all modern browsers not.

NBSP (as Unicode character or   entity) should be variable width space (SpaceHint?) in the FPDF.write_html().

The same in the pdf.multi_cell(). It could be a compatibility problem if users use it as fixed-width space. Maybe extra option (fixed_width_nbsp=True)?

20230703-152836-821x60

Minimal code

from fpdf import FPDF

pdf = FPDF(orientation='P', format='A4')
pdf.add_page()
pdf.set_font('helvetica', size=10)
pdf.multi_cell(0, 5, '1\u00a02 aaaaaaaaaaaaaaaaaaaaaqweqwedqweqwe aaaaaaaaaaaaaaaaaaaaaqweqwedqweqweqweqweqweqweqweqwqweqw.</p>')
from fpdf import FPDF

pdf = FPDF(orientation='P', format='A4')
pdf.add_page()
pdf.set_font('helvetica', size=10)
pdf.write_html('<p align="justify">1&nbsp;2 aaaaaaaaaaaaaaaaaaaaaqweqwedqweqwe aaaaaaaaaaaaaaaaaaaaaqweqwedqweqweqweqweqweqweqweqwqweqw.</p>')

Environment
Please provide the following information:

  • Operating System: Linux Debian Bookworm
  • Python version: 3.9.17, 3.11.4
  • fpdf2 version used: 2.7.4 (pip)

Also check on GitHub master branch.

@rysson rysson added the bug label Jul 3, 2023
@rysson
Copy link
Author

rysson commented Jul 3, 2023

Some hot-fix for both methods.
Without any extra arguments (to keep backward compatybility), I've found your awesome library today, I'm not able to make it in the correct way.

File line_break.py

HARD_SPACE = "\u00a0"

class Fragment:
    ...
    def add_character(...):
        ...
        if character == SPACE:
            ...
        elif character == HARD_SPACE:
            character = SPACE
            self.fragments.append(Fragment("", graphics_state, k, url))
            # changing active fragment is not necessary I guess,
            # space can be at end of the last fragment or first in the new fragment as well
            # active_fragment = self.fragments[-1]   # TODO: delete me
            self.number_of_spaces += 1
        elif character == SOFT_HYPHEN and not self.print_sh:
            ...                

@Lucas-C
Copy link
Member

Lucas-C commented Jul 3, 2023

@gmischler : as you have previously worked on this subject, could you have a look this please? 😊

rysson added a commit to rysson/fpdf2 that referenced this issue Jul 3, 2023
@rysson
Copy link
Author

rysson commented Jul 3, 2023

Thanks.

I have some commit to test, see rysson@db0625a

I added nbsp argument (default False) to FPDF class.

For string like 1␣2 aaaaa␣3 x␣aaaaa...

with nbsp=False (default):

20230703-165100-812x49

and with nbsp=True:

20230703-165122-814x41

I wanted to create PR, but git hooks failed (see #835).

@gmischler
Copy link
Collaborator

@rysson, it took me quite a while to figure out that you're actually talking about whitespace getting stretched when justifying text. I can conclude that from a single line of code, but explicitly explaining that at the start might have been helpful...

This appears to be a "feature" of how PDF viewers are applying the ## Tw declaration in the file only to normal space characters, and not other whitespace with builtin fonts. I haven't checked what the specification says about it, but it may well instruct them to do so.

If we didn't rely on Tw for builtin fonts, we could simply include the NBSP in TextLine.number_of_spaces and in the split pattern for word separation. But as it is, replacing it with a normal space looks like the simplest way to go.
However, note that the relevant method is CurrentLine.add_character(), and not in Fragment(). Also, you must not touch self.fragments at that point. You're only substituting a character, not changing anything about the graphics context.

NBSP = "\u00a0"

class CurrentLine:
    ...
    def add_character(...):
        ...
        if character == SPACE:
            ...
        elif character == NBSP:
            # PDF viewers ignore NBSP for word spacing with "Tw".
            character = SPACE
            self.number_of_spaces += 1
        elif character == SOFT_HYPHEN and not self.print_sh:
            ...        

@Lucas-C
Copy link
Member

Lucas-C commented Jul 4, 2023

Thank you @gmischler for investigating this

@rysson
Copy link
Author

rysson commented Jul 4, 2023

@rysson, it took me quite a while to figure out that you're actually talking about whitespace getting stretched when justifying text [...]

It's why I told about fixed-width spaces. Sure, I should write it's important in justify, sorry.

This appears to be a "feature" of how PDF [...]

I'm totally newbie in PDF, I'm sorry.

If we didn't rely on Tw for builtin fonts, we could simply include the NBSP in TextLine.number_of_spaces and in the split pattern for word separation. But as it is, replacing it with a normal space looks like the simplest way to go. However, note that the relevant method is CurrentLine.add_character(), and not in Fragment(). Also, you must not touch self.fragments at that point. You're only substituting a character, not changing anything about the graphics context.

Great! Simpler and working :-)
I found the awesome project yesterday. I was just guessing.

So, what next?

@gmischler
Copy link
Collaborator

So, what next?

just update your PR accordingly.

With regards to that, I recommend to stick with standard terminology (eg. NON_BREAK_SPACE or NBSP instead of HARD_SPACE).
I also think we don't need an option to activate/deactivate this. Other than not triggering line breaks, a NBSP should indeed always behave exactly as a normal space. I can't think of a valid use case for disabling that. There are seperate fixed-width space-type characters available for when a different behaviour is desired. Without the option handling your PR will become a lot simpler, without cluttering the API.
You should just add some test cases to verify that it really does what it is supposed to do and won't break with any other future modifications.

@Lucas-C
Copy link
Member

Lucas-C commented Jul 11, 2023

I wanted to create PR, but git hooks failed (see #835).

FYI @rysson, this should now be fixed 😊
And thank you for this PR!

@rysson
Copy link
Author

rysson commented Jul 11, 2023

Thanks, than I create new PR based on @gmischler notes.

rysson added a commit to rysson/fpdf2 that referenced this issue Jul 11, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Jul 23, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Sep 2, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Sep 2, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Sep 2, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Sep 2, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Oct 10, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Oct 10, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Oct 10, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Oct 10, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Oct 10, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Oct 10, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Oct 10, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Oct 10, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Oct 10, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Oct 10, 2023
rysson added a commit to rysson/fpdf2 that referenced this issue Oct 10, 2023
gmischler pushed a commit that referenced this issue Oct 11, 2023
* Fix NBSP support. Closes: #834.

* Update docs about NBSP (#834)

* Update unittest for #834 (test_charmap)

* Update unittest for #834 (test_html_whitespace_handling)

* Update unittest for #834 (test_html)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants