New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAIRO_STATUS_TAG_ERROR: b'invalid tag name, attributes, or nesting' #742

Closed
Niecke opened this Issue Nov 20, 2018 · 44 comments

Comments

Projects
None yet
4 participants
@Niecke
Copy link

Niecke commented Nov 20, 2018

Hello,

I encountered a strange error. When I try to render a pdf in my flask app with the following code:

@some_blueprint.route("/print/<string:file_hash>")
def print_pdf(file_hash):
    html = HTML(string="<html><head><title>Test</title></head><body><h1>test</h1></body></html>")
    return render_pdf(html)

Then I get:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2309, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2295, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1741, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/azk/azk/controllers/hiwi.py", line 184, in print_pdf
    return render_pdf(html)
  File "/usr/local/lib/python3.6/dist-packages/flask_weasyprint/__init__.py", line 209, in render_pdf
    pdf = html.write_pdf(stylesheets=stylesheets)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/__init__.py", line 199, in write_pdf
    target, zoom, attachments)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/document.py", line 613, in write_pdf
    levels.pop(), title, link_attribs, 0)
  File "/usr/local/lib/python3.6/dist-packages/cairocffi/surfaces.py", line 903, in add_outline
    self._check_status()
  File "/usr/local/lib/python3.6/dist-packages/cairocffi/surfaces.py", line 160, in _check_status
    _check_status(cairo.cairo_surface_status(self._pointer))
  File "/usr/local/lib/python3.6/dist-packages/cairocffi/__init__.py", line 79, in _check_status
    raise exception(message, status)
cairocffi.CairoError: cairo returned CAIRO_STATUS_TAG_ERROR: b'invalid tag name, attributes, or nesting'

Actually I am using flask_weasyprint, but the error is the same when using weasyprint directly. So far I have not been able to reproduce this with a simple flask application, but after removing the <h1> and </h1> tag everythings fine.

My python env consists of:

alembic==1.0.2
asn1crypto==0.24.0
bcrypt==3.1.4
blinker==1.4
bokeh==1.0.1
cairocffi==0.9.0
CairoSVG==2.2.1
certifi==2018.10.15
cffi==1.11.5
chardet==3.0.4
Click==7.0
coverage==4.5.1
cryptography==2.1.4
cssselect2==0.2.1
decorator==4.3.0
defusedxml==0.5.0
Flask==1.0.2
Flask-Bcrypt==0.7.1
Flask-DebugToolbar==0.10.1
Flask-Login==0.4.1
Flask-Migrate==2.3.0
flask-mongoengine==0.9.5
Flask-Principal==0.4.0
Flask-SQLAlchemy==2.3.2
Flask-WeasyPrint==0.5
Flask-WTF==0.14.2
html5lib==1.0.1
idna==2.6
infinity==1.4
intervals==0.8.1
itsdangerous==1.1.0
Jinja2==2.10
keyring==10.6.0
keyrings.alt==3.0
ldap==1.0.2
ldap3==2.5.1
Mako==1.0.7
MarkupSafe==1.1.0
mongoengine==0.16.0
numpy==1.15.4
packaging==18.0
pandas==0.23.4
Pillow==5.3.0
ply==3.11
pyasn1==0.4.4
pyasn1-modules==0.2.2
pycparser==2.18
pycrypto==2.6.1
pygobject==3.26.1
pymongo==3.7.2
PyMySQL==0.9.2
pyparsing==2.3.0
Pyphen==0.9.5
python-dateutil==2.7.5
python-editor==1.0.3
python-ldap==3.1.0
pytz==2018.7
pyxdg==0.25
PyYAML==3.13
regex==2018.11.7
requests==2.20.0
SecretStorage==2.3.1
six==1.11.0
SQLAlchemy==1.2.12
SQLAlchemy-Utils==0.33.6
tinycss2==0.6.1
tornado==5.1.1
urllib3==1.24.1
validators==0.12.2
WeasyPrint==43
webencodings==0.5.1
Werkzeug==0.14.1
WTForms==2.2.1
WTForms-Alchemy==0.16.7
WTForms-Components==0.10.3

and the whole app is running in a docker container which is based on ubuntu 18.04.
Is there anything I am doing wrong?

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Nov 20, 2018

This error has been discussed in length in PR #665 -- seemed to be a special case on Win 8.1 with german locale. Though you're obviously not on Windows, but probably in Germany ... it might be nevertheless a (THE?) locale problem.

I dunno the commands to check or set the language on your OS, but maybe defining the environment variable LC_ALL before you run weasyprint works for you, too. E.g.:

SET LC_ALL=en-us
@liZe

This comment has been minimized.

Copy link
Member

liZe commented Nov 22, 2018

I've (finally!) got this error on my Linux system today for the first time. I have to check what's going on…

@Niecke

This comment has been minimized.

Copy link

Niecke commented Nov 22, 2018

Tontyna was right, since I needed German date format for the application I set the container to de_DE.UTF-8. Sorry for ignoring the other issue I thought it was only windows related and did not realize that it could be the locale.

Using export en_US.UTF-8 before running the flask application solved the problem. Now I need to find another way to get the right date format, but I think this will be a minor issue.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Nov 22, 2018

Tontyna was right

He (almost) always is!

it might be nevertheless a (THE?) locale problem.

As said in #665, the problem is probably caused by the parse_float function in Cairo. But I have to understand why it sometimes crashes and sometimes doesn't. Now that I can reproduce the problem on my computer, it is much easier. My computer has always had a French locale (using commas as decimal separator), I'd like to understand why the problem only appears today.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Nov 22, 2018

OK, the problem appears for me when I use locale.setlocale(locale.LC_ALL, 'fr_FR') in my code. As simple as that.

Unfortunately, it doesn't explain the difference between Windows versions.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Nov 22, 2018

Well… We can consider it as a bug in WeasyPrint, as Cairo's documentation uses locale-dependent sprintf function to generate tag strings. That's ugly IMO, but following the same behavior in WeasyPrint should be pretty easy (and wouldn't require long talks with Cairo devs).

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Nov 22, 2018

But we cant solve it in a satisfying way within WeasyPrint. At least I didnt find one. I tried that. No way to detect, whether Cairo expects a dot or a comma or whatever. Think of thousands separators...

Of course, we could try ... except the calls to add_hyperlinks() and add_outline(), but in the except part we still dont know for sure what number format Cairo expects. So blindly exchanging dot with comma and another try...except and then a warning "sorry, unable to satisfy Cairo"?

Maybe it's not a bug in Cairo (HA! because their documentation tells us about this bad decision! Does it?) - -- but as @liZe said in #665:

Using this [locale-dependent] function to parse a locale-unrelated string looks like a bad idea. That's probably where the "real" bug is.

And we shouldn't forget that maybe it's Pango that forces Cairo to want a comma.
As I pointed out it's turned on by pango_context_set_font_map.

BTW: I'm glad that @liZe can reproduce the feature, because my Win8 system was upgraded to Win10 recently.

Ah, and your statement

He (almost) always is!

makes me happy, but it's wrong because "He" is a "She" 😁

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Nov 22, 2018

should be pretty easy

Oh my god.

'{:n}'.format is exactly what we need. But. It's broken with locales using non-ascii thousands separators.

When fixing bugs in WeasyPrint caused by poor choices from Cairo devs is impossible because of Python bugs, it really, really is annoying.

Let's find a workaround for the workaround.

liZe added a commit that referenced this issue Nov 22, 2018

Localize floats used by Cairo tags
According to the documentation (and to the code), Cairo seems to rely on
locale-dependent strings for tag attributes. It's for sure a bad idea, but at
least try to follow this rule and see if it fixes our problems.

Related to #742.
@liZe

This comment has been minimized.

Copy link
Member

liZe commented Nov 22, 2018

makes me happy, but it's wrong because "He" is a "She"

And one more time, I'm wrong, she's right. QED 😉.

(For real: I'm really sorry about my bad assumption…)

And we shouldn't forget that maybe it's Pango that forces Cairo to want a comma.
As I pointed out it's turned on by pango_context_set_font_map.

Oh, I didn't remember that detail.

What I did in c34b128 is at least to rely on current locale when generating tag attributes for Cairo. It shouldn't break anything for configurations that used to work (at least tests pass), and it may fix the problem with other users (at least it does for me).

Of course, the real fix is probably to use locale-independent strings for tag attributes in Cairo. I'll open an issue on Cairo's bug tracker.

@Niecke Is it possible for you to test the current master branch.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Nov 22, 2018

I'll open an issue on Cairo's bug tracker.

https://gitlab.freedesktop.org/cairo/cairo/issues/347

@Niecke

This comment has been minimized.

Copy link

Niecke commented Nov 22, 2018

@liZe I will try to test it tomorrow.

Thanks already for all your support.

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Nov 22, 2018

Maybe c34b128 solves the bug on Linux, but now my German Windows 7 that was fine before, crashes. Probably @JohannesMunk's Win 10 systems will crash, too.

Edit: Only one of my German Win7 systems (a virtual box within my former Win8) crashes with the patch, another non-virtual Win7 still doesnt crash.
Strange.
Only explanation would be that on the virtual one Python's locale.setlocale(locale.LC_ALL) isn't C anymore, or Cairo/Pango succeeed in retrieving the system locale since its hosting computer switched to Win10 (damned, no access to that computer until Monday!).
The non-virtual one still returns C and of course, with and without patch the decimal separator is a dot.

Especially remarkable: My Windows 8 computer, the one that was upgraded to Windows 10, crashes with and without the patch. Great! Can investigate this funny feature furthermore 😬

Would be interested how @JohannesMunk's Windows 8 behaves with the patch -- I gues it still crashes.

The only thing I know for sure is that environment variables that I change or set within Python scripts never reach the DLLs, they inherit the initial environment of the initial Python process.

@liZe -- your locale.setlocale(locale.LC_ALL, 'fr_FR') wasn't the Python function, not called within a Python script but somewhere outside, in the environment?

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Nov 22, 2018

@liZe -- I don't get it. How is locale.format('%f', number) supposed to create a number with locale-aware decimal separator unless some Python code sets Python's locale? When WeasyPrint is run from commandline, there is nobody to call locale.setlocale and Python's documentation states clearly that on startup the locale is C, only LC_CTYPE is adapted.

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Nov 22, 2018

Contemplating further I see 3 ways to fix the bug:

  1. Force erverybody to use the same locale -- SET LC_ALL or equivalent in the environment to catch the libraries (and hope they all obey) and locale.setlocale and locale.format in the Python department.
  2. Try to find out which locale (number format) Cairo/Pango decides to use and apply that to the CAIRO_TAGS
  3. Persuade the Cairo developers to use dots.

Nr. 1 is ugly, since you never know who else tries to force his own laguage; not sure whether nr. 2 will work (similar reasons and maybe incompatibilities in locale aliasses); nr. 3 is the flawless favourite, though the only thing we can do to make it come true is praying.

@Niecke

This comment has been minimized.

Copy link

Niecke commented Nov 23, 2018

I now tested ddf10d2. Using locale.setlocale(locale.LC_ALL, "de_DE.utf8") within the application works fine within a Docker Linux Container on a Linux Server and in a Linux VM that was hosted on Windows 10.

Using export LC_ALL=de_DE.UTF-8 also works.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Nov 23, 2018

Edit: Only one of my German Win7 systems (a virtual box within my former Win8) crashes with the patch, another non-virtual Win7 still doesnt crash.
Strange.

@liZe -- your locale.setlocale(locale.LC_ALL, 'fr_FR') wasn't the Python function, not called within a Python script but somewhere outside, in the environment?

It was the Python function. Here's a summary on my computer:

  • from command line, system locale set to fr_FR: v43 and master work
  • from command line, LC_ALL set to fr_FR from environment variable: v43 and master work
  • as a library, in a script beginning with locale.setlocale(locale.LC_ALL, 'fr_FR'): v43 crashes and master work

The only thing I know for sure is that environment variables that I change or set within Python scripts never reach the DLLs, they inherit the initial environment of the initial Python process.

Oh. On Linux, the only thing I know for sure is that environment variables I set on my system outside the Python scripts never reach shared libraries. Of course, that's not how @Niecke's system seems to work.

We're doomed 🤖.

I don't get it. How is locale.format('%f', number) supposed to create a number with locale-aware decimal separator unless some Python code sets Python's locale?

It was Python code, so it should make sense now.

When WeasyPrint is run from commandline, there is nobody to call locale.setlocale and Python's documentation states clearly that on startup the locale is C, only LC_CTYPE is adapted.

That's what I thought too, but after reading this chapter on Python 3.7 changelog I'm not sure anymore ([…] implied by the use of the default C or POSIX locale on non-Windows platforms.)

Nr. 1 is ugly, since you never know who else tries to force his own language

I agree.

not sure whether nr. 2 will work (similar reasons and maybe incompatibilities in locale aliasses)

Not sure at all 😄.

nr. 3 is the flawless favourite, though the only thing we can do to make it come true is praying

It's the real fix, but then we have to find a workaround for Cairo versions between 1.15.x and 1.16.0 and apply this workaround only for buggy versions as Cairo's fix is not backward compatible 😱.

I now tested ddf10d2. Using locale.setlocale(locale.LC_ALL, "de_DE.utf8") within the application works fine within a Docker Linux Container on a Linux Server and in a Linux VM that was hosted on Windows 10.

Using export LC_ALL=de_DE.UTF-8 also works.

@Niecke Does it work without locale.setlocale or export LC_ALL?

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Nov 23, 2018

We're doomed

Yes we are.
No we aren't. We could reintroduce pdfrw 😏

Anyway. Any OS, any locale set or not set. Unless we know for sure that there is a Cairo that uses dots, we should try to provide the expected number format.

On Linux it looks like locale.setlocale does the trick (set it to C at the start of Documnt.write_pdf and reset it at the end should work).

On Windows it looks like instead we have to ask Pango for its PangoLanguage* pango_language_get_default (void) and pray that an alias is returned that Python knows -- Pango's language is de-de on my (crashing) Win8 computer, on my non-crashing Win7 computer it's a bug and en-us, while on both systems, I get German_Germany.1252 when I ask Python for the current user locale when startet from a command prompt.

And it looks like Cairo switches its language within the first call to pango_context_set_font_map().

Couldnt yet find a function to tell Pango that, for heavens sake & while WeasyPrint is rendering, it should use C. (Of course, there is the brute-force LC_ALL environment variable).
Pango documentation states that indeed, there isn't such a Pango function but setlocale on Unix systems.

Yes, at least on Windows, we're quite doomed.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Dec 27, 2018

OK, it's time to take a decision. This bug broke some of my production web services, I'm really angry.

😠

I've spent a loooooot of time trying to find a solution, but there's no real solution. Here's what I think now:

  1. c34b128 is a real fix, as Cairo needs localized strings. At least we get localized strings now. As far as I can understand this problem, it should fix this issue on platforms where Pango doesn't play with locales, ie. Linux and macOS.
  2. c34b128 breaks Windows sometimes. That's bad, and that's for sure caused by pango_context_set_font_map. As this function is "only for internal use by Pango backends", I suppose that we can find a way to avoid this function call. @Tontyna Does the current version works when commenting the pango_context_set_font_map call?
  3. There's no way we can get a satisfying answer from Cairo. The proposed patch tries to allow dots, but obviously won't be backported to previous versions. And I don't want to spend time digging and talking about thousands separators. I'll ask Cairo devs to forget this issue, maybe improve the documentation, but that's all.

Time to get version 44 working everywhere. And if we can't with Cairo, then we'll edit the generated PDF files with our own PDF post processor, as we used to do.

@liZe liZe added this to the 44 milestone Dec 27, 2018

liZe added a commit that referenced this issue Dec 27, 2018

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Dec 27, 2018

I've created an avoid_set_font_map branch, I'll try to test it on Windows (I can steal a VM for the next two days).

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 27, 2018

Does the current version works when commenting the pango_context_set_font_map call?

When commented out GTK stays with the dot. As long as WeasyPrint's _locale_float() produces a dot the numbers (and hyperlinks and bookmarks) are fine. (Assuming that no other thread sharing the DLLs with WeasyPrint induced GTK to activate another locale.)

But pango_context_set_font_map call is prerequisite for the @font-face feature.

@liZe -- at the moment I've only access to my (German) Win7 system where GTK erroneously assumes a locale of en-us and as long as Python's initial C-locale isnt altered everybody works happily with the dot. So I hope your VM is a Windows where GTK gets the French comma.

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 27, 2018

Had a look at Adrian's patch and ... hm ... better forget it asap.

I've created an avoid_set_font_map branch

waiting in suspense whether pangoft2.pango_fc_font_map_set_config() will rescue the decimal separator ... I expect it wont.

edit the generated PDF files with our own PDF post processor, as we used to do.

seems the way to go.

liZe added a commit that referenced this issue Dec 27, 2018

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Dec 27, 2018

So, here's what I can understand from my test on a French Windows 10.

Version 43 always works. With our without setting the French locale (using a comma as decimal separator) in the terminal using set, with or without setting the locale in Python, it just works.

Actually, Cairo always needs dots as decimal separator.

Current master and avoid_set_font_map branches don't work when I set the locale in Python, as it makes _local_float return commas that are not accepted by Cairo.

Calling or not calling pango_context_set_font_map or pango_cairo_font_map_set_default doesn't change anything for me. Cairo only works with dots.


@Tontyna The only thing that is clear to me now from this result and from your comments is that using dots on Windows seem to work more often, and that setting the locale with Python on Linux and macOS changes the decimal separator used in the Python script (of course) but also in Cairo.

(And your tutorial for Windows is awesome 👏.)

I've updated avoid_set_font_map. It should now:

  • work on platforms where version 43 was working,
  • work on POSIX platforms when locale is set in Python scripts, and
  • call pango_cairo_font_map_set_default insead of pango_context_set_font_map, trying to fix Windows versions where calling pango_context_set_font_map changes the locale used by Cairo.

waiting in suspense whether pangoft2.pango_fc_font_map_set_config() will rescue the decimal separator ... I expect it wont.

As pango_cairo_font_map_set_default only changes the default map for the current thread, I really hope that it doesn't work so that we don't introduce other crazy bugs in multithreaded applications.

I'll try tomorrow on my Windows VM.

edit the generated PDF files with our own PDF post processor, as we used to do.

seems the way to go.

Is this the first step of our own PDF generator replacing Cairo?

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 28, 2018

Actually, Cairo always needs dots as decimal separator.

Oh, oh, oh, I fear your VM is one of the IMO buggy systems where GTK is unable to detect the actual locale, even ignores LC_ALL and instead decides that en-us should be applied -- what do you receive from pango_language_get_default()?

My impression is that on newer (better, non-buggy?) Windows systems GTK gets it right (yes, gets the floating point format the user is used to use, HA!) and we are doomed or forced to set LC_ALL and hope all threads and all subprocesses agree with that setting.

Whether the dot-affine Windows is more frequent than the proper-locale one is a question we cannot decide yet. Not enough data:

  • Me: knowing 1 dot-affine and 2 proper-locale systems, 1 VM switched from dot-affine to proper-locale after upgrading its host from Win8 to Win10.
  • @liZe: 1 dot-affine
  • @JohannesMunk: at least one of each kind
  • @Niecke: 1 proper-locale

Will have access to my proper-locale computers on January 14 again. Until then I don't believe that your patch solves the problem.

Is this the first step of our own PDF generator replacing Cairo?

Oh yes! And no Pango, no GTK, no DLL hell, no C! Great!

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 28, 2018

Tonight's memory flash: The dot-affine Windows is a decade-old well-known bug. I didn't recognize it at once because it struck me when programming in Delphi.

@liZe: to turn your VM into a proper-locale one, raising CAIRO_STATUS_TAG_ERROR, you change the current Date/Time settings to something different, save the changes, then go back to Date/Time settings, and change it back to French again. (I guess and hope this works for the number format, too)

See https://stackoverflow.com/questions/4204786/strange-regional-character-datetime-issue-with-delphi7

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Dec 28, 2018

  • 1 VM switched from dot-affine to proper-locale after upgrading its host from Win8 to Win10

Seriously…

what do you receive from pango_language_get_default()?

And it's… 💥 fr-fr 💥.

So, even when LC_ALL is set to fr-fr, even if Python's locale is set to fr-fr, even if Pango's default language is fr-fr, Cairo wants dots on my computer.

Unsurprisingly, avoid_set_font_map now works with cbaa7fa.

Oh yes! And no Pango, no GTK, no DLL hell, no C! Great!

It sounds crazy, but, who knows…

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 28, 2018

🎉 👍 💥

Apart from that: I still don't believe it.
Since your VM never and under no circumstances, not even when calling pango_context_set_font_map, neither wanted a comma nor failed with CAIRO_STATUS_TAG_ERROR it doesn't need a workaround to avoid commas, quite the contrary.

At maximum I'll believe that avoiding pango_context_set_font_map makes GTK stay with the C locale. I'll check that as soon as I gain access to my proper-locale systems.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Dec 28, 2018

nor failed with CAIRO_STATUS_TAG_ERROR

It failed with CAIRO_STATUS_TAG_ERROR when I gave commas (before cbaa7fa and when locale was set in Python).

it doesn't need a workaround to avoid commas, quite the contrary.

Sure.

At maximum I'll believe that avoiding pango_context_set_font_map makes GTK stay with the C locale.

As version 43 works perfectly too, I believe that pango_context_set_font_map doesn't change the locale on my computer.

I'm trying to find why…

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 28, 2018

As I said: Your VM is one of the dot-affine buggy Windows systems.

Unless you edit your Date/Time/Number settings as described above you can turn upside down or whatever, without impact on the dot.

Dont know how to (temporarily) activate the proper-locale state in GTK or via Python, in Delphi we called the Windows API function SetThreadLocale(LOCALE_USER_DEFAULT) to correct the mistake during the lifetime of our applications.

(And no, I cant adjust the settings on my buggy dot-affine dev machine, it's the last within our office where I can check whether our workaround really works.)

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 28, 2018

...and I hope editing the Date/Time/Number settings indeed terminates the dot-affinity. I never practiced that solution myself.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Dec 28, 2018

...and I hope editing the Date/Time/Numer settings indeed terminates the dot-affinity.

Of course, it doesn't.

I swear I've tried to change everything in the date/time/number settings, language settings, country settings… Numbers and dates change everywhere in other applications, but my Cairo version just wants dots.

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 28, 2018

OMG and pango_language_get_default returns fr-fr, that's fantastic! I became a programmer because I liked the machine doing what I told her ... oh golden olden days ...

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 28, 2018

HA! Whats your Cairo version? 😏
Inadvertedly applied Adrian's patch, huh?

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 28, 2018

Standard question to our customers in suchlike situations: Did you restart your computer after changing the settings? If no, then please do so.

@JohannesMunk

This comment has been minimized.

Copy link

JohannesMunk commented Dec 28, 2018

Hey you 2! I know thats probably not your style, but why dont you grab a sledgehammer and get rid of this problem?

Do the conversion once with everything as is, which works ok on most systems. but if the function call fails, handle that by trying again with first all commas str replaced to dots and then the otherway around.

?

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 28, 2018

Oh yeah! SLEDGEHAMMER 💣 💥 🔥 ⚡️ 🏴 (damnit, all those emojis are ugly sweetish)

Oh no, me being a nonviolent being 🐑

@JohannesMunk: what about thousands separators and other potential atrociousnesses when it comes to number formatting?

The only universal sledgehammer so far is: use a dot, possibly in combination with SET LC_ALL=C

@JohannesMunk

This comment has been minimized.

Copy link

JohannesMunk commented Dec 28, 2018

do you have the number strings one by one? if so thousand separators should easily be fixable=removable by a basic regular expression. and you dont need to put them back in..

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Dec 28, 2018

Inadvertedly applied Adrian's patch, huh?

Did you imagine me patching and compiling Cairo on Windows? 😉

GTK+ for Windows has been updated on December 19th, maybe there's something new in this release…

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Dec 28, 2018

do you have the number strings one by one? if so thousand separators should easily be fixable=removable by a basic regular expression. and you dont need to put them back in..

Our problem is that the error occurs when we render the document, not when we generate the strings. When we're able to catch the exception, it's already too late to change the string.

Did you restart your computer after changing the settings? If no, then please do so.

I did between yesterday and today. Unfortunately, I'm now on holidays and can't access the Windows VM anymore.

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 28, 2018

Our real problem is that we neither can detect (and adapt to) the required number format nor can we force Cairo to switch to a format of our choice.

We could somehow

if sys.platform.startswith('win'):
    try:
        # create a dummy document with C-numbers
        float_format = 'dots'
    except Exception:
        float_format = 'comma'
else:
    float_format = 'locale'

but thats ugly. No. Not satisfying at all.

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 29, 2018

Have been reading various C sources, contemplated about DLLs and their environments, did some experimentes on my Win7 machine, formerly known as dot-affine, with strange results.

Depending on the versions of the 30+ GTK libraries and the set LC_ALL, Cairo (resp. sscanf) is happy with the dot or fails with the famous CAIRO_STATUS_ERROR unless I feed it with a comma.

For example SET LC_ALL=de_de insists on commas while SET LC_ALL=de-de wants a dot. In both cases pango.pango_language_get_default() tells me that the language is de-de.
Similar effect with LC_ALL=it_doesnt_matter (comma) and LC_ALL=it-doesnt-matter (dot), while both, LC_ALL=i_dont_care and LC_ALL=i-dont-care want the dot.
Interestingly nobody wanted a thousands separator, only FontConfig sometimes warned me about ignoring invalid locale.

My conclusion is that the locale(-string) which the GTK-libraries share (?) either never reaches the Microsoft C libraries (the code that implements sscanf) or reaches it crumbled with unpredictable effects on the number format. And the culprit seems to be ??? dunno. Only thing I know for sure: Cairo's use of sscanf is a dumb typo.

Do-it-ourselves or use the sledgehammer seems to be the choice.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Dec 29, 2018

Do-it-ourselves or use the sledgehammer seems to be the choice.

Or…

As Adrian says (and I think he's right), %f doesn't insert thousands separators. As the only problem we have is the decimal separators (right?), we can just remove the decimal part. The greatest error we can have is 0.5pt (0.18mm), maybe that's an error we can live with.

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 29, 2018

just remove the decimal part

Wow, yes! Of course! Simple, clean, brilliant!!

Nobody will ever notice hyperlinks shifted by 0.18mm .

@liZe liZe closed this in cb87108 Dec 29, 2018

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Dec 29, 2018

Here's our nonviolent solution 🐑.

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Dec 29, 2018

Works like a charm.

Though (at high zoom levels) the shift is perceptible in the PDF viewer, without knowing about the trick I woudn't pay attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment