New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAIRO_STATUS_TAG_ERROR: b'invalid tag name, attributes, or nesting' #742
Comments
This error has been discussed in length in PR #665 -- seemed to be a special case on Win 8.1 with german locale. Though you're obviously not on Windows, but probably in Germany ... it might be nevertheless a (THE?) locale problem. I dunno the commands to check or set the language on your OS, but maybe defining the environment variable
|
I've (finally!) got this error on my Linux system today for the first time. I have to check what's going on… |
Tontyna was right, since I needed German date format for the application I set the container to de_DE.UTF-8. Sorry for ignoring the other issue I thought it was only windows related and did not realize that it could be the locale. Using |
He (almost) always is!
As said in #665, the problem is probably caused by the |
OK, the problem appears for me when I use Unfortunately, it doesn't explain the difference between Windows versions. |
Well… We can consider it as a bug in WeasyPrint, as Cairo's documentation uses locale-dependent |
But we cant solve it in a satisfying way within WeasyPrint. At least I didnt find one. I tried that. No way to detect, whether Cairo expects a dot or a comma or whatever. Think of thousands separators... Of course, we could Maybe it's not a bug in Cairo (HA! because their documentation tells us about this bad decision! Does it?) - -- but as @liZe said in #665:
And we shouldn't forget that maybe it's Pango that forces Cairo to want a comma. BTW: I'm glad that @liZe can reproduce the feature, because my Win8 system was upgraded to Win10 recently. Ah, and your statement
makes me happy, but it's wrong because "He" is a "She" 😁 |
Oh my god.
When fixing bugs in WeasyPrint caused by poor choices from Cairo devs is impossible because of Python bugs, it really, really is annoying. Let's find a workaround for the workaround. |
According to the documentation (and to the code), Cairo seems to rely on locale-dependent strings for tag attributes. It's for sure a bad idea, but at least try to follow this rule and see if it fixes our problems. Related to #742.
And one more time, I'm wrong, she's right. QED 😉. (For real: I'm really sorry about my bad assumption…)
Oh, I didn't remember that detail. What I did in c34b128 is at least to rely on current locale when generating tag attributes for Cairo. It shouldn't break anything for configurations that used to work (at least tests pass), and it may fix the problem with other users (at least it does for me). Of course, the real fix is probably to use locale-independent strings for tag attributes in Cairo. I'll open an issue on Cairo's bug tracker. @Niecke Is it possible for you to test the current master branch. |
|
@liZe I will try to test it tomorrow. Thanks already for all your support. |
Maybe c34b128 solves the bug on Linux, but now my German Windows 7 that was fine before, crashes. Probably @JohannesMunk's Win 10 systems will crash, too. Edit: Only one of my German Win7 systems (a virtual box within my former Win8) crashes with the patch, another non-virtual Win7 still doesnt crash. Especially remarkable: My Windows 8 computer, the one that was upgraded to Windows 10, crashes with and without the patch. Great! Can investigate this funny feature furthermore 😬 Would be interested how @JohannesMunk's Windows 8 behaves with the patch -- I gues it still crashes. The only thing I know for sure is that environment variables that I change or set within Python scripts never reach the DLLs, they inherit the initial environment of the initial Python process. @liZe -- your |
@liZe -- I don't get it. How is |
Contemplating further I see 3 ways to fix the bug:
Nr. 1 is ugly, since you never know who else tries to force his own laguage; not sure whether nr. 2 will work (similar reasons and maybe incompatibilities in locale aliasses); nr. 3 is the flawless favourite, though the only thing we can do to make it come true is praying. |
I now tested ddf10d2. Using Using |
…
It was the Python function. Here's a summary on my computer:
Oh. On Linux, the only thing I know for sure is that environment variables I set on my system outside the Python scripts never reach shared libraries. Of course, that's not how @Niecke's system seems to work. We're doomed 🤖.
It was Python code, so it should make sense now.
That's what I thought too, but after reading this chapter on Python 3.7 changelog I'm not sure anymore ([…] implied by the use of the default C or POSIX locale on non-Windows platforms.)
I agree.
Not sure at all 😄.
It's the real fix, but then we have to find a workaround for Cairo versions between 1.15.x and 1.16.0 and apply this workaround only for buggy versions as Cairo's fix is not backward compatible 😱.
@Niecke Does it work without |
Yes we are. Anyway. Any OS, any locale set or not set. Unless we know for sure that there is a Cairo that uses dots, we should try to provide the expected number format. On Linux it looks like On Windows it looks like instead we have to ask Pango for its PangoLanguage* pango_language_get_default (void) and pray that an alias is returned that Python knows -- Pango's language is de-de on my (crashing) Win8 computer, on my non-crashing Win7 computer it's a bug and en-us, while on both systems, I get German_Germany.1252 when I ask Python for the current user locale when startet from a command prompt. And it looks like Cairo switches its language within the first call to Couldnt yet find a function to tell Pango that, for heavens sake & while WeasyPrint is rendering, it should use C. (Of course, there is the brute-force LC_ALL environment variable). Yes, at least on Windows, we're quite doomed. |
OK, it's time to take a decision. This bug broke some of my production web services, I'm really angry. 😠 I've spent a loooooot of time trying to find a solution, but there's no real solution. Here's what I think now:
Time to get version 44 working everywhere. And if we can't with Cairo, then we'll edit the generated PDF files with our own PDF post processor, as we used to do. |
I've created an |
When commented out GTK stays with the dot. As long as WeasyPrint's But @liZe -- at the moment I've only access to my (German) Win7 system where GTK erroneously assumes a locale of en-us and as long as Python's initial C-locale isnt altered everybody works happily with the dot. So I hope your VM is a Windows where GTK gets the French comma. |
Had a look at Adrian's patch and ... hm ... better forget it asap.
waiting in suspense whether
seems the way to go. |
So, here's what I can understand from my test on a French Windows 10. Version 43 always works. With our without setting the French locale (using a comma as decimal separator) in the terminal using Actually, Cairo always needs dots as decimal separator. Current Calling or not calling @Tontyna The only thing that is clear to me now from this result and from your comments is that using dots on Windows seem to work more often, and that setting the locale with Python on Linux and macOS changes the decimal separator used in the Python script (of course) but also in Cairo. (And your tutorial for Windows is awesome 👏.) I've updated
As I'll try tomorrow on my Windows VM.
Is this the first step of our own PDF generator replacing Cairo? |
Oh, oh, oh, I fear your VM is one of the IMO buggy systems where GTK is unable to detect the actual locale, even ignores My impression is that on newer (better, non-buggy?) Windows systems GTK gets it right (yes, gets the floating point format the user is used to use, HA!) and we are doomed or forced to set Whether the dot-affine Windows is more frequent than the proper-locale one is a question we cannot decide yet. Not enough data:
Will have access to my proper-locale computers on January 14 again. Until then I don't believe that your patch solves the problem.
Oh yes! And no Pango, no GTK, no DLL hell, no C! Great! |
Tonight's memory flash: The dot-affine Windows is a decade-old well-known bug. I didn't recognize it at once because it struck me when programming in Delphi. @liZe: to turn your VM into a proper-locale one, raising See https://stackoverflow.com/questions/4204786/strange-regional-character-datetime-issue-with-delphi7 |
Seriously…
And it's… 💥 So, even when Unsurprisingly,
It sounds crazy, but, who knows… |
🎉 👍 💥 Apart from that: I still don't believe it. At maximum I'll believe that avoiding |
It failed with
Sure.
As version 43 works perfectly too, I believe that I'm trying to find why… |
As I said: Your VM is one of the dot-affine buggy Windows systems. Unless you edit your Date/Time/Number settings as described above you can turn upside down or whatever, without impact on the dot. Dont know how to (temporarily) activate the proper-locale state in GTK or via Python, in Delphi we called the Windows API function (And no, I cant adjust the settings on my buggy dot-affine dev machine, it's the last within our office where I can check whether our workaround really works.) |
...and I hope editing the Date/Time/Number settings indeed terminates the dot-affinity. I never practiced that solution myself. |
Of course, it doesn't. I swear I've tried to change everything in the date/time/number settings, language settings, country settings… Numbers and dates change everywhere in other applications, but my Cairo version just wants dots. |
OMG and |
HA! Whats your Cairo version? 😏 |
Standard question to our customers in suchlike situations: Did you restart your computer after changing the settings? If no, then please do so. |
Hey you 2! I know thats probably not your style, but why dont you grab a sledgehammer and get rid of this problem? Do the conversion once with everything as is, which works ok on most systems. but if the function call fails, handle that by trying again with first all commas str replaced to dots and then the otherway around. ? |
Oh yeah! SLEDGEHAMMER 💣 💥 🔥 ⚡ 🏴 (damnit, all those emojis are ugly sweetish) Oh no, me being a nonviolent being 🐑 @JohannesMunk: what about thousands separators and other potential atrociousnesses when it comes to number formatting? The only universal sledgehammer so far is: use a dot, possibly in combination with |
do you have the number strings one by one? if so thousand separators should easily be fixable=removable by a basic regular expression. and you dont need to put them back in.. |
Did you imagine me patching and compiling Cairo on Windows? 😉 GTK+ for Windows has been updated on December 19th, maybe there's something new in this release… |
Our problem is that the error occurs when we render the document, not when we generate the strings. When we're able to catch the exception, it's already too late to change the string.
I did between yesterday and today. Unfortunately, I'm now on holidays and can't access the Windows VM anymore. |
Our real problem is that we neither can detect (and adapt to) the required number format nor can we force Cairo to switch to a format of our choice. We could somehow if sys.platform.startswith('win'):
try:
# create a dummy document with C-numbers
float_format = 'dots'
except Exception:
float_format = 'comma'
else:
float_format = 'locale' but thats ugly. No. Not satisfying at all. |
Have been reading various C sources, contemplated about DLLs and their environments, did some experimentes on my Win7 machine, formerly known as dot-affine, with strange results. Depending on the versions of the 30+ GTK libraries and the set For example My conclusion is that the locale(-string) which the GTK-libraries share (?) either never reaches the Microsoft C libraries (the code that implements Do-it-ourselves or use the sledgehammer seems to be the choice. |
Or… As Adrian says (and I think he's right), |
Wow, yes! Of course! Simple, clean, brilliant!! Nobody will ever notice hyperlinks shifted by 0.18mm . |
Here's our nonviolent solution 🐑. |
Works like a charm. Though (at high zoom levels) the shift is perceptible in the PDF viewer, without knowing about the trick I woudn't pay attention. |
Hello,
I encountered a strange error. When I try to render a pdf in my flask app with the following code:
Then I get:
Actually I am using flask_weasyprint, but the error is the same when using weasyprint directly. So far I have not been able to reproduce this with a simple flask application, but after removing the
<h1>
and</h1>
tag everythings fine.My python env consists of:
and the whole app is running in a docker container which is based on ubuntu 18.04.
Is there anything I am doing wrong?
The text was updated successfully, but these errors were encountered: