-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 compatible export for BeerXML #624
Comments
Thanks for the detailed description. This is indeed not the behaviour we would want. I think there are a couple of improvements we could make in the code. Firstly, we're probably not doing the right thing when we write strings out to the XML file, even in the newest version of the code. It should be straightforward to get Qt to transcode our strings to ISO-8859-1. Secondly, BeerXML is a bit odd in that it requires us to use ISO-8859-1 (see "XML Header" section of http://www.beerxml.com/beerxml.htm). This does support some accents etc (including most of the ones you'd use in French) but it's not great for all languages. We could at least consider allowing export in UTF-8, ie give you the option to create something that is valid XML, and would probably be read OK by programs wanting to parse BeerXML, even though it's not strictly in compliance with BeerXML 1.0. I'll try to have a look at this in the next few weeks. (Bit distracted by working on things that will enable us to support BeerJSON, but hope to reach a natural break-point on that soon.) |
Thanks for your quick reply ! It effectively states :
But I'm wondering how to interprete those statements. Maybe it's not the place to talk about this (in such a case, I apologize for the noise 😄 ), but could one interprete the statement as "we need the xml's header line as per xml standard, which may looks like this: It seems that the wikipedia page of BeerXML supports this hypothesis. Regarding the BeerJSON, that's really nice to read ! I was wondering about this myself some hours ago, that could be a nice feature to have in order to share some recipes away. |
As you'll have seen in the comments in BeerXml.cpp, there are several respects in which the BeerXML 1.0 standard is flawed, including that it is not actually valid XML. Since the standard is no longer actively maintained, we can't get definitive answers on how to interpret the bits that seem badly worded or plain wrong. The sample files I found all had the In the long run, I think a lot of folks want BeerJSON to supplant BeerXML. (AIUI, the data model of BeerJSON is based on a draft BeerXML 2.0 standard that never got off the ground.) One good thing with BeerJSON is that it is defined by its published schema, so it's less ambiguous than BeerXML and much easier to validate. |
You just had to summon me. I want beerXML to die. In flames. I will laugh and dance with joy around its funeral pyre. The I have no real guidance on this. It seems a reasonable change, but I simply do not understand the effects enough to say more. |
Thanks @mikfire! @bebenlebricolo, I have had a quick look at the code and have some good news for you. I see now that I actually already coded a fix for the I think @mikfire is soon going to do the merge for #617, so once that happens, grab the latest Brewtarget code and give it a try. (If you're really impatient, you could install https://github.com/Brewken/brewken, which already has the fix merged - see line 736 of https://github.com/Brewken/brewken/blob/develop/src/xml/BeerXml.cpp. Brewken will ultimately be "Brewtarget plus features for microbreweries" but at the moment it's where I've been working on Database/BeerJSON stuff before backporting to Brewtarget.) |
Neat ! This might work, I'll test when I have time. Thanks @matty0ung and @mikfire for your replies, I'll test the new versions when available or even test the feature on your branch directly, I suppose brewtarget is quite straightforward to build. |
No problem. 👍 BTW, not that it matters, but, since you mention it, ISO-8859-1 does not use 2 character ligatures. It's a single-byte character set -- see https://en.wikipedia.org/wiki/ISO/IEC_8859-1 (ou https://fr.wikipedia.org/wiki/ISO/CEI_8859-1 si tu préfères en français!) It's all hangovers from the days of pain before Unicode (and its encodings such as UTF-8) unified the world. There is a whole world of detail and pain that you could get into in investigating these things, but largely of historical interest. Eg modern standards such as JSON are pretty much all UTF-8 so we (probably) won't have to think about character encodings when we do BeerJSON. |
Hi (again) @matty0ung thanks for the insights, I should have had a look to the specification before guessing in the wild 😅 ! This leads to the point that a beerJSON implementation will fill all those encoding gaps (and all the pain will hopefully disappear, at least the ones caused by languages/character set compatibilities). I'm closing this issue as the original subject was addressed by the #617 PR. |
This might be contained within another Issue already, but I discovered that exporting in beerXML format breaks some character sets.
For instance, I'm using the french alphabet with accents and special characters ; after writing a beer recipe and exporting it to beerXML, I found that in the XML itself I had issues with the encoding ISO-8859-1 as it encodes only the first 256 characters available in UTF-8 (so no accents). Another difference is that UTF-8 is encoded on two octets whereas ISO-8859-1 only uses one.
Misencoding for western languages (?)
The Xml header indicates :
and further down the file I have (opening the file in UTF-8 mode)
"Empâtage" is the french translation of "Mashing", but accents are not preserved it seems.
Another example :
Expected : "Mandarina Bavaria à l'ébullition pendant 10.000 min"
Reopening the file using the ISO-8859-1 encoding gives the following results :
This encoding splits each special character into 2 bytes and tries to decode it with the ISO-8859-1 character set, so it does not account for the extra characters.
Local configuration
BrewTarget version : 2.3.1 (arch linux x86_64 bits, using arch linux AUR via yay)
I've updated the brewtarget version to the latest available version in arch AUR (2.3.1-3) and the issue still appears to be there.
The text was updated successfully, but these errors were encountered: