New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing to decompress after strFromU8 #16
Comments
Yes, this behavior is known. The issue is that If you'd like to read a bit on what UTF-8 is in binary and how it works under the hood, this RFC may be of interest. Effectively, we need two separate encoders because UTF-8 binary has to be treated differently: individual characters can range from 1 to 4 bytes. |
Thanks for the clarifications - I arrived at pretty much the exact same conclusion here (not all uint8 arrays constitute valid utf-8 sequences). My usecase was trying to send this data via xhr - encoding the data via latin1 ended up wasting bytes so where I ended up was along the lines of: // data is the result of e.g. gzipSync
const req = new XMLHttpRequest()
req.open(method, url, true)
req.send(new Blob([data.buffer], { type: 'text/plain' })) This seemed to work well on both the client and server side. Hopefully this helps the next fool trying something similar :) |
Yep, that's why I mentioned that binary strings are inefficient in the docs. Perhaps I could make that more clear. Thanks for the question! By the way, have you considered using the asynchronous API? Since you're using XHR asynchronously anyway, it shouldn't be too hard to implement and would avoid freezing the browser. Of course, that doesn't apply if what you're uploading is only a few kB, but I'm recommending it just in case. |
Yeah - the first time through the readme/docs the purpose of latin1 encoding was unclear so it might be worth giving it another pass. That said - great job with the docs + benchmarks, everything was well laid out and made sense. I was implementing on top of an existing legacy codebase so using the async api sadly wasn't that simple - but hoping to do so as I have cycles to refactor/simplify. |
Thanks for the feedback, I'll try to work on the docs a bit more for that. If you have a specific format in mind (e.g. an interactive website or something) let me know, I'm quite terrible at documentation so any suggestions are appreciated. I hope fflate works well in your codebase! |
Is there any way around this without a 200% increase in compressed size? In my testing, the compressed strings seem the same so far in terms of characters, so I'm a bit confused. I think it should probably be documented in the README that this is an issue, even if it's not specific to only fflate. Thanks for the great work @101arrowz! |
I'm not entirely sure what you're asking, could you post some sample code? |
Sorry for the confusion and thanks for the speedy reply @101arrowz! In the README it says:
But in my testing, using And I was thinking the text in this comment might be better served outside of the code sample, as it seems pretty crucial and it's easy to overlook in the codeblock until you're already a ways into trying to implement Sorry if any of this doesn't make sense, it's been a long day. Thank you again for your great work! |
Binary strings are really a hack to represent byte arrays with strings. The code points that map to the characters in a binary string correspond to numbers from 0-255 representing the value of the given byte. In other words, binary strings can represent arbitrary binary data. In contrast, a normal UTF-8 string is just that - a string. It can't represent arbitrary data, only valid Unicode characters. So basically, if you store/transfer the string you generate with any sort of protocol that encodes strings with Unicode (HTTP, Local Storage, cookies, etc.), a binary string will take more bytes than the actual data it encodes. A binary string is really useful for compatibility with old JS where |
Thanks for the library! Been testing it out and ran into an interesting error:
The output for version 0.4.1 in node.js is:
The same seems to occur for other compression functions as well - tested gzipSync, compressSync. Could there be a problem with
strFromU8
?The text was updated successfully, but these errors were encountered: