New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix encodebase64 and decodebase64 filters #7683
Conversation
The documentation for encodebase64 says that the input is treated as binary data, but in fact the input is being treated as text data, with an extra UTF-8 encoding step being performed first. Likewise, the decodebase64 documentation says that it outputs binary data, but in fact it will do a UTF-8 decoding step before producing output, which will in fact garble binary data. This commit changes the behavior of encodebase64 and decodebase64 to match what the documentation says they do. It also adds an optional `text` suffix to both filters to keep the current behavior. Finally, an optional `urlsafe` suffix is added to both filters to allow them to use the "URL-safe" variant of base64 (using `-` instead of `+` and `_` instead of `/`).
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
2c45696
to
f338eae
Compare
Turns out a little more than this is going to be needed.
f338eae
to
dadf6a4
Compare
Hi @rmunn, I feel that we should preserve the existing behaviour to minimize the impact on end users and update the documentation accordingly. |
Except that the existing behavior (expecting text input) is rarely useful AFAICT. Most uses of |
@rmunn we have a strict backwards compatibility policy to not break the behaviour of wikitext code that might exist in the wild. Is this change sufficiently important to break backward compatibility @Jermolene ? |
Does the backwards compatibility mandate extend to bugs? Because when the documentation says the wiki does one thing, but the actual behavior is something else, I think that's a bug and the behavior should be changed to match what the documentation says. |
P.S. Here's what I wrote in the related bug report at #7626 (comment):
|
Thanks @rmunn @saqimtiaz on balance I would prefer to maintain backwards compatibility here, and update the docs. Generally, I don't think the quality of our docs is high enough for us to regard the docs as the single source of truth; for us, it is still the code. |
Okay. When I get back around to this PR (I'll probably finish the if/then/else and list join PRs first), I'll switch the default to text mode and add a "binary" suffix for turning off the UTF-8 encoding step. That will preserve the existing behavior by default. Then I'll update the documentation accordingly. |
Have to use String.replace with a global regex instead
@Jermolene - Got back around to fixing the encodebase64 and decodebase64 filters. Please let me know what you think. I struggled a little with writing the documentation (should I assume people know what UTF-8 is?), so if you have suggestions to improve it they would be welcome. But the code works, and I've included unit tests to ensure it continues to work. Once this is merged, a couple places in the code that use |
Thank you @rmunn, looks good. The docs definitely don't need to explain base64 and utf8 but it might be useful to link to good, durable sources for further information.
Excellent, I think it's probably reasonable include those changes in this PR? |
source for utf-8: https://developer.mozilla.org/en-US/docs/Glossary/UTF-8 |
Since window.btoa() is not available under Node.js, we'll replace all uses of it with the $tw.utils.base64encode() function that now works correctly for binary data.
Done. The one in the GitHub saver now encodes in text mode (I double checked and that's the correct way to encode an |
Done. There was already a link to the Base64 definition in MDN's glossary so I just added a link to the UTF-8 definition. |
Hi @rmunn I just came across this article which would have helped me in the first place: https://web.dev/articles/base64-encoding Do we have tests to check for the lone surrogate pair problem that they mention? |
Hi @rmunn I am seeing the following errors when running the test suite in the browser which may be related to this PR. Visit https://tiddlywiki.com/prerelease/test.html and then scroll down to see the errors:
|
@Jermolene - #7814 will fix the mistake I made. (I didn't realize that |
Thank you, much appreciated.
No problem. |
Fixes #7626.
The documentation for the
encodebase64
filter operator says that the input is treated as binary data, but in fact the input is being treated as text data, with an extra UTF-8 encoding step being performed first.Likewise, the
decodebase64
documentation says that it outputs binary data, but in fact it will do a UTF-8 decoding step before producing output, which will in fact garble binary data.This PR changes the behavior of encodebase64 and decodebase64 to match what the documentation says they do. It also adds an optional
text
suffix to both filters to keep the current behavior.Finally, an optional
urlsafe
suffix is added to both filters to allow them to use the "URL-safe" variant of base64 (using-
instead of+
and_
instead of/
).