Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix encodebase64 and decodebase64 filters #7683

Merged
merged 7 commits into from Oct 18, 2023

Conversation

rmunn
Copy link
Contributor

@rmunn rmunn commented Aug 18, 2023

Fixes #7626.

The documentation for the encodebase64 filter operator says that the input is treated as binary data, but in fact the input is being treated as text data, with an extra UTF-8 encoding step being performed first.

Likewise, the decodebase64 documentation says that it outputs binary data, but in fact it will do a UTF-8 decoding step before producing output, which will in fact garble binary data.

This PR changes the behavior of encodebase64 and decodebase64 to match what the documentation says they do. It also adds an optional text suffix to both filters to keep the current behavior.

Finally, an optional urlsafe suffix is added to both filters to allow them to use the "URL-safe" variant of base64 (using - instead of + and _ instead of /).

The documentation for encodebase64 says that the input is treated as
binary data, but in fact the input is being treated as text data, with
an extra UTF-8 encoding step being performed first.

Likewise, the decodebase64 documentation says that it outputs binary
data, but in fact it will do a UTF-8 decoding step before producing
output, which will in fact garble binary data.

This commit changes the behavior of encodebase64 and decodebase64 to
match what the documentation says they do. It also adds an optional
`text` suffix to both filters to keep the current behavior.

Finally, an optional `urlsafe` suffix is added to both filters to allow
them to use the "URL-safe" variant of base64 (using `-` instead of `+`
and `_` instead of `/`).
@vercel
Copy link

vercel bot commented Aug 18, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
tiddlywiki5 ✅ Ready (Inspect) Visit Preview Oct 18, 2023 3:01pm

Turns out a little more than this is going to be needed.
@saqimtiaz
Copy link
Contributor

This PR changes the behavior of encodebase64 and decodebase64 to match what the documentation says they do. It also adds an optional text suffix to both filters to keep the current behavior.

Hi @rmunn, I feel that we should preserve the existing behaviour to minimize the impact on end users and update the documentation accordingly.

@rmunn
Copy link
Contributor Author

rmunn commented Aug 18, 2023

This PR changes the behavior of encodebase64 and decodebase64 to match what the documentation says they do. It also adds an optional text suffix to both filters to keep the current behavior.

Hi @rmunn, I feel that we should preserve the existing behaviour to minimize the impact on end users and update the documentation accordingly.

Except that the existing behavior (expecting text input) is rarely useful AFAICT. Most uses of encodebase64, I suspect, will be taking binary data. So I thought binary data should be the default behavior.

@saqimtiaz
Copy link
Contributor

@rmunn we have a strict backwards compatibility policy to not break the behaviour of wikitext code that might exist in the wild. Is this change sufficiently important to break backward compatibility @Jermolene ?

@rmunn
Copy link
Contributor Author

rmunn commented Aug 18, 2023

Does the backwards compatibility mandate extend to bugs? Because when the documentation says the wiki does one thing, but the actual behavior is something else, I think that's a bug and the behavior should be changed to match what the documentation says.

@rmunn
Copy link
Contributor Author

rmunn commented Aug 18, 2023

P.S. Here's what I wrote in the related bug report at #7626 (comment):

Since the encodebase64 filter was just added in release 5.2.6 in March 2023, five months ago, I doubt that many people have started using it yet, especially because it's buggy at the moment. So I don't think there are a lot of wikis out there relying on the current behavior, and I think it's safe to change the behavior.

@Jermolene
Copy link
Owner

Thanks @rmunn @saqimtiaz on balance I would prefer to maintain backwards compatibility here, and update the docs.

Generally, I don't think the quality of our docs is high enough for us to regard the docs as the single source of truth; for us, it is still the code.

@rmunn
Copy link
Contributor Author

rmunn commented Aug 30, 2023

Okay. When I get back around to this PR (I'll probably finish the if/then/else and list join PRs first), I'll switch the default to text mode and add a "binary" suffix for turning off the UTF-8 encoding step. That will preserve the existing behavior by default. Then I'll update the documentation accordingly.

Have to use String.replace with a global regex instead
@rmunn
Copy link
Contributor Author

rmunn commented Oct 18, 2023

@Jermolene - Got back around to fixing the encodebase64 and decodebase64 filters. Please let me know what you think. I struggled a little with writing the documentation (should I assume people know what UTF-8 is?), so if you have suggestions to improve it they would be welcome. But the code works, and I've included unit tests to ensure it continues to work.

Once this is merged, a couple places in the code that use window.btoa (the GitHub saver and the tm-http-request handler code) could be replaced with $tw.utils.btoa or $tw.utils.base64Encode(str,true) now, so that they will work in Node.js environments where window.btoa is not available.

@Jermolene
Copy link
Owner

@Jermolene - Got back around to fixing the encodebase64 and decodebase64 filters. Please let me know what you think. I struggled a little with writing the documentation (should I assume people know what UTF-8 is?), so if you have suggestions to improve it they would be welcome. But the code works, and I've included unit tests to ensure it continues to work.

Thank you @rmunn, looks good. The docs definitely don't need to explain base64 and utf8 but it might be useful to link to good, durable sources for further information.

Once this is merged, a couple places in the code that use window.btoa (the GitHub saver and the tm-http-request handler code) could be replaced with $tw.utils.btoa or $tw.utils.base64Encode(str,true) now, so that they will work in Node.js environments where window.btoa is not available.

Excellent, I think it's probably reasonable include those changes in this PR?

@pmario
Copy link
Contributor

pmario commented Oct 18, 2023

Since window.btoa() is not available under Node.js, we'll replace all
uses of it with the $tw.utils.base64encode() function that now works
correctly for binary data.
@rmunn
Copy link
Contributor Author

rmunn commented Oct 18, 2023

Excellent, I think it's probably reasonable include those changes in this PR?

Done. The one in the GitHub saver now encodes in text mode (I double checked and that's the correct way to encode an Authorization: Basic header; the previous version would have failed if someone's password had contained an emoji like 📚), and the one in the http.js file now encodes in binary mode.

@rmunn
Copy link
Contributor Author

rmunn commented Oct 18, 2023

Thank you @rmunn, looks good. The docs definitely don't need to explain base64 and utf8 but it might be useful to link to good, durable sources for further information.

Done. There was already a link to the Base64 definition in MDN's glossary so I just added a link to the UTF-8 definition.

@Jermolene
Copy link
Owner

Great thank you @rmunn @pmario

@Jermolene Jermolene merged commit 326ae61 into Jermolene:master Oct 18, 2023
4 checks passed
@Jermolene
Copy link
Owner

Hi @rmunn I just came across this article which would have helped me in the first place:

https://web.dev/articles/base64-encoding

Do we have tests to check for the lone surrogate pair problem that they mention?

@Jermolene
Copy link
Owner

Hi @rmunn I am seeing the following errors when running the test suite in the browser which may be related to this PR.

Visit https://tiddlywiki.com/prerelease/test.html and then scroll down to see the errors:

[Utility tests](https://tiddlywiki.com/prerelease/test.html?spec=Utility%20tests) > [should handle base64 encoding binary data in URL-safe variant](https://tiddlywiki.com/prerelease/test.html?spec=Utility%20tests%20should%20handle%20base64%20encoding%20binary%20data%20in%20URL-safe%20variant)
TypeError: Can only call Window.btoa on instances of Window (line 851)
    at <Jasmine>
[Utility tests](https://tiddlywiki.com/prerelease/test.html?spec=Utility%20tests) > [should handle base64 encoding binary data](https://tiddlywiki.com/prerelease/test.html?spec=Utility%20tests%20should%20handle%20base64%20encoding%20binary%20data)
TypeError: Can only call Window.btoa on instances of Window (line 851)
    at <Jasmine>

@rmunn
Copy link
Contributor Author

rmunn commented Oct 26, 2023

@Jermolene - #7814 will fix the mistake I made. (I didn't realize that window.btoa needed the Window instance, and therefore exports.btoa = window.btoa doesn't work because this is null.) As for setting up Playwright to run the tests in a real browser during CI, I'm willing to try to work on that but I can't prioritize it very highly just now.

@Jermolene
Copy link
Owner

@Jermolene - #7814 will fix the mistake I made. (I didn't realize that window.btoa needed the Window instance, and therefore exports.btoa = window.btoa doesn't work because this is null.)

Thank you, much appreciated.

As for setting up Playwright to run the tests in a real browser during CI, I'm willing to try to work on that but I can't prioritize it very highly just now.

No problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected and undocumented behaviour for $tw.utils.base64Encode()
4 participants