Fix encodebase64 and decodebase64 filters #7683

rmunn · 2023-08-18T07:27:27Z

The documentation for the encodebase64 filter operator says that the input is treated as binary data, but in fact the input is being treated as text data, with an extra UTF-8 encoding step being performed first.

Likewise, the decodebase64 documentation says that it outputs binary data, but in fact it will do a UTF-8 decoding step before producing output, which will in fact garble binary data.

This PR changes the behavior of encodebase64 and decodebase64 to match what the documentation says they do. It also adds an optional text suffix to both filters to keep the current behavior.

Finally, an optional urlsafe suffix is added to both filters to allow them to use the "URL-safe" variant of base64 (using - instead of + and _ instead of /).

The documentation for encodebase64 says that the input is treated as binary data, but in fact the input is being treated as text data, with an extra UTF-8 encoding step being performed first. Likewise, the decodebase64 documentation says that it outputs binary data, but in fact it will do a UTF-8 decoding step before producing output, which will in fact garble binary data. This commit changes the behavior of encodebase64 and decodebase64 to match what the documentation says they do. It also adds an optional `text` suffix to both filters to keep the current behavior. Finally, an optional `urlsafe` suffix is added to both filters to allow them to use the "URL-safe" variant of base64 (using `-` instead of `+` and `_` instead of `/`).

vercel · 2023-08-18T07:27:30Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
tiddlywiki5	✅ Ready (Inspect)	Visit Preview	Oct 18, 2023 3:01pm

Turns out a little more than this is going to be needed.

saqimtiaz · 2023-08-18T08:02:01Z

This PR changes the behavior of encodebase64 and decodebase64 to match what the documentation says they do. It also adds an optional text suffix to both filters to keep the current behavior.

Hi @rmunn, I feel that we should preserve the existing behaviour to minimize the impact on end users and update the documentation accordingly.

rmunn · 2023-08-18T08:55:38Z

This PR changes the behavior of encodebase64 and decodebase64 to match what the documentation says they do. It also adds an optional text suffix to both filters to keep the current behavior.

Hi @rmunn, I feel that we should preserve the existing behaviour to minimize the impact on end users and update the documentation accordingly.

Except that the existing behavior (expecting text input) is rarely useful AFAICT. Most uses of encodebase64, I suspect, will be taking binary data. So I thought binary data should be the default behavior.

saqimtiaz · 2023-08-18T08:59:56Z

@rmunn we have a strict backwards compatibility policy to not break the behaviour of wikitext code that might exist in the wild. Is this change sufficiently important to break backward compatibility @Jermolene ?

rmunn · 2023-08-18T13:13:32Z

Does the backwards compatibility mandate extend to bugs? Because when the documentation says the wiki does one thing, but the actual behavior is something else, I think that's a bug and the behavior should be changed to match what the documentation says.

rmunn · 2023-08-18T13:14:14Z

P.S. Here's what I wrote in the related bug report at #7626 (comment):

Since the encodebase64 filter was just added in release 5.2.6 in March 2023, five months ago, I doubt that many people have started using it yet, especially because it's buggy at the moment. So I don't think there are a lot of wikis out there relying on the current behavior, and I think it's safe to change the behavior.

Jermolene · 2023-08-30T11:28:09Z

Thanks @rmunn @saqimtiaz on balance I would prefer to maintain backwards compatibility here, and update the docs.

Generally, I don't think the quality of our docs is high enough for us to regard the docs as the single source of truth; for us, it is still the code.

rmunn · 2023-08-30T14:36:34Z

Okay. When I get back around to this PR (I'll probably finish the if/then/else and list join PRs first), I'll switch the default to text mode and add a "binary" suffix for turning off the UTF-8 encoding step. That will preserve the existing behavior by default. Then I'll update the documentation accordingly.

Have to use String.replace with a global regex instead

rmunn · 2023-10-18T05:47:38Z

@Jermolene - Got back around to fixing the encodebase64 and decodebase64 filters. Please let me know what you think. I struggled a little with writing the documentation (should I assume people know what UTF-8 is?), so if you have suggestions to improve it they would be welcome. But the code works, and I've included unit tests to ensure it continues to work.

Once this is merged, a couple places in the code that use window.btoa (the GitHub saver and the tm-http-request handler code) could be replaced with $tw.utils.btoa or $tw.utils.base64Encode(str,true) now, so that they will work in Node.js environments where window.btoa is not available.

Jermolene · 2023-10-18T08:31:27Z

@Jermolene - Got back around to fixing the encodebase64 and decodebase64 filters. Please let me know what you think. I struggled a little with writing the documentation (should I assume people know what UTF-8 is?), so if you have suggestions to improve it they would be welcome. But the code works, and I've included unit tests to ensure it continues to work.

Thank you @rmunn, looks good. The docs definitely don't need to explain base64 and utf8 but it might be useful to link to good, durable sources for further information.

Once this is merged, a couple places in the code that use window.btoa (the GitHub saver and the tm-http-request handler code) could be replaced with $tw.utils.btoa or $tw.utils.base64Encode(str,true) now, so that they will work in Node.js environments where window.btoa is not available.

Excellent, I think it's probably reasonable include those changes in this PR?

pmario · 2023-10-18T11:10:56Z

source for utf-8: https://developer.mozilla.org/en-US/docs/Glossary/UTF-8
source for base64: https://developer.mozilla.org/en-US/docs/Glossary/Base64

Since window.btoa() is not available under Node.js, we'll replace all uses of it with the $tw.utils.base64encode() function that now works correctly for binary data.

rmunn · 2023-10-18T14:48:11Z

Excellent, I think it's probably reasonable include those changes in this PR?

Done. The one in the GitHub saver now encodes in text mode (I double checked and that's the correct way to encode an Authorization: Basic header; the previous version would have failed if someone's password had contained an emoji like 📚), and the one in the http.js file now encodes in binary mode.

rmunn · 2023-10-18T14:58:57Z

Thank you @rmunn, looks good. The docs definitely don't need to explain base64 and utf8 but it might be useful to link to good, durable sources for further information.

Done. There was already a link to the Base64 definition in MDN's glossary so I just added a link to the UTF-8 definition.

Jermolene · 2023-10-18T15:08:50Z

Great thank you @rmunn @pmario

Jermolene · 2023-10-20T12:35:20Z

Hi @rmunn I just came across this article which would have helped me in the first place:

https://web.dev/articles/base64-encoding

Do we have tests to check for the lone surrogate pair problem that they mention?

Jermolene · 2023-10-25T08:24:20Z

Hi @rmunn I am seeing the following errors when running the test suite in the browser which may be related to this PR.

Visit https://tiddlywiki.com/prerelease/test.html and then scroll down to see the errors:

[Utility tests](https://tiddlywiki.com/prerelease/test.html?spec=Utility%20tests) > [should handle base64 encoding binary data in URL-safe variant](https://tiddlywiki.com/prerelease/test.html?spec=Utility%20tests%20should%20handle%20base64%20encoding%20binary%20data%20in%20URL-safe%20variant)
TypeError: Can only call Window.btoa on instances of Window (line 851)
    at <Jasmine>
[Utility tests](https://tiddlywiki.com/prerelease/test.html?spec=Utility%20tests) > [should handle base64 encoding binary data](https://tiddlywiki.com/prerelease/test.html?spec=Utility%20tests%20should%20handle%20base64%20encoding%20binary%20data)
TypeError: Can only call Window.btoa on instances of Window (line 851)
    at <Jasmine>

rmunn · 2023-10-26T04:10:02Z

@Jermolene - #7814 will fix the mistake I made. (I didn't realize that window.btoa needed the Window instance, and therefore exports.btoa = window.btoa doesn't work because this is null.) As for setting up Playwright to run the tests in a real browser during CI, I'm willing to try to work on that but I can't prioritize it very highly just now.

Jermolene · 2023-10-26T08:06:58Z

@Jermolene - #7814 will fix the mistake I made. (I didn't realize that window.btoa needed the Window instance, and therefore exports.btoa = window.btoa doesn't work because this is null.)

Thank you, much appreciated.

As for setting up Playwright to run the tests in a real browser during CI, I'm willing to try to work on that but I can't prioritize it very highly just now.

No problem.

vercel bot had a problem deploying to Preview August 18, 2023 07:27 Failure

rmunn force-pushed the fix-base64-binary branch from 2c45696 to f338eae Compare August 18, 2023 07:58

Try to fix failing test

dadf6a4

Turns out a little more than this is going to be needed.

rmunn force-pushed the fix-base64-binary branch from f338eae to dadf6a4 Compare August 18, 2023 07:59

vercel bot had a problem deploying to Preview August 18, 2023 07:59 Failure

rmunn mentioned this pull request Sep 13, 2023

@rmunn's open pull requests #7740

Open

rmunn added 2 commits October 18, 2023 12:15

Fix binary base64 encoding, including unit tests

4e8142c

Update base64 filter documentation

ffbe54e

vercel bot deployed to Preview October 18, 2023 05:39 View deployment

Can't use replaceAll, too new

e125e75

Have to use String.replace with a global regex instead

vercel bot deployed to Preview October 18, 2023 05:44 View deployment

Replace uses of window.btoa() in rest of code

8deccd6

Since window.btoa() is not available under Node.js, we'll replace all uses of it with the $tw.utils.base64encode() function that now works correctly for binary data.

vercel bot deployed to Preview October 18, 2023 14:49 View deployment

Add link to UTF-8 glossary definition at MDN

0a7e624

vercel bot deployed to Preview October 18, 2023 15:01 View deployment

Jermolene merged commit 326ae61 into Jermolene:master Oct 18, 2023
4 checks passed

Jermolene mentioned this pull request Oct 25, 2023

Browser-based tests should be run under continuous integration #7812

Closed

rmunn mentioned this pull request Oct 26, 2023

Fix window.btoa call in browser #7814

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix encodebase64 and decodebase64 filters #7683

Fix encodebase64 and decodebase64 filters #7683

rmunn commented Aug 18, 2023

vercel bot commented Aug 18, 2023 •

edited

saqimtiaz commented Aug 18, 2023

rmunn commented Aug 18, 2023

saqimtiaz commented Aug 18, 2023

rmunn commented Aug 18, 2023

rmunn commented Aug 18, 2023

Jermolene commented Aug 30, 2023

rmunn commented Aug 30, 2023

rmunn commented Oct 18, 2023

Jermolene commented Oct 18, 2023

pmario commented Oct 18, 2023

rmunn commented Oct 18, 2023

rmunn commented Oct 18, 2023

Jermolene commented Oct 18, 2023

Jermolene commented Oct 20, 2023

Jermolene commented Oct 25, 2023

rmunn commented Oct 26, 2023

Jermolene commented Oct 26, 2023

Fix encodebase64 and decodebase64 filters #7683

Fix encodebase64 and decodebase64 filters #7683

Conversation

rmunn commented Aug 18, 2023

vercel bot commented Aug 18, 2023 • edited

saqimtiaz commented Aug 18, 2023

rmunn commented Aug 18, 2023

saqimtiaz commented Aug 18, 2023

rmunn commented Aug 18, 2023

rmunn commented Aug 18, 2023

Jermolene commented Aug 30, 2023

rmunn commented Aug 30, 2023

rmunn commented Oct 18, 2023

Jermolene commented Oct 18, 2023

pmario commented Oct 18, 2023

rmunn commented Oct 18, 2023

rmunn commented Oct 18, 2023

Jermolene commented Oct 18, 2023

Jermolene commented Oct 20, 2023

Jermolene commented Oct 25, 2023

rmunn commented Oct 26, 2023

Jermolene commented Oct 26, 2023

vercel bot commented Aug 18, 2023 •

edited