-
Notifications
You must be signed in to change notification settings - Fork 21
feat: Support base64-encoded S3 metadata #302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #302 +/- ##
=======================================
Coverage 87.03% 87.03%
=======================================
Files 50 50
Lines 3724 3749 +25
Branches 613 615 +2
=======================================
+ Hits 3241 3263 +22
- Misses 474 477 +3
Partials 9 9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Here's a patch for decoding arbitrary UTF8: diff --git a/src/files/repo.ts b/src/files/repo.ts
index 90002db3..2a526335 100644
--- a/src/files/repo.ts
+++ b/src/files/repo.ts
@@ -50,7 +50,7 @@ export function parseRmetLine(line: string): Rmet | null {
let versionStr = match!.groups!.version_str as string
// Base64 encoded version strings are prefixed with '!'
if (versionStr.startsWith('!')) {
- versionStr = atob(versionStr.slice(1))
+ versionStr = b64toUtf8(versionStr.slice(1))
}
const versionMatch = versionStr.match(versionRegex)
return {
@@ -61,6 +61,12 @@ export function parseRmetLine(line: string): Rmet | null {
}
}
+function b64toUtf8(str: string): string {
+ const decoded = atob(str)
+ const bytes = Uint8Array.from({ length: decoded.length }, (_, i) => decoded.charCodeAt(i))
+ return textDecoder.decode(bytes)
+}
+
/**
* Read remote metadata entries for a given annex key
*Not sure if it's necessary. |
|
Here is one example from OpenNeuro: ds004194/20f/d42/MD5E-s12638776--cd08699afe31c89ad03356e45170de36.txt.log.rmet The unicode decoded string should be: |
|
Thanks. The UTF8 decoder was necessary. I've also verified that https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html always uses UTF-8, so I don't think we need to worry about alternative encodings showing up on, e.g., Windows systems with UTF16 paths. |
It can happen that the S3 metadata gets base64-encoded. We need to handle this.
@nellh You mentioned that this is to handle unicode filenames... Do you have any b64 data from OpenNeuro we can verify this for? Python handles unicode more straightforwardly than JS.