Skip to content

Conversation

@effigies
Copy link
Contributor

@effigies effigies commented Dec 3, 2025

It can happen that the S3 metadata gets base64-encoded. We need to handle this.

@nellh You mentioned that this is to handle unicode filenames... Do you have any b64 data from OpenNeuro we can verify this for? Python handles unicode more straightforwardly than JS.

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 79.31034% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.03%. Comparing base (25c5cf1) to head (884c0d1).
⚠️ Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
src/files/repo.ts 79.31% 6 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #302   +/-   ##
=======================================
  Coverage   87.03%   87.03%           
=======================================
  Files          50       50           
  Lines        3724     3749   +25     
  Branches      613      615    +2     
=======================================
+ Hits         3241     3263   +22     
- Misses        474      477    +3     
  Partials        9        9           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@effigies
Copy link
Contributor Author

effigies commented Dec 3, 2025

Here's a patch for decoding arbitrary UTF8:

diff --git a/src/files/repo.ts b/src/files/repo.ts
index 90002db3..2a526335 100644
--- a/src/files/repo.ts
+++ b/src/files/repo.ts
@@ -50,7 +50,7 @@ export function parseRmetLine(line: string): Rmet | null {
   let versionStr = match!.groups!.version_str as string
   // Base64 encoded version strings are prefixed with '!'
   if (versionStr.startsWith('!')) {
-    versionStr = atob(versionStr.slice(1))
+    versionStr = b64toUtf8(versionStr.slice(1))
   }
   const versionMatch = versionStr.match(versionRegex)
   return {
@@ -61,6 +61,12 @@ export function parseRmetLine(line: string): Rmet | null {
   }
 }
 
+function b64toUtf8(str: string): string {
+  const decoded = atob(str)
+  const bytes = Uint8Array.from({ length: decoded.length }, (_, i) => decoded.charCodeAt(i))
+  return textDecoder.decode(bytes)
+}
+
 /**
  * Read remote metadata entries for a given annex key
  *

Not sure if it's necessary.

@nellh
Copy link
Member

nellh commented Dec 4, 2025

Here is one example from OpenNeuro:

ds004194/20f/d42/MD5E-s12638776--cd08699afe31c89ad03356e45170de36.txt.log.rmet

1714730995.499196097s 51d08fb4-c58a-4fd7-a171-e5ff8226ca2f:V +!aG1NM2VqYWRxdFgxR18uc1VRNmJzd0FMLnhUcFJFeG8jZHMwMDQxOTQvZGVyaXZhdGl2ZXMvQnJhbmRzZXRhbDIwMjRUZW1wb3JhbEFkYXB0YXRpb25FQ29HL2RhdGFfc3ViamVjdHMvc3ViLXAxMS9lcG9jaHNfYi9lcG9jaHNfYl9jaGFubmVsMzItQW1iZXLigJlzIE1hY0Jvb2sgUHJvLnR4dA==

The unicode decoded string should be:

hmM3ejadqtX1G_.sUQ6bswAL.xTpRExo#ds004194/derivatives/Brandsetal2024TemporalAdaptationECoG/data_subjects/sub-p11/epochs_b/epochs_b_channel32-Amber’s MacBook Pro.txt

@effigies
Copy link
Contributor Author

effigies commented Dec 4, 2025

Thanks. The UTF8 decoder was necessary. I've also verified that https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html always uses UTF-8, so I don't think we need to worry about alternative encodings showing up on, e.g., Windows systems with UTF16 paths.

@effigies effigies requested a review from nellh December 5, 2025 15:11
@effigies
Copy link
Contributor Author

effigies commented Dec 5, 2025

@nellh I think we need this and #297 for OpenNeuro. If you have a chance to review today, I can finish this off and cut a release.

@effigies effigies merged commit d4d7a87 into bids-standard:main Dec 5, 2025
28 of 29 checks passed
@effigies effigies deleted the b64 branch December 5, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants