Skip to content

WSL: Subprocess cmd.exe with /U to output UTF-16LE#6717

Merged
holmanb merged 4 commits intocanonical:mainfrom
cnihelton:fix-nonascii-userprofile
Mar 16, 2026
Merged

WSL: Subprocess cmd.exe with /U to output UTF-16LE#6717
holmanb merged 4 commits intocanonical:mainfrom
cnihelton:fix-nonascii-userprofile

Conversation

@cnihelton
Copy link
Copy Markdown

@cnihelton cnihelton commented Feb 5, 2026

Fixes: #6716

Including a small peasant fix in a comment about the WSL2 /init being proprietary (no longer the case since WSL2 was open sourced last year).

And using the echo. syntax instead of echo to prevent a very unlikely corner case of the environment variable set to white spaces:

C:\Users\João Martín😁>set unknown=  # There is a space here

C:\Users\João Martín😁>echo %unknown%
ECHO is ON

C:\Users\João Martín😁>echo.%unknown%


Proposed Commit Message

fix(WSL): Always subprocess cmd.exe in UTF-16 mode  # no more than 72 characters

As we manipulate paths acquired by subprocessing cmd.exe inside WSL,
by using it in UTF-16 mode we ensure a predictable output when the strings
are not ASCII-compatible, such as reading the user profile when it contains special characters.

Fixes GH-6716

Additional Context

Test Steps

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

Peasant comment fix: /init is now open source (as part of WSL2).

Fixes: canonical#6716
That function can now throw UnicodeDecodeError, which inherits from
ValueError, so we should catch ValueError as before.
@holmanb holmanb self-assigned this Feb 5, 2026
@holmanb
Copy link
Copy Markdown
Member

holmanb commented Feb 5, 2026

Thanks for this contribution @CarlosNihelton! A couple of requests:

  1. Could you please add some test coverage for ds-identify? (tests/unittests/test_ds_identify.py)
  2. Could you please run cloud-init collect-logs on a system that booted with these changes and attach the tarball?

Also, for my own understanding, I would like to know what environment variables are set by the calling processes for both ds-identify and cloud-init's Python code. Could I ask you to instrument each of these and sharing the results? In ds-identify something like debug 1 $(env) would work. In the Python code logging the content of os.environ would suffice.

@cnihelton
Copy link
Copy Markdown
Author

cnihelton commented Feb 5, 2026

Hi @holmanb!

Here are the logs from a system with username containing non-ascii characters and with the datasource and ds-identify patched: cloud-init.tar.gz

Regarding adding coverage to ds-identify I need to ask you a deeper question. AFAICT to make this changeset testable I'd need to break this assignment into two lines: _RET=$(/init "$exepath" /u /c "$@" 2>/dev/null | iconv -f UTF-16LE -t UTF-8), otherwise they are replaced by mocks and there is nothing to cover (as it is currently). But POSIX shells don't like NULL bytes in the middle of strings, and UTF-16 has plenty of them.

j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ var=$(cmd.exe /U /C echo.%USERPROFILE%)
-bash: warning: command substitution: ignored null byte in input
j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ profile=$( echo "$_RET" | iconv -f UTF-16LE -t UTF-8)
j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ echo $profile
㩃啜敳獲䩜慍瑲滭😁਍

So, I need something like base64 in the middle of this process to ensure I can turn the UTF16 bytes into something the shell stores in a variable and then recover it piping into iconv.

j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ _RET=$(cmd.exe /U /C echo.%USERPROFILE% | base64)
j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ echo $_RET
QwA6AFwAVQBzAGUAcgBzAFwASgBvAOMAbwAgAE0AYQByAHQA7QBuAD3YAd4NAAoA
j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ profile=$(echo $_RET | base64 -d | iconv -f UTF-16LE -t UTF-8)
j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ echo $profile
C:\Users\João Martín😁

With the base64 approach I'd remove the pipe to iconv from the WSL_run_cmd function and put it in the call site. Then we can further test the function WSL_profile_dir() by mocking WSL_run_cmd to ouput UTF-16 encoded data.
But that comes with the cost of adding two calls to base64.

diff --git a/tools/ds-identify b/tools/ds-identify
index c2a6d69ea..e4b8507ea 100755
--- a/tools/ds-identify
+++ b/tools/ds-identify
@@ -1710,7 +1710,7 @@ WSL_run_cmd() {
     shift
     # Using the '/u' flag to enforce Unicode (UTF-16 LE), thus we need to decode it afterwards.
     # It's more reliable than the default ANSI Code Pages for anything above the ASCII range.
-    _RET=$(/init "$exepath" /u /c "$@" 2>/dev/null | iconv -f UTF-16LE -t UTF-8)
+    _RET=$(/init "$exepath" /u /c "$@" 2>/dev/null | base64)
 }
 
 WSL_profile_dir() {
@@ -1725,6 +1725,7 @@ WSL_profile_dir() {
             # to output the Windows user profile directory path, which is
             # held by the environment variable %USERPROFILE%.
             WSL_run_cmd "$cmdexe" "echo.%USERPROFILE%"
-             profiledir="${_RET%%[[:cntrl:]]}"
+            profiledir=$(echo $_RET | base64 -d | iconv -f UTF-16LE -t UTF-8)
+             profiledir="${profiledir%%[[:cntrl:]]}"
             if [ -n "$profiledir" ]; then
                 # wslpath is a program supplied by WSL itself that translates Windows and Linux paths,

WDYT?

Copy link
Copy Markdown
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing some options @CarlosNihelton. Regarding unit testing, I'd say lets leave it as-is for now. Cloud-init's unit test code for ds-identify doesn't support the kind of testing I was hoping for, and you've raised some good points about the complexity - I'd rather not introduce even more complexity to the code for the sake of this test.

Comment thread tools/ds-identify Outdated
Comment thread tools/ds-identify
@cnihelton cnihelton requested a review from holmanb February 23, 2026 12:08
@github-actions
Copy link
Copy Markdown

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging blackboxsw, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag blackboxsw to reopen it.)

@github-actions github-actions Bot added the stale-pr Pull request is stale; will be auto-closed soon label Mar 10, 2026
@holmanb holmanb removed the stale-pr Pull request is stale; will be auto-closed soon label Mar 10, 2026
Copy link
Copy Markdown
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CarlosNihelton!

@holmanb holmanb merged commit 5c685d6 into canonical:main Mar 16, 2026
22 checks passed
blackboxsw pushed a commit to blackboxsw/cloud-init that referenced this pull request Apr 13, 2026
As we manipulate paths acquired by subprocessing cmd.exe inside WSL,
by using it in UTF-16 mode we ensure a predictable output when the strings
are not ASCII-compatible, such as reading the user profile when it contains special characters.

Fixes canonicalGH-6716
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WSL: datasource fails to find user-data if Windows username contains non-ASCII chars

3 participants