Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users that have certain special characters in their username to build successfully when using exec #6223

Merged
merged 22 commits into from
Apr 8, 2021

Conversation

benvillalobos
Copy link
Member

@benvillalobos benvillalobos commented Mar 4, 2021

Fixes https://dev.azure.com/devdiv/DevDiv/_workitems/edit/1264667

Context

Users with profile names that contain parens in them will never be able to succesfully build a program that calls Exec with any command. This is because exec does the following:

  1. Generate a cmd file in the users temp directory
  2. Calls cmd /Q /D /C <path-to-temp-file>

The problem with this is that running cmd /C "some command" does not work if the command has any parens in it. It needs to be escaped like so: ^( and ^).

Changes Made

  • When user sets EscapeSpecialCharacters in the exec task (boolean parameter), we escape characters that need to be escaped when calling cmd /c. We preserve the original functionality of always removing spaces and escaping '^'
  • Added under changewave 16.10
  • Added documentation for it.

Testing

Added tests under exec_tests.cs
Tested this on my personal laptop with a ben(dev) account. Here's the output of those builds:

Test Project

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net5.0</TargetFramework>
    <RootNamespace>_1264667_special_char_in_username</RootNamespace>
  </PropertyGroup>
  
  <Target Name="PreBuild" BeforeTargets="PreBuildEvent">
    <Exec Command="ECHO Hello" />
  </Target>

</Project>

Before

c:\src\repros\1264667-special-char>"C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Current\Bin\MSBuild.exe"
Microsoft (R) Build Engine version 16.9.0+57a23d249 for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.

Build started 3/3/2021 5:27:56 PM.
Project "c:\src\repros\1264667-special-char\1264667-special-char.csproj" on node 1 (default targets).
PreBuild:
  ECHO Hello
  'C:\Users\ben' is not recognized as an internal or external command,
  operable program or batch file.
c:\src\repros\1264667-special-char\1264667-special-char.csproj(10,5): error MSB3073: The command "ECHO Hello" exited wi
th code 1.
Done Building Project "c:\src\repros\1264667-special-char\1264667-special-char.csproj" (default targets) -- FAILED.


Build FAILED.

"c:\src\repros\1264667-special-char\1264667-special-char.csproj" (default target) (1) ->
(PreBuild target) ->
  c:\src\repros\1264667-special-char\1264667-special-char.csproj(10,5): error MSB3073: The command "ECHO Hello" exited
with code 1.

    0 Warning(s)
    1 Error(s)

Time Elapsed 00:00:01.13

c:\src\repros\1264667-special-char>

After

c:\src\repros\1264667-special-char>"C:\src\git\msbuild\artifacts\bin\bootstrap\net472\MSBuild\Current\Bin\MSBuild.exe"
Microsoft (R) Build Engine version 16.10.0-dev-21153-01+9084023d9 for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.

Build started 3/3/2021 5:30:00 PM.
Project "c:\src\repros\1264667-special-char\1264667-special-char.csproj" on node 1 (default targets).
PreBuild:
  ECHO Hello
  Hello
GenerateTargetFrameworkMonikerAttribute:
Skipping target "GenerateTargetFrameworkMonikerAttribute" because all output files are up-to-date with respect to the i
nput files.
CoreGenerateAssemblyInfo:
Skipping target "CoreGenerateAssemblyInfo" because all output files are up-to-date with respect to the input files.
CoreCompile:
Skipping target "CoreCompile" because all output files are up-to-date with respect to the input files.
_CreateAppHost:
Skipping target "_CreateAppHost" because all output files are up-to-date with respect to the input files.
_CopyOutOfDateSourceItemsToOutputDirectory:
Skipping target "_CopyOutOfDateSourceItemsToOutputDirectory" because all output files are up-to-date with respect to th
e input files.
GenerateBuildDependencyFile:
Skipping target "GenerateBuildDependencyFile" because all output files are up-to-date with respect to the input files.
GenerateBuildRuntimeConfigurationFiles:
Skipping target "GenerateBuildRuntimeConfigurationFiles" because all output files are up-to-date with respect to the in
put files.
CopyFilesToOutputDirectory:
  1264667-special-char -> c:\src\repros\1264667-special-char\bin\Debug\net5.0\1264667-special-char.dll
Done Building Project "c:\src\repros\1264667-special-char\1264667-special-char.csproj" (default targets).


Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:01.45

c:\src\repros\1264667-special-char>

Notes

Considering working more on exec, that task (and tooltask for that matter) could use a lot of sprucing up.

@benvillalobos benvillalobos changed the title Initial fix. Escape parentheses that exist in the path of the generated cmd file Allow users that have parentheses in their username to build successfully when using exec Mar 4, 2021

// cmd needs parens to be escaped when executing files with the /C flag.
// consider the case where the user has a parenthesis in ther username (which is uncommon, but valid)
if ((batchFileForCommandLine.Contains("(") && !batchFileForCommandLine.Contains("^(")) || (batchFileForCommandLine.Contains(")") && !batchFileForCommandLine.Contains("^)")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't you just escaping all instances of (/) without preceding ^? Seems like you'll miss some this way in, for instance, ben(dev^(atMS^))

{
string temporaryDirectory = Path.Combine(Path.GetTempPath(), "Temporary" + Guid.NewGuid().ToString("N"));
string temporaryDirectory = Path.Combine(Path.GetTempPath(), "Temporary" + Guid.NewGuid().ToString("N"), subfolder ?? string.Empty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here, this was always bugging me. Should we have a root folder "MSBuild" inside TEMP, and stick all our stuff inside that? Otherwise you can't attribute to which process a given file belongs to. Just being good citizens of TEMP. Maybe replace Temporary with MSBuild (and have a subfolder).

See #6219

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might that trip some customers into a path-too-long situation?

Assert.StartsWith(newTempPath, tempPath);

// Get a count of how many temp files there are right now.
string[] tempFiles = Directory.GetFiles(tempPath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is flaky, if the number of temp files in the directory changes during the run we'll fail and it'll be hard to investigate.

In general, I think we should have a single folder under TEMP, "MSBuild", and have everything under that folder. Maybe tests should use "MSBuildTest" to keep it separate. We should never enumerate the root of TEMP as it might be huge and some files might be locked, files are created and deleted and renamed there all the time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is why you're doing this in the first place. We care about Exec succeeding and producing the correct output, but I wouldn't bother with verifying we don't have garbage lying around—that has nothing to do with your change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was leftover logic I forgot to get rid of.

@@ -22,7 +22,7 @@ internal static partial class FileUtilities
/// <param name="subfolder"></param>
internal static string GetTemporaryDirectory(bool createDirectory = true, string subfolder = null)
{
string temporaryDirectory = Path.Combine(Path.GetTempPath(), "Temporary" + Guid.NewGuid().ToString("N"), subfolder ?? string.Empty);
string temporaryDirectory = Path.Combine(Path.GetTempPath(), "MSBuild" + Guid.NewGuid().ToString("N"), subfolder ?? string.Empty);
Copy link
Member

@KirillOsenkov KirillOsenkov Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do "MSBuild", Guid... so that MSBuild is its own directory, and Guids are subdirectories within it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also modify any similar methods (if they exist)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently changing the name to MSBuild here works, but there are places where this method could be called instead of Path.GetTempDirectory (even within FileUtilities). Unfortunately this means changing a bunch of tests and I'd rather not piggy-back that onto this PR.

I say we take this change and I can make a new PR that refactors TempFileUtilities.cs.

If not, I can revert the change to using the MSBuild folder and we can merge this PR as is.

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized what you were asking for wasn't many MSBuild<guid> folders, but temp/MSBuild/<all the guids folders>. Given that this commit didn't actually help much, I'm going to revert it and actually fix it in a different PR that's based off of this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds great

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracking this here: #6239

// Now run the Exec task on a simple command.
Exec exec = PrepareExec("echo Hello World!");
exec.Execute().ShouldBeTrue();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no newline

Comment on lines 624 to 626
if((fileName[i] == '(' || fileName[i] == ')' || fileName[i] == '&') && fileName[i-1] != '^')
{
fileName.Insert(i++, '^');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not append each character individually so you don't have to insert into the middle?

Also, why not start at 0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also cache fileName[i] in a local char variable to save on access
add spaces around [i - 1] or format this area using Format Selection or Format Document

Copy link
Member Author

@benvillalobos benvillalobos Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not starting at 0 here because we're on windows and batchFileForComandLine points to a .cmd file in the temp folder. It really should start at 3 to skip over c:/.

src/Tasks/Exec.cs Outdated Show resolved Hide resolved
Co-authored-by: Forgind <Forgind@users.noreply.github.com>
@benvillalobos
Copy link
Member Author

benvillalobos commented Mar 10, 2021

Note to self: There are other areas to change the folder to MSBuild Won't do in this PR.

@KirillOsenkov
Copy link
Member

yes let’s do one thing at a time, fix TEMP separately

@benvillalobos benvillalobos added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label Mar 12, 2021
@@ -48,6 +48,27 @@ private ExecWrapper PrepareExecWrapper(string command)
return exec;
}

[Fact]
[Trait("Category", "mono-osx-failing")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this failing on other OSes? Does it need fixing there too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but since we don't currently have a customer asking for it, BenVillalobos wanted to hold off on fixing it until we have feedback that it's needed.

Copy link
Member

@ladipro ladipro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you 100% sure that this change can't break anyone? Put it behind a changewave just in case?

{
char c = batchFileForCommandLine[i];

if ((c == '(' || c == ')' || c == '&') && (i == 0 || batchFileForCommandLine[i - 1] != '^'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ just before the character does not mean that the character is escaped. The escaped character may be ^ itself.

Example: ^^( the open paren won't be escaped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. @benvillalobos, can you add something to verify that the previous character was not ^? And I think putting it behind a change wave is reasonable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out there's some behavior I didn't expect with cmd /C. Check this out:

Notice that having a carat escapes an open and closing parens?

cmd /C "c:\hello^()\"
'c:\hello()\' is not recognized...

Not two open parens though.

cmd /C "c:\hello^(()\"
'c:\hello(' is not recognized...

What about some closed parens?

C:\Users\bevillal>cmd /C "c:\hello^)))))\"
'c:\hello)))))\' is not recognized <--- What??

What if we put stuff inside the parens?

cmd /C "c:\hello^(ab^c)\"
'c:\hello(abc)\' is not recognized

Okay so we keep parsing even after seeing a '^' between escaped parens. But not if those parens are escaped.

C:\Users\bevillal>cmd /C "c:\hello(ab^c)\"
'c:\hello' is not recognized

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plot thickens!

Some more test cases I played with during standup.

@marcpopMSFT suggested testing with Program Files(x86), which was a great idea. I created c:/temp/test folder(x86) and... cmd /C "C:\temp\test folder(x86)\a.txt" works.

Here are some other variations:

No space in folder: cmd /C "C:\temp\testfolder(x86)\a.txt": 'C:\temp\testfolder' is not recognized
Space and x86: cmd /C "C:\temp\test folder(x86)\a.txt": Works
Space and no x86: cmd /C "C:\temp\test folder(123)\a.txt": Works

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like everything is escaped if there's a space—we're detecting that separately? If so, and you can find where that's happening, that might be a much easier fix: detect characters other than spaces in addition to spaces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like everything is escaped if there's a space

If only 😕
cmd /C "C:\temp - Copy\testfolder(x86)\a^.txt" fails
cmd /C "C:\temp - Copy\testfolder(x86)\a(.txt" succeeds
cmd /C "C:\temp - Copy\testfolder(x86)\a(()()$.txt" succeeds.
It definitely breaks on unescaped & and ^

Other fun fact:
Copy pasting a file called a(()()$.txt resulted in a(()$ - Copy.txt being generated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes from PR review, consider the following:

  • Consider just executing the file.
  • Have an optional metadata that either:
    • Is some string containing for specific characters to escape
    • Is some boolean flag that will escape all special characters except the escape character itself (or some variation of that to give a better user experience)
  • Is there some existing library to PInvoke into?
  • Consider what the /S flag does

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of /s is apparently to handle commands that have quotes within them. Something like cmd /s /c ""command1.exe" > "some/path with/spaces"" (notice the quotes around the entire expression). It looks like if we passed /s and surrounded the entire command in an extra set of quotes, it would properly execute the batch file if it had parens, but not improperly escaped paths. So that's a no-go.

I'll implement this by allowing a custom metadata string for "characters to escape". This is the most general solution that allows customers to fix issues with characters in their paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've implemented the custom metadata escaping, but accounting for items that are "already escaped" is something I'm willing to make an issue for and tackle that when the demand arises. There are too many niche scenarios to handle (see above) for a scenario that is much less likely to occur for customers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still in favor of making Exec just execute the command instead of writing and reading from a batch file, but at this point, it's probably better to wait for rainersigwald to tell us why that's a no-go. Will review this version instead.

@Forgind Forgind removed the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label Mar 15, 2021
Base automatically changed from master to main March 15, 2021 20:09
Copy link
Member

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

{
char c = batchFileForCommandLine[i];

if (ShouldEscapeCharacter(c) && (i == 0 || batchFileForCommandLine[i - 1] != '^'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said you wanted to put off fixing edge cases like not escaping a character preceded by an escaped ^, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. at the moment we've only gotten a request for the parentheses case (user's username was something(dev) which seems reasonable). So this fix will account for most other special characters, and I'll make an issue tracking that we don't account for '^''s that are already escaped and the other special cases that arise with it. If it gains traction we can worry about it then. For now this is a "better than it was before" PR.

private bool ShouldEscapeCharacter(char c)
{
// Escape '&' to preserve previous functionality
return c == '&' || (_charactersToEscape != null && _charactersToEscape.Contains(c));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about performance here...this function should be fast, but the batch file could be huge, in which case this would be called an immense number of times. It would be nice to factor out things like the null check to make this particular function call even faster.

If you want it even faster, you could make a bit array representing the characters to escape (i.e., bit x is 1 iff (int)c is in _charactersToEscape) and check that instead of this HashSet. On the other hand, I wouldn't be surprised if there's an optimization to HashSet to do just that, so it would be worth checking whether it makes a difference first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. In theory we could swap the implementation from: "Supply a set of the characters you want escaped" to "Escape all special characters (but if some are already escaped, or you fall into a specific edge case, you're out of luck)".

If we have an array of the characters we know need to be escaped and the user sets "TheMagicFlag", we could iterate through the array and check if ((c | specialChars[i]) == c).

Thoughts? /cc: @ladipro

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some rough testing with stopwatches comparing hashset.contains with a for loop through an array of special characters that checks if ((c | specialChars[i])==c). The loop/array beats out the hashset.contains by several orders of magnitude (in terms of ticks).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am on a weekend realizing "wait, why not just do a simple comparison?" I was incepted by the bit comment above 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, (c | specialChars[i] == c) would actually be wrong if, for example, specialChars[i] were 0, it would always return true.

@benvillalobos benvillalobos changed the title Allow users that have parentheses in their username to build successfully when using exec Allow users that have certain special characters in their username to build successfully when using exec Apr 5, 2021
@benvillalobos benvillalobos added this to In Progress in 16.10 via automation Apr 5, 2021
@benvillalobos
Copy link
Member Author

benvillalobos commented Apr 5, 2021

Looking for one more signoff (of latest changes). /cc: @ladipro @KirillOsenkov @cdmihai

Previous implementation:

  1. Users set a CharactersToEscape to something like "()&", those characters are escaped.
  2. Implementation is the characters get placed into a hashset that we check contains on

Current implementation:

  1. User sets a EscapeSpecialCharacters boolean on the exec task that escapes specific characters that need to be escaped. The characters that need to be escaped were tested individually and taken from help cmd.
  2. Implementation is an array of characters which is much faster than the hashset implementation.

Nathan asked a perfectly valid question, whether or not we always want to escape these characters. Which...we totally should but this code has been this way for 5+ years. A case can be made for putting this under a changewave and always escaping these characters.

@benvillalobos
Copy link
Member Author

benvillalobos commented Apr 5, 2021

After chatting with marc and nathan a bit. I'm putting this under a changewave rather than an opt-in because there is no case where users having special characters in their username that will result in a successful run of exec. It's under a changewave because we are "changing" how these temp files have been generated for 5+ years.

Default case will be: Escape all special characters (listed under help cmd) that need escaping, except ^, the escape character.

Copy link
Member

@ladipro ladipro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a couple of comments, looks good!

Comment on lines +651 to +659
for (int i = 0; i < _charactersToEscape.Length; i++)
{
if (c == _charactersToEscape[i])
{
return true;
}
}

return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Wondering how IndexOf would compare to the manual loop in terms of perf for such a small array.

Suggested change
for (int i = 0; i < _charactersToEscape.Length; i++)
{
if (c == _charactersToEscape[i])
{
return true;
}
}
return false;
return Array.IndexOf(_charactersToEscape, c) != -1;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some rough testing here. On small cases (~200 iterations) the array performed marginally faster (about ~500 ticks). Array.IndexOf performs better under more iterations. But this was only when run ~10k+ times, which paths aren't likely to get to.

The more realistic case seems to be 150~250 characters, depending on temp and how long someone's username is. Will keep it as a for loop through the array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the "average" case looks something like this C:\\Users\\username\\AppData\\Local\\Temp\\sh4yq5zn.hpc\\Temporary4476f4a8231944f0959781eebe164cdb\\hello()w]o(rld)\\tmpb547f707b5e34e4e9f9b5d49ac496be4.exec.cmd

@@ -54,6 +54,9 @@ public Exec()
private Encoding _standardOutputEncoding;
private string _command;

// '^' before _any_ character escapes that character, don't escape it.
private char[] _charactersToEscape = { '(', ')', '=', ';', '!', ',', '&', ' '};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private static readonly. No need to create an array per Exec instance as it is immutable.


if (ShouldEscapeCharacter(c) && (i == 0 || batchFileForCommandLine[i - 1] != '^'))
{
fileName.Append('^');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If you wanted to squeeze a bit more perf out of this, you could acquire the StringBuilder first time you hit this line. Then in the common case of no escaping, there would be no StringBuilder and you wouldn't allocate a new string (allocation is happening inside StringBuilderCache.GetStringAndRelease).

@Forgind Forgind added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label Apr 7, 2021
@Forgind Forgind merged commit bab9b71 into dotnet:main Apr 8, 2021
16.10 automation moved this from In Progress to Done Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changewave16.10 merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now.
Projects
16.10
Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants