InvalidDataException in Process.GetProcesses on Linux #12755

Closed
AndreyAkinshin opened this Issue Oct 18, 2016 · 17 comments

Comments

Projects
None yet
3 participants
@AndreyAkinshin
Member

AndreyAkinshin commented Oct 18, 2016

Consider the following program:

public static void Main(string[] args)
{
    try
    {
        Process.GetProcesses();
    }
    catch (Exception e)
    {
        Console.WriteLine(e);
    }
}

Sometimes on Linux (Ubuntu 16.04), I have the following exception:

$ dotnet run
ystem.IO.InvalidDataException: Found invalid data while decoding.
   at System.IO.StringParser.ParseNextChar()
   at Interop.procfs.TryParseStatFile(String statFilePath, ParsedStat& result, ReusableTextReader reusableReader)
   at System.Diagnostics.ProcessManager.CreateProcessInfo(ParsedStat procFsStat, ReusableTextReader reusableReader)
   at System.Diagnostics.ProcessManager.CreateProcessInfo(Int32 pid, ReusableTextReader reusableReader)
   at System.Diagnostics.ProcessManager.GetProcessInfos(String machineName)
   at System.Diagnostics.Process.GetProcesses(String machineName)
   at System.Diagnostics.Process.GetProcesses()
   at DotNetCoreConsoleApplication.Program.Main(String[] args) in /home/akinshin/RiderProjects/DotNetCoreConsoleApplication/DotNetCoreConsoleApplication/Program.cs:line 12

Repro: the bug can be easily reproduced, if you run Rider before this Program.

The origin of the troubles: I'm working on .NET Core tests support in Rider on Linux. When I try to execute

 /usr/bin/dotnet test --port 36513 --parentProcessId 3624 --no-build --framework net451

I have the following exception:

dotnet-test Error: 0 : System.IO.InvalidDataException: Found invalid data while decoding.
   at System.IO.StringParser.ParseNextChar()
   at Interop.procfs.TryParseStatFile(String statFilePath, ParsedStat& result, ReusableTextReader reusableReader)
   at System.Diagnostics.ProcessManager.CreateProcessInfo(ParsedStat procFsStat, ReusableTextReader reusableReader)
   at System.Diagnostics.ProcessManager.CreateProcessInfo(Int32 pid, ReusableTextReader reusableReader)
   at System.Diagnostics.ProcessManager.GetProcessInfos(String machineName)
   at System.Diagnostics.Process.GetProcesses(String machineName)
   at System.Diagnostics.Process.GetProcesses()
   at Microsoft.DotNet.Tools.Test.TestCommand.RegisterForParentProcessExit(Int32 id)
   at Microsoft.DotNet.Tools.Test.TestCommand.DoRun(String[] args)

Thus, tests could not be started because RegisterForParentProcessExit uses System.Diagnostics.Process.GetProcesses().

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 18, 2016

Member

@AndreyAkinshin, thanks for the report.

When you say Repro: the bug can be easily reproduced, if you run Rider before this Program., I assume you mean running dotnet test while Rider is running (rather than it reproing if you run Rider, close Rider, and then run dotnet test)? If so, can you run Rider, get its process ID, and share the output of cat /proc/12345/stat, where 12345 is whatever Rider's pid was?

Member

stephentoub commented Oct 18, 2016

@AndreyAkinshin, thanks for the report.

When you say Repro: the bug can be easily reproduced, if you run Rider before this Program., I assume you mean running dotnet test while Rider is running (rather than it reproing if you run Rider, close Rider, and then run dotnet test)? If so, can you run Rider, get its process ID, and share the output of cat /proc/12345/stat, where 12345 is whatever Rider's pid was?

@AndreyAkinshin

This comment has been minimized.

Show comment
Hide comment
@AndreyAkinshin

AndreyAkinshin Oct 18, 2016

Member

assume you mean running dotnet test while Rider is running

Yep, sorry for the confusing wording.

If so, can you run Rider, get its process ID, and share the output of cat /proc/12345/stat, where 12345 is whatever Rider's pid was?

Ok, will do it in a few hours.

Member

AndreyAkinshin commented Oct 18, 2016

assume you mean running dotnet test while Rider is running

Yep, sorry for the confusing wording.

If so, can you run Rider, get its process ID, and share the output of cat /proc/12345/stat, where 12345 is whatever Rider's pid was?

Ok, will do it in a few hours.

@karelz karelz added the X-Plat label Oct 18, 2016

@karelz

This comment has been minimized.

Show comment
Hide comment
Member

karelz commented Oct 18, 2016

cc: @Priya91

@AndreyAkinshin

This comment has been minimized.

Show comment
Hide comment
@AndreyAkinshin

AndreyAkinshin Oct 19, 2016

Member
$ /usr/bin/dotnet test --port 34594 --parentProcessId 22708 --no-build --framework net451
dotnet-test Error: 0 : System.IO.InvalidDataException: Found invalid data while decoding.
   at System.IO.StringParser.ParseNextChar()
   at Interop.procfs.TryParseStatFile(String statFilePath, ParsedStat& result, ReusableTextReader reusableReader)
   at System.Diagnostics.ProcessManager.CreateProcessInfo(ParsedStat procFsStat, ReusableTextReader reusableReader)
   at System.Diagnostics.ProcessManager.CreateProcessInfo(Int32 pid, ReusableTextReader reusableReader)
   at System.Diagnostics.ProcessManager.GetProcessInfos(String machineName)
   at System.Diagnostics.Process.GetProcesses(String machineName)
   at System.Diagnostics.Process.GetProcesses()
   at Microsoft.DotNet.Tools.Test.TestCommand.RegisterForParentProcessExit(Int32 id)
   at Microsoft.DotNet.Tools.Test.TestCommand.DoRun(String[] args)
$ cat /proc/22708/stat
22708 (Main) S 22671 22613 1811 34822 22613 1077936128 299586 56849 0 0 4631 2247 212 202 20 0 46 0 6052771 3812831232 216927 18446744073709551615 4194304 7811692 140732389738864 140732389730608 140218164806560 0 4 4096 545326316 0 0 0 17 3 0 0 0 0 0 9908848 9950248 10424320 140732389744066 140732389744248 140732389744248 140732389748649 0
Member

AndreyAkinshin commented Oct 19, 2016

$ /usr/bin/dotnet test --port 34594 --parentProcessId 22708 --no-build --framework net451
dotnet-test Error: 0 : System.IO.InvalidDataException: Found invalid data while decoding.
   at System.IO.StringParser.ParseNextChar()
   at Interop.procfs.TryParseStatFile(String statFilePath, ParsedStat& result, ReusableTextReader reusableReader)
   at System.Diagnostics.ProcessManager.CreateProcessInfo(ParsedStat procFsStat, ReusableTextReader reusableReader)
   at System.Diagnostics.ProcessManager.CreateProcessInfo(Int32 pid, ReusableTextReader reusableReader)
   at System.Diagnostics.ProcessManager.GetProcessInfos(String machineName)
   at System.Diagnostics.Process.GetProcesses(String machineName)
   at System.Diagnostics.Process.GetProcesses()
   at Microsoft.DotNet.Tools.Test.TestCommand.RegisterForParentProcessExit(Int32 id)
   at Microsoft.DotNet.Tools.Test.TestCommand.DoRun(String[] args)
$ cat /proc/22708/stat
22708 (Main) S 22671 22613 1811 34822 22613 1077936128 299586 56849 0 0 4631 2247 212 202 20 0 46 0 6052771 3812831232 216927 18446744073709551615 4194304 7811692 140732389738864 140732389730608 140218164806560 0 4 4096 545326316 0 0 0 17 3 0 0 0 0 0 9908848 9950248 10424320 140732389744066 140732389744248 140732389744248 140732389748649 0
@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 19, 2016

Member

Thanks, @AndreyAkinshin. That stat file looks fine and parses ok. Does Rider launch any other processes?

Member

stephentoub commented Oct 19, 2016

Thanks, @AndreyAkinshin. That stat file looks fine and parses ok. Does Rider launch any other processes?

@AndreyAkinshin

This comment has been minimized.

Show comment
Hide comment
@AndreyAkinshin

AndreyAkinshin Oct 19, 2016

Member

Yes, Rider consists of several processes. E.g., we also have a java process:

$ ps -aux | grep java
akinshin 22671  4.6  7.4 5673592 628904 pts/6  Sl+  02:42  13:52 ./../jre/jre/bin/java -agentlib:yjpagent-linux64=disablealloc,delay=10000,sessionname=Rider10 -Xbootclasspath/a:./../lib/boot.jar -classpath ./../lib/bootstrap.jar:./../lib/extensions.jar:./../lib/util.jar:./../lib/jdom.jar:./../lib/log4j.jar:./../lib/trove4j.jar:./../lib/jna.jar -Xms128m -Xmx750m -XX:ReservedCodeCacheSize=240m -XX:+UseConcMarkSweepGC -XX:SoftRefLRUPolicyMSPerMB=50 -ea -Dsun.io.useCanonCaches=false -Djava.net.preferIPv4Stack=true -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -XX:MaxJavaStackTraceDepth=-1 -Dawt.useSystemAAFontSettings=lcd -Djb.vmOptionsFile=./rider64.vmoptions -XX:ErrorFile=/home/akinshin/java_error_in_RS_%p.log -XX:HeapDumpPath=/home/akinshin/java_error_in_RS.hprof -Djb.restart.code=88 -Didea.paths.selector=Rider10 -Didea.platform.prefix=Rider -Didea.no.jre.check=true com.intellij.idea.Main
$ cat /proc/22671/stat
22671 (java) S 22613 22613 1811 34822 22613 1077936128 152695 5119 0 0 62954 20508 0 0 20 0 46 0 6052367 5809758208 157226 18446744073709551615 4194304 4196492 140722457690368 140722457672944 140421426043341 0 0 4096 16796879 0 0 0 17 1 0 0 0 0 0 6293648 6294284 26882048 140722457694087 140722457694940 140722457694940 140722457698274 0
Member

AndreyAkinshin commented Oct 19, 2016

Yes, Rider consists of several processes. E.g., we also have a java process:

$ ps -aux | grep java
akinshin 22671  4.6  7.4 5673592 628904 pts/6  Sl+  02:42  13:52 ./../jre/jre/bin/java -agentlib:yjpagent-linux64=disablealloc,delay=10000,sessionname=Rider10 -Xbootclasspath/a:./../lib/boot.jar -classpath ./../lib/bootstrap.jar:./../lib/extensions.jar:./../lib/util.jar:./../lib/jdom.jar:./../lib/log4j.jar:./../lib/trove4j.jar:./../lib/jna.jar -Xms128m -Xmx750m -XX:ReservedCodeCacheSize=240m -XX:+UseConcMarkSweepGC -XX:SoftRefLRUPolicyMSPerMB=50 -ea -Dsun.io.useCanonCaches=false -Djava.net.preferIPv4Stack=true -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -XX:MaxJavaStackTraceDepth=-1 -Dawt.useSystemAAFontSettings=lcd -Djb.vmOptionsFile=./rider64.vmoptions -XX:ErrorFile=/home/akinshin/java_error_in_RS_%p.log -XX:HeapDumpPath=/home/akinshin/java_error_in_RS.hprof -Djb.restart.code=88 -Didea.paths.selector=Rider10 -Didea.platform.prefix=Rider -Didea.no.jre.check=true com.intellij.idea.Main
$ cat /proc/22671/stat
22671 (java) S 22613 22613 1811 34822 22613 1077936128 152695 5119 0 0 62954 20508 0 0 20 0 46 0 6052367 5809758208 157226 18446744073709551615 4194304 4196492 140722457690368 140722457672944 140421426043341 0 0 4096 16796879 0 0 0 17 1 0 0 0 0 0 6293648 6294284 26882048 140722457694087 140722457694940 140722457694940 140722457698274 0
@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 19, 2016

Member

Hmm, that one parses ok as well.

Based on the stack, there's some file /proc/N/stat (for some number N) that has a format we're not expecting, and in particular something isn't parsing correctly related to the third field (the letter 'S' in both of your examples), e.g. the input is ending after the name, or we're somehow parsing the name incorrectly, or something like that. I've not seen this happen before, so there's something about Rider (assuming everything works fine when you haven't run it) and the processes it's creating that's causing the parser to fail.

I assume that while Rider is running, every time you run dotnet test it fails, not just the first time?

Tall order, I know, but would it be possible for you to send me all of the /proc/N/stat files while in the problematic state?

Member

stephentoub commented Oct 19, 2016

Hmm, that one parses ok as well.

Based on the stack, there's some file /proc/N/stat (for some number N) that has a format we're not expecting, and in particular something isn't parsing correctly related to the third field (the letter 'S' in both of your examples), e.g. the input is ending after the name, or we're somehow parsing the name incorrectly, or something like that. I've not seen this happen before, so there's something about Rider (assuming everything works fine when you haven't run it) and the processes it's creating that's causing the parser to fail.

I assume that while Rider is running, every time you run dotnet test it fails, not just the first time?

Tall order, I know, but would it be possible for you to send me all of the /proc/N/stat files while in the problematic state?

@AndreyAkinshin

This comment has been minimized.

Show comment
Hide comment
@AndreyAkinshin

AndreyAkinshin Oct 19, 2016

Member

I assume that while Rider is running, every time you run dotnet test it fails, not just the first time?

Yes

Tall order, I know, but would it be possible for you to send me all of the /proc/N/stat files while in the problematic state?

No problem. Which command I should run?

Member

AndreyAkinshin commented Oct 19, 2016

I assume that while Rider is running, every time you run dotnet test it fails, not just the first time?

Yes

Tall order, I know, but would it be possible for you to send me all of the /proc/N/stat files while in the problematic state?

No problem. Which command I should run?

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 19, 2016

Member

No problem. Which command I should run?

Could you try this?

zip statfiles.zip `sudo find /proc -wholename "/proc/[0-9]*/stat" -maxdepth 2`
Member

stephentoub commented Oct 19, 2016

No problem. Which command I should run?

Could you try this?

zip statfiles.zip `sudo find /proc -wholename "/proc/[0-9]*/stat" -maxdepth 2`
@AndreyAkinshin

This comment has been minimized.

Show comment
Hide comment
@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 19, 2016

Member

Thanks, @AndreyAkinshin. I don't know what's going on, but all of these files parse fine; I don't get the seen exception on any of them.

If I were to give you a commit for an update to System.Diagnostics.Process.dll (or even a build of the dll if that would help), would you be able to try it? I can add some instrumentation to help us understand where things are going wrong.

Member

stephentoub commented Oct 19, 2016

Thanks, @AndreyAkinshin. I don't know what's going on, but all of these files parse fine; I don't get the seen exception on any of them.

If I were to give you a commit for an update to System.Diagnostics.Process.dll (or even a build of the dll if that would help), would you be able to try it? I can add some instrumentation to help us understand where things are going wrong.

@AndreyAkinshin

This comment has been minimized.

Show comment
Hide comment
@AndreyAkinshin

AndreyAkinshin Oct 19, 2016

Member

Ok, no problem.
By the way, you could download Rider and try it on you own Linux.

Member

AndreyAkinshin commented Oct 19, 2016

Ok, no problem.
By the way, you could download Rider and try it on you own Linux.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 19, 2016

Member

By the way, you could download Rider and try it on you own Linux.

Oh, from the page you linked to, it sounded like builds weren't available yet.

Member

stephentoub commented Oct 19, 2016

By the way, you could download Rider and try it on you own Linux.

Oh, from the page you linked to, it sounded like builds weren't available yet.

@AndreyAkinshin

This comment has been minimized.

Show comment
Hide comment
@AndreyAkinshin

AndreyAkinshin Oct 19, 2016

Member

You could fill the "Subscribe to receive early builds" form (only Name and e-mail are required) and you will get the direct link for downloading. Or I could send the link for you personally (e.g., via e-mail).

Member

AndreyAkinshin commented Oct 19, 2016

You could fill the "Subscribe to receive early builds" form (only Name and e-mail are required) and you will get the direct link for downloading. Or I could send the link for you personally (e.g., via e-mail).

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 19, 2016

Member

Thanks, @AndreyAkinshin. I installed Rider and found the problem. The JetBrains.ReSharper.Host.exe process has a thread in it with a name that includes spaces: "JetPool (S) Reg". When we're parsing the processes' task list looking for its threads, we parse the stat file for each task, and in doing so we misinterpret the space in the name as a space separator for the other items in the line. The fix in System.Diagnostics.Process is likely to track the parens and ensure we treat the whole parenthesized unit as the name. In the meantime, as a workaround, if you have control over the thread/task's name (something somewhere is probably calling a function like pthread_setname_np, or using prctl with PR_SET_NAME), you could try using a name that doesn't include spaces.

Member

stephentoub commented Oct 19, 2016

Thanks, @AndreyAkinshin. I installed Rider and found the problem. The JetBrains.ReSharper.Host.exe process has a thread in it with a name that includes spaces: "JetPool (S) Reg". When we're parsing the processes' task list looking for its threads, we parse the stat file for each task, and in doing so we misinterpret the space in the name as a space separator for the other items in the line. The fix in System.Diagnostics.Process is likely to track the parens and ensure we treat the whole parenthesized unit as the name. In the meantime, as a workaround, if you have control over the thread/task's name (something somewhere is probably calling a function like pthread_setname_np, or using prctl with PR_SET_NAME), you could try using a name that doesn't include spaces.

@AndreyAkinshin

This comment has been minimized.

Show comment
Hide comment
@AndreyAkinshin

AndreyAkinshin Oct 19, 2016

Member

Ok, thanks. Could I expect a bugfix in the next version of coreclr?

Member

AndreyAkinshin commented Oct 19, 2016

Ok, thanks. Could I expect a bugfix in the next version of coreclr?

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Oct 19, 2016

Member

Could I expect a bugfix in the next version of coreclr?

I'm hoping the fix (#12791) will be in the 1.1 release.

In the meantime, I've also opened dotnet/cli#4452 to look at adding a workaround to dotnet test.

Thanks for your help in tracking this down. Apparently spaces in process/thread names on Unix are relatively rare 😄

Member

stephentoub commented Oct 19, 2016

Could I expect a bugfix in the next version of coreclr?

I'm hoping the fix (#12791) will be in the 1.1 release.

In the meantime, I've also opened dotnet/cli#4452 to look at adding a workaround to dotnet test.

Thanks for your help in tracking this down. Apparently spaces in process/thread names on Unix are relatively rare 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment