Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process.Unix: handle ETXTBSY error when executing file that was just written by .NET. #59079

Closed
wants to merge 6 commits into from

Conversation

tmds
Copy link
Member

@tmds tmds commented Sep 14, 2021

ETXTBSY means we're trying to execute a file that is open for write.
This may be a file that was just written and closed, but is still referenced by another process that has forked but not yet exec-ed.

When this error occurs, we wait for all processes to finish exec-ing, which causes the file to be closed (CLOEXEC). Then we start the process again.

Fixes #58964.

cc @agocke @jkotas @adamsitnik

…written by .NET.

ETXTBSY means we're trying to execute a file that is open for write.
This may be a file that was just written and closed, but is still referenced by another process
that has forked but not yet exec-ed.

When this error occurs, we wait for all processes to finish exec-ing, which causes the file
to be closed (CLOEXEC). Then we start the process again.
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Sep 14, 2021
@ghost
Copy link

ghost commented Sep 14, 2021

Tagging subscribers to this area: @dotnet/area-system-diagnostics-process
See info in area-owners.md if you want to be subscribed.

Issue Details

ETXTBSY means we're trying to execute a file that is open for write.
This may be a file that was just written and closed, but is still referenced by another process that has forked but not yet exec-ed.

When this error occurs, we wait for all processes to finish exec-ing, which causes the file to be closed (CLOEXEC). Then we start the process again.

Fixes #58964.

cc @agocke @jkotas @adamsitnik

Author: tmds
Assignees: -
Labels:

area-System.Diagnostics.Process

Milestone: -

@tmds
Copy link
Member Author

tmds commented Sep 15, 2021

This helps but it is not a 100% solution.

We wait for all processes that have forked to exec-d.

We consider it exec-ed when we've observed the close-on-exec of a pipe end that was handed to the child:

// Also close the write end of the exec waiting pipe, and wait for the pipe to be closed
// by trying to read from it (the read will wake up when the pipe is closed and broken).
// Ignore any errors... this is a best-effort attempt.
CloseIfOpen(waitForChildToExecPipe[WRITE_END_OF_PIPE]);
if (waitForChildToExecPipe[READ_END_OF_PIPE] != -1)

So we know for sure the pipe end was closed, but this happens as part of a loop in the kernel and the handle we care about may come later in this loop...

If in addition to the changes in this PR we do something that causes the handle to be closed between fork and exec it will work. There have been some requests for control of handle inheritance when starting processes (like #13943).
And Linux and other OSes have support for efficiently closing a range of file descriptors, e.g. close_range.

@agocke @jkotas @adamsitnik should we adopt the changes to Process.Unix.cs and consider it an improvement, though not a complete fix?

@stephentoub
Copy link
Member

This helps but it is not a 100% solution

Isn't this also only helping in the case where the process using the file was created by this parent process? It's using a lock specific to the process to achieve the waiting.

Based on my current understanding, I don't think we should take this. It's introducing non-trivial additional complexity and synchronization and the fundamental issue still exists.

@tmds
Copy link
Member Author

tmds commented Sep 15, 2021

Isn't this also only helping in the case where the process using the file was created by this parent process? It's using a lock specific to the process to achieve the waiting.

Yes, this is for the specific use-case when a .NET application writes an executable file, and then executes it using Process.Start.

A race occurs when another Process.Start forks while the executable is being written, and it has not yet exec-ed.
Because this second process holds the executable file handle open until it execs (CLOEXEC), the first process fails to start with ETXTBSY.

By taking the writer lock we wait until all processes have exec-ed and started to apply CLOEXEC (waitForChildToExecPipe[READ_END_OF_PIPE] was closed).

@stephentoub
Copy link
Member

What happens if the file is written but the app neglects to close it... Process.Start will now hang rather than error?

@tmds
Copy link
Member Author

tmds commented Sep 15, 2021

What happens if the file is written but the app neglects to close it... Process.Start will now hang rather than error?

We handle ETXTBSY once, and second time we throw for it.

@stephentoub
Copy link
Member

Will the test you added succeed 100% of the time?

@tmds
Copy link
Member Author

tmds commented Sep 15, 2021

Will the test you added succeed 100% of the time?

No, I have to remove it.
When making the PR I thought it would pass, because that happened on my machine.
When it failed in CI, I had to figure out why.

I'm ok if we won't merge because it is a niche use-case and not a 100% fix.

I can add a test that ensures Process.Start doesn't hang for a file that remains open.

@tmds
Copy link
Member Author

tmds commented Sep 17, 2021

This is as good as I can make it. It's an improvement, but not a 100% solution.

@stephentoub feel free to merge or close the PR.

@danmoseley
Copy link
Member

@stephentoub does this address your reservations above?

@jkotas
Copy link
Member

jkotas commented Oct 18, 2021

My vote is for closing this PR. I do not think it is worth it given that it is not 100% fix. @tmds Thank you for giving it a shot!

@stephentoub
Copy link
Member

does this address your reservations above?

I very much appreciate the effort and the attempt, but my fundamental concern remains: it adds non-trivial complexity (introducing a retry loop, changing the synchronization scheme to employ an upgradeable read lock and taking the write lock in the middle to coordinate, etc.) and doesn't fully address the issue.

@agocke
Copy link
Member

agocke commented Oct 19, 2021

I ended up adding a retry in the test loop, and that seems to have solved the problem there. Until Linux provides a more robust solution, I think users can address this on an individual basis

@dotnet dotnet locked as resolved and limited conversation to collaborators Nov 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Diagnostics.Process community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modifying a file and then executing it can fail on Linux
5 participants