Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potentially dropped Client in configure #40

Closed
wants to merge 18 commits into from
Closed

Fix potentially dropped Client in configure #40

wants to merge 18 commits into from

Conversation

NobodyXu
Copy link
Contributor

@NobodyXu NobodyXu commented Jul 1, 2022

Fixed #25

Signed-off-by: Jiahao XU Jiahao_XU@outlook.com

@alexcrichton
Copy link
Member

On second read, I don't know whether this is safe. If this attempts to destroy the Arc<T> closure in the pre_exec closure I don't think that this is safe.

@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 2, 2022

On second read, I don't know whether this is safe. If this attempts to destroy the Arc<T> closure in the pre_exec closure I don't think that this is safe.

Oops, I didn't think of that.

I would probably revert it and fix that with lifetime subtype/variance.

@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 2, 2022

On second read, I don't know whether this is safe. If this attempts to destroy the Arc<T> closure in the pre_exec closure I don't think that this is safe.

I've fixed that by using lifetime subtyping to ensure that &self always outlives &Command.

@alexcrichton
Copy link
Member

I think a better solution might be to use some sort of leaking method in the forked process. I'm wary to change the public API here and I'm not actually sure that this fixes the original issue (while it probably fixes the one specific example I'm not sure that this fixes all usages).

@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 5, 2022

I think a better solution might be to use some sort of leaking method in the forked process. I'm wary to change the public API here and I'm not actually sure that this fixes the original issue (while it probably fixes the one specific example I'm not sure that this fixes all usages).

That sounds like a good idea.

I will do it tomorrow.

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 6, 2022

@alexcrichton I've fixed this by leaking the Arc.

@alexcrichton
Copy link
Member

Reading over this, can you confirm the behavior one way or another? This now actually seems like it'll legitimately leak the memory in the host process.

@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 6, 2022

@alexcrichton According to CommandExt::pre_exec's documentation:

This closure will be run in the context of the child process after a fork. This primarily means that any modifications made to memory on behalf of this closure will not be visible to the parent process.

So this should be fine.

@alexcrichton
Copy link
Member

I'm more worried about the Command in the original parent process where the Arc is leaked I believe.

@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 7, 2022

I'm more worried about the Command in the original parent process where the Arc is leaked I believe.

Thanks for pointing out!

Turns out that I missed that one :D

Since `ManuallyDrop<Arc<Self>>` would also leak the `Arc` in parent,
switch to using `Arc::increment_strong_count` instead.

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 7, 2022

@alexcrichton This should now be fixed as I use Arc::increment_strong_count in the children to make sure that the Arc is leaked, but not in the parent.

@alexcrichton
Copy link
Member

Looks good to me, can you expand in the comment why the arc is leaked though?

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 7, 2022

Looks good to me, can you expand in the comment why the arc is leaked though?

Arc::increment_strong_count increments the strong count of the Arc by one, thus it is leaked unless we pair it with Arc::decrement_strong_count.

@alexcrichton
Copy link
Member

Sorry yes I understand that this is being leaked, what I would like the code to document is why it's being leaked because in the abstract that's a pretty weird thing to do intentionally.

@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 7, 2022

what I would like the code to document is why it's being leaked because in the abstract that's a pretty weird thing to do intentionally.

Hmmm, Arc::increment_strong_count(p.0) is equivalent to the following code:

let arc = mem::ManuallyDrop::new(unsafe { Arc::from_raw(p.0) });
let _arc_clone: mem::ManuallyDrop<_> = arc.clone();

@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 7, 2022

Sorry yes I understand that this is being leaked, what I would like the code to document is why it's being leaked because in the abstract that's a pretty weird thing to do intentionally.

Basically, Arc uses reference counting to keep track of the references and it is dropped when the reference counting reaches 0.

Using Arc::increment_strong_count increment the reference counting by 1 without creating an Arc, thus prevents the reference counting from ever reaching 0 and being dropped.

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 7, 2022

@alexcrichton I've updated the comment in unix::Client::configure.

@alexcrichton
Copy link
Member

Sorry I think more things need to be added here:

  • There should be a test from the original issue to assert that it works.
  • There should ideally be a test that nothing is leaked in the host process. Currently I think a leak happens.
  • It is not necessary to document why incrementing the strong count leaks the Arc. What I think is necessary is to document why anything is being leaked in the first case.
  • Again though I think it would be good to confirm that the child process actually runs the destructor of the Arc. If it doesn't do that then there's no need to deal with leaking anyway.

@NobodyXu
Copy link
Contributor Author

@alexcrichton Got it, I would check the 4th point tomorrow by looking into the std library.

@NobodyXu
Copy link
Contributor Author

@alexcrichton I've checked the source code of Command::do_exec, it does not destructs the pre_exec function we pass to them.

However, the use of pre_exec itself prevents the use of posix_spawn, which can use vfork to avoid duplicating the virtual memory mapping, so I think we should avoid the use of pre_exec by creating the pipe without O_CLOEXEC flag.

@NobodyXu NobodyXu closed this Jul 17, 2022
@NobodyXu NobodyXu deleted the fix/configure branch July 17, 2022 08:36
@NobodyXu NobodyXu restored the fix/configure branch July 17, 2022 11:38
@NobodyXu NobodyXu reopened this Jul 17, 2022
Since `Command` keeps pre exec function alive, we just need to pass in
an `Arc`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
@NobodyXu
Copy link
Contributor Author

@alexcrichton The test client-dropped-before-command failed on windows, however I have no idea how to fix it.
Can we merge this PR and then fix this in another PR?

@alexcrichton
Copy link
Member

No I would prefer to not land a broken test that doesn't succeed on all platforms.

@NobodyXu
Copy link
Contributor Author

No I would prefer to not land a broken test that doesn't succeed on all platforms.

Fair enough.

After looking at the mod windows, I think the only way to fix it is to rework the API.
I would redesign the API then.

@NobodyXu

This comment was marked as outdated.

@NobodyXu
Copy link
Contributor Author

Ignore the previous comment, it was actually wrong.

I mistook wasm.rs for windows.rs...

That takes any type that implements `Command` and a function `f` that
spawns the process.

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
By changing `{windows, wasm, unix}::Client::string_arg` to return
`std::borrow::Cow<'_, str>`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
@NobodyXu
Copy link
Contributor Author

@alexcrichton I have fixed the CI failure, can you review this again please?

The latest commits fixed this by changing configure to configure_and_run, taking Command by value and requires a function f that performs the spawning.

configure_and_run requires user to perform spawning in f to ensure that Client must be alive at when spawning and once the function exits, it will clear any environment variable it sets.

This also avoids the use of std::os::unix::process::CommandExt to register pre_exec, meaning that on unix, Command::spawn can use posix_spawn to efficiently spawn a new process using vfork.

I also added support for tokio::process::Command behind a feature flag tokio so now people using tokio can also use this crate trivally without manually converting between std::process::Command and tokio::process::Command.

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
@alexcrichton
Copy link
Member

Sorry this has grown to the point that I don't really know what's going on internally any more. If it's this difficult to fix the original issue then it may not be worth fixing or otherwise could try to just result in a better error or something like that.

@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 25, 2022

Sorry this has grown to the point that I don't really know what's going on internally any more. If it's this difficult to fix the original issue then it may not be worth fixing or otherwise could try to just result in a better error or something like that.

The original issue is caused by the API itself.

The original API says that we can configure a Command so that it can be spawned at anytime and inherit the jobserver.
This is where everything goes wrong, Client can be dropped before Command is dropped.

Trying to clone and store Client in CommandExt::pre_exec only works on unix and it has a performance cost (it cannot use posix_spawn) and does not work on windows since it does not have pre_exec handle.

Thus I decided to redesign the API to better illustrate that relationship and avoid the performance cost.

In the new API, you have to spawn inside Client::configure_and_spawn and on exit, any environment set by it will also be cleared.

This works for std::process::Command and tokio::process::Command since spawning is synchronous for both of them.

@alexcrichton
Copy link
Member

Yes I understand it's the API itself that's causing the issue. This is so significantly different from before that I don't want to pull it into this crate. If you'd like though you could maintain a separate crate which depends on this one.

@NobodyXu
Copy link
Contributor Author

Yes I understand it's the API itself that's causing the issue. This is so significantly different from before that I don't want to pull it into this crate. If you'd like though you could maintain a separate crate which depends on this one.

Understandable, I might just fork this crate and create jobserver2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Client::configure is unsafe
2 participants