Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address Dangerous $using: in Foreach-Object -Parallel #10499

Closed
pearsonsjp opened this issue Sep 8, 2019 · 15 comments
Closed

Address Dangerous $using: in Foreach-Object -Parallel #10499

pearsonsjp opened this issue Sep 8, 2019 · 15 comments
Labels
Committee-Reviewed PS-Committee has reviewed this and made a decision Issue-Enhancement the issue is more of a feature request than a bug WG-Cmdlets-Core cmdlets in the Microsoft.PowerShell.Core module

Comments

@pearsonsjp
Copy link

According to the documentation, Foreach -Parallel passes variables by reference with $using: while the rest of PowerShell passes variables by value.
This is a dangerous feature, having the exact same syntax for such different mechanics can cause unintended effects, and could even slip by experienced developers.

If these values are to pass by reference, then I suggest the syntax be changed to $ref:
That will force people to do it with intention, and know exactly what they're looking at at all times. This would be similar to how it is done in C#.

@kilasuit
Copy link
Collaborator

kilasuit commented Sep 8, 2019

This is a feature that was brought in the early days of PowerShell to be able to pass to Jobs and remote commands as listed and is documented in https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_remote_variables?view=powershell-6#using-local-variables and is required for the Foreach -Parallel to function at all.

I'm not sure I can agree with you that it's either dangerous or would slip past experienced developers plus PowerShell isn't C# so those coming from C# may find they get tripped up by this but those that are experienced PowerShell Developers will know this and have used it excessively in the past

@pearsonsjp pearsonsjp changed the title Address Dangerous $using: in Foreach -Parallel Address Dangerous $using: in Foreach-Object -Parallel Sep 8, 2019
@pearsonsjp
Copy link
Author

I was referring to Foreach-Object (Title updated) which nobody has used excessively in the past, because it is a new feature. It does not behave the same as previous iterations of $using because in the past it has always passed by value not by reference
This is a fundamental change in how it works.

@BrucePay
Copy link
Collaborator

BrucePay commented Sep 9, 2019

@pearsonsjp

while the rest of PowerShell passes variables by value.

This is not actually correct. In fact, PowerShell always passes by reference both with reference types and boxed value types.

With regards to using $using: to pass references - yes you can get into trouble but the performance
cost of the alternative is high. In PowerShell WorkFlow, which supported foreach -parallel, we marshalled objects across runspaces. It was abysmally slow. To get any kind of acceptable performance, marshalling had to be turned off.

@iSazonov
Copy link
Collaborator

/cc @PaulHigin for information.

@iSazonov iSazonov added WG-Cmdlets-Core cmdlets in the Microsoft.PowerShell.Core module Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a labels Sep 10, 2019
@PaulHigin
Copy link
Contributor

PaulHigin commented Sep 10, 2019

Just to be clear, it is the ForEach-Object -Parallel parameter set that is new and the behavior of the existing ForEach-Object parameter sets has not changed. This new parameter set implements parallelism and really cannot be expected to work the same as the existing serial processing.

Much of these concerns have already been debated in the RFC for this feature (PowerShell/PowerShell-RFC#194).

It was felt that adding this new -Parallel parameter set was the best way to provide parallelism at the cmdlet level. It can be dangerous if used incorrectly. But then there are a lot of ways to hurt yourself with PowerShell. Rather than diluting the feature I feel it is better make users aware so that they can use it safely and effectively.

Official documentation is coming, but in the mean time I have written a blog to describe and clarify this new feature:
https://devblogs.microsoft.com/powershell/powershell-foreach-object-parallel-feature/

PowerShell
PowerShell ForEach-Object Parallel Feature PowerShell 7.0 Preview 3 is now available with a new ForEach-Object Parallel Experimental feature. This feature is a great new tool for parallelizing work, but like any tool, it has its uses and drawbacks. This article describes this new feature,

@pearsonsjp
Copy link
Author

This blog is actually where I got my information:

" But there is a big difference when using the $using: keyword in ForEach-Object -Parallel. And that is for remoting, the variable being passed is a copy sent over the remoting connection. But with ForEach-Object -Parallel, the actual object reference is being passed from one script to another, violating normal isolation restrictions. So it is possible to have a non thread-safe variable used in two scripts running on different threads, which can lead to unpredictable behavior." <=- That is passing by value, not reference.

This is why I made the suggestion to use $ref: in this feature rather than $using.

I agree that there are a lot of ways to hurt yourself with PowerShell and that education is good; this just seemed like asking for disaster. But it sounds like this has already been discussed at length.

@vexx32
Copy link
Collaborator

vexx32 commented Sep 10, 2019

@PaulHigin I don't know if I can clarify this better than @pearsonsjp already has, but it seems like you might be misunderstanding the issue?

In Invoke-Command, the $using: variables are values copied to the remote machine. No data is exchanged between individual target machines.

In the new ForEach-Object -Parallel, $using: is (as you point out in the article) not thread-safe, as the actual reference to the data can and is shared between threads.

I think what @pearsonsjp is getting at is that existing expectations of how $using: behaves in Invoke-Command will almost certainly carry over to ForEach-Object -Parallel, creating confusion. Their proposed solution is to distinguish the difference in syntax.

I agree that if the behaviour remains as it currently stands, we should probably attempt to distance the similarity from Invoke-Command so as to conform to existing expectations of how $using: variables behave. I think the proposed $ref:variable syntax are a reasonable compromise, as it both communicates "this is kinda-similar to $using:variable" but also "this is fundamentally different from how $using:variable operates" clearly and succinctly.

@Jaykul is often one to say we should attempt to engineer the language towards creating a "pit of success" (rather than "pitfalls" to be anxiously skirted around). I think this is one of those cases.

@pearsonsjp
Copy link
Author

Yes! Very much this.
Thank you vexx for putting my thoughts into words so eloquently.

@PaulHigin
Copy link
Contributor

Oh ok, I get it now. Yes I agree that having a $ref keyword instead of $using is a very reasonable request, and would make the issues of passing by reference more apparent.

@iSazonov iSazonov added Issue-Enhancement the issue is more of a feature request than a bug and removed Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a labels Sep 11, 2019
@SteveL-MSFT SteveL-MSFT added the Review - Committee The PR/Issue needs a review from the PowerShell Committee label Oct 21, 2019
@SteveL-MSFT
Copy link
Member

@PowerShell/powershell-committee discussed this.

PSWorkFlow had already gone through this discussion where the semantics were different depending on whether it was in-proc or out-of-proc but the decision was to use $using: everywhere for consistency and not having to learn specific syntax based on situation even if the semantics were different. Introducing $ref: would be unique here making it more difficult to discover.

$using: means use this variable from caller space into new execution context. Expectation is that documentation for specific cmdlets supporting $using: clarify the semantics. So the PS-Committee decision is to stay with $using:.

@SteveL-MSFT SteveL-MSFT added Committee-Reviewed PS-Committee has reviewed this and made a decision and removed Review - Committee The PR/Issue needs a review from the PowerShell Committee labels Oct 23, 2019
@vexx32
Copy link
Collaborator

vexx32 commented Oct 23, 2019

Hmm. I can't say I think the comparison to workflow is particularly relevant. We're talking very distinct cmdlets and a specialised parameter set here (ForEach-Object doesn't have a $using: in its normal parameter sets, for example).

Personally I think it valuable to make a clear distinction in syntax, if only to be very clear about the potential pitfalls. If the PS team prefer to handle that burden in documentation, I suppose that's a viable option, though I feel it likely to increase misunderstandings.

If we're worried about discoverability... Well, given its constrained usage, would't it be fairly easy to just document it in the cmdlets' help documentation, and then perhaps an about topic that references $using for comparison's sake?

That said... Yeah, I guess in some ways it's "easier" to use the same naming system. I don't think it's a great way to go, though, since the features do behave significantly differently.

Just my 2 cents! 💖

@SteveL-MSFT
Copy link
Member

The PSWorkFlow mention is that this whole discussion has already happened before within the team when it was developing PSWorkFlow. Although PowerShellCore/7 doesn't include PSWorkFlow, some existing PSWorkFlow users may recall $using: there and had to keep the semantics in mind depending on how they used PSWorkFlow. So that was a precedent already set. I think they key thing is that $using: was never meant to mean a copy, it was literally just "using this variable from caller context into a new context". Introducing a new prefix adds more confusion (in our opinion) and makes it less discoverable as $using: is a known concept.

@vexx32
Copy link
Collaborator

vexx32 commented Oct 24, 2019

Aye. And while I understand and accept that this is probably better for folks who already have used $using: variables extensively, I can't help but think it's yet another thing where users have to memorise that now we have two very clearly distinct types of $using:, rather than being able to tell the two kinds at a glance while skimming through code. 🙂

I guess I feel like this is a good opportunity to make a clean break from the old behaviour to clarify the purpose of the variable usage, and would help new users / those not as familiar with $using: from the PSWorkflow days to more easily know the difference.

With $using: sort of pulling double duty as it were, it's much easier to draw the wrong conclusion from reading code. Much as I love documentation, I am painfully aware that there will always be a significant portion of the userbase that doesn't read the docs extensively. I'm constantly pointing people in the community to bits of the documentation who've been around for several years, and never saw or bothered to read more than one or two about_* help topics, if that. For those folks, the fact that $using: can behave quite unexpectedly in Start-ThreadJob or ForEach-Object -Parallel compared to, say Invoke-Command, which a great many users are familiar with.

The familiarity of $using: could easily be a double-edged sword for us here, I think. But I've said my piece; perhaps I'm worrying over nothing. 😁

@pearsonsjp
Copy link
Author

pearsonsjp commented Oct 24, 2019 via email

@iSazonov
Copy link
Collaborator

The conclusion should be documented if not done yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Committee-Reviewed PS-Committee has reviewed this and made a decision Issue-Enhancement the issue is more of a feature request than a bug WG-Cmdlets-Core cmdlets in the Microsoft.PowerShell.Core module
Projects
None yet
Development

No branches or pull requests

7 participants