Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RFC for foreach-parallel feature #174

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter...
Filter file types
Jump to…
Jump to file or symbol
Failed to load files and symbols.

Always

Just for now

@@ -0,0 +1,110 @@
---
RFC: RFCnnnn
Author: Paul Higinbotham
Status: Draft
SupercededBy: N/A
Version: 1.0
Area: Engine
Comments Due: July 1, 2019
Plan to implement: Yes
---

# Implement PowerShell language foreach -parallel

Windows PowerShell currently supports the foreach language keyword with the -parallel switch flag, but only for workflow scripts.

```powershell
workflow wf1 {
$list = 1..5
foreach -parallel -throttlelimit 5 ($item in $list) {
Start-Sleep -Seconds 1
Write-Output "Output $item"
}
}
```

This will run the script block with each value in the `$list` array, in parallel using workflow jobs.
However, workflow is not supported in PowerShell Core 6, partly because it is a Windows only solution but also because it is cumbersome to use.
In addition the workflow implementation is very heavy weight, using lots of system resources.

This is a proposal to re-implement `foreach -parallel` in PowerShell Core, using PowerShell's support for concurrency via Runspaces.
It is similar to the [ThreadJob module](https://www.powershellgallery.com/packages/ThreadJob/1.1.2) except that it becomes part of the PowerShell language via `foreach -parallel`.

This comment has been minimized.

Copy link
@swngdnz

swngdnz May 14, 2019

Regardless of how implemented (as a cmdlet or a foreach extension), I seriously hope you will make this compatible with psexec. When I filed an issue against threadjob noting that it broke psexec, psexec was blamed. I doubt highly that sysinternals/Mark Russinovich would agree with that.

## Motivation

As a PowerShell User,
This conversation was marked as resolved by PaulHigin

This comment has been minimized.

Copy link
@rjmholt

rjmholt May 14, 2019

Member

This might be worth turning into a quote rather than a pre block:

Suggested change
As a PowerShell User,
> As a PowerShell User,
> I can do simple fan-out concurrency from within the language,
> without having to obtain and load a separate module or deal with PowerShell jobs.

This comment has been minimized.

Copy link
@Jaykul

Jaykul May 14, 2019

Contributor

Code-fenced gherkin 😉

I can do simple fan-out concurrency from within the language, without having to obtain and load a separate module or deal with PowerShell jobs.

## Specification

The PowerShell `foreach -parallel` language keyword will be re-implemented to perform invoke script blocks in parallel, similar to how it works for workflow functions except that script blocks will be invoked on threads within the same process rather than in workflow jobs running in separate processes.
The default behavior is to fan-out script block execution to multiple threads, and then wait for all threads to finish.
However, a `-asjob` switch will also be supported that returns a PowerShell job object for asynchronous use.
If the number of foreach iterations exceed the throttle limit value, then only the throttle limit number of threads are created at a time and the rest are queued until a running thread becomes available.
This conversation was marked as resolved by PaulHigin

This comment has been minimized.

Copy link
@Jaykul

Jaykul May 14, 2019

Contributor

Is there an implicit throttle limit, or does this only happen explicitly?

This comment has been minimized.

Copy link
@PaulHigin

PaulHigin May 14, 2019

Author Contributor

I was thinking of having a default throttle limit to prevent inadvertently overwhelming the system. But it is difficult to know what value to use (10?). I am open to suggestions.

This comment has been minimized.

Copy link
@fMichaleczek

fMichaleczek May 14, 2019

the default throttle limit should be accorded with NumberOfLogicalProcessors.

Windows : Get-CIMInstance –class Win32_processor | Select NumberOfLogicalProcessors
Linux : grep -c ^processor /proc/cpuinfo  
macOS : sysctl -n hw.logicalcpu

### Supported foreach parameters

- `-parallel`
- `-throttlelimit`
This conversation was marked as resolved by PaulHigin

This comment has been minimized.

Copy link
@rjmholt

rjmholt May 14, 2019

Member

I know this is implicit, but there's a conceptual "parameter set" around -Parallel whereby the other parameters only make sense with it. Maybe that's worth specifying in this RFC

This comment has been minimized.

Copy link
@PaulHigin

PaulHigin May 14, 2019

Author Contributor

Yes, -timeout and -asjob are mutually exclusive. Checks will need to be implemented in the parser.

- `-timeout`
- `-asjob`

### P0 Features

- `foreach -parallel` fans out script block execution to threads, along with a bound single foreach iteration value

- `-throttlelimit` parameter value specifies the maximum number of threads that can run at one time

- `-timeout` parameter value specifies a maximum time to wait for all iterations to complete, after which 'stop' will be called on all running script blocks to terminate execution

- `-asjob` switch causes foreach to return a PowerShell job object that is used to asynchronously monitor execution

- When a job object is returned, it will be compatible with all relevant job cmdlets
This conversation was marked as resolved by PaulHigin

This comment has been minimized.

Copy link
@rjmholt

rjmholt May 14, 2019

Member

This line is worth clarifying I think; does this mean a PSRemotingJob is guaranteed to be returned, or some level of API compatibility, or only that cmdlets will work but no guarantees are made for other APIs. If the case is the latter, I think it's worth explicitly calling out that "this isn't going to be like normal jobs and you aren't guaranteed all the APIs that you get with Start-Job".

This comment has been minimized.

Copy link
@Jaykul

Jaykul May 14, 2019

Contributor

Jobs are a known API extension point. I think this is clear enough

This comment has been minimized.

Copy link
@PaulHigin

PaulHigin May 14, 2019

Author Contributor

I can list the expected job cmdlets this job should be compatible with. Probably:
Debug-Job, Get-Job, Receive-Job, Remove-Job, Stop-Job, Wait-Job

This comment has been minimized.

Copy link
@rjmholt

rjmholt May 14, 2019

Member

More want to clamp down on the language; my litigious side reads it will be compatible with all relevant job cmdlets and squints. But am happy if others are


- All script blocks running in parallel will run isolated from each other.
This conversation was marked as resolved by PaulHigin

This comment has been minimized.

Copy link
@rjmholt

rjmholt May 14, 2019

Member

I think it's worth going into detail about what kind of isolation this will provide:

  • Shared memory/.NET API isolation?
  • Standard runspace isolation (where modules and variables are not shared)

It might also be useful to mention what is shared, such as if certain kinds of variables will be, or if this cannot guarantee .NET thread safety.

This comment has been minimized.

Copy link
@PaulHigin

PaulHigin May 14, 2019

Author Contributor

I'll clarify that it is standard PowerShell runspace isolation. The concept of PowerShell runspaces is outside the scope of this RFC.

Only foreach iteration objects will be passed to the parallel script block.

### Data stream handling

`foreach -parallel` will use normal PowerShell pipes to return various data streams.
Data will be returned in order received.
Except when `-asjob` switch is used, in which case a single job object is returned.
The returned job object will contain an array of child jobs that represent each iteration of the foreach.
This conversation was marked as resolved by PaulHigin

This comment has been minimized.

Copy link
@johlju

johlju May 14, 2019

The returned job object will contain an array of child jobs that represent each iteration of the foreach.

So when used together with with the -throttlelimit this will return an array of all child jobs, and the number of child jobs exceed the throttle limit value, will those have status 'Waiting'? In relation to the section 'Specification' too.

This comment has been minimized.

Copy link
@PaulHigin

PaulHigin May 14, 2019

Author Contributor

Yes, I anticipate creating a single child job for each loop iteration when using the -asjob switch. The state of the job will be NotStarted when queued due to throttle limit. This could become a resource problem if a foreach loop is created with 10K(!) iterations. Implementation should use lazy initialization and disposal at the earliest opportunity strategies to minimize resource usage.


### Examples

```powershell
This conversation was marked as resolved by PaulHigin

This comment has been minimized.

Copy link
@rjmholt

rjmholt May 14, 2019

Member

If we have a way to partition work relatively easily, I think that might be a worthwhile example

$computerNames = 'computer1','computer2','computer3','computer4','computer5'
$logs = foreach -parallel -throttle 10 -timeout 300 ($computer in $computerNames)
{
Get-Logs -ComputerName $computer
}
```

```powershell
$computerNames = 'computer1','computer2','computer3','computer4','computer5'
$job = foreach -parallel -asjob ($computer in $computerNames)
{
Get-Logs -ComputerName $computer
}
$logs = $job | Wait-Job | Receive-Job
```

```powershell
$params += @{
$argTitle = "Title1"
$argValue = 102
}
foreach -parallel ($param in $params)
{
c:\scripts\ToRun.ps1 @param
}
```

## Alternate Proposals and Considerations

One alternative is to create a `ForEach-Parallel` cmdlet instead of re-implementing the `foreach -parallel` keyword.
This would work well but would not be as useful as making it part of the PowerShell language.

This comment has been minimized.

Copy link
@rjmholt

rjmholt May 14, 2019

Member

I think this is maybe worth discussing more, just because the drawbacks aren't as apparent to me. What's the key limitation here?

The drawbacks I see with foreach () -Parallel are:

  • New syntax means scripts are syntactically backwards incompatible -- they cannot even be successfully parsed by an older PowerShell version. Compare with:
    $foreachParams = @{}
    if ($PSVersionTable.PSVersion.Major -ge 7) { $foreachParams += @{ Parallel = $true } }
    $workItems | ForEach-Object @foreachParams { Invoke-Processing $_ }
  • Assigning from a foreach-loop seems like a relatively unintuitive construction and a bit syntactically off. I know it's already a functionality we support and I think it makes sense in the language, but minting it as the primary syntax for parallelism seems to run a bit against the natural style of PowerShell to me

This comment has been minimized.

Copy link
@KevinMarquette

KevinMarquette May 14, 2019

I don't believe that ForEach-Object was in scope of this RFC. If we get this implemented, I can see that becoming part of the conversation.

I also don't believe that we would be able to splat to the foreach operator.

This comment has been minimized.

Copy link
@vexx32

vexx32 May 14, 2019

Contributor

Maybe not, but I agree with @rjmholt here; it makes more syntactic sense to put this into the cmdlet instead. ForEach-Object -Parallel -AsJob {} makes a LOT more sense than foreach -Parallel -AsJob ($a in $b) {} visually, and with only a handful of exceptions the majority of language keywords don't have parameters like that.

Additionally, having this available for pipeline cmdlets I think would be significantly more valuable than just a foreach loop, which can't be used in a pipeline context.

This comment has been minimized.

Copy link
@fMichaleczek

fMichaleczek May 14, 2019

I'm agree with @vexx32 and I think it's a bad practice to modify foreach in this way.
I'm against a "Foreach VS Foreach-Object VS Magic Foreach VS ForEach-Parallel"
I vote for "ForEach-Object -Parallel -AsJob {}"

This comment has been minimized.

Copy link
@PaulHigin

PaulHigin May 14, 2019

Author Contributor

I am perfectly fine implementing this as a cmdlet rather than a foreach language keyword extension. In fact it is much easier to implement as a cmdlet. If the community prefers a cmdlet (as it seems from these comments) then I am happy to update this RFC accordingly. But I'll let the PowerShell committee weigh in as well.

This comment has been minimized.

Copy link
@johlju

johlju May 14, 2019

How would $objects | ForEach-Object -Parallel -AsJob {} | Select-Object ... work? If this would be added to a cmdlet the I would expect this to work. I vote for this being added to the foreach statement. Then if someone need a cmdlet, they could build a cmdlet from that?

This comment has been minimized.

Copy link
@fMichaleczek

fMichaleczek May 14, 2019

@johlju I don't see issue with your command.
I don't find a 'keyword' category in Get-Help. How-to handle this point ? Discoverability is the more important things for end-user.

This comment has been minimized.

Copy link
@SteveL-MSFT

SteveL-MSFT May 14, 2019

Member

@johlju in your example, since -AsJob is specified, you would get back PSJob objects in the pipeline

Perhaps what would help is for @PaulHigin to add some more script examples of how each case would work whether it's a extension of the foreach statement or the ForEach-Object cmdlet

This comment has been minimized.

Copy link
@vexx32

vexx32 May 14, 2019

Contributor

If we opt for a cmdlet, I would expect $objects | ForEach-Object -Parallel -AsJob {} to emit the job objects, because that's what you've asked for. The pattern here would be relatively straightforward, I'd think:

$objects 
| ForEach-Object -Parallel -AsJob {
    # do stuff
}
| Receive-Job -Wait 
| Select-Object ...

This is a fairly easy pattern, and if you prefer to collect the objects and do some other processing while they operate, you have the option of doing so.

If we omitted -AsJob I would expect the cmdlet to write each returned item to the output stream as that task / runspace completes its job and you would end up with a bit of a disordered pipeline, but nonetheless one that could in theory operate much, much quicker. 😄

This comment has been minimized.

Copy link
@fMichaleczek

fMichaleczek May 14, 2019

@vexx32 your pattern make me happy 😍

This comment has been minimized.

Copy link
@rjmholt

rjmholt May 14, 2019

Member

you would end up with a bit of a disordered pipeline

I think in the context of parallel processing, getting things out of order is a standard expectation (perhaps not in PowerShell, but there hasn't been much parallel processing in PowerShell historically).

It also opens up the possibility of a new switch like -InOrder to do the extra blocking to resort the output (as opposed to emitting objects as they become available, which would block but less so)

This comment has been minimized.

Copy link
@HumanEquivalentUnit

HumanEquivalentUnit May 14, 2019

Having a parameter on a keyword seems weird, but it gives a simple progression:

  • foreach-object # pipeline but slow
  • .foreach() # faster, still supports pipeline
  • foreach ($x in $collection) # even faster, but no pipeline
  • foreach -parallel ($x in $collection) # even faster, no pipeline, and more resource use

If this RFC is implemented as a cmdlet, we then have a more confusing non-progression:

  • foreach-object # pipeline but slow
  • .foreach() # faster, still supports pipeline, feels like it could have a Parallel parameter, but doesn't.
  • foreach ($x in $collection) # even faster, but not parallel
  • foreach-object -parallel # the slow one, but a different kind of faster .. maybe?

And no way to combine the performance increase of foreach() and -parallel in one thing, even though both are often wanted just for their performance.

Beware of -AsJob which would makes the pipeline one totally different in a pipeline.

And foreach -parallel ($x in $collection) still exists in a years of documentation for workflow, which is outdated for PS7 but if PS7 has a "parallel foreach", but it's not the one everyone has seen before, that's probably going to trip people up too. Even more when foreach is an alias of foreach-object.

For these reasons, I lean towards foreach -parallel ($x in $collection) despite it being an exceptional keyword with a parameter and looking a bit weird.

This comment has been minimized.

Copy link
@vexx32

vexx32 May 14, 2019

Contributor

I see those points... but all that tells me is that the best solution is to implement both. 😂

This comment has been minimized.

Copy link
@EliteLoser

EliteLoser May 16, 2019

What am I missing? How is the implementation ForEach-Object -Parallel (-AsJob/-InOrder) any less backwards-compatible-breaking than foreach -parallel ()? Genuine question… By the way - I would seriously like both options. Apart from the splatting you can do for ForEach-Object with the PS version check for above 7 in this thread, but can't that also simply be implemented in foreach -parallel as well while you're changing things?

I do not find it unreasonable with a backwards-incompatible change of this nature. #requires -version 7…?

How would the differentiation be between enumerating the collection at once, as foreach, the keyword, does, while ForEach-Object takes them one at a time. When you are using parallelism, "one at a time" is sort of a mute idea, but will it not play a role in the execution of the jobs, as far as the collection enumeration is concerned? Or will the implementation of ForEach-Object -Parallel be designed to … is the term here … "block" the incoming pipeline?

These feel like stupid questions, but I'm so dumb I have an excuse.

This comment has been minimized.

Copy link
@vexx32

vexx32 May 16, 2019

Contributor

Indeed, using either of them would break in older versions of PS.

Personally, I'd also like to have both available, but I think that adding a ton of configurability to the foreach -Parallel ($a in $b) {} form is a bad move. Multiple switches off of a language keyword is very very messy, and I don't think that should realistically support -AsJob either.

In my mind, it would be most effective to have the simple foreach -Parallel available as a keyword with some sensible defaults and basically no additional configurability. Dealing with job objects, throttling, etc., in my mind ought to be a concern only for the ForEach-Object -Parallel cmdlet form.

As for splatting etc., not really, no; splatting to a language keyword isn't supported as far as I've ever seen, so you'd need to write two copies of the loop -- and in PS 6.0-6.2 the foreach -parallel would be a parse error if I recall correctly, so you can't even have it anywhere in the script that PS is parsing if you want it to work in all PS versions, whereas with the splatting available for cmdlets you can apply it or not depending on PSVersion if you need to.

This comment has been minimized.

Copy link
@rjmholt

rjmholt May 16, 2019

Member

What am I missing? How is the implementation ForEach-Object -Parallel (-AsJob/-InOrder) any less backwards-compatible-breaking than foreach -parallel ()?

A syntactic change will make any whole script containing it backwards incompatible (the parser will reject it before executing any of it), vs a command parameter which will make the single command execution backwards incompatible. You can script around the latter much more easily than the former.

Take the following example:

Script A

$x = 1
foreach -parallel ($y in 1..100) { $y }
Write-Host "`$x is $x"

Script B

$x = 1
1..100 | ForEach-Object -Parallel { $_ }
Write-Host "`$x is $x"

When you try executing both of these in Windows PowerShell 5.1 (or PowerShell 6.2) you get the following result:

Script A

At /Users/rjmholt/Documents/Dev/Microsoft/vscode-PowerShell/ex.ps1:2 char:1
+ foreach -parallel ($y in 1..100) { $y }
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The '-parallel' parameter can be used only within a workflow.
+ CategoryInfo          : ParserError: (:) [], ParseException
+ FullyQualifiedErrorId : ParallelNotSupported

Script B

ForEach-Object : Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.
At /Users/rjmholt/Documents/Dev/Microsoft/vscode-PowerShell/ex.ps1:2 char:10
+ 1..100 | ForEach-Object -Parallel { $_ }
+          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : MetadataError: (:) [ForEach-Object], ParameterBindingException
+ FullyQualifiedErrorId : AmbiguousParameterSet,Microsoft.PowerShell.Commands.ForEachObjectCommand

$x is 1

Notice that the final statement in Script B still executes.

Much more importantly, Script B can be changed to make it work with lower PowerShell versions:

Script B (modified)

$x = 1
if ($PSVersionTable.PSVersion.Major -ge 7)
{
    1..100 | ForEach-Object -Parallel { $_ }
}
else
{
    1..100 | ForEach-Object { $_ }
}
Write-Host "`$x is $x"

Executing it in PowerShell 6.2 and Windows PowerShell 5.1 now works (and in this case does exactly what it would in PS 7, just slower).

There's no way to change Script A that will both keep the foreach () -parallel and make it work in PowerShell 6.2 or 5.1. You would need another script to conditionally call the syntactically compatible one.

There are also API compatibility concerns with things like AstVisitor2, but in this case because Workflow supported foreach () -parallel {} I think we don't actually have that problem.

This comment has been minimized.

Copy link
@rjmholt

rjmholt May 16, 2019

Member

On the question of a performance hierarchy, I don't think it's a good idea to depend on some constructs being faster than others. Performance is a cross-cutting concern or "aspect" and is very implementation-dependent; there's no API for performance, and the contract is generally "it should be as fast as we can make it while still being able to maintain and extend it".

In fact, the -Parallel flag doesn't fit into a nice hierarchy at all with other looping constructs. You will probably find that for many workloads, it's the slowest because of thread dispatch overhead; using -Parallel you'll find that you get the best performance boost when the actual loop mechanics are negligible compared to what's happening inside the loop (like heavy computation or web requests), so having a slightly heavier-weight loop logic won't really matter.

This comment has been minimized.

Copy link
@EliteLoser

EliteLoser May 17, 2019

(Hope this comes in the right thread, I'm confused..). As someone once explained to me, there are three types of errors in PowerShell: script-terminating errors (foreach -parallel). Statement-terminating errors (foreach-object -parallel) and … #¤%& I forgot, but the least intrusive one… (sorry, I should get the fact in here, maybe an edit).

I do absolutely hate the idea of having to add that conditional about major version -ge 7 every time in the future. I read the phrase foreach-parallel and thought that a brand new cmdlet (originally I said "keyword", Jesus F* Christ) might be just as well, or rather better?? You could still do the -ge 7 check and code in ForEach-Parallel instead of ForEach-Object -parallel. The upside there is that no new scripts would break old scripts, ever, in this scenario, and you'd have full support if working on above version 7. The requirement would be that scripts are compatible with version 7 or above (Python 2/3 eventually have to make the change, and they are - 9 years later). I mentioned #requires -version 7. Would that be considered unreasonable as a requirement for using foreach-parallel, as a new cmdlet name? I think I like that more.

What am I missing this time?

This comment has been minimized.

Copy link
@rjmholt

rjmholt May 17, 2019

Member

I think @mklement0 does the best job of covering PowerShell error handling in MicrosoftDocs/PowerShell-Docs#1583. Unfortunately it's not terribly consistent, but that's just the way it is in old versions and we'd need RFCs to change it all since it could break existing scripts in new PowerShell versions.

BUT, none of those errors are parse errors, which is where the script doesn't even make enough sense to execute in the first place. Imagine this:

$x = Get-Thing 'Hello' $something
if ($x -banana 7) { Write-Output 'Hello' }

When you read through that script, things make sense until you hit -banana. It's not like that's a command that could plausibly do something if the right module is loaded, it's a whole new operator. In PowerShell you can't define new operators; there's a short, well-known list of them. So this script makes no sense and no amount of module loading or variable definition will make it make sense. You could almost say it's not PowerShell.

So unlike all the runtime errors, which is what the 2 or 3 error concept refers to, the error that occurs trying to execute this script is shown before any of the script can ever run. You can't surround it with a try/catch or use a trap statement. The whole script (including try/catch etc) is gobbledigook according to the parser and it just spits it out. It's not PowerShell. You may as well have tried to run a CSV or a Python file. A parse error is more grave even than a script-terminating error — it's like a script-never-starting error.

Keywords in PowerShell are just like operators in that you can't define new ones. However something like a new keyword at the start of a statement like foreach-parallel ($x in $range) { ... } would be interpreted by old PowerShell versions as a command. But having something that's a keyword in one version of PowerShell vs a command in another is cumbersome at best I think; tools like PSReadLine, the PowerShell extension in VSCode and PSScriptAnalyzer would have to go to a lot of trouble to manage a strange thing like that. Also, new keywords cause breaking changes to all the PowerShell tooling that exists because they add new things to the interfaces (ICustomAstVisitor{2} and AstVisitor{2}).

A new cmdlet fits the existing grammar (so PowerShell 7 remains the same language as PowerShell 6 and PowerShell 5.1) and therefore causes no problems with all the tools that already exist (and generally work pretty hard to already work with PowerShell 3, 4, 5.1 6.1 and 6.2).

#requires has the same problem. The moment you put #requires -Version 7.0 at the top of your script, you'll never be able to run that script in lower PowerShell versions. But that makes life hard. And there's no good reason to force that on you; you don't want to be required to run PowerShell 7 if your script will work fine in lower PowerShell versions except for one small detail. When you use #requires -Version 7, it's not saying "PowerShell 7 supports this" it's saying "PowerShell 6 and below don't support this". But the ideal solution is one where PowerShell 6 and 5.1 will work too, but PowerShell 7 gives you added benefits without forcing you to choose.

This comment has been minimized.

Copy link
@EliteLoser

EliteLoser May 17, 2019

See my updated comment. I said keyword, but meant ForEach-Parallel as a new cmdlet (please don't tell me adding new COMMANDLETS is troublesome!), and leaving ForEach-Object as-is. Avoids all pitfall, as I see it. Even better, no? Why mangle and mess with ForEach-Object and introduce a need to check for PowerShell versions (makes me shudder and scream!).

And I know the meaning of #requires -version X being a minimum version requirement...

I felt like I was being schooled in a useful way, but based on a completely misunderstood base, so it sort of ticked me off. And I'm in an insomnia period for months now, so I want to curse A LOT now. Takes TREMENDOUS effort to restrain it.

This comment has been minimized.

Copy link
@vexx32

vexx32 May 17, 2019

Contributor

Either way you'll still need to check for PowerShell version if you want a script using this to operate at all on order versions. It doesn't matter if it's a new cmdlet or a new parameter for an existing cmdlet.

If you don't care about it being usable on old versions, you can let it fall apart and break all the same.

In my mind, I'd rather roll this into the existing cmdlet because:

  1. Users will likely expect similar behaviours of the new cmdlet anyway, so
  2. It's less duplicate code, and
  3. It's more discoverable than a whole new command

This comment has been minimized.

Copy link
@KevinMarquette

KevinMarquette Jun 11, 2019

I feel like I am missing something here. We actually want a parse error, correct, so if someone tries to run this on down level Powershell that the whole script doesn't run.

Sounds like we are trying to make this work able in the older versions and I'm not sure why.

This is a nice new feature of the new version of powershell. I am perfectly fine if I use 7.0 specific features that it doesn't work in lower versions. Just like when the added DSC or Classes. New going forward.

This comment has been minimized.

Copy link
@vexx32

vexx32 Jun 11, 2019

Contributor

I'm not against it as a language keyword in and of itself, but the complexity proposed does not befit a language construct. Too many switches and options.

If we keep a language construct for this, I would be very very cautious about adding more than the basic -parallel switch parameter to the keyword.

This comment has been minimized.

Copy link
@jhoneill

jhoneill Jun 11, 2019

@vexx32 - there is the problem
You can't just add -parallel because the same script block won't work as a thread job and in a foreach statement (variables need to switch to $using:varname) and you need to say how many threads so that's a second switch. And then you also loses some of the advantages of a parallelism.
$x | foreach {something slow with $_ } | another slow command
can run the steps of the pipeline as each item comes in but if it it is a statement

$z = foreach ($y in $x) -parallel {something slow with $using:y}
$z | another slow command 

this can't start on the second command until everything is in $z.

This comment has been minimized.

Copy link
@vexx32

vexx32 Jun 11, 2019

Contributor

To my understanding, the foreach -parallel keyword is only intended to run in parallel for the duration of the loop. It's not supposed to keep running while the rest of the script executes, it simply runs each iteration of the loop in parallel, and waits until all the iterations complete before continuing the script.

That said, you are correct that special variable handling would need to be applied. I do not know whether $using: would be necessary; as this is at the language level, it's possible that that particular necessity could be skipped here and handled by the engine.

This comment has been minimized.

Copy link
@jhoneill

jhoneill Jun 11, 2019

@vexx32 Keyword would HAVE to work that way.

But I wrote something which does this job called "start-parallel" and it's on the PSGallery
In the examples I have this for hunting machines found in ad
PS C:\>Get-ADComputer -LDAPFilter "(operatingsystem=Windows XP*)" | select -expand name | Start-parallel -Command "Test-Connection"
By not collecting all the data first we can be testing the connection to the first machine before get-adComputer has fetched the last one so we need fewer threads and/or finish sooner.

I have another example
function quickping {param ($last=10) test-connection "192.168.0.$last" -count 1 }
1..254 | Start-Parallel -Scriptblock ${Function:\quickping} | ft Address, ResponseTime

as it happens all my active IP addresses are the low numbers, so it takes 20 seconds to run but the answers come out and go through the next command in the first 2 seconds. If I had to do
$x = foreach ($ip in (1..254)) -parallel {test-connection "192.168.0.$IP" -count 1 }
$x | ft address, ResponseTime
I wouldn't see any output for 20 seconds.

So a parallel for each cmdlet can go in the middle of a pipeline and start a thread as input comes in from command "A", and send out data to command "Z" as each thread completes.
So if A produces 100 items at 1 second intervals, and Z takes 1 second per item, and the parallel step takes 2 seconds for most tasks but some take 10, and it's 0.1 seconds to start a thread
You get your first output after 4.1 seconds, and the last one as early as after 104.1
If you have to save the results of A; and of the parallel step, its ~120 seconds before you get any output and the final output is at second ~220.

This comment has been minimized.

Copy link
@vexx32

vexx32 Jun 11, 2019

Contributor

I'm not arguing against a cmdlet version at all? I'm saying that we should probably implement both, as each have different use cases and intrinsic implementation differences.

Both can be useful imo.

This comment has been minimized.

Copy link
@jhoneill

jhoneill Jun 11, 2019

Put like that :-) ... there is a case for both, I think the case is stronger for the cmdlet;
When you said

To my understanding, the foreach -parallel keyword is only intended to run in parallel for the duration of the loop. It's not supposed to keep running while the rest of the script executes, it simply runs each iteration of the loop in parallel, and waits until all the iterations complete before continuing the script.

I was saying, "Yes and that's why the keyword is less good"
If your script is goes
Get items ; do something to each item in its own thread; format output.
Then The keyword approach can't start any threads until it has all the items and won't output anything until all threads have completed. But if they are a pipeline the threads will be started as the items are fetched, and the output can happen as the threads end. That overlapping of commands in a pipeline is makes a big perf difference if the commands either side of the parallelized one are slow

This comment has been minimized.

Copy link
@rjmholt

rjmholt Jun 11, 2019

Member

I feel like I am missing something here. We actually want a parse error, correct, so if someone tries to run this on down level Powershell that the whole script doesn't run.

Some scripts and modules need to run on versions from PS 7 down to PS 3 (or even PS 2 in the case of Pester), and not just on Windows.

An example is the PowerShellEditorServices (backend to the PowerShell extension for VSCode) startup script; it must run on everything from PS 3 in Windows Server 2012 (in some cases 2008) to PS 7 on Ubuntu 18.04. We took it out, but for example that script used to call chmod on *nix and Set-Acl on Windows. Imagine if you couldn't wrap that in an if, but it was a parse error.

Keywords that don't exist in any of those versions can't be used anywhere in that script. We'd have to write a whole new script (slowing down startup, increasing the download size, duplicating the code). Whereas a command parameter can be added to a splat conditionally. PS 7 users would get the parallel speedup, but it still works in PS 3.

Another example is the Install-VSCode.ps1 script. It wants to be fast, so in Windows PowerShell it uses Start-BitsTransfer since that's available. If that resulted in a parse error, you wouldn't be able to do that. We'd have to either publish two scripts, or settle for Invoke-WebRequest (which was sped up considerably in PS 6 btw :)).

You can already prevent downlevel running at parse time with #requires -Version 7.0. But as someone who maintains several complicated scripts that must work all the way downlevel, I'd like the ability to leverage PowerShell's dynamism to get the best everywhere.

This comment has been minimized.

Copy link
@swngdnz

swngdnz Jun 12, 2019

That startup script is one fugly piece of scripting. :-)

But I think you miss @jhoneill's point - a down-level script engine (e.g., PS v3) should generate an error if not gated by a PSVersion check. "foreach -parallel" does not pass execution time checking on PS v3, even though a proper AST is generated - up until the "-parallel" parameter.

But if the code is protected by a PSVersion check, all is OK. I wouldn't think that should change - and I don't think that @jhoneill is suggesting otherwise.

This comment has been minimized.

Copy link
@KevinMarquette

KevinMarquette Jun 12, 2019

I was missing that PS v3 was handling this differently. When I checked it on PS v5.1, I got this message when calling foreach -parallel:

the '-parallel' parameter can be used only within a workflow. 

And that is exactly what I expected to happen on 5.1. Because it was a parse exception, it never even tried to execute. This is also what I expected to happen.

I expect that I would need to use it in a workflow on a pre PS v7 script. I am perfectly OK with it as a PS v7 feature that is not compatible with anything lower. If you are targeting a lower version, then you need to not use the newer features.

With that said, I ok with script authors abusing the syntax to write multi-versioned scripts but I don't think that should dictate the primary design. If foreach -parallel should be a thing, the fact that you can't use a PS v7 feature in a PS v3 script should not prevent us from implementing it in the ideal way (if we ever decide what that is).

Yes, I know some people need to write scripts that target PS v3 and it would be really nice to have a script that just ran faster on PS 7.0 and still worked on PS v3, but it is also perfectly OK for it to be a parse exception in PS v3. We already have things in PS v7 that are a parse exception in PS v5.1

This comment has been minimized.

Copy link
@jhoneill

jhoneill Jun 12, 2019

I know some people need to write scripts that target PS v3 and it would be really nice to have a script that just ran faster on PS 7.0 and still worked on PS v3, but it is also perfectly OK for it to be a parse exception in PS v3. We already have things in PS v7 that are a parse exception in PS v5.1

Up to a point ... Having dealt with clients where it is just too painful to get servers updated from PS4 to PS5 and found my scripts used a couple of bits of 5 specific syntax, I'm probably more keen than average that things should work on old versions.

This errors without running anything on 5.1

$lastByte = 1..10 
if ($PSVersionTable.PSVersion.Major -lt 7) {
    foreach ($b in $lastbyte) {Test-Connection "192.168.0.$b" -Count 1 } 
}
else {
    foreach ($b in $lastbyte) -parallel  {Test-Connection "192.168.0.$b" -Count 1 }  
}

This runs

$lastByte = 1..10 
if ($PSVersionTable.PSVersion.Major -lt 7) {
    $lastbyte | foreach -Process  {Test-Connection "192.168.0.$b" -Count 1 } 
}
else {
    $lastbyte | foreach -parallel  {Test-Connection "192.168.0.$_" -Count 1 }  
}

Now, if everything else were equal (and it's not the cmdlet can go in a pipeline) I think most people would say the implementation which supports one script for two versions is preferable - it's not mandatory

But here's why breaking can be good. Imagine a college creating a ton of new users at the start of a year.
$newUsers = import-csv new.csv | Add-CustomUser
$newUsers | export-csv created.csv
$newUsers | foreach-object {add-CustomHomeDir $_}

So we do this and it all looks good but someone says "It makes the new csv real quick but creating the home directories feels like a month" so they add -parallel. Then someone runs the script on another box and the users get created, the file is exported and BANG error with no homedirectories set up. We can't run the script again because the users exist so we have to clean up and it's all horrible.
Would someone who did the quick conversion think to put requires at the top the script in case someone runs it on an old version ? A complete fail would save them from themselves; but I would prefer that command line as
import-csv new.csv | Add-CustomUser -outvariable $newusers | foreach-object {add-CustomHomeDir $_}
Something which didn't let me create home directories until I'd created the last user would be bad

But if re-implementing the foreach keyword becomes problematic, it would be a good fallback solution.
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.