Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cleanup {} named function block #207

Merged
merged 18 commits into from May 19, 2020

Conversation

vexx32
Copy link
Contributor

@vexx32 vexx32 commented Jun 29, 2019

For a working implementation of this RFC, please see PowerShell/PowerShell#9900.

@markekraus
Copy link

Would this also be implemented on the Cmdlet/PSCmdlet API? Or only on functions?

I would also like to see ForEach-Object updated with a -Dispose parameter as a future iteration.

@vexx32
Copy link
Contributor Author

vexx32 commented Jun 30, 2019

Would this also be implemented on the Cmdlet/PSCmdlet API? Or only on functions?

This is already possible with Cmdlet / PSCmdlet -- you just need to implement IDisposable and it will function the same already (although currently commands aren't disposed until the pipeline is completely done with and the pipeline itself is being disposed; this would be changed in this RFC to dispose after EndProcessing() is called).

I would also like to see ForEach-Object updated with a -Dispose parameter as a future iteration.

Ah yes, forgot to mention that; that's also done in the linked draft PR. I'll add it to this RFC. 😄

@vexx32
Copy link
Contributor Author

vexx32 commented Aug 19, 2019

@SteveL-MSFT any chance I could get you guys to look this one over? 💖

1-Draft/RFCNNNN-Dispose-Script-Block.md Outdated Show resolved Hide resolved
1-Draft/RFCNNNN-Dispose-Script-Block.md Outdated Show resolved Hide resolved
@vors
Copy link
Contributor

vors commented Aug 20, 2019

@vexx32 the IDisposable trick to get the semantic in C# cmdlet depends on the internal implementation details of the cmdlet and pipeline lifecycle, so that (potentially) could break or change at some moment.

@vors
Copy link
Contributor

vors commented Aug 20, 2019

First, I want to compliment you on a well written RFC. It was a pleasure to read!

I think that the use case is narrow-enough that it's acceptable to have a more verbose syntax for achieving the goal. Namely, this could be achieved with

function Get-Foo {
    param(
        [string]$myparam
    )

   function Get-FooInternal {
        process { # "$myparam" could be used here without additional marshaling }
   }

  start { 
     try {
       Get-FooInternal
     } finally {
       # this is an equivalent of dispose block
     }
  }
}

@vexx32
Copy link
Contributor Author

vexx32 commented Aug 21, 2019

@vors could you expand on what you mean with your IDisposable comment?

Yes, it can be achieved with try/finally, but (crucially!) not across named blocks. You can't initialize something in begin{} in order to take and act upon pipeline input with this object in process{}, and dispose of it in end{} safely.

Try/finally cannot be used outside of named blocks, so this construct is necessary for any command that needs to work with external resources (database connections, for example) as well as utilize PowerShell's pipeline framework. The capability is simply not there at the moment, and this fills that gap.

@SeeminglyScience
Copy link

@vors

@vexx32 the IDisposable trick to get the semantic in C# cmdlet depends on the internal implementation details of the cmdlet and pipeline lifecycle, so that (potentially) could break or change at some moment.

Nah, that's documented behavior. See Implement the IDisposable Interface (AC04).

Added to the Alternate Proposals section below.
Clarify wording around invoking dispose{} and mention IDisposable with PS Classes
@vors
Copy link
Contributor

vors commented Aug 22, 2019

@SeeminglyScience oh nice!

@vexx32
Copy link
Contributor Author

vexx32 commented Aug 30, 2019

@SteveL-MSFT do you think we have sufficient comments for a committee review, or would you like me to give it another Twitter shout? 🙂

@vexx32
Copy link
Contributor Author

vexx32 commented Sep 17, 2019

@SteveL-MSFT @joeyaiello friendly ping for review!

I'd like to get this one in the works and hopefully get the code into ps7 if possible! It's already written, bar some minor rebases needed.

@vexx32
Copy link
Contributor Author

vexx32 commented Sep 25, 2019

@SteveL-MSFT @joeyaiello friendly ping for review when you guys are able, please & thank you! 😊 💖

@joeyaiello
Copy link
Contributor

@PowerShell/powershell-committee reviewed this today with quorum via @daxian-dbw, @JamesWTruher, @SteveL-MSFT, and @joeyaiello

Overall, we really like the RFC, especially how it reflects the organic consensus that was reached in #6673. We have a few questions / open concerns, though:

First and foremost, we're worried about the risk of taking significant engine changes in an area where we don't feel very confident about our test coverage, specifically where we're not currently calling Dispose in the pipeline (quoting @JamesWTruher). Those changes would be difficult to mask behind an experimental feature and, as such, we don't feel confident taking these changes in

The magnitude of those changes are exacerbated by the fact that myself and @daxian-dbw agree that we should simultaneously update C# cmdletst to call their IDisposable : Dispose functions in the same cases that a PS function's dispose block should be called.

We want to apologize for dragging our feet on such a high-interest RFC for so long, as it would've been really awesome to have in 7.0, but we feel it's best to wait until after GA before we merge it.

In the meantime, we can also discuss some smaller design questions / concerns.

  • We should get consensus with others that changing the cmdlet behavior to also call Dispose in the above cases is a desirable change.
  • @daxian-dbw notes that Ctrl+C is tricky, as it's impossible to know how long a dispose block may need to execute. We should discuss whether it's desirable to have a timeout in dispose all up, whether we should display warning output to the user, and/or whether we should suggest an additional override via Ctrl+Break (which is not a well known or discoverable interrupt--for the record, I suggested this and think it's probably not a good idea, just enumerating).
  • We're making a note that creating a dispose keyword breaks any functions--including internal ones--that are called dispose. We should do a brief corpus check, but otherwise we're not too concerned about this break.
  • We're generally think that dispose should not be able to return anything to the output stream, as well as the fact that having non-output streams work is generally a good thing. The only open point of debate is whether the error stream should still return, as it doesn't make sense to some that something getting executed after a terminating error (i.e. dispose) might throw again.

Does anyone have opinions on the above?

@vexx32
Copy link
Contributor Author

vexx32 commented Sep 30, 2019

Sad to hear it won't make it into 7.0, as I agree -- it would have been a pretty awesome feature to get in for a major version bump, but I'm glad you guys like it nonetheless! 😊

(Let me know if you guys do change your mind and want to shoot for 7.0; I have the code and pretty thorough tests written to the best of my ability already. Unless we need to make changes, but I can make that happen.)

FWIW:

  • Dispose() is already called on cmdlets in the pipeline in similar cases, though currently this occurs only after every EndProcessing() has been completed or after StopProcessing() is triggered. In other words, in cases of error this occurs at the same place as before, and in cases of normally disposing a pipeline, we've just moved the disposal up slightly to be after each EndProcessing() rather than waiting for the whole pipeline to start being deconstructed.
  • Agreed that Dispose{} should not be able to return data in principle. In practice I'm not sure how to do this, but considering that at the time of writing, [void] PowerShell class methods do have this disconnected duality (writing output disabled, writing non-terminating errors enabled) as recently surfaced in an issue on the PS repo I think I may be able to replicate this for dispose{} by copying how that pipeline context is put together to make this happen.

Not sure on the Ctrl+C/Ctrl+Break points. Timeout makes sense, but also is the kind of implementation detail that can potentially cause problems. That said, dispose with a timeout is still markedly better than no dispose at all, and if it's documented would probably be OK.

@kilasuit
Copy link

kilasuit commented Sep 30, 2019

We want to apologize for dragging our feet on such a high-interest RFC for so long, as it would've been really awesome to have in 7.0, but we feel it's best to wait until after GA before we merge it.

In the meantime, we can also discuss some smaller design questions / concerns.

Considering you've been vocal that you aren't planning a v7 release until dotnet 3.1 comes out as that's an LTS version then perhaps this could get this RFC & associated PR mentioned in via an experimental feature branch in the main PSRepo (ie not just an experimental feature) until you are happy that all tests are added, as from what I read this sounds like it would need to be a bit longer lived than most PR's are and perhaps needs a handful of PR's instead of just the one. This often makes sense for longer lived (read harder to build) features than a single PR would allow flexibility for, and whilst I know this isn't typically how development of PowerShell is done, it may make sense in this case, especially if it would help in accelerating this making it into the GA of v7.0

The magnitude of those changes are exacerbated by the fact that myself and @daxian-dbw agree that we should simultaneously update C# cmdletst to call their IDisposable : Dispose functions in the same cases that a PS function's dispose block should be called.

The above suggestion would in my mind also help with what you've called out here @joeyaiello

@joeyaiello
Copy link
Contributor

@kilasuit I'd be totally with you if it weren't for the fact that we don't think this is going to be feasible to hide behind an experimental feature.

@SteveL-MSFT
Copy link
Member

@vexx32 we are actively discussing this today

Include considerations & alternate proposals for committee and individual feedback.
@vexx32
Copy link
Contributor Author

vexx32 commented Apr 6, 2020

@SteveL-MSFT friendly ping for follow up. 🙂

To do so, such commands need also implement `IDisposable` and the `Dispose()` method.
This RFC would simply change when the dispose method is called.
Currently, the method is called during pipeline disposal.
This RFC would ensure it is called immediately after `EndProcessing()` completes during normal pipeline operation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Strictly speaking, it doesn't need to be changed.

However, I think it makes more sense to call it after EndProcessing() when possible, simply because that means that we:

  1. have a more well-defined point where we can expect Dispose() to be called,
  2. free up resources earlier, and
  3. spread out the overall impact of calling Dispose() so that some of it can happen during pipeline execution / towards the end of a pipeline, rather than all waiting till the very end of a pipeline and having to wait for every command in the pipeline to be disposed, one at a time.

On paper I don't think the differences will be massively different in the majority of cases. Selfishly, this makes it slightly easier to implement in CommandProcessorBase, if I recall that section of code correctly. 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking further at the code behaviour... I don't think this is strictly necessary. Although some work is currently done before disposing commands, it doesn't look like there's a significant wait between calling the endprocessing/DoComplete methods and the DisposeCommands(), so it's likely okay to leave it more or less where it is.

Copy link
Contributor Author

@vexx32 vexx32 May 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having been tinkering a bit more with this, it really comes down to a fairly small-ish difference... in most cases.

  • If we have dispose execute after all EndProcessing()/end{} steps are completed, that is a bit more consistent with how other steps work (all cmdlets execute begin before moving to process for each, etc.)
  • If we have dispose execute after each EndProcessing()/end{} step is completed, that lets us free up resources a bit quicker, and makes some types of pipelines easier to work with.
    • For example, Get-Content $file | Some-Function | Set-Content $file would be possible if Get-Content opens its file read handle in EndProcessing() and then disposes of it in Dispose(); Set-Content would freely be able to open its own file handle by the time its EndProcessing() step comes around. Currently, this requires parentheses to be used to separate the pipelines: (Get-Content $file) | Some-Function | Set-Content $file.

That's just an example, but I'm sure there are other potential applications that are a bit harder to reason about. I tend to think it's a better idea to dispose a bit sooner if we can. This would be a fairly small change to cut out, though, so whatever the committee decides, I'll be able to live with. 😁

- The slightly different parameter name is necessary as C# does not permit a property name to collide with a method name, and `IDisposable` mandates a method named `Dispose()`.
- To improve consistency, the parameter can be explicitly aliased to `-Dispose`.
- The parameter _must_ be assigned explicitly by name, and never be assumed based on number of provided scriptblock parameters as the command currently does with `begin`/`process`/`end`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the .foreach{} magic method also supports begin/process/end. Are you planning to extend it as well?

Copy link
Contributor Author

@vexx32 vexx32 May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. If I don't, dispose {} is basically a dead block there, I don't see a good way to prevent folks using it since it's literally just accepting a scriptblock.

In truth, wasn't really aware they supported named blocks till the call today. Open to suggestions on this one, really could go either way. Kind of like ForEach-Object there. I've actually moved this section to Considerations as well. 🙂

- This includes new token types and appropriate entries in the `Tokenizer`'s keyword tables.
- Additional members and constructors for `ScriptBlockAst` to enable it to be recognised by the parser.
- Additional methods in `DlrScriptCommandProcessor` and `CompiledScriptBlock` to enable the block to be executed at the correct time.
- Both types implement or inherit from a type that implements `IDisposable`, and as such the existing `Dispose()` methods can generally be utilised to run this block.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cmdlet actually going to be IDisposible? If so, what happens if one of these things is disposed on the finalizer thread. It will very likely error out either because there's no PowerShell engine available or the engine is in use. Maybe post it back to the event list?

Copy link
Contributor Author

@vexx32 vexx32 May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think script cmdlets already are IDisposable, so I'm just adding some code into that disposal. Script commands (non-CmdletBinding functions) are processed by DlrScriptCommandProcessor which inherits CommandProcessorBase (which is IDisposable).

Can you elaborate a bit on that last point? Not quite sure what you mean or how to do that, but definitely agree we should be very careful if it ever gets to that segment of code on the finalizer thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After digging through the code somewhat for some of the InvokeWithPipe overloads, I think I have a surface-level grasp of what you mean. I'll add this into the RFC, I think I can implement this.

 - Reworded a few things, moved a couple bits around to make it more sensible.
 - Moved the `ForEach-Object` bits to Considerations as it may not be desirable/necessary to include
 in the implementation.
 - Added finalizer thread behaviour to address exceptional circumstances.
@vexx32
Copy link
Contributor Author

vexx32 commented May 7, 2020

@SteveL-MSFT I've gone through and made the changes we talked about, I think... that should cover just about everything. Added a few sections and moved the stuff about ForEach-Object to Considerations since we weren't sure it was really necessary to add this functionality into ForEach-Object as well.

My memory's honestly failing me here, I don't quite recall if everyone was on board with dispose as the final name for the keyword/feature (that's the one thing I forgot to write down 😄). As I said during the call, I'm not especially worried with the name overall, but felt dispose was more appropriate since the intent is to mimic / leverage the IDisposable functionality from .NET. I don't mind doing a bit more of a rewrite if y'all preferred / decided upon cleanup in the end instead. 🙂

FYI @bpayette as well, made some edits that are related to your comments. 🙂

Copy link
Member

@SteveL-MSFT SteveL-MSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PowerShell/powershell-committee reviewed this asking for some changes:

  • Committee still agrees to use the name Cleanup instead of Dispose as Dispose() has specific meaning for C# developers that doesn't map exactly to this scriptblock
  • Currently this RFC calls out ForEach-Object (and related .ForEach/.Where) supporting Cleanup{}, but expectation is that any scriptblock execution needs to be consistent and respect the Cleanup{} block. Whether a -Cleanup parameter is needed for ForEach-Object can be added later based on user feedback

There were some questions/concerns about the implementation details, but that should be handled in the code PR and once resolved reflected back into this RFC.

@essentialexch
Copy link

Was this Cleanup{} name-change made during a public call that I missed? Was it recorded?

I won't argue, but given that recently ternary and null-operators have been accepted that have "similar but not identical" meanings and syntax for C# programmers, that reasoning seems specious.

@SteveL-MSFT
Copy link
Member

@essentialexch no, it wasn't during a public call nor was it recorded. To be clear, it's wasn't an "obviously this vs that" type of conclusion. We debated Dispose vs Finally vs Cleanup. I agree we have other examples of similar but not identical, but in this case, we felt that Cleanup would be easier to understand for scripters and also not set expectations with developers.

@vexx32
Copy link
Contributor Author

vexx32 commented May 13, 2020

@SteveL-MSFT I'll go through and change the naming here then, no worries. If it works as it needs to work and the name is reasonably descriptive I'm all for it. 🙂

I'll incorporate the rest into the RFC; to summarise quickly, though, I entirely agree -- it should work everywhere that named blocks are already supported. This includes:

  • Scripts
  • Functions
  • Script cmdlets (aka [CmdletBinding()] functions)
  • ForEach/Where magic methods
  • Random scriptblocks that get invoked wherever, provided what's executing them supports named blocks. (Scriptblocks passed to .NET methods to be converted into Func or Action parameters notably don't support named blocks, for example.)

@essentialexch if my memory is reliable over this timespan, the name cleanup came up here and there over the course of the original issue in the main PS repo and probably at least a couple times in the discussion here if we look to dig through it all again. Personally I prefer dispose but ultimately as long as it works and the name makes some sense I'll be happy. 😁

@vexx32 vexx32 changed the title Add dispose{} named function block Add Cleanup {} named function block May 13, 2020
@essentialexch
Copy link

:-) As I wrote, I won't argue. It just seemed like a surprising unanticipated change here, at least to me. :-)

I just want the functionality. It's the first "must have" functionality driving PS v7+ functionality to me. It provides finally for long-running robust scripts that don't/won't require regular restarts.

- Name changed to `cleanup`
- Clarification on more confusing parts.

#### Error States

Terminating errors can be thrown from a `cleanup {}` block, which will skip the invocation of the rest of that `cleanup {}` block, without preventing other commands' `cleanup {}` blocks from being executed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question related to this: what happens with the following statements in a cleanup block:

  • throw "bad"
  • $PSCmdlet.ThrowTerminatingError(...)
  • $PSCmdlet.WriteError(...)

Which automatic variables will be available, and will any of them have fields that don't make sense or otherwise don't work in the cleanup block?

Copy link
Contributor Author

@vexx32 vexx32 May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-terminating errors work normally. Terminating errors work OK, but only if the command hasn't already thrown an error out.

In terms of the current implementation that I have in the code PR itself, I've noticed that there's an issue with reporting terminating errors in the Cleanup{} block due to the design of the pipeline processor if (and only if) the command has already thrown a terminating error. Basically, the error is recorded in $error and then the pipeline processor suppresses it, because the command is already stopping due to an earlier error. Same should happen with errors thrown from Dispose() in compiled cmdlets, I think, even in the original code there.

I'd rather avoid refactoring that out of the pipeline processor itself, it seems like that would be super messy. If there's a way I can catch that coming out of the cleanup {} block (or indeed the Dispose() block from compiled commands) and make it a non-terminating error at that point (so the cleanup{} is still cancelled by a terminating error, but the pipeline reframes the error as non-terminating if it's already received a terminating error from that command, and the command has already finished what it can do anyway).

Something like that, anyway. Hopefully that makes sense. 😁

All the usual automatic variables should be available, I think. I'll verify $PSCmdlet is... the only ones I know are not are $_/$PSItem and $input as I explicitly avoid passing those into the execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that all makes sense. At the very least the pipeline processor is so central that any issues that come about when making the necessary modifications should present themselves pretty quickly.

With the variables I'm particularly thinking of things like $PSCmdlet and $ExecutionContext that have deep properties hooking back into the engine -- I expect that at least some of them will behave quite differently in a cleanup block and we should do our best to establish how, or if they will have parts that shouldn't be relied upon (or should behave differently -- e.g. throw because it's not a valid time to use them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order for the cleanup block to run at all, we essentially have to tell the pipeline stopper to temporarily pretend all's fine and dandy until we're done with the cleanup block.

As a result, all those variables seem to behave themselves just fine, at least at the moment. But yeah, depending on decisions made during code review, that might change a bit, we'll see how it goes, definitely something to keep a sharp eye on. 🙂

To do so, such commands need also implement `IDisposable` and the `Dispose()` method.
This RFC would simply change when the dispose method is called.
Currently, the method is called during pipeline disposal, after **all** commands have completed their `EndProcessing()` steps.
As part of the changes proposed in this RFC, we would ensure each command's `cleanup {}` or `Dispose()` routines are called immediately after _each_ cmdlet's `EndProcessing()` execution completes during normal pipeline operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm nervous about changing the ordering here with regard to when Dispose is called. What behavioral change do you think will happen because of this change, and if none, is there a way that you can validate that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be a lot of visible changes here, but there would be some. Essentially the change here would simple reorder the EndProcessing/Dispose steps in a given pipeline from EndA -> EndB -> EndC -> DisposeA -> DisposeB -> DisposeC to instead being EndA -> DisposeA -> EndB -> DisposeB -> EndC -> DisposeC

To an individual command, there should not be a significant difference. It's still getting disposed after it completes EndProcessing, just with a bit less delay than before.

For a practical example, let's take the example of a cmdlet at the start of a pipeline that takes a lock on a file to read data in during BeginProcessing() and doesn't release it until Dispose(). Suppose also that it writes all of its output in EndProcessing(). Then you want to pipe that to another command that wants to also take a write lock on that same file, but doesn't need it until EndProcessing().

Contrived example? Maybe!

Currently, this circumstance will fail completely, because the first command's lock is still alive by the time the second command wants to take a lock. If we move disposal/cleanup to just after each command's EndProcessing() instead, then everything just works.

Hugely useful scenario? Maybe, maybe not. It's kind of an edge case where it will be super useful, but at least on paper I think it makes more sense to dispose resources and do cleanup when the command has done its job, rather than when the pipeline as a whole is completed.

If we suppose that all the commands in a given pipeline have some cleanup{} or Dispose() processing going on, doing it after each command will "feel" better overall to users, I think; it'll spread out any impact of running cleanup/disposal across the run of the pipeline, rather than effectively queueing it up until the end of the pipe.


Anyway, that's my reasoning.

As for do I think there'll be a behavioural change... no, in most cases there won't be. Can I validate that.... maybe? Not sure. I can't really think of a way apart from simply demonstrating that commands in general still work just fine, even when they have implement IDisposable or a cleanup{} block.

I'd venture to guess that if it were a problem, we'd see some odd behaviour fairly quickly from cmdlets that utilise the IDisposable pattern currently? Outside of those, no commands presently written could realistically be affected, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sorry for the long text, didn't realize I'd gone on so long 😬 )

@joeyaiello
Copy link
Contributor

@JamesWTruher's latest comment aside (we think it's probably an implementation detail to be handled in PR), we're good to accept this RFC. Cleaning it up for merge now!

@joeyaiello joeyaiello merged commit cb4e441 into PowerShell:master May 19, 2020
@vexx32 vexx32 deleted the DisposeBlockRFC branch May 19, 2020 19:46
@Jaykul
Copy link
Contributor

Jaykul commented Jun 25, 2020

We debated Dispose vs Finally vs Cleanup. I agree we have other examples of similar but not identical, but in this case, we felt that Cleanup would be easier to understand for scripters and also not set expectations with developers.

I'm not looking forward to having this conversation about a thousand times:

MrThomasRayner: what are the big differences between end and cleanup?

Me: Cleanup is like Dispose -- it runs even if things are cancelled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet