New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transactional Installation - Improve concurrent operations (pending) #943

Closed
ferventcoder opened this Issue Sep 1, 2016 · 9 comments

Comments

Projects
None yet
3 participants
@ferventcoder
Member

ferventcoder commented Sep 1, 2016

Related to #198 and #822.

From @masaeedu - discussion starting at #822 (comment)

From the command line, is it possible to start multiple chocolatey install scripts without interfering with each other? The current default of nuking the in-progress install seems unreasonable. Either:

  • the new invocation of choco install should proceed without trying to delete the in-progress package. If either fails due to MSI reservation conflicts so be it
  • or the new invocation should wait until the pending installation is complete before doing anything, and should notify the user it is waiting
  • or an error should be produced informing the user a pending install is in progress and no action will be taken
  • or the user should be prompted to make an explicit decision to abort and remove the pending install

"Pending" installs and their removal isn't as clean as it may seem at first glance, since choco has no idea what artifacts the package has already inserted into the system, and unceremoniously removing the package does not even allow the user the ability to run the package's uninstall script to try and clean up. In most cases it just ends up irrevocably borking the package on that system.

@ferventcoder ferventcoder added this to the 0.10.x milestone Sep 1, 2016

@ferventcoder ferventcoder modified the milestones: 0.10.1, 0.10.x Sep 16, 2016

@ferventcoder ferventcoder added 3 - Done and removed 0 - Backlog labels Sep 16, 2016

@ferventcoder ferventcoder self-assigned this Sep 16, 2016

@ferventcoder

This comment has been minimized.

Show comment
Hide comment
@ferventcoder

ferventcoder Sep 16, 2016

Member

When running two choco.exe processes at once, it would be best to hold a lock on the pending file for the currently running process and detect if the lock exists for another process. Then skip the removal of a package that has a lock on the file.

This is done for 0.10.1.

Member

ferventcoder commented Sep 16, 2016

When running two choco.exe processes at once, it would be best to hold a lock on the pending file for the currently running process and detect if the lock exists for another process. Then skip the removal of a package that has a lock on the file.

This is done for 0.10.1.

ferventcoder added a commit that referenced this issue Sep 17, 2016

(GH-943) IFileSystem - Open File Exclusively
Add method for opening a file exclusively.

ferventcoder added a commit that referenced this issue Sep 17, 2016

(GH-943) Lock Pending File Until Operation Completes
Open and hold the pending file exclusively open until install is
finished. Then remove the file lock. This allows for better concurrent
operations when running multiple choco processes at the same time
(which isn't necessarily recommended).

ferventcoder added a commit that referenced this issue Sep 17, 2016

(GH-943) Skip Locked Pending Files
When removing packages in a pending state, attempt to open the pending
file first. If it fails, log a message about skipping and move on to
the next pending file.

ferventcoder added a commit that referenced this issue Sep 17, 2016

Merge branch 'stable'
* stable:
  (GH-934) Support Paths > 260 Characters
  (GH-943) Skip Locked Pending Files
  (GH-943) Lock Pending File Until Operation Completes
  (GH-943) IFileSystem - Open File Exclusively
  (GH-839) More switch names for dependency apply
@DarwinJS

This comment has been minimized.

Show comment
Hide comment
@DarwinJS

DarwinJS Sep 18, 2016

I would like to suggest that this:

"the new invocation of choco install should proceed without trying to delete the in-progress package. If either fails due to MSI reservation conflicts so be it"

Needs better handling than "so be it" - you can ask any admin that has to troubleshoot 3% failures across 10,000 nodes.

Apologies for taking a hard line on "so be it" below - no ill intents to the poster of that sentiment - but the below is from hard experience of being the automation engineer having to give an account for failure percents on large scale automated software deployments for software deployment technologies that take a "so be it" approach - including a ton of extra work diagnosing a bunch of the failures to learn "why".

I would say that within chocolatey if the MSI "InProgress" flag is set, Chocolatey should have a retry cycle and then eventually fail (all with lots of logging).

This would make chocolatey more tolerant of both: 1) itself installing a package in another instance of choco, 2) Something else currently installing software using MSI - like a concurrent automated software distribution that does not use chocolatey or Windows Updates or the end user manually running an install.

Keep in mind that when pushing one chocolatey install job to 1000's of machines using automated software distribution (especially desktops), the odds of conflicting with another MSI install across all those install instances is much higher. Many manual resolutions can be avoided by simply standing down and polling again.

In addition, proper logging and a dedicated exit code (rather than "so be it") would help quickly diagnose that chocolatey could not get MSI services for the package. Would be great to have a dedicated exit code for this condition - as systems like SCCM can pick up that exit code and give statistics reports that reveal a meaningful reason why certain machines failed. SCCM and many other software distribution systems can then be told to re-target these failures (automatically and/or human re-scheduled)

If chocolatey does not already do the above natively (something tells me it might), I could file a separate issue since I believe this support would make Chocolatey more enterprise ready regardless of it's ability to support running more than one chocolatey install at a time.

Also if the dedicated exit code is not part of the current support, I could file an issue for that.

DarwinJS commented Sep 18, 2016

I would like to suggest that this:

"the new invocation of choco install should proceed without trying to delete the in-progress package. If either fails due to MSI reservation conflicts so be it"

Needs better handling than "so be it" - you can ask any admin that has to troubleshoot 3% failures across 10,000 nodes.

Apologies for taking a hard line on "so be it" below - no ill intents to the poster of that sentiment - but the below is from hard experience of being the automation engineer having to give an account for failure percents on large scale automated software deployments for software deployment technologies that take a "so be it" approach - including a ton of extra work diagnosing a bunch of the failures to learn "why".

I would say that within chocolatey if the MSI "InProgress" flag is set, Chocolatey should have a retry cycle and then eventually fail (all with lots of logging).

This would make chocolatey more tolerant of both: 1) itself installing a package in another instance of choco, 2) Something else currently installing software using MSI - like a concurrent automated software distribution that does not use chocolatey or Windows Updates or the end user manually running an install.

Keep in mind that when pushing one chocolatey install job to 1000's of machines using automated software distribution (especially desktops), the odds of conflicting with another MSI install across all those install instances is much higher. Many manual resolutions can be avoided by simply standing down and polling again.

In addition, proper logging and a dedicated exit code (rather than "so be it") would help quickly diagnose that chocolatey could not get MSI services for the package. Would be great to have a dedicated exit code for this condition - as systems like SCCM can pick up that exit code and give statistics reports that reveal a meaningful reason why certain machines failed. SCCM and many other software distribution systems can then be told to re-target these failures (automatically and/or human re-scheduled)

If chocolatey does not already do the above natively (something tells me it might), I could file a separate issue since I believe this support would make Chocolatey more enterprise ready regardless of it's ability to support running more than one chocolatey install at a time.

Also if the dedicated exit code is not part of the current support, I could file an issue for that.

@ferventcoder

This comment has been minimized.

Show comment
Hide comment
@ferventcoder

ferventcoder Sep 18, 2016

Member

I think the ticket you are looking for is #484.

Chocolatey does provide package exit codes back up the chain so they can be reported appropriately - that is #512. So you can get that information about why an MSI failed now in current versions of choco.

Some enhancements we are planning to do provide better information around failures and things like detecting and waiting. I can't say they will all land in FOSS as some of these improvements seem like an organization's use case and not an individual's use case.

Member

ferventcoder commented Sep 18, 2016

I think the ticket you are looking for is #484.

Chocolatey does provide package exit codes back up the chain so they can be reported appropriately - that is #512. So you can get that information about why an MSI failed now in current versions of choco.

Some enhancements we are planning to do provide better information around failures and things like detecting and waiting. I can't say they will all land in FOSS as some of these improvements seem like an organization's use case and not an individual's use case.

@ferventcoder ferventcoder changed the title from Improve concurrent operations (pending improvements) to Transactional Installation - Improve concurrent operations (pending) Sep 18, 2016

@masaeedu

This comment has been minimized.

Show comment
Hide comment
@masaeedu

masaeedu Sep 18, 2016

Apologies for taking a hard line on "so be it" below - no ill intents to the poster of that sentiment - but the below is from hard experience of being the automation engineer having to give an account for failure percents on large scale automated software deployments for software deployment technologies that take a "so be it" approach - including a ton of extra work diagnosing a bunch of the failures to learn "why".

I would say that within chocolatey if the MSI "InProgress" flag is set, Chocolatey should have a retry cycle and then eventually fail (all with lots of logging).

This is not how most installers work in practice. The retry and clean up logic required for each installer varies on a case-by-case basis, and simply retrying an installer over and over again is not likely to result in success. If you anticipate flaky installs and require logging and a cleanup-retry cycle, this should be part of your chocolatey install scripts and the orchestration that kicks off the chocolatey install. You already have the ability to inspect the installer exit code and the process output.

masaeedu commented Sep 18, 2016

Apologies for taking a hard line on "so be it" below - no ill intents to the poster of that sentiment - but the below is from hard experience of being the automation engineer having to give an account for failure percents on large scale automated software deployments for software deployment technologies that take a "so be it" approach - including a ton of extra work diagnosing a bunch of the failures to learn "why".

I would say that within chocolatey if the MSI "InProgress" flag is set, Chocolatey should have a retry cycle and then eventually fail (all with lots of logging).

This is not how most installers work in practice. The retry and clean up logic required for each installer varies on a case-by-case basis, and simply retrying an installer over and over again is not likely to result in success. If you anticipate flaky installs and require logging and a cleanup-retry cycle, this should be part of your chocolatey install scripts and the orchestration that kicks off the chocolatey install. You already have the ability to inspect the installer exit code and the process output.

@ferventcoder

This comment has been minimized.

Show comment
Hide comment
@ferventcoder

ferventcoder Sep 18, 2016

Member

I'm going to preemptively de-escalate this -

It's known that @DarwinJS has years of experience with Windows installers, typically MSI (this can be seen by some quick research into his github and website). Typically Darwin is speaking from the point of Windows Installer technology (MSI), which does have built-in facilities to let you know other installs are occurring. What Darwin is asking for is for Chocolatey to see those particular exit codes and retry.

@masaeedu In the right situations, almost all software has flaky installs. Unfortunately Asad, I don't know your background but I definitely agree once you step outside of MSI, it varies wildly. Sometimes within MSI, but for the most part the error codes and checking are all pretty consistent.

So what I am saying is that no one is technically wrong here in what they wrote, understanding different contexts and perspectives.

Member

ferventcoder commented Sep 18, 2016

I'm going to preemptively de-escalate this -

It's known that @DarwinJS has years of experience with Windows installers, typically MSI (this can be seen by some quick research into his github and website). Typically Darwin is speaking from the point of Windows Installer technology (MSI), which does have built-in facilities to let you know other installs are occurring. What Darwin is asking for is for Chocolatey to see those particular exit codes and retry.

@masaeedu In the right situations, almost all software has flaky installs. Unfortunately Asad, I don't know your background but I definitely agree once you step outside of MSI, it varies wildly. Sometimes within MSI, but for the most part the error codes and checking are all pretty consistent.

So what I am saying is that no one is technically wrong here in what they wrote, understanding different contexts and perspectives.

@masaeedu

This comment has been minimized.

Show comment
Hide comment
@masaeedu

masaeedu Sep 19, 2016

@ferventcoder I certainly don't have any experience with the internals of MSI installers, and defer to his experience in this matter. Nevertheless, MSI installers are not the only installers that rely on checking Windows in-progress install conflicts. Since neither MSI installers nor others that are capable of detecting install conflicts (from my use case, InstallShield) are completely idempotent, the fact remains that chocolatey should not be repeatedly attempting the install in the background.

I don't have a background as such with any of this, but my use case is deploying and provisioning virtual machines for automated integration testing. Since our environment setup involves network activity and we are heavily loading the hypervisor, invariably some installations fail due to timeouts or newly introduced bugs.

When this happens, we don't want or need chocolatey to retry the install behind the scenes, we just need the exit code (which chocolatey already provides), and the orchestration scripts that are invoking chocolatey can figure out whether to dump and rebuild the machine, revert to snapshot, retry after a delay, cleanup and reattempt with older bits, or just report failure to our CI. Many of these scenarios would become more complicated if chocolatey started doing stuff in the background that was not explicitly requested. There are scenarios where MSI and non-MSI installers alike can crash without releasing the installer reservation key, and a reattempt cycle here would not be helpful.

For these reasons I stand by my original comment, in that chocolatey should not be in the business of trying to schedule or abort installs in the background. If an install fails due to a conflict, so be it. Whatever agent is invoking chocolatey (Puppet, DSC, ansible, human being sitting at a terminal, whatever) will be responsible for deciding when and whether to retry, or to change the installation process to avoid timing conflicts. At the very least the proposed automatic retry functionality should be hidden behind a new flag.

As an aside, I don't think there is any need to "de-escalate" anything here, I think it is fine for different people to have different opinions on what features they want.

masaeedu commented Sep 19, 2016

@ferventcoder I certainly don't have any experience with the internals of MSI installers, and defer to his experience in this matter. Nevertheless, MSI installers are not the only installers that rely on checking Windows in-progress install conflicts. Since neither MSI installers nor others that are capable of detecting install conflicts (from my use case, InstallShield) are completely idempotent, the fact remains that chocolatey should not be repeatedly attempting the install in the background.

I don't have a background as such with any of this, but my use case is deploying and provisioning virtual machines for automated integration testing. Since our environment setup involves network activity and we are heavily loading the hypervisor, invariably some installations fail due to timeouts or newly introduced bugs.

When this happens, we don't want or need chocolatey to retry the install behind the scenes, we just need the exit code (which chocolatey already provides), and the orchestration scripts that are invoking chocolatey can figure out whether to dump and rebuild the machine, revert to snapshot, retry after a delay, cleanup and reattempt with older bits, or just report failure to our CI. Many of these scenarios would become more complicated if chocolatey started doing stuff in the background that was not explicitly requested. There are scenarios where MSI and non-MSI installers alike can crash without releasing the installer reservation key, and a reattempt cycle here would not be helpful.

For these reasons I stand by my original comment, in that chocolatey should not be in the business of trying to schedule or abort installs in the background. If an install fails due to a conflict, so be it. Whatever agent is invoking chocolatey (Puppet, DSC, ansible, human being sitting at a terminal, whatever) will be responsible for deciding when and whether to retry, or to change the installation process to avoid timing conflicts. At the very least the proposed automatic retry functionality should be hidden behind a new flag.

As an aside, I don't think there is any need to "de-escalate" anything here, I think it is fine for different people to have different opinions on what features they want.

@ferventcoder

This comment has been minimized.

Show comment
Hide comment
@ferventcoder

ferventcoder Sep 19, 2016

Member

Fair statements. I think what we all understand is that different folks have different perspectives and sometimes hope for knobs they can turn to make software work better for them. This is one of those instances where Chocolatey would allow more knobs for other folks to change the default behavior. That default behavior is possibly the status quo currently and it may continue to be the default behavior.

Member

ferventcoder commented Sep 19, 2016

Fair statements. I think what we all understand is that different folks have different perspectives and sometimes hope for knobs they can turn to make software work better for them. This is one of those instances where Chocolatey would allow more knobs for other folks to change the default behavior. That default behavior is possibly the status quo currently and it may continue to be the default behavior.

ferventcoder added a commit that referenced this issue Sep 19, 2016

(GH-943) Remove Transaction Lock Even on Failure
Whether or not the package is successful, remove the lock on the
pending file. Otherwise the failed install cleanup will not work
properly.

ferventcoder added a commit that referenced this issue Sep 19, 2016

Merge branch 'stable'
* stable:
  (version) 0.10.1
  (GH-943) Remove Transaction Lock Even on Failure
  (doc) update CHANGELOG/nuspec
  (doc) add CHANGELOG title/summary
  (doc) update licensed changelog
  (GH-458) Warn To Verbose Log For Now
  (doc) add licensed changelog
  (maint) formatting
  (doc) Note Runtime Options For Checksums In Error
@DarwinJS

This comment has been minimized.

Show comment
Hide comment
@DarwinJS

DarwinJS Sep 19, 2016

My experience is with large scale automated software distribution - it happens MSI was/is very mature in this area due to:

  • being over engineered (built in the hayday of Windows Waterfall dev methodology)
  • getting a ton of large scale testing which fleshes out at-scale use cases.

However, I definitely feel that the software distribution is the same story over and over again - so why not steal from man-decades of sunk cost into engineering the problems out of at-scale automated software distribution - it's free for the taking - no need to take 50 iterations to work around the same problems again.

Solid return codes would work fine - I can do a retry loop easy enough.

For use cases which aren't just immutable infrastructure - like long term instances, real servers and especially desktops - any resilience around crusty end-nodes and not assuming that a given framework is the only software distribution maintaining the node is helpful for adoption.

DarwinJS commented Sep 19, 2016

My experience is with large scale automated software distribution - it happens MSI was/is very mature in this area due to:

  • being over engineered (built in the hayday of Windows Waterfall dev methodology)
  • getting a ton of large scale testing which fleshes out at-scale use cases.

However, I definitely feel that the software distribution is the same story over and over again - so why not steal from man-decades of sunk cost into engineering the problems out of at-scale automated software distribution - it's free for the taking - no need to take 50 iterations to work around the same problems again.

Solid return codes would work fine - I can do a retry loop easy enough.

For use cases which aren't just immutable infrastructure - like long term instances, real servers and especially desktops - any resilience around crusty end-nodes and not assuming that a given framework is the only software distribution maintaining the node is helpful for adoption.

@DarwinJS

This comment has been minimized.

Show comment
Hide comment
@DarwinJS

DarwinJS Sep 19, 2016

I am pretty sure Rob knows this, but you could optimize Chocolatey a little around underlying concurrent MSIs because can find the "InProgress" flag for MSI in the registry and if you know the package you have are about to do is MSI, you could just send the "busy" return code right away rather than execute MSI and wait like 5+ minutes for it to give you the "Another install is in progress" return code.

Also you could take a hard lesson from MSI's simplistic "InProgress" flag. IE it would be nice to know the Package Name and the date the flag was thrown, because if the flag is the same in 3-5 days, I bet the client is in an unhealth state - and even us pure DevOps guys like it when technology can self-report its health status :)

DarwinJS commented Sep 19, 2016

I am pretty sure Rob knows this, but you could optimize Chocolatey a little around underlying concurrent MSIs because can find the "InProgress" flag for MSI in the registry and if you know the package you have are about to do is MSI, you could just send the "busy" return code right away rather than execute MSI and wait like 5+ minutes for it to give you the "Another install is in progress" return code.

Also you could take a hard lesson from MSI's simplistic "InProgress" flag. IE it would be nice to know the Package Name and the date the flag was thrown, because if the flag is the same in 3-5 days, I bet the client is in an unhealth state - and even us pure DevOps guys like it when technology can self-report its health status :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment