Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto machine removal timing does not take into account if Octopus Server is offline #3924

Closed
alexrolleyoctopus opened this issue Nov 16, 2017 · 7 comments
Assignees
Labels
kind/bug This issue represents a verified problem we are committed to solving tag/breaking-change The resolution of this issue introduced a deliberately breaking change
Milestone

Comments

@alexrolleyoctopus
Copy link

Issue

  • Auto machine removal is configured for a set of machines.
  • A network issue causes connections to drop to all tentacles such that they all report as unavailable to Octopus.
  • Octopus server is shutdown to prevent auto machine removal
  • Outage lasts longer than removal window
  • Network connection is restored and Octopus Server restarted
  • Machines will be removed automatically before the startup health check runs to report the machines as available

Proposed resolution
Automated machine removal process should check that last health check run is after latest server restart before triggering machine removal

Reference: https://secure.helpscout.net/conversation/467497373/20922?folderId=1465198

@alexrolleyoctopus alexrolleyoctopus added area/execution kind/bug This issue represents a verified problem we are committed to solving labels Nov 16, 2017
@MichaelJCompton
Copy link

I wonder what the right interpretation is here. Octopus isn't guaranteeing that the machine has been down for the whole period of the window - just that it saw it go down and hasn't seen it alive since, or something else?

For example, if the health check window is 1hr and the removal window is 2hrs. If a machine goes down just before a health check, but comes back straight after, and then has a network blip 1hr later during health check, we'd remove it even though it's been up for all of the two hours except a couple of blips.

Also, what about: machine goes down, health check runs, machine comes back, long deployments start that max out the tasks on the server (so health checks aren't runnings, cause waiting for deployments to finish), but auto removal isn't held up by deployments, so it will delete the machine - again even though it's been up the whole time.

These and above are edge cases, but what's the promise Octopus is trying to make about this?

@octoreleasebot octoreleasebot added this to the 4.1.5 milestone Dec 19, 2017
@octoreleasebot
Copy link

octoreleasebot commented Dec 19, 2017

Release Note: Auto machine removal now happens as part of health checks. Minor breaking change API endpoint for machine removal logs is removed and machine removal logs are no longer stored on the Octopus server.

@MichaelJCompton MichaelJCompton removed this from the 4.1.5 milestone Dec 19, 2017
@MichaelJCompton
Copy link

Pulled from 4.1.5 because it requires a change in the API. Planned for 4.2.0.

@MichaelJCompton
Copy link

@MichaelJCompton MichaelJCompton added this to the 2018.1.0 milestone Jan 17, 2018
@JesseNaranjo
Copy link

@MichaelJCompton does this mean that there will now be no log entry anywhere for machines being removed?

@MichaelJCompton
Copy link

Hi @JesseNaranjo, testing for machine removal now happens as part of health checks and the health check logs record that and decisions to remove machines. If a machine is removed a machine removal event is added to the audit events.

@lock
Copy link

lock bot commented Nov 23, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. If you think you've found a related issue, please contact our support team so we can triage your issue, and make sure it's handled appropriately.

@lock lock bot locked as resolved and limited conversation to collaborators Nov 23, 2018
@zentron zentron added the tag/breaking-change The resolution of this issue introduced a deliberately breaking change label Jan 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug This issue represents a verified problem we are committed to solving tag/breaking-change The resolution of this issue introduced a deliberately breaking change
Projects
None yet
Development

No branches or pull requests

5 participants