Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support removing defunct k8s pods. #3276

Closed
madorb opened this issue Apr 3, 2024 · 5 comments
Closed

Better support removing defunct k8s pods. #3276

madorb opened this issue Apr 3, 2024 · 5 comments

Comments

@madorb
Copy link

madorb commented Apr 3, 2024

In general it feels like SBA doesn't play super nicely with kubernetes where instance lifecycles are often intentionally short lived. Is it feasible to add some sort of configuration that would cause instances to automatically deregister after some amount of time?

I made an attempt to try to resolve this for myself, but was unable to, I asked the below on on stackoverflow and gitter, but didn't receive any response. If the feature request isn't feasible - are there suggestions on what i'm doing wrong with my below attempt to solve the issue?

============================================================================
We use kubernetes and despite having spring.boot.admin.client.auto-deregistration=true instances often are unable/unsuccessful at deregistering before their pod is killed.

This leaves us with many phantom instances on SBA that don't seem to ever be automatically cleaned up.

I've attempted to solve this by writing a scheduled process to automate deregistering offline apps, but it also does not appear to be actually deregistering them, and they still show up as offline in the UI.

@Component
class InstanceReaper(val instanceRegistry: InstanceRegistry) {
    private val log = KotlinLogging.logger {}

    @Scheduled(fixedDelay = 5, timeUnit = TimeUnit.MINUTES)
    fun clearDeadInstances() {
        // Any instance reporting down for more than 60 seconds we can remove.
        // It should simply re-register if it comes back online
        val cutoffInstant = Instant.now().minusSeconds(60)

        log.debug { "Launching reaper to kill offline instances older than $cutoffInstant" }

        var total = 0
        var removed = 0

        this.instanceRegistry.instances.subscribe {
            total++
            val statusTimestamp = it.statusTimestamp

            // Only remove offline instances, as applications can be online but be DOWN due to failing health indicators.
            if (it.isRegistered && it.statusInfo.isOffline && statusTimestamp.isBefore(cutoffInstant)) {
                log.info {
                    "De-registering ${it.registration.name} instance ${it.id} having management url " +
                        "${it.registration.managementUrl} for reporting as offline for more than 60 seconds." +
                        " Last status Timestamp: $statusTimestamp"
                }
                removed++
                it.deregister()
            }
        }

        log.info { "Reaper execution removed $removed instances out of an initial total of $total" }
    }
}

I'm sure i'm just misunderstanding something, but instances that are no longer online will continue to show in the UI and the SBA application log will have the same log message printed for the application every 5 minutes:

De-registering foo-bar-app instance c3df5fcf6dbb having management url http://10.x.x.x:8081/actuator for reporting as offline for more than 60 seconds. Last status Timestamp: 2024-02-07T16:15:19.896560955Z

@hzpz
Copy link
Collaborator

hzpz commented Apr 5, 2024

Which discovery mechanism are you using?

We have a discovery playground with examples for a lot of different discovery mechanisms. If you are using the discovery-server/client combination, you might also be interested in #2872.

@erikpetzold
Copy link
Member

To me this sounds like self registration with the Spring Boot Admin Client. As Timo already pointed out there are discovery solutions optimized for kubernetes and it would be better to use these.

@madorb
Copy link
Author

madorb commented Apr 5, 2024

@erikpetzold that's correct, we're presently using the client as we're in the process of migrating from EC2 to K8s and have apps in various states of migration, so have not yet fully embraced using k8s discovery. While that definitely looks like the better long-term option, surely there must be some programmatic way of unregistering instances in the meantime?

@erikpetzold
Copy link
Member

Hi @madorb

first, you could use discovery within the k8s cluster and self-registration within the EC2 instances. No problem in mixing both approaches.

As you said, the client ist already unregistering. This happens when the spring application-context is shut down. Maybe the pods are killed too fast, so they cannot shut down gracefully. Maybe have a look at the docs or articles like this one: https://www.thoughtworks.com/insights/blog/cloud/shutdown-services-kubernetes

Besides that, your InstanceReaper should be working at first sight. Which version of SBA Server do you use?

@andreasfritz
Copy link
Contributor

If you have any questions or comments, please open the ticket again or create a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants