Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Instance Health Check interval to be configured #3578

Closed
7 tasks done
hardillb opened this issue Mar 14, 2024 · 7 comments · Fixed by #3716
Closed
7 tasks done

Allow Instance Health Check interval to be configured #3578

hardillb opened this issue Mar 14, 2024 · 7 comments · Fixed by #3716
Assignees
Labels
customer request requested by customer feature-request New feature or request that needs to be turned into Epic/Story details size:M - 3 Sizing estimation point
Milestone

Comments

@hardillb
Copy link
Contributor

hardillb commented Mar 14, 2024

Description

As a FlowFuse admin
I would like to be able to set the timeout on the healthcheck
So that I can run large blocking workloads without nr-launcher killing the process.

Which customers would this be available to

Everyone - CE/Starter/Team/Enterprise

Have you provided an initial effort estimate for this issue?

I have provided an initial effort estimate

Requested by: https://app-eu1.hubspot.com/contacts/26586079/record/0-3/10928370628/

@hardillb hardillb added feature-request New feature or request that needs to be turned into Epic/Story details needs-triage Needs looking at to decide what to do size:M - 3 Sizing estimation point customer request requested by customer labels Mar 14, 2024
@MarianRaphael MarianRaphael added this to the 2.3 milestone Mar 15, 2024
@Steve-Mcl Steve-Mcl self-assigned this Apr 2, 2024
@Steve-Mcl
Copy link
Contributor

Questions

  1. As an MVP, would it be suitable/desirable to support this via ENV Vars initially?
  2. There are multiple constants that define the hang detection. Would we expose all or just HEALTH_POLL_INTERVAL?
    • HEALTH_POLL_INTERVAL: Interval between status polls of Node-RED used to detect a hung runtime (default 7.5sec)
    • HEALTH_POLL_MAX_STARTUP_ERROR_COUNT: The number of consecutive timeouts during startup phase before considering a NR hang (default 10)
    • HEALTH_POLL_MAX_ERROR_COUNT: The number of consecutive timeouts before considering a NR hang (default 3)
    • HEALTH_POLL_TIMEOUT: Timeout to apply to health polling requests (defaults to 500 less than HEALTH_POLL_INTERVAL)

@knolleary
Copy link
Member

@Steve-Mcl I think env var approach would be a good quick iteration in terms of implementation. A second iteration that adds options in the settings for this could then just cause those env vars to be set under the covers.

If we don't do the UI piece, it will be critical to have clear, discoverable documentation about the setting. I don't think adding the UI piece is a huge overhead here, so I'd suggest see if you can timebox that piece to a couple hours at most once the base env var parts are done.

In terms of what settings, HEALTH_POLL_INTERVAL is the main one we care about here and I think is sufficient.

@Steve-Mcl
Copy link
Contributor

In the design phase, I tried the env var approach and discovered the UX is sub optimal in that the container needs to be suspended (destroyed) in order for it to pick up any change of value. This is actually not necessary since we pass settings to Launcher.start (and that is where the health check timer is started). Therefore, for improved UX and simplified implementation I am going back on the idea of using ENV VARs and actually passing the timer value along with other settings like userDir and port in the drivers async settings call.

@Steve-Mcl Steve-Mcl removed the needs-triage Needs looking at to decide what to do label Apr 15, 2024
@knolleary
Copy link
Member

@Steve-Mcl I've taken a second look at this and I don't see any need to update all the drivers. We can centralise it in the /settings api endpoint in the forge app - https://github.com/FlowFuse/flowfuse/blob/main/forge/routes/api/project.js#L798 - which takes the settings from the driver and adds in a bunch of standard settings (which this would be one).

The driver settings are intended to be the driver-specific things that need setting differently depending on the needs of the driver.

@Steve-Mcl
Copy link
Contributor

@Steve-Mcl I've taken a second look at this and I don't see any need to update all the drivers. We can centralise it in the /settings api endpoint in the forge app - https://github.com/FlowFuse/flowfuse/blob/main/forge/routes/api/project.js#L798 - which takes the settings from the driver and adds in a bunch of standard settings (which this would be one).

Hmmm, good point. i think since this work started out as a Env Vars I got trapped in that route.

Will pull everything tomorrow and revisit.

@joepavitt
Copy link
Contributor

@Steve-Mcl status report please? Is #3715 the only outstanding item here?

@Steve-Mcl
Copy link
Contributor

@Steve-Mcl status report please? Is #3715 the only outstanding item here?

#3715 (PR #3716) is the only item to be merged. There was some re-work as requested and carried out on the drivers.

Ready for final review/merge at any time.

@knolleary knolleary linked a pull request Jun 6, 2024 that will close this issue
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer request requested by customer feature-request New feature or request that needs to be turned into Epic/Story details size:M - 3 Sizing estimation point
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants