Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EE feattures need to be enabled before the telemetery code fires at startup #1681

Closed
hardillb opened this issue Feb 8, 2023 · 3 comments · Fixed by #1758
Closed

EE feattures need to be enabled before the telemetery code fires at startup #1681

hardillb opened this issue Feb 8, 2023 · 3 comments · Fixed by #1758
Assignees
Milestone

Comments

@hardillb
Copy link
Contributor

hardillb commented Feb 8, 2023

Current Behavior

Telemetry metrics try to read EE features before the db models have been loaded.

$ kubectl logs -f flowforge-56cb6477df-bfxsn
[2023-02-08T13:41:48.109Z] INFO: FlowForge v1.3.1-96008d6-20230207.0
[2023-02-08T13:41:48.110Z] INFO: FlowForge running with NodeJS v16.19.0
[2023-02-08T13:41:48.110Z] INFO: FlowForge Data Directory: /usr/src/forge
[2023-02-08T13:41:48.110Z] INFO: Config File: /usr/src/forge/etc/flowforge.yml
[2023-02-08T13:41:48.320Z] INFO: Database driver: postgres
[2023-02-08T13:41:50.916Z] INFO: License verified:
[2023-02-08T13:41:50.916Z] INFO:   ****************************
[2023-02-08T13:41:50.916Z] INFO:   * Development-mode License *
[2023-02-08T13:41:50.916Z] INFO:   ****************************
[2023-02-08T13:41:50.916Z] INFO:  Org:     FlowForge Inc Staging
[2023-02-08T13:41:50.916Z] INFO:  Valid From : 2022-09-22T00:00:00.000Z
[2023-02-08T13:41:50.916Z] INFO:  Expires    : 2023-09-22T23:59:59.000Z
[2023-02-08T13:41:50.925Z] INFO: Usage:
[2023-02-08T13:41:50.926Z] INFO:  Users    : 18/150
[2023-02-08T13:41:50.926Z] INFO:  Teams    : 24/50
[2023-02-08T13:41:50.926Z] INFO:  Projects : 27/50
[2023-02-08T13:41:50.926Z] INFO:  Devices  : 31/50
/usr/src/forge/app/node_modules/@flowforge/flowforge/forge/monitor/metrics/004-platform.js:4
        sharedLibraryEntries = await app.db.models.StorageSharedLibrary.count()
                                                                        ^

TypeError: Cannot read properties of undefined (reading 'count')
    at module.exports (/usr/src/forge/app/node_modules/@flowforge/flowforge/forge/monitor/metrics/004-platform.js:4:73)
    at gather (/usr/src/forge/app/node_modules/@flowforge/flowforge/forge/monitor/index.js:47:26)
    at async Timeout.ping [as _onTimeout] (/usr/src/forge/app/node_modules/@flowforge/flowforge/forge/monitor/index.js:80:29)

Expected Behavior

All EE features to be initialised before the telemetry code runs.

Steps To Reproduce

Timing window that may be dependent on the how long the container driver takes to initialise

Environment

  • FlowForge version: 1.3.x
  • Node.js version: 16
  • npm version: 8
  • Platform/OS: K8s/Linux
  • Browser: NA
@hardillb hardillb added the needs-triage Needs looking at to decide what to do label Feb 8, 2023
@knolleary
Copy link
Member

When starting up, we currently initialise monitor, containers, ee in that order. The monitor component does its first ping 10 seconds after startup, assuming everything is ready. We have seen the containers component take > 10 secs to initialise on staging, so the ping happens before ee is ready.

Two things to look at:

  1. why is k8s container driver taking so long - should it be doing more work asynchronously
  2. moving ee component to register earlier. Need to be careful around assumptions on what is/isn't ready at that point in time.

@knolleary knolleary removed the needs-triage Needs looking at to decide what to do label Feb 22, 2023
@knolleary knolleary added this to the 1.5 milestone Feb 22, 2023
@Steve-Mcl
Copy link
Contributor

Steve-Mcl commented Feb 22, 2023

I can confirm seeing this in my dev works - though not related to k8s - but because of machine slowdown & debugging etc.

2. moving ee component to register earlier. Need to be careful around assumptions on what is/isn't ready at that point in time

I propose the solution to the immediate timing issue might be to move housekeeping registration to end of ALL registrations (i.e. just before await server.ready())

My debugging of this leads me to think the "k8s driver taking too long" may just be a catalyst that highlights this issue (much like my slow dev environment where the ee componentes had not yet been registered but the housekeeping had been registered and the "startup" housekeeping task occured before the ee registration)

@knolleary
Copy link
Member

I think we do want to do a general review of the order components are loaded in... but as an initial change, moving the monitor component to the end would solve this issue.

As it stands, the monitor component doesn't use the housekeeper task management as it was implemented before we introduced the housekeeper. As a separate task, updating the monitor to use housekeeper for its task management would make sense. But that isn't a must-have right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants