Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix device running or reporting old snapshot id/name (backport #2575) #2580

Merged
merged 5 commits into from
Aug 17, 2023

Conversation

github-actions[bot]
Copy link

Backport of #2575


Description

This PR addresses several issues found while debugging the reported problem.

NOTE: None of the work here involved messing with the device files. The state of the device and its files were all a result of actions taken in the forge application. I feel this is important to point out because as it stands, it is possible to get the device to crash and into odd states.

The issues encountered...

problem 1 - device running snapshot of deleted instance

Device active/target snapshot get populated in checkin code
device reports it is on old snapshot & that matches
checkin code did not check that the activeSnapshots project matched the device current project

problem 2 - snapshot table is not cleaned - holds snapshots from deleted instances

When device checks in, its reported snapshot is found and the device objects active/target snapshot are populated meaning the device passes its checkin & does not instruct an update.

NOTE: No work was done to delete the old snapshots, only to mitigate the problem. A separate issue will be raised detailing thoughts and care points around this.

problem 3

Even if we force a DeviceUpdate, it just passes the exact same settings due to the snapshot table having old entries

problem 4

Once we do get the device to clean up, switching between dev/autonomous mode caused a crash due to missing files

[AGENT] 09/08/2023 09:47:27 [info] Cleaning instance directory
[AGENT] 09/08/2023 09:47:50 [info] Enabling developer mode
[AGENT] 09/08/2023 09:47:51 [info] Enabling remote editor access
[AGENT] 09/08/2023 09:47:51 [info] No running Node-RED instance, not starting editor
[AGENT] 09/08/2023 09:47:56 [info] Enabling remote editor access
[AGENT] 09/08/2023 09:47:56 [info] No running Node-RED instance, not starting editor
[Error: ENOENT: no such file or directory, open 'C:\\opt\\flowforge-device\\project\\flows.json'] {
 errno: -4058,
 code: 'ENOENT',
 syscall: 'open',
 path: 'C:\\\\opt\\\\flowforge-device\\\\project\\\\flows.json'
}

problem 5

when the agent determines a new snapshot is available, we dont instruct it to grab new settings.
Since some env vars are computed by the platform (e.g. FF_SNAPSHOT_ID/NAME) we must also instruct
the agent to refresh its settings.

The work in this PR

1: ensure device is not using orphaned snapshot

When checking the status of the device, grab the full row from ProjectSnapshots and verify the associated project is both present and matches the device. If not, the snapshot is considered to be an orphan or mismatched and a device update is requested.

2: ensure platform doesnt send ophaned snapshot > dev

When sending an update command to device, check the targetSnapshot has a project AND it matches the projectId that owns the device. If this fails, set the snapshot id in the payload update to null. This will signal the device to clear its current snapshot.

This can occur when an instance (with a member device and target snapshot) is deleted. The ProjectId field is cleared but the snapshot remains in the database.

TODO Unit Tests

Adds tests to test/unit/forge/db/controllers/Device_spec.js

  • Device controller
  • sendDeviceUpdateCommand
  • sends update command to device
  • does not send orphaned snapshot in update command to device

Manual Tests

  1. delete instance - expect device to clear its config
  • On next poll, agent reports correct change 👍
[AGENT] 09/08/2023 15:17:39 [info] Stopping Node-RED
[AGENT] 09/08/2023 15:17:39 [info] Stopped Node-RED
[AGENT] 09/08/2023 15:17:39 [info] Cleaning instance directory
  • Device status on forge app shows Application and Instace Unassigned 👍
  • Last Known Status shows stopped 👍
  1. delete active snapshot - expect device to clear its config
  • On next poll, agent reports correct change 👍
[AGENT] 09/08/2023 15:13:52 [info] Stopping Node-RED
[AGENT] 09/08/2023 15:13:52 [info] Stopped Node-RED
[AGENT] 09/08/2023 15:13:52 [info] Cleaning instance directory

Related Issue(s)

FlowFuse/device-agent#132

Checklist

  • I have read the contribution guidelines
  • Suitable unit/system level tests have been added and they pass
  • Documentation has been updated
  • Upgrade instructions
  • Configuration details
  • Concepts
  • Changes flowforge.yml?
  • Issue/PR raised on flowforge/helm to update ConfigMap Template
  • Issue/PR raised on flowforge/CloudProject to update values for Staging/Production

Labels

  • Backport needed? -> add the backport label
  • Includes a DB migration? -> add the area:migration label

@knolleary knolleary merged commit d6ca76d into maintenance Aug 17, 2023
@knolleary knolleary deleted the backport-2575 branch August 17, 2023 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants