Fix device running or reporting old snapshot id/name (backport #2575) #2580
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #2575
Description
This PR addresses several issues found while debugging the reported problem.
NOTE: None of the work here involved messing with the device files. The state of the device and its files were all a result of actions taken in the forge application. I feel this is important to point out because as it stands, it is possible to get the device to crash and into odd states.
The issues encountered...
problem 1 - device running snapshot of deleted instance
Device active/target snapshot get populated in checkin code
device reports it is on old snapshot & that matches
checkin code did not check that the activeSnapshots project matched the device current project
problem 2 - snapshot table is not cleaned - holds snapshots from deleted instances
When device checks in, its reported snapshot is found and the device objects active/target snapshot are populated meaning the device passes its checkin & does not instruct an update.
NOTE: No work was done to delete the old snapshots, only to mitigate the problem. A separate issue will be raised detailing thoughts and care points around this.
problem 3
Even if we force a DeviceUpdate, it just passes the exact same settings due to the snapshot table having old entries
problem 4
Once we do get the device to clean up, switching between dev/autonomous mode caused a crash due to missing files
problem 5
when the agent determines a new snapshot is available, we dont instruct it to grab new settings.
Since some env vars are computed by the platform (e.g. FF_SNAPSHOT_ID/NAME) we must also instruct
the agent to refresh its settings.
The work in this PR
1: ensure device is not using orphaned snapshot
When checking the status of the device, grab the full row from ProjectSnapshots and verify the associated project is both present and matches the device. If not, the snapshot is considered to be an orphan or mismatched and a device update is requested.
2: ensure platform doesnt send ophaned snapshot > dev
When sending an update command to device, check the targetSnapshot has a project AND it matches the projectId that owns the device. If this fails, set the snapshot id in the payload update to null. This will signal the device to clear its current snapshot.
This can occur when an instance (with a member device and target snapshot) is deleted. The ProjectId field is cleared but the snapshot remains in the database.
TODOUnit TestsAdds tests to
test/unit/forge/db/controllers/Device_spec.js
Manual Tests
stopped
👍Related Issue(s)
FlowFuse/device-agent#132
Checklist
flowforge.yml
?flowforge/helm
to update ConfigMap Templateflowforge/CloudProject
to update values for Staging/ProductionLabels
backport
labelarea:migration
label