Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring dataman to remove locking mechanisms and to use uORB #21594

Merged
merged 29 commits into from
Jul 24, 2023

Conversation

Igor-Misic
Copy link
Member

@Igor-Misic Igor-Misic commented May 15, 2023

Changes brought by this PR:

  • Refactored the initialization process of the Dataman file.
  • Removed the thread-locking mechanism from the Dataman thread.
  • Added two new classes, DatamanClient and DatmanCache, which facilitate communication with Dataman using uORB messages.
  • Removed the locking mechanism for Dataman itself

Refactored Initialization of the Dataman File
This fix addresses the issue outlined in the following link: #21278.
Now, if the file doesn't exist or contains an incorrect compact key, it will be cleared, and default values will be set.

DatamanClient
Introduces a new method of communicating with the Dataman thread. Previously, direct access to the Dataman module was used, where requests were inserted into the queue with a semaphore. This approach has now been replaced with the uORB messaging system.

DatamanClient introduces both synchronous and asynchronous modes of communication with Dataman.

DatamanCache
This class serves as a cache storage for a defined amount of Dataman items. Currently, the cache does not employ any advanced caching algorithms.

Integration within mission
The mission now incorporates a simple caching mechanism that attempts to cache the next N items whenever the "current item" is modified.
In order to eliminate the need for a locking mechanism within the mission, logic has been implemented to check if the set of mission items has been updated in the Dataman. The write operation to DM_KEY_MISSION_STATE has been removed from the mavlink thread, and instead, the existing uORB (mission.msg) message is utilized.

Integration geofence and safe points
As the information on new geofence and safe points is updated at the end, it is safe to utilize it without the need for a lock. Integration has been implemented to store all geofence and safe point items within the cache and update them whenever changes occur.

Overview of the new architecture
image

Stack is increased as it was needed
image

@Igor-Misic Igor-Misic force-pushed the pr-dataman_refactoring branch 3 times, most recently from c326949 to 4c0713b Compare May 16, 2023 14:50
@dagar dagar self-requested a review May 17, 2023 01:07
@Igor-Misic Igor-Misic force-pushed the pr-dataman_refactoring branch 3 times, most recently from f8b538d to 56d0d30 Compare May 17, 2023 14:14
@Igor-Misic Igor-Misic force-pushed the pr-dataman_refactoring branch 6 times, most recently from e0dd2cc to 7768e5d Compare May 22, 2023 09:37
@Igor-Misic Igor-Misic marked this pull request as ready for review May 22, 2023 11:34
@Igor-Misic Igor-Misic requested a review from bkueng May 22, 2023 11:34
@julianoes
Copy link
Contributor

Nice, this looks amazing @Igor-Misic!

@bkueng bkueng force-pushed the pr-dataman_refactoring branch 2 times, most recently from f2705c3 to 2f16266 Compare June 16, 2023 08:27
bkueng
bkueng previously approved these changes Jun 16, 2023
Copy link
Member

@bkueng bkueng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @Igor-Misic! I went through it, tested quite a bit, and fixed some things.

RAM usage increases a bit (depending on geofence/safe point usage ~2KB).

Navigator does not block anymore during a mission execution (unless DO_JUMP is used). This is in sih with geofence and safe points:

PR:
navigator: 2337 events, 121802us elapsed, 52.12us avg, min 21us max 803us 650.441us rms
main:
navigator: 2539 events, 919572us elapsed, 362.18us avg, min 19us max 26094us 3136.414us rms

@@ -449,7 +449,7 @@ pipeline {
sh './Tools/HIL/run_nsh_cmd.py --device `find /dev/serial -name *usb-*` --cmd "uorb_tests latency_test" || true'

// test dataman
sh './Tools/HIL/run_nsh_cmd.py --device `find /dev/serial -name *usb-*` --cmd "tests dataman"'
sh './Tools/HIL/run_nsh_cmd.py --device `find /dev/serial -name *usb-*` --cmd "tests dataman" || true'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Igor explained it in the commit: 50f23de.

I made it a bit more explicit in b0c918f

@dagar
Copy link
Member

dagar commented Jun 16, 2023

@bkueng
Copy link
Member

bkueng commented Jun 19, 2023

There's a failure on the test rack that looks real. http://px4-jenkins.dagar.ca:8080/blue/organizations/jenkins/PX4-Autopilot/detail/pr-dataman_refactoring/21/pipeline/125

I didn't reproduce locally, but I think I fixed it in the last commit.

@tstastny
Copy link
Contributor

looks like only remaining build failure is 0.01% flash on diatone mamba

@tstastny
Copy link
Contributor

@dagar - what do you think? gtg?

@@ -527,6 +554,14 @@ Mission::update_mission()
const mission_s old_mission = _mission;

if (_mission_sub.copy(&_mission)) {

bool success = _dataman_client.writeSync(DM_KEY_MISSION_STATE, 0, reinterpret_cast<uint8_t *>(&_mission),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?
DM_KEY_MISSION_STATE and ORB_ID(mission) are the same thing, mavlink_mission would have published mission and written DM_KEY_MISSION_STATE back to back. I think it's time to kill DM_KEY_MISSION_STATE and remove some of the confusion.

@dagar
Copy link
Member

dagar commented Jun 21, 2023

  • DM_KEY_MISSION_STATE is obsolete and can be purged, we don't need to save the progress to the SD card
  • publishing ORB_ID(mission) is used to signal navigation mission to update and rerun mission feasibility, but now it's also updating with any geofence or safe point changes
  • DatamanClient is using the old uORB API where a subscription is a file descriptor, we need to check that the extra FD usage doesn't trigger the warning (or change to uORB::SubscriptionBlocking, etc)

Igor-Misic and others added 21 commits July 24, 2023 10:29
A version update is needed since the dataman is showing errors if data doesn't exist or if it is wrongly stored. This will force default data to be initialized.
tests dataman will trigger errors and run_nsh_cmd.py
 is not written to detect only when a test fails. This is properly tested with run_tests.py
…anges

This avoids the need to regularly access dataman for checking.
Avoids unnecessary dataman accesses
…dataman'

As this test is expected to contain errors.
This can happen e.g. when the item is cleared.
Instead of also when geofence/safe points updated.
This prevents reporting multiple times.
@bkueng
Copy link
Member

bkueng commented Jul 24, 2023

@dagar I'm getting this in as a few things are blocking on this. I'll do a follow-up, if there's anything else needed.

@bkueng bkueng merged commit 8e8b35d into main Jul 24, 2023
87 checks passed
@bkueng bkueng deleted the pr-dataman_refactoring branch July 24, 2023 11:10
@DronecodeBot
Copy link

This pull request has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there:

https://discuss.px4.io/t/how-to-start-developing/34131/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants