Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONVIF Discovery Handler Optimizations #351

Merged
merged 5 commits into from Jul 22, 2021

Conversation

kate-goldenring
Copy link
Contributor

What this PR does / why we need it:
This PR optimizes our ONVIF discovery handler by:

  1. Applying scope filtering based on discovery response removing the need for an additional GetScopes call to the cameras during apply_filters
  2. Testing the responsiveness of cameras during discovery via a call to the publicly accessible GetSystemDateAndTime. This prevents further filtering calls from being made to unresponsive cameras and ensures that camera is responsive.
  3. After initial discovery, maintaining a list of the previously visible cameras and only apply filtering on new cameras that are seen. Previously, every 10 seconds, we were making calls to get the cameras' ip and mac addresses which not only is computationally expensive but also could overload the cameras.

Via some local testing, it appears that the cpu requests and limits can be reduced as a result of the optimizations. Previously discovery was getting stuck on one thread if a connection could not be made to a camera.

During this PR, also learned that:

  • IPv6 cameras with local-link addresses throw an "Invalid argument" error due to an interface not being specified. A later PR should address that. For now, this PR fixed the previous behavior of panicking upon this error: closes ONVIF Camera discovery issue #249.
  • Authenticated cameras do not support GetNetworkInterfaces calls, so they are filtered out due to no response. A later PR should add support for obtaining the mac address from authenticated cameras, possibly via arp.

Special notes for your reviewer:

If applicable:

  • this PR contains unit tests
  • added code adheres to standard Rust formatting (cargo fmt)
  • code builds properly (cargo build)
  • code is free of common mistakes (cargo clippy)
  • all Akri tests succeed (cargo test)
  • version has been updated appropriately (./version.sh)

@kate-goldenring
Copy link
Contributor Author

@bfjelds I made some changes to make the device calls more concurrent, further improving performance. A follow up to this PR should address the verbose logging. trace and debug logging from the ONVIF discover handler dependencies are appearing in the Discovery Handler pod logs.

@@ -415,11 +415,11 @@ onvif:
memoryRequest: 11Mi
# cpuRequest defines the minimum amount of CPU that must be available to this Pod
# for it to be scheduled by the Kubernetes Scheduler
cpuRequest: 300m
cpuRequest: 10m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

if !cameras.contains(camera) {
trace!("discover - discovered:{:?}", &latest_cameras);
// Remove cameras that have gone offline
previous_cameras.iter().for_each(|c| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this logic need some retry timeout before removing the cameras or is that handled elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry timeout is effectively 10 seconds since that is the discovery interval. If they were to come back online on the next discovery check they will once again be reported to the agent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if on second X camera A is present, on second X+9 it disappears, on X+10 another scan does not find it, so it removes it from Akri instance, on X+11 camera comes back (so camera had intermittent connectivity for 2 seconds), we still end up evicting pods that were using the camera or is my understanding wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your scenario does/would not occur. The discovery handler logic is different from the agent logic. The Discovery Handler reports to the Agent whenever there are changes in device visibility, checking for changes every 10 seconds. The Agent then decides what to do with that information. Currently, for shared devices, it gives a 5 minute timeout before deleting instances, triggering the controller to bring down pods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this logic need some retry timeout before removing the cameras or is that handled elsewhere?

To summarize: the removing of instances and brokers is handled elsewhere, in the Agent and Controller respectively after a 5 minute period of the Discovery Handler not reporting the device.

@@ -295,18 +258,16 @@ mod tests {
#[tokio::test]
async fn test_apply_filters_no_filters() {
let mock_uri = "device_uri";
let mock_ip = "mock.ip";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why is one with . and the other with :?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we were aiming at mimicking somewhat the format of an ip address vs a mac address, with . and : separators, respectively.

@kate-goldenring kate-goldenring merged commit 45345d2 into project-akri:main Jul 22, 2021
@kate-goldenring kate-goldenring deleted the onvif-cleanup branch July 22, 2021 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ONVIF Camera discovery issue
3 participants