New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ONVIF Discovery Handler Optimizations #351
ONVIF Discovery Handler Optimizations #351
Conversation
@bfjelds I made some changes to make the device calls more concurrent, further improving performance. A follow up to this PR should address the verbose logging. trace and debug logging from the ONVIF discover handler dependencies are appearing in the Discovery Handler pod logs. |
@@ -415,11 +415,11 @@ onvif: | |||
memoryRequest: 11Mi | |||
# cpuRequest defines the minimum amount of CPU that must be available to this Pod | |||
# for it to be scheduled by the Kubernetes Scheduler | |||
cpuRequest: 300m | |||
cpuRequest: 10m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
if !cameras.contains(camera) { | ||
trace!("discover - discovered:{:?}", &latest_cameras); | ||
// Remove cameras that have gone offline | ||
previous_cameras.iter().for_each(|c| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this logic need some retry timeout before removing the cameras or is that handled elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry timeout is effectively 10 seconds since that is the discovery interval. If they were to come back online on the next discovery check they will once again be reported to the agent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if on second X camera A is present, on second X+9 it disappears, on X+10 another scan does not find it, so it removes it from Akri instance, on X+11 camera comes back (so camera had intermittent connectivity for 2 seconds), we still end up evicting pods that were using the camera or is my understanding wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your scenario does/would not occur. The discovery handler logic is different from the agent logic. The Discovery Handler reports to the Agent whenever there are changes in device visibility, checking for changes every 10 seconds. The Agent then decides what to do with that information. Currently, for shared devices, it gives a 5 minute timeout before deleting instances, triggering the controller to bring down pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this logic need some retry timeout before removing the cameras or is that handled elsewhere?
To summarize: the removing of instances and brokers is handled elsewhere, in the Agent and Controller respectively after a 5 minute period of the Discovery Handler not reporting the device.
@@ -295,18 +258,16 @@ mod tests { | |||
#[tokio::test] | |||
async fn test_apply_filters_no_filters() { | |||
let mock_uri = "device_uri"; | |||
let mock_ip = "mock.ip"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why is one with . and the other with :?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we were aiming at mimicking somewhat the format of an ip address vs a mac address, with .
and :
separators, respectively.
What this PR does / why we need it:
This PR optimizes our ONVIF discovery handler by:
GetScopes
call to the cameras duringapply_filters
GetSystemDateAndTime
. This prevents further filtering calls from being made to unresponsive cameras and ensures that camera is responsive.Via some local testing, it appears that the cpu requests and limits can be reduced as a result of the optimizations. Previously discovery was getting stuck on one thread if a connection could not be made to a camera.
During this PR, also learned that:
"Invalid argument"
error due to an interface not being specified. A later PR should address that. For now, this PR fixed the previous behavior of panicking upon this error: closes ONVIF Camera discovery issue #249.GetNetworkInterfaces
calls, so they are filtered out due to no response. A later PR should add support for obtaining the mac address from authenticated cameras, possibly via arp.Special notes for your reviewer:
If applicable:
cargo fmt
)cargo build
)cargo clippy
)cargo test
)./version.sh
)