-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connectivity / Nested Connectivity: Add tolerances for failure cases #5512
Connectivity / Nested Connectivity: Add tolerances for failure cases #5512
Conversation
This reverts commit 5c0e07a.
This reverts commit a9f2be9.
) Migrate the single node connectivity test pipeline from self-hosted devops agents to 1ES-hosted agents. Also, update the edgelet Cargo.lock to reference an aziot-identity-service commit with a PACKAGE_VERSION in packages.yaml, that matches the version of aziot-identity-service that the edgelet package depends on. This fixes an error in the connectivity tests, where aziot-edge fails to install.
This reverts commit e5feb46.
…the logic is simple
d2b8666
to
079382f
Compare
"topic": "initiate/#" | ||
} | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file are due to a missing cherry pick. I wanted to test the behavior for this module as it is enabled in 1.2. I will revert before merge, just using it for testing.
e2e_deployment_files/nestededge_bottomLayerBaseDeployment_connectivity_mqtt.template.json
Outdated
Show resolved
Hide resolved
new DateTime(2020, 1, 1, 9, 10, 24, 15) | ||
}, | ||
10, 7, 0, 0, 0, 0, 0, 0, 0, true | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added logic to the connectivity report to catch this case.
@@ -90,7 +108,7 @@ class DirectMethodConnectivityReportDataWithSenderAndReceiverSource | |||
// NetworkOnFailure test | |||
Enumerable.Range(1, 7).Select(v => (ulong)v), | |||
new[] { 1UL, 2UL, 3UL, 5UL, 6UL, 7UL }, | |||
new List<HttpStatusCode> { HttpStatusCode.InternalServerError, HttpStatusCode.OK, HttpStatusCode.OK, HttpStatusCode.NotFound, HttpStatusCode.OK, HttpStatusCode.OK, HttpStatusCode.OK }, | |||
new List<HttpStatusCode> { HttpStatusCode.OK, HttpStatusCode.OK, HttpStatusCode.InternalServerError, HttpStatusCode.NotFound, HttpStatusCode.InternalServerError, HttpStatusCode.InternalServerError, HttpStatusCode.InternalServerError }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep the test functioning as expected, ignoring any tolerances. Therefore I had to exceed the tolerance (by adding more internal server errors) so that it would surpass the new threshold.
I didn't want to add tests for the tolerances as it gets tricky, we are on a time crunch, they are subject to change, and it is simple percentage logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a todo comment to the test case & create a detailed work item to follow on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am saying we don't need to test these cases as it is unecessary.
@@ -185,7 +203,7 @@ class DirectMethodConnectivityReportDataWithSenderAndReceiverSource | |||
// Non-Offline test | |||
Enumerable.Range(1, 7).Select(v => (ulong)v), | |||
Enumerable.Range(1, 7).Select(v => (ulong)v), | |||
new List<HttpStatusCode> { HttpStatusCode.OK, HttpStatusCode.OK, HttpStatusCode.NotFound, HttpStatusCode.OK, HttpStatusCode.OK, HttpStatusCode.OK, HttpStatusCode.OK }, | |||
new List<HttpStatusCode> { HttpStatusCode.NotFound, HttpStatusCode.NotFound, HttpStatusCode.NotFound, HttpStatusCode.NotFound, HttpStatusCode.OK, HttpStatusCode.OK, HttpStatusCode.OK }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep the test functioning as expected, ignoring any tolerances. Therefore I had to exceed the tolerance (by adding more errors) so that it would surpass the new threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run the pipeline one more time to make sure nothing breaks. The code change looks good!
…y tests (Azure#5456)"" This reverts commit a720faf.
This reverts commit dad4a82.
This reverts commit 900d0c3.
@@ -726,28 +726,32 @@ | |||
"TwinTestPropertyType": "Desired", | |||
"ExpectedSource": "twinTester1.desiredUpdated", | |||
"ActualSource": "twinTester2.desiredReceived", | |||
"TestDescription": "twin | desired property | amqp" | |||
"TestDescription": "twin | desired property | amqp", | |||
"Topology": "Nested" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Topology": "Nested"
I like this.
// This tolerance is needed because sometimes we see large numbers of NetworkOnFailures. | ||
// Also, sometimes we observe 1 NetworkOffFailure and a lot of mismatched results. The | ||
// mismatched results are likely a test logic issue that needs further investigation. | ||
return totalSuccessful > 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh man. This is a really relaxed passing criteria. Now not sure how I feel about this.
Per
// This tolerance is needed because sometimes we see large numbers of NetworkOnFailures.
// Also, sometimes we observe 1 NetworkOffFailure and a lot of mismatched results. The
// mismatched results are likely a test logic issue that needs further investigation.
Is there a way we can objectively measure this and use the measurement as a passing criteria? i.e. More than 60% in passing... we can call it pass.
@damonbarry Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to box in the tolerances as much as possible, but in one case for nested I saw a very severe failure like this. If the tests are going to be green all the time for known issues we need the tolerance.
We can prioritize investigating this since it is a more severe issue.
@@ -108,7 +115,20 @@ public bool IsPassedHelper() | |||
}, | |||
() => | |||
{ | |||
return this.TotalExpectCount == this.TotalMatchCount; | |||
// Product issue for C2D messages connected to edgehub over mqtt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the / this.TotalExpectCount
in this subroutine are prone to DIVIDED BY 0
exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We check that expect count is > 0 here:
https://github.com/Azure/iotedge/pull/5512/files#diff-64b0d03f03915eab5550575602fed2f602c3ad2f36feeb43d4b7929b8bb6027fR111
@@ -60,6 +61,7 @@ public void TestConstructorSuccess() | |||
|
|||
var reportGenerator = new CountingReportGenerator( | |||
TestDescription, | |||
TestMode.Connectivity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider doing TestMode.E2E
instead of TestMode.Connectivity
.
namespace TestResultCoordinator
{
enum TestMode
{
Connectivity,
E2E = Connectivity,
LongHaul
}
}
PS. please check if that's legal in Dotnet3.1 syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds a little odd that Connectivity
mode is used in E2E
test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a unit test for the TRC, not a part of the E2E test suite. Unless I am misunderstanding what you thought. The paths look similar.
This PR is a in the same spirit as #5512, however not nearly the same number of changes is needed. These things have been done for 1.1: - Direct method report accounts for duplicate direct methods received. This really only affected nested connectivity with the broker, but the logic has been made and reviewed so I thought it was good to add it to 1.1 anyway in case we see some weird regression in the future. - Add tolerance for NetworkOnFailure for connectivity tests. We will fail the tests if we pass less than 90% of direct methods. - Add tolerance for missing messages. I saw one instance where we were missing a two messages. Does not reproduce often.
This PR serves to turn Connectivity and Nested Connectivity to green. It introduces the following tolerances:
There are also the following changes:
Before merging I will confirm that both connectivity and nested connectivity pipelines are passing.