Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant tests and wait for device to be disconnected #5244

Merged
merged 9 commits into from
Jul 15, 2021

Conversation

ancaantochi
Copy link
Contributor

@ancaantochi ancaantochi commented Jul 13, 2021

Removed a tests which tested the same actions as another test that was left.
Test was flaky because sometimes the c2d message was sent in the same time the device was disconnected and sometimes edgeHub received it and immediately the device proxy was not available. Fixed to check the device in IotHub that the device was disconnected before sending C2D message.

Copy link
Contributor

@vadim-kovalyov vadim-kovalyov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule, we should try to avoid long timeouts in tests, and ideally avoid having dependency on IotHub. Should we instead promote this to E2E test?

Also, after looking at this test, I'm trying to evaluate it's usefulness. So, we create device, connect, then disconnect, then try to send offline message and verify we received it.... It seems to me that we are testing Iot Hub, and not Iot Edge... So, do we really need this test?

await Task.Delay(TimeSpan.FromSeconds(30));

// Wait for the connection to be closed on the Edge side by checking device connection state
Task timerTask = Task.Delay(TimeSpan.FromMinutes(4));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 minutes? I thought we are trying to speed our tests up, not slow them down :) On a serious note though, why do we need such long timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought 60s would help, but it was still failing so just trying to see if 4min would help and it was still failing. I found the issue and I think with the current change is stable, still running a few more times. It needs to wait for connect before it disconnects. We are trying to fix it, not speed it :) Anyway I changed it and it waits up to 60s to connects and then disconnect and as soon this happens it can proceed, so it most cases it should be faster than 60s.

while (device.ConnectionState != DeviceConnectionState.Disconnected)
{
device = await rm.GetDeviceAsync(deviceName);
await Task.Delay(TimeSpan.FromSeconds(5));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it here? Is there a way to do it without delay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a delay so we don't make the calls to IotHub continuously. I picked 5s, it could be changed to lower value, maybe 3s?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long term planning question: Could these tests accomplish their goals if we had a mock iothub and we didn't have to worry about network calls? We just make some situation or set of events that should occur in the mock iothub?

If so, then I think having some delays here in these tests are fine. If not, then I like the idea of trying to have as small a delay as possible. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to ask IotHub on their SLA for operations. 5sec is too much IMO. If I could, I'd probably do it with interval not more that 1s, and total timeout 10s max.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can mock the sdk, so it won't connect to IotHub, but probably this issue wouldn't be caught. We should discuss in the tests improvements plan what should be mocked.

@ancaantochi
Copy link
Contributor Author

As a general rule, we should try to avoid long timeouts in tests, and ideally avoid having dependency on IotHub. Should we instead promote this to E2E test?
Also, after looking at this test, I'm trying to evaluate it's usefulness. So, we create device, connect, then disconnect, then try to send offline message and verify we received it.... It seems to me that we are testing Iot Hub, and not Iot Edge... So, do we really need this test?

It is E2E test, just that it's not running in containers. I think it's useful because I found a bug for the root cause issue, it is a corner case if device connects and quickly disconnects before edgeHub has time to connect to cloud, then cloud connection is not closed, c2d messages are sent and edgehub can't send it to device because it was disconnected. I think long haul run into it before, with CI it's easy to debug from VS and actually find the root cause.

}
catch (Exception)
{
if (i == 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would you mind adding a small comment on this part for if (i == 4) ?

Copy link
Contributor

@vadim-kovalyov vadim-kovalyov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to block this, but I do recommend to still revisit the timeout numbers after we fix the root cause for the issue.

while (device.ConnectionState != DeviceConnectionState.Disconnected)
{
device = await rm.GetDeviceAsync(deviceName);
await Task.Delay(TimeSpan.FromSeconds(5));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to ask IotHub on their SLA for operations. 5sec is too much IMO. If I could, I'd probably do it with interval not more that 1s, and total timeout 10s max.

@ancaantochi ancaantochi merged commit 221048a into Azure:master Jul 15, 2021
@ancaantochi ancaantochi deleted the ancan/fix-c2d-ci-tests branch July 15, 2021 20:19
ancaantochi added a commit to ancaantochi/iotedge that referenced this pull request Aug 3, 2021
…5244)

* Remove redundant tests and wait for device to be connected/disconnected
ancaantochi added a commit to ancaantochi/iotedge that referenced this pull request Aug 3, 2021
…5244)

* Remove redundant tests and wait for device to be connected/disconnected
kodiakhq bot pushed a commit that referenced this pull request Aug 5, 2021
…#5313)

* Remove redundant tests and wait for device to be connected/disconnected
kodiakhq bot pushed a commit that referenced this pull request Aug 20, 2021
…#5314)

* Remove redundant tests and wait for device to be connected/disconnected
damonbarry pushed a commit to damonbarry/iotedge that referenced this pull request Apr 15, 2022
…5244)

* Remove redundant tests and wait for device to be connected/disconnected
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants