Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] ProvisioningDeviceClient.RegisterAsync: CancellationToken not effective when offline #2036

Closed
its-phil opened this issue Jun 17, 2021 · 5 comments
Assignees
Labels
bug Something isn't working. fix-checked-in Fix checked into main or preview, but not yet released. IoTSDK Tracks all IoT SDK issues across the board

Comments

@its-phil
Copy link

Context

  • OS, version, SKU and CPU architecture used: Ubuntu 18.04 x86
  • Application's .NET Target Framework : net5.0
  • SDK version used: Microsoft.Azure.Devices.Provisioning.Transport.Amqp 1.14.1

Description of the issue

Calling ProvisioningDeviceClient.RegisterAsync with a cancellation token ignores the token timeout if the device is offline (tested on an Azure VM where all outbound connections are blocked). The ProvisioningDeviceClient instance is created with a ProvisioningTransportHandlerAmqp instance.

I would expect RegisterAsync to cancel if I set the cancellation token source timeout to e.g. 15 seconds. Instead, the timeout is always 1 minute.

This seems related to this issue and this issue but the workaround mentioned here doesn't seem work with the ProvisioningDeviceClient.

Is there another way to work around this issue? I'd like the RegisterAsync call to fail faster when the device is offline.

Code sample exhibiting the issue

using var transport = new ProvisioningTransportHandlerAmqp(TransportFallbackType.TcpOnly);
var provisioningDeviceClient = ProvisioningDeviceClient.Create(dpsUrl, scopeId, securityProvider, transport);

const int timeoutMs = 15000;
using var cts = new CancellationTokenSource(TimeSpan.FromMilliseconds(timeoutMs));
var sw = new Stopwatch();
try
{
    sw.Start();
    var result = await provisioningDeviceClient.RegisterAsync(cts.Token).ConfigureAwait(false);
    Console.WriteLine(result.Status);
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
    Console.WriteLine(ex.InnerException?.Message);
}
finally
{
    sw.Stop();
    Console.WriteLine($"Registration finished or cancelled after {sw.ElapsedMilliseconds} ms.");
    // Here I'd expect ~15 seconds. Instead I always get ~60 seconds.
}
@its-phil its-phil added the bug Something isn't working. label Jun 17, 2021
@github-actions github-actions bot added the IoTSDK Tracks all IoT SDK issues across the board label Jun 17, 2021
@timtay-microsoft
Copy link
Member

As mentioned in #1921 the AMQP library that both the IoT Hub device client and the DPS device client use does not support cancellation tokens, so we have some hardcoded timeout values for scenarios like these. While we are pushing for the AMQP library to add support for cancellation tokens, it may be a while before that is finished.

That said, the real issue here seems to be that the registerAsync call doesn't immediately fail when a network disconnection is detected. That seems like an issue we could investigate now.

@timtay-microsoft
Copy link
Member

And we can also look into making these hardcoded timeouts configurable as well

@its-phil
Copy link
Author

Thanks! I'd appreciate any solution that allows for customizing the timeouts. What I'd like to achieve is to connect an IoT device to an Azure Edge gateway, but not after trying to connect to the DPS (to update the assigned IoT Hub) and IoT Hub directly (to update desired properties). Should that fail (e.g. because the device and Edge are offline), the device can fall back to a cached IoT Hub and Azure Edge gateway address. That's why - in my case - it doesn't make sense for the device to wait for one minute.

As a matter of fact, it seems that there is the same issue when connecting directly to the IoT Hub (also via AMQP). So, in effect, I have a device patiently waiting for 2 minutes while it could already be connected and running if my cancellation timeouts worked.

For the sake of completeness, here is my code sample of the direct IoT Hub connection: specifying the timeout as TimeSpan doesn't seem to do the trick either in

var authenticationMethod = new DeviceAuthenticationWithRegistrySymmetricKey(deviceId, primary);
using var cts = new CancellationTokenSource(TimeSpan.FromMilliseconds(timeoutMs));
dc = DeviceClient.Create(iotHubAddress, authenticationMethod, TransportType.Amqp_Tcp_Only);
await dc.OpenAsync(cts.Token);

(using Microsoft.Azure.Devices.Client 1.37.2)

timtay-microsoft added a commit that referenced this issue Jun 18, 2021
…ort for timespan timeouts to provisioning device client

As discussed in #2036, AMQP provisioning device clients have no way to configure the timeout given to the AMQP library for operations like opening links. This adds overloads to the existing provisioning device client's registerAsync methods that allow for users to configure this timespan timeout.
timtay-microsoft added a commit that referenced this issue Jun 21, 2021
…ort for timespan timeouts to provisioning device client (#2041)

As discussed in #2036, AMQP provisioning device clients have no way to configure the timeout given to the AMQP library for operations like opening links. This adds overloads to the existing provisioning device client's registerAsync methods that allow for users to configure this timespan timeout.
@timtay-microsoft timtay-microsoft added the fix-checked-in Fix checked into main or preview, but not yet released. label Jun 21, 2021
@timtay-microsoft timtay-microsoft self-assigned this Jun 21, 2021
@timtay-microsoft
Copy link
Member

#2041 will bring the ability to configure the timespan that we pass into the AMQP library, so I'm tagging this as "fix checked in"

For the cancellation token half of this issue, #1921 will be where we track that support.

@timtay-microsoft
Copy link
Member

timtay-microsoft commented Aug 11, 2021

Since we have released timespan support for this API, and since we will be tracking better cancellation token support in #1921, I'm going to close this issue

https://github.com/Azure/azure-iot-sdk-csharp/releases/tag/2021-8-11

abhipsaMisra added a commit that referenced this issue Oct 8, 2021
* feat(shared): Add common resources for convention-based operations

* feat(iot-device): Add support for convention-based telemetry operation

* feat(iot-device): Add support for convention-based command operations

* * feat(iot-device): Add support for convention-based properties operations

* feat(iot-device): Add support for convention-based properties operations

Co-authored-by: James Davis ⛺️🏔 <jamdavi@microsoft.com>

* feat(e2e-tests): Add telemetry E2E tests

* feat(e2e-tests): Add command E2E tests

* fix(iot-device): Updating client property collection to handle no convention

* feat(samples): Add thermostat and temperature controller sample

* fix(doc, samples): Update API design doc and move SystemTextJson helper to samples

* fix(iot-device): Separate out root-level and component-level property addition operations

* feat(tests): Add unit tests for ClientPropertyCollection

feat(tests): Add unit tests for ClientPropertyCollection

Co-authored-by: Abhipsa Misra <abhipsa.misra@microsoft.com>

* feat(e2e-tests): Add properties E2E tests

Co-authored-by: Abhipsa Misra <abhipsa.misra@microsoft.com>

* feat(e2e-tests): Add fault injection tests for properties operations (#2001)

* fix(iot-device, shared, samples): Rename StatusCodes to CommonClientResponseCodes and add a comment to highlight ClientOptions behavior

* fix(iot-device): Fix enumerator implementation to return key-value pairs

* fix(iot-device): Make ClientOptions.PayloadConvention readonly

* fix(shared): Fix from merge conflict

* fix(samples): Update the pnp samples readme (#2025)

* Rename E2ETests.Configuration to TestConfiguration to avoid SDK type name conflict

* Revert "Rename E2ETests.Configuration to TestConfiguration to avoid SDK type name conflict"

This reverts commit 75034e5.

* Rename E2ETests.Configuration to TestConfiguration to avoid SDK type name conflict

* feat(service): add support for DeviceScope to import/export job

* fix(iot-device): Update IoT Hub CONNACK timeout to be 60 seconds

* feat(provisioning-device, prov-amqp, prov-mqtt, prov-https): Add support for timespan timeouts to provisioning device client (#2041)

As discussed in #2036, AMQP provisioning device clients have no way to configure the timeout given to the AMQP library for operations like opening links. This adds overloads to the existing provisioning device client's registerAsync methods that allow for users to configure this timespan timeout.

* fix(iot-device): Update doc comments for MqttTransportSettings

* Add documentation to repo about which platforms this SDK supports (#2046)

fixes #2032

* readme: Update readme to have LTS patch reference

* doc(readme): Updated readme with details on support for device streaming feature. (#2054)

* fix(iot-device): Update the amqp layer to inspect the inner exception for an amqp error code (#2053)

* refactor(samples): Move preview samples to samples repository (#2057)

* fix(e2e-tests): Fix file renaming Configutaion to TestConfiguration

* fix(iot-device): Update connection string validation param

* Update team codeowners

* fix(e2e-tests): Update resource generation script to generate resources
for samples

* initial changeS (#2060)

Co-authored-by: Sindhu Nagesh <vinagesh@microsoft.com>

* Adding readme to explain authentication, retry and timeouts in DeviceClient (#2096)

* refactor(iot-device): Rename property add method

* Last of error codes documentation (#2110) (#2074)

refactor(iot-service): Add API example for DeviceAlreadyExists error (#2101)

fix(iot-service): Hide unreferenced error code (#2094)

a few misc errors (#2092)

* a few misc errors

* fixup

Document ArgumentNull (#2091)

Document InvalidOperation and ArgumentInvalid (#2089)

Updating exception details for IotHubThrottledException (#2088)

refactor(iot-service): Add comments for service returned error codes (#2083)

Fix build error (#2087)

Updating exception details for IotHubSuspendedException (#2086)

Updating exception details for IotHubCommunicationException (#2085)

Fix InvalidProtocolVersion documentation. (#2084)

More codes doc'd and code cleanup (#2082)

* More codes doc'd, code clean up

* Update license

* Random is static

* License header

* More codes doc'd, code clean up

More updates to exceptions thrown (#2081)

More error code doc comments

Remove ref to ExponentialBackoff

[Error Codes] Update ThrottleBacklogLimitExceeded error code (#2078)

[Error Codes] Update PreconditionFailed error description (#2077)

* [Error Codes] Update PreconditionFailed error description

* Remove space

[Error Codes] Updated MessageTooLarge (#2073)

* [Error Codes] Updated MessageTooLarge

* Removed en-us where seen

* Remove en-us

* REmove en-us

[Error Codes] Updating UnauthorizedException (#2072)

* [Error Codes] Updating UnauthorizedException

* Remove en-us

* Remove en-us

* Update UnauthorizedException.cs

* Update UnauthorizedException.cs

[Error Codes] Updated ServerErrorException (#2071)

* [Error Codes] Updated ServerErrorException

* Update ServerErrorException.cs

* Update ServerErrorException.cs

* Update ServerErrorException.cs

* Update ServerErrorException.cs

* Update ServerErrorException.cs

[Error Codes] Updated QuotaExceededException (#2070)

* Updated QuotaExceededException classes

* Update QuotaExceededException.cs

* Update QuotaExceededException.cs

Updated ServerBusyException (#2069)

[Error Codes] Update PartitionNotFound error code (#2075)

* [Error Codes] Update PartitionNotFound error code

* remove double lines

* remove double lines

Obsolete error codes that are not throw by the service (#2079)

Fix deprecation messages.

Notes for and deprecation of BlobContainerValidationError, BulkRegistryOperationFailure, and JobQuotaExceeded

Document errors and remove unreferenced file

* Rename readme per style and update reference (#2113)

* * fix(iot-device): TryGetValue methods should check for the component identifier when applicable and should not throw any exceptions (#2111)

* refactor(iot-device): Separate client reported properties into a separate accessor

* fix(iot-device): Add checks to verify that client property update response has version and requestId (#2115)

* refactor(iot-service) Make all clients mockable (#2117)

* refactor(job-client): Make job client easy to mock (#1875)

* refactor(service-client): Make service client easy to mock (#1878)

* refactor(registry-manager): Make registry manager mockable (#1881)

* refactor(digital-twin-client): Make the DigitalTwin client mockable. (#1892)

* refactor(iot-service): Move direct method request file out of JobClient folder into root

* refactor(prov-amqp, prov-mqtt, prov-http): Fix timeout tests failing occasionally (#2121)

* fix(tools): Fix readme for ConsoleEventListener usage (#2123)

* fix(tools): Fix readme for ConsoleEventListener usage

* doc(all): Add documentation for the upcoming service TLS changes for user awareness (#2112)

* fix(doc): Fix typo is readme

* Remove barustum from CODEOWNERS (#2126)

* refactor(ado): Update area paths for analyzers pipeline (#2136)

* refactor(tools-logs): Update log capture script to be configurable and use powershell (#2141)

* refactor(iot-device, prov-mqtt): Target new DotNetty version when not net451 (#2129)

* refactor(doc): Add additional comments to log capture readme

* Bump versions for 2021-08-12 release (#2143)

* fix(readme): Update the LTS support dates readme (#2147)

* fix(e2e): get values from create call to avoid timing issues

* refactor(iot-device): Update the internal method names as per functionality

* refactor(e2e): Remove unnecessary value in keyvault from setup file (#2153)

* fix(e2e): Fix e2e setup script to choose the correct dps endpoint based on region of the resource (#2134)

* fix(readme): Update readme to specify preview strategy

* Removing vinagesh from codebase owners list

* feat(iot-device): Add convenience method for acknowledging writable property update requests

* fix(iot-device): Fix params passed into ObjectDisposedException

* fix(githib): Update github issues template

* Add support for .NET 5 (#2169)

* feat(all): Add support for .NET 5.0

* refactor(all): Update APIs to follow .NET 5.0 patterns

* refactor(e2e-tests): Update E2E tests to follow .NET 5.0 patterns

* refactor(all): Simplify disposal of IDisposable X509Certificate

* fix(vsts): Update the location for log analytics workspace resource

* refactor(e2e-tests): Dispose IDisposable types

* fix(e2e-tests): Dispose IDisposable types created - hmac

* fix(iot-device): Update PayloadCollection to accept dictionary of values (#2171)

* Wrong reference to Device Streaming Sample (#2173)

* Fix link to Device Streaming Sample

* Update iothub/service/samples/readme.md

Co-authored-by: David R. Williamson <drwill@microsoft.com>

* fix(dps-service): Remove client side validation of x509 CA references format (#2172)

* Add Andy to the repos (#2179)

* Default Branch update: Update master branch to main. (#2183)

* Update master branch to main.

* Update horton-e2e.yaml

* fix(iot-serv): Fix bug where device scope and parent scopes set to device weren't used in bulk add operations (#2189)

#2184

* fix(iot-svc): Cleanup and deprecation warning of code in CryptoKeyGenerator (#2187)

* fix(iot-svc): Add support for GeneratePassword to other targets

* Don't add net472 support, but keep code cleanup

* Add deprecated attribute

* Fix test references by making a local copy for tests

* Add XML docs for exceptions that can be thrown by ModuleClient.SendEventAsync and DeviceClient.SendEventAsync (#2178)

* Add XML docs for exceptions that can be thrown by ModuleClient.SendEventAsync and DeviceClient.SendEventAsync

* Add more XML docs for exceptions for ModuleClient.SendEventAsync and DeviceClient.SendEventAsync

* Update iothub/device/src/DeviceClient.cs

Co-authored-by: David R. Williamson <drwill@microsoft.com>

* Add using for exceptions

Co-authored-by: David R. Williamson <drwill@microsoft.com>

* refactor(iot-device): Merge flows for twin and client property operations (#2180)

* fix(e2e-tests): Update E2E tests to initialize and dispose resources correctly

* fix

Co-authored-by: James Davis ⛺️🏔 <jamdavi@microsoft.com>
Co-authored-by: jamdavi <73593426+jamdavi@users.noreply.github.com>
Co-authored-by: David R. Williamson <drwill@microsoft.com>
Co-authored-by: timtay-microsoft <timtay@microsoft.com>
Co-authored-by: Sindhu Nagesh <vinagesh@microsoft.com>
Co-authored-by: bikamani <41314966+bikamani@users.noreply.github.com>
Co-authored-by: Bommas <5359504+scb01@users.noreply.github.com>
Co-authored-by: Basel Rustum <barustum@microsoft.com>
Co-authored-by: Roman Marusyk <Marusyk@users.noreply.github.com>
Co-authored-by: Azad Abbasi <azabbasi@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. fix-checked-in Fix checked into main or preview, but not yet released. IoTSDK Tracks all IoT SDK issues across the board
Projects
None yet
Development

No branches or pull requests

2 participants