Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messages are not received after edgeHub restart - AMQP protocol #571

Closed
Hammatt opened this issue Jul 30, 2018 · 14 comments
Closed

Messages are not received after edgeHub restart - AMQP protocol #571

Hammatt opened this issue Jul 30, 2018 · 14 comments
Assignees
Labels
area-edge Issues related to IoT Edge. bug Something isn't working.

Comments

@Hammatt
Copy link

Hammatt commented Jul 30, 2018

  • OS, version, SKU and CPU architecture used: Ubuntu 18.04 x64
  • Application's .NET Target Framework : netcoreapp2.1
  • Device: Embedded PC
  • SDK version used: 1.17.1

Description of the issue:

Edge modules do not reconnect to the Hub properly after the hub restarts some time during operation. They seem to be able to

Code sample exhibiting the issue:

Any set of two modules which communicate with eachother via AMQP (have not tested other protocols) Here's a sample one I often use:

namespace MessageSender
{
    using System;
    using System.IO;
    using System.Runtime.InteropServices;
    using System.Runtime.Loader;
    using System.Security.Cryptography.X509Certificates;
    using System.Text;
    using System.Threading;
    using System.Threading.Tasks;
    using Microsoft.Azure.Devices.Client;
    using Microsoft.Azure.Devices.Client.Transport.Mqtt;

    class Program
    {
        static int counter;

        static ModuleClient ioTHubModuleClient;

        static void Main(string[] args)
        {
            Init().Wait();

            Task.Run(async () =>
                    {
                        var rnd = new Random();
                        var myMessageSize = 262143-160;
                        while (true)
                        {
                            Console.WriteLine("Generating message...");
                            await Task.Delay(2000);
                            //generate new huge message
                            var myMessageBytes = new byte[myMessageSize];
//                            myMessageSize -= 100;
                            rnd.NextBytes(myMessageBytes);
                            var myMessage = new Message(myMessageBytes);

                            await ioTHubModuleClient.SendEventAsync("input1", myMessage);
                            Console.WriteLine("Message Sent.");
                        }
                    });

            // Wait until the app unloads or is cancelled
            var cts = new CancellationTokenSource();
            AssemblyLoadContext.Default.Unloading += (ctx) => cts.Cancel();
            Console.CancelKeyPress += (sender, cpe) => cts.Cancel();
            WhenCancelled(cts.Token).Wait();
        }

        /// <summary>
        /// Handles cleanup operations when app is cancelled or unloads
        /// </summary>
        public static Task WhenCancelled(CancellationToken cancellationToken)
        {
            var tcs = new TaskCompletionSource<bool>();
            cancellationToken.Register(s => ((TaskCompletionSource<bool>)s).SetResult(true), tcs);
            return tcs.Task;
        }

        /// <summary>
        /// Initializes the ModuleClient and sets up the callback to receive
        /// messages containing temperature information
        /// </summary>
        static async Task Init()
        {
            var amqpSetting = new AmqpTransportSettings(TransportType.Amqp_Tcp_Only);
            ITransportSettings[] settings = { amqpSetting };

            // Open a connection to the Edge runtime
            ioTHubModuleClient = await ModuleClient.CreateFromEnvironmentAsync(settings);
            await ioTHubModuleClient.OpenAsync();
            Console.WriteLine("IoT Hub module client initialized.");

            // Register callback to be called when a message is received by the module
            await ioTHubModuleClient.SetInputMessageHandlerAsync("input1", PipeMessage, ioTHubModuleClient);

            await ioTHubModuleClient.SetDesiredPropertyUpdateCallbackAsync(async (inProperties, inContext) => {Console.WriteLine("Desired Property Update");}, ioTHubModuleClient).ConfigureAwait(false);
        }

        /// <summary>
        /// This method is called whenever the module is sent a message from the EdgeHub.
        /// It just pipe the messages without any change.
        /// It prints all the incoming messages.
        /// </summary>
        static async Task<MessageResponse> PipeMessage(Message message, object userContext)
        {
            Console.WriteLine("Message received.");
            int counterValue = Interlocked.Increment(ref counter);

            var moduleClient = userContext as ModuleClient;
            if (moduleClient == null)
            {
                throw new InvalidOperationException("UserContext doesn't contain " + "expected values");
            }

            Console.WriteLine($"Received message: {counterValue}, size: {message.BodyStream.Length}");

            byte[] messageBytes = message.GetBytes();

            Console.WriteLine($"Read in {messageBytes.Length} bytes");

            return MessageResponse.Completed;
        }
    }
}

Hook two instances of that up to itself.

Console log of the issue:

run docker restart edgeHub to force the edgeHub to restart..

See Azure/iotedge#65 for more info.

@CIPop CIPop changed the title Messages are not received after edgeHub restart. Messages are not received after edgeHub restart - AMQP protocol Aug 1, 2018
@CIPop
Copy link
Member

CIPop commented Aug 1, 2018

Thanks for reporting @Hammatt seems related to #558 but for AMQP.

@CIPop CIPop added bug Something isn't working. area-edge Issues related to IoT Edge. labels Aug 1, 2018
@CIPop
Copy link
Member

CIPop commented Aug 3, 2018

Also related (or duplicate of) #239.

@CIPop
Copy link
Member

CIPop commented Aug 28, 2018

@Hammatt @abhipsaMisra and I are still investigating both our connection recovery tests and adding more logs.
@varunpuranik has potentially identified the issue manifesting in the Edge scenario and he is working on a fix.

@Hammatt
Copy link
Author

Hammatt commented Aug 28, 2018

Thanks for the update! I can confirm that we have still been seeing this issue occasionally.

@varunpuranik varunpuranik mentioned this issue Sep 5, 2018
2 tasks
@CIPop
Copy link
Member

CIPop commented Sep 6, 2018

@Hammatt We believe we've identified the cause of this issue: @varunpuranik's PR #611 was tested against both MQTT and AMQP. Once the changes are in, we'll prepare a new release.

@WilliamBerryiii
Copy link
Member

@CIPop - any updates on this one?

@CIPop
Copy link
Member

CIPop commented Oct 15, 2018

@WilliamBerryiii This is a lot more complicated than first expected.

  1. @varunpuranik 's PR is breaking the method E2E fault injection tests (i.e. disconnect recovery) and possibly twin. Since we've found this fundamental issue with our SDK, we're blocking all PRs until we get all our E2E recovery tests enabled (I inherited them disabled...)
  2. I've added some more logging and was able to at least make our E2E for message reconnect work and they are quite stable. I wasn't able to enable command or twin update (individual tests seem to pass but when all tests run at the same time they fail which is indicative of another issue, potentially with objects kept around after a disconnect/timeout which could lead to 2 or more clients trying to connect with the same identity).
  3. I've also found that our E2E fault injection for AMQP are not actually dropping the TCP connection (wrong impl. on the service side) so even with these tests enabled, we're not testing your scenario. I'm looking at other ways to simulate a real TCP connection drop and we'll use Edge as a repro (the problem being that this scenario isn't consistently reproducing according to @varunpuranik).
  4. AMQP fixed the following issue which could be related: Bugfix: Message disposition hangs infinitely when the operation times out azure-amqp#123

@WilliamBerryiii
Copy link
Member

I'd also add that we should probably find a way to allow community members to run the E2E tests before PRs ... even via a light weight stubbing mechanism - assuming that the CI/CD on a PR triggers a full E2E run on a MSFT owned IoT Hub. (Just thinking about the ease of accepting non-MSFT contributions.)

@CIPop
Copy link
Member

CIPop commented Oct 17, 2018

@WilliamBerryiii Good point! I meant to document this for a while: Please open a Support Request indicating your test subscription asking to enable Fault Injection.

@sebader
Copy link
Member

sebader commented Dec 10, 2018

Would switching modules to MQTT be a workaround or would that not solve it or even bring other issues?

@varunpuranik
Copy link
Contributor

I validated that with the latest release of the SDK - v1.19.0 this issue is fixed - https://www.nuget.org/packages/Microsoft.Azure.Devices.Client/1.19.0

@Hammatt - Can you please validate and if it works, close the issue?
Thanks.

@Hammatt
Copy link
Author

Hammatt commented Jan 8, 2019

Thanks, I will validate this over the next few days and get back to you as soon as I can confirm.

@Hammatt
Copy link
Author

Hammatt commented Jan 13, 2019

Sorry for the delay. I've finally had time to go through a bunch of testing and I haven't been able to reproduce the problem. Thanks!

@Hammatt Hammatt closed this as completed Jan 13, 2019
@az-iot-builder-01
Copy link
Contributor

@CIPop, @Hammatt, @WilliamBerryiii, @sebader, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-edge Issues related to IoT Edge. bug Something isn't working.
Projects
None yet
Development

No branches or pull requests

8 participants