Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to continue dialog. A dialog with id xxxx could not be found. #6716

Open
RuanCSoftsure opened this issue Dec 14, 2023 · 4 comments
Open
Assignees
Labels
Bot Services Required for internal Azure reporting. Do not delete. Do not change color. bug Indicates an unexpected problem or an unintended behavior. customer-replied-to Indicates that the team has replied to the issue reported by the customer. Do not delete. customer-reported Issue is created by anyone that is not a collaborator in the repository.

Comments

@RuanCSoftsure
Copy link

RuanCSoftsure commented Dec 14, 2023

Github issues should be used for bugs and feature requests. Use Stack Overflow for general "how-to" questions.

Version

4.21.1

Describe the bug

Whenever a client is on a dialog and they respond to a message on the bot, it will intermittently fail with the "Failed to continue dialog....A Dialog with id xxxx could not be found" exception.

EXCEPTION MESSAGE:
Failed to continue dialog. A dialog with id xxxx could not be found.

INNER EXCEPTION MESSAGE:
no inner exception message provided

SOURCE:
Microsoft.Bot.Builder.Dialogs

To Reproduce

I have really struggled to reproduce this. And when I do manage to reproduce it, I still don't really know why it is failing. This only started happening when we upgraded from 4.12 to 4.21.

Expected behavior

The client should just be able to continue with the dialog they are on.

Screenshots

Additional context

Happens specifically on this line of code.

// Run the Dialog with the new message Activity.
await _dialog.RunAsync(turnContext, _conversationState.CreateProperty("DialogState"), cancellationToken);

I don't know whether I should be making other changes to our existing logic. And how child dialogs are added to the parent dialog. But out of over a 100 conversations a day...this will happen to probably 3 or 5 of them.
We store the conversation and userstate on Redis. And I have had a look at the conversations that would fail with this error and the dialog that is supposedly not there...is in the list. So I am completely stumped as to where else I should look. And why this is happening intermittently.

I want to mention that I use dependency injection and register each dialog as Transient service.
I then only add the dialog to the stack once I get to the waterfall step where I need it. Don't know whether this would could potentially be why I am getting this failure now. Before upgrading from 4.12 to 4.21 this was not a problem.

@RuanCSoftsure RuanCSoftsure added bug Indicates an unexpected problem or an unintended behavior. needs-triage The issue has just been created and it has not been reviewed by the team. labels Dec 14, 2023
@RuanCSoftsure
Copy link
Author

Anyone that could help with this one? Seems like it is getting worse. :(

@dmvtech
Copy link
Collaborator

dmvtech commented Jan 23, 2024

  • Does this happen with a specific bot Channel?
  • What state storage are you using?

I want to mention that I use dependency injection and register each dialog as Transient service.
I then only add the dialog to the stack once I get to the waterfall step where I need it. Don't know whether this would could potentially be why I am getting this failure now.

Can you share how you do this? That's not standard approach. Not sure what side effects that may have.

@dmvtech dmvtech self-assigned this Jan 23, 2024
@dmvtech dmvtech added customer-reported Issue is created by anyone that is not a collaborator in the repository. Bot Services Required for internal Azure reporting. Do not delete. Do not change color. customer-replied-to Indicates that the team has replied to the issue reported by the customer. Do not delete. and removed needs-triage The issue has just been created and it has not been reviewed by the team. labels Jan 23, 2024
@RuanCSoftsure
Copy link
Author

Thanks for responding. To answer your questions...

- Does this happen with a specific bot Channel?
Directline Channel
- What state storage are you using?
Storing it in the database, using Redis specifically with a time to live of 24 hours

Can you share how you do this? That's not standard approach. Not sure what side effects that may have.

I will show some code snippets yes. Please see below.

In this example I have ManagePolicyDialogBase as a dialog where the client can manage certain policy information.

Startup.cs in the bot has the following:

services.AddTransient<ManagePolicyDialogBase>();

This dialog is a child dialog of my WelcomeDialogBase.

WelcomeDialog is started in the StartWelcomeDialog step of my GreetingsDialog

private async Task<DialogTurnResult> StartWelcomeDialogAsync(WaterfallStepContext stepContext, CancellationToken cancellationToken)
        {        
         
                var dialogName = _welcomeDialog.GetType().Name;

                if (FindDialog(dialogName) == null) AddDialog(_welcomeDialog);
             
                return await stepContext.BeginDialogAsync(dialogName);
         }

My WelcomeDialogBase constructor looks like this:
Please note I am only showing you snippets of it. Our bot is quite complex so don't want to overwhelm with unnecessary code.
You will see the managePolicyDialogBase being injected in the constructor below

      public WelcomeDialogBase(string dialogId, IBotImplementations botImplementations, IDialogImplementations dialogImplementations,
            ICheckPublicHoliday checkPublicHoliday, ICheckOfficeHours checkOfficeHours,
            Func<BotClient, ClaimsIntentDialogBase> claimIntentDialogResolver,
            Func<BotClient, ProductInfoIntentDialogBase> productInfoIntentDialogResolver,
            PetQuoteDialogBase petQuoteDialog,
            ManagePolicyDialogBase managePolicyDialog) : base(dialogId, botImplementations)
        {      

         .....
           
           _managePolicyDialog = managePolicyDialog;

         .....

            AddDialog(new WaterfallDialog(nameof(WaterfallDialog), new WaterfallStep[] {
                StartDialogAsync,
                PromptOptionsAsync,
                OnOptionSelectedAsync,
                StartClaimsDialogAsync,
                ReturnFromClaimsDialogAsync,
                StartManagePolicyDialogAsync,
                ReturnFromManagePolicyAsync,
                StartQuoteDialogAsync,
                ReturnFromQuoteDialogAsync,
                StartProductInfoDialogAsync,
                ReturnFromProductInfoDialogAsync,
                PromptSpeakToHumanOptionsAsync,
                OnSpeakToHumanOptionSelectedAsync,
                StartCallBackDialogAsync,
                ReturnFromCallBackDialogAsync,
                StartAgentTransferDialogAsync,
                ReturnFromAgentTransferDialogAsync,
                EndWelcomeDialogAsync
            }));
            InitialDialogId = nameof(WaterfallDialog);
        }

The dialog is started in the StartManagePolicyDialogAsync step.

        protected async Task<DialogTurnResult> StartManagePolicyDialogAsync(WaterfallStepContext stepContext, CancellationToken cancellationToken)
        {
            var intent = MenuIntents.ManagePolicy.ToString();

            await _topIntent.LogAsync(stepContext.Context.Activity, intent, intent, _botClientId);

            var dialogName = _managePolicyDialog.GetType().Name;

            if (FindDialog(dialogName) == null) AddDialog(_managePolicyDialog);

            // navigatge to Manage Policy Dialog
            return await stepContext.BeginDialogAsync(dialogName);
        }

Like I mentioned...this is not something that happens all the time. It happens intermittently. Not a single one of these exceptions came through yesterday. A very strange problem indeed. So maybe I just need to tweak some logic, since upgrading to the latest framework. But I don't really know what logic i need to tweak. Before version 4.21.1 I never experienced this exception.

Thanks for your willingness to try and help. I will share as much code as I can to get to the bottom of this.

@RuanCSoftsure
Copy link
Author

RuanCSoftsure commented Feb 6, 2024

Don't know why I didn't check the Pod in AKS. That is where I bot is hosted. Ever since I upgraded the bot framework from 4.12 to 4.21.1 this error started happening. But it was very intermittently. Now that our bot is getting way more traffic via our own custom WhatsApp Channel, this error is happening more often and now I have noticed the Pod is restarting a few times a day, which restarts the whole bot...and that is why this error is occurring more regularly.

Getting an Exit Code 139 on Kuberneties.

And I believe this is the reason....

Incompatibilities
This is by far the most common reason for SIGSEGV errors, and luckily one that is very easy to fix. After updating a library, if you forget to change the version number of that library, then your system may attempt to load the older binary library. If this older binary then tries to access memory addresses assigned to the newer library, then an incompatibility error exists across your binaries and libraries. This is a very common mistake. It can be fixed by simply updating the version number whenever you update a library and its binaries.

Source: https://techreport.com/blog/exit-code-139-kubernetes/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bot Services Required for internal Azure reporting. Do not delete. Do not change color. bug Indicates an unexpected problem or an unintended behavior. customer-replied-to Indicates that the team has replied to the issue reported by the customer. Do not delete. customer-reported Issue is created by anyone that is not a collaborator in the repository.
Projects
None yet
Development

No branches or pull requests

3 participants