-
Notifications
You must be signed in to change notification settings - Fork 2.4k
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[.Net SDK, state service] Doesn't save state #4477
Comments
Please share your code, this will be next to impossible to diagnose without your code. |
@JasonSowers added as collaborator to the copy of repository Also today we have tried to downgrade BF to 3.12.4(stable version we used before 3.13.1) right now 30+ full conversations(25-40 interactions) , no bug |
You're not going mad SenseVision, I just had a panic attack for most of the afternoon today because I thought I had completely messed up my work (and mediocre understanding of how BotFramework worked). Exactly the same experience after moving from 3.14.0 to either 3.14.1 or 3.15 (not enough testing between upgrades to determine which) - random mutations in conversation position, sometimes even going back to points in 'completed' (Context.Done) dialogs. I only noticed properly after upgrading again to 3.15. I'm using Azure Table Storage for state, and an Azure app service for the bot. Also 3.2.5 for Microsoft.Bot.Builder.Azure. Dialogs for conversations. (No LUIS, no FormFlow, or anything fancy. Bog-standard MessagesController.Post -> scope.Resolve, Conversation.SendAsync.... -> RootDialog -> context.Call(other dialogs) ) After downgrading Bot.Builder and Bot.Builder.History from 3.15 to 3.13.1 (and BotConnector to 3.14.07, although I can't say right now which is making the difference) everything is OK again. WindowsAzure.Storage is at 9.1.1. To Jason and the team - apologies, as this kind of random comment that is missing a workable repro is about as welcome as a kick in the teeth :) If I can find a consistent repro, I will certainly post it, but as SenseVision notes, it is frustratingly random. |
I got your invite, I will start looking into this. We recently released a new SDK version 3.15 can you please confirm if you are experiencing the same issue in 3.15. @oflahero I couldn't gather if 3.15 was working or not for you from your post. |
Small update: Bot.Connector is not the issue. It's a problem in Bot.Builder (and/or .History) between 3.13.1 (OK) and 3.15 (not OK). I can't test if it's Bot.Builder.History in isolation as upgrading that nupkg requires Bot.Builder. I will try 3.14.1.1 at some point tomorrow and try to narrow it down. |
I experienced the same issues with 3.14.1.1. So the problem appears to have been introduced somewhere when moving between 3.13.1 to 3.14.1.1 in Microsoft.Bot.Builder and Microsoft.Bot.Builder.History (unless I was doing something stupid in my code that relied on something that was fixed after 3.13.1!) I don't have any further granularity info. |
@oflahero Can you please share your project as well? I'm trying to make a minimum reproducible example and want to see if there are any similarities between your project and SenseVision's So I have started with the most simple implementation of saving state data and I am adding pieces in from SenseVision's project to try to figure out the root cause, but I haven't been able to reproduce it yet. I will be spending the day looking into this |
@oflahero @SenseVision Can either of you confirm this happens in the emulator as well? I have been trying all day to reproduce this with no luck. |
Hey there. I spent a bit of time on this today too - I've settled on 3.14.0 for Bot.Builder, Bot.Builder.History and Bot.Connector as the most stable configuration. Anything higher (>=3.14.1) and the issue is back. I tried to reproduce in the emulator with no luck. It's only happening for me in the channel, which in my case is Facebook Messenger. My project's commercial, and currently pretty large, so no sharing I'm afraid. I understand the frustration of spending time trying to reproduce an issue like this. I will hopefully try to do a pared-down minimal project that exhibits the problem, but I wouldn't be surprised if it doesn't. The problem project has so many interconnected moving parts - targets full 4.7.1, also NetStandard by some NuGet packages. I have enough issues with assembly bindings as it is. I can understand putting this on hold til more people claim to have the issue also. |
Let me try facebook, no luck with repro in the emulator, webchat, (or facebook now) 👎 I would really prefer to get to the bottom of this rather than put it on hold. I would guess it is affecting more customers than just you 2. So any more help you can give would be greatly appreciated |
Crazy busy today but I'll try to do a repro later if I can. |
@JasonSowers sorry , i haven't answered , was super busy . Check 3.15 -> Can either of you confirm this happens in the emulator as well? -> I've rechecked references history , looks like @oflahero is right 3.13.1 is OK |
Hi guys, Our guess was that something in the way the bot state is saved has changed between those versions (we are using TableStorage with a custom key for the bot state), but we couldn't find anything suspicious in the Bot.Builder commits... So we just reverted back to 3.13.1 and everything works again. |
Welcome to the club! I'm about halfway through setting up a minimal project - haven't forgotten about you Jason. No guarantees it'll exhibit the symptoms, mind. |
No worries guys, been busy over here as well. I hope to make some more time to try to get to the bottom of this next week. All of my bots are running 3.15 (because I test the SDK before we release it) and I have not noticed any issues with state like you are seeing, so I'm hoping one of you can provide a clue here especially because I haven't been able to reproduce this. |
I have a repo at https://github.com/oflahero/315Issue where it looks like I can reproduce the issue. It's a simple bot that echoes what you type with the length of the input string. If you enter 'find', it also launches a dialog simulating searching for a phone or email contact by name substring. There are only three contacts hardcoded. Once it either returns a match or tells you there are no matches, you're out of the dialog again. Using branch 315 (Bot Builder & Connector at 3.15.0), and the Facebook channel, you can eventually (after repeatedly hitting 'find' and responding appropriately) observe it jump out of sync. It may take a good ten minutes of this. Note: there's some code in the repo that isn't used - I was just trying to grab stuff from the problem project in order to reproduce the issue. I have a publicly available FB messenger page hosting the 3.15.0 version, if anyone wants to try it without installing themselves. Just send me your FB profile identifier so I can add you as a test user to the app on FB Developer, and I'll let you know where to look. Branch 314 is the stable (3.14.0) version. The 'UnrelatedPostIssue' branch is another problem entirely - it's a one-character diff commit, but it causes a 'BadRequest' exception in the bot on PostAsync. It's completely floored me. |
Hi Guys |
I went through the Bot.Builder commits between 3.13.1 and 3.14.1.1 The only change related to our issue seems to be CachingBotDataStore changes in commit I couldn't spot anything suspicious with the changes, but @oflahero could you register your TableBotDataStore directly as IBotDataStore and remove the CachingBotDataStore registration in your example and see if that helps? |
As I understand it though @thschuetze , you need a CachingBotDataStore sitting on top of your keyed registration for it to work though? At least until we get BotBuilder v4... (see issue #4376) Maybe there's a simpler way. I'll be the first to admit that the tight integration of Autofac into BotBuilder is a pet hate of mine (mainly because it confuses the hell out of me, and I loathe DI on principle :) ) |
Awesome work guys, thanks for helping us work this out. I will be spending time on this today, but mainly tomorrow since its late in the day now trying to address the issue. As a note when I spent nearly a whole day trying to reproduce I tried both with and without @seca23 this issue does not seem related can you please open a new issue with details about what you are experiencing. |
@oflahero you are right, I wasn't aware of that issue. Btw. I got your example code and deployed it, but so far haven't been able to reproduce the issue with Slack (web), Teams (web) or Skype (iOS app). I don't have a facebook bot available that I could connect. |
Good news I was able to reproduce in the emulator, the bad news is I need to catch it on a turn it fails on this is going to be a tough one to find 👎 |
@seca23 I actually think your issue is related after looking into this. Are you using multiple dialogs/prompts are you able to share what your dialogs look like if so |
@JasonSowers from the MessageController I call a LuisDialog. When an intent is triggered it calls a new dialog [that inherits from IDialog]. This dialog contains a few prompts and then starts a form flow. Now, at first it seemed like the problem existed at the completion of the form flow. However, later on, I have noticed that the problem also occurred in the middle of the form flow, and in other prompt dialogs. This happens intermittently. If I run through a form flow quickly, it seems to be fine again. However, especially when there is a delay in user response the problem seems to occur. Our bot endpoint is local and we store the state in local SQL table storage. |
@JasonSowers great stuff! @thschuetze It can be very frustrating trying to reproduce it. I gave up today after five solid minutes of banging the keyboard. I did on two occasions find that the problem appeared when I had left the conversation in the middle of a dialog for a significant time. How long that has to be I can't say, but my Azure website had definitely spun down by then and took an extra ten secs or so to get going again (don't let anyone fool you that Azure's 'always on' setting really does what it says on the tin) |
It seems prompts (which are dialogs themselves) seem to be the culprit here. Is anyone noticing this behavior outside of on a step when a prompt is involved? |
That could fit my situation; I'm using a PromptDIalog.Text. My example's not big enough to rule out non-prompt interactions though, might have to add a few more dummy steps in the process. |
Hi guys, Package Version Info: Issue: |
Wonder how you can recommend that if CachingBotDataStore is the only thing that makes the state being saved (because of its FlushAsync implementation that loads state instances and calls SaveAsync on them). Even InMemoryDataStore doesn't work without it in tests. |
I have the same issue with |
@JasonSowers I'd be glad to provide you an example, but the code I am working on is non-shareable. I'll try to reproduce it in a small sample project as soon as I'll have some time. |
@peterbozso Thank you very much. I am working with Jason on this and a reproduction would go a long way to being able to get closer to the heart of this issue. |
If it could help to someone. Yesterday I changed the scope of |
@mitkodi Version: v3.15.2.2 Here is how my registration looks:
|
@seca23 I don't wrap the
|
@mitkodi, I see what you mean. Just to confirm, I am using an on-premise SqlBotDataStore. Thanks, I will surely try it out and let you know if it resolves my problem. |
Not sure if this will help, but we are seeing the same issue while running automated tests. The same conversation will persist the I am running v3.15.2 for Builder & Builder.History. We are using the
Sample, incorrect private conversation data saved after processing a message during the test:
Sample, correct private conversation data saved after processing the same message during a different execution of the same test:
|
Looking a bit further, it also seems that the |
I've been playing with a dev deployment for a couple of days using 3.15.2.2, using SqlBotDataStore (Instance PerLifetimeScope()) with no surrounding CachingBotDataStore. It appears to be stable so far, no occurrences of the issue. |
Update: yesterday I observed behaviour matching this issue again, still using SqlBotDataStore, InstancePerLifetimeScope() on its own (no CachingBotDataStore.) Obviously I can't be 100% sure that the cause is the same, but the symptom is identical. My RootDialog put a new child dialog on the stack, but then on replying to the first message from the child dialog, the bot acted as if the user was still in the MessageReceivedAsync loop of the RootDialog. That is to say, the new bot state ('now in child dialog') was not saved. I didn't see any SQL Server dependency failures for this operation - had the internal 'bot state save' processing failed, I assume some kind of exception or trace would have been emitted? If it helps, I've attached the ApplicationInsights timeline from the bot request. Times are GMT (UTC+1). There appear to be more calls to login.botframework.com (and one to login.microsoftonline.com) than I'd expected. And it was after a long period of inactivity for the bot up to that point (it had last received an /api/messages request the day before.) In the meantime, back I go to 3.14.0 just in case. |
Thank you, everyone, for your continued patience with this issue. I finally have some good news. We think we have identified the issue and will be testing a private release internally very soon. We hope to have this out in the next release. I will continue to keep you updated as I have information. |
Nice one! Any hints while we're waiting Jason? Something to do with authentication timeout affecting the state save? |
@oflahero @SenseVision @thschuetze @seca23 @VanjuliK @glasody @jamesmundy @tempel3 @peterbozso @delight-by @mitkodi @egracer We believe this fix works, but there have already been instances of my testing being successful and others still reporting failures. If anyone would like to test this before we release it to nuget you can find the packages here for 3.15.3 here. Please let us know how this works out for you. I will be actively monitoring this thread today. |
Thanks guys. If I'm reading it right, this fix will have no effect on a setup using SqlBotDataStore with no CachingBotDataStore? |
@oflahero It shouldn't |
@JasonSowers for me it works till now. I wrapped again |
@JasonSowers, I have also upgraded to 3.15.3 and everything seems to work fine. |
@oflahero you should no longer have the CachingBotDataStore disabled. In fact, we advise against it at this point. |
Thanks Jason, I saw the wind was blowing that way alright and added it back in only a few hours ago. Looking good so far. |
@JasonSowers I just checked with the project I mentioned above and 3.15.3 seems to fix the issue for me completely. |
If anyone is still experiencing issues related to this, please open a new issue. Thanks again to everyone for your help and patience. |
Bot Info
*Microsoft.Bot.Builder.Azure : 3.2.5
*WindowsAzure.Storage : 8.7 didn't upgrade to 9+ , because of Cannot upgrade WindowsAzure.Storage passed 8.7 BotBuilder-Azure#58
Issue Description
It started about 2 weeks ago .
Bot doesn't save it's current state \ deserialize the previous state instead of current (the last one where he stopped ).
Occurs rarely.(It could be user get this bug , it could be not , user could get this bug several times per conversation)
Occurs in random moment of the workflow (doesn't depend on which dialog is \ context.Wait(Smth) ).
No Exceptions at all ! Neither in workflow nor http requests to Azure Tables.
Bot process looks like :
Bot state is 0
Bot moves to the state 1( send message 1)
Bot waiting for an answer( user message 1)
//
User gets bot message 1
User respond (user message 1)
//
Bot accept answer (user message 1) , Bot state = 1( expected = 1)
Bot moves to the state 2( send message 2)
Bot waiting for an answer( user message 2)
//
User gets bot message 2
User respond (user message 2)
//
Bot accept answer (user message 2) , Bot state = 1 ( expected = 2)
Actual Results
Not really good example (because use nested dialog), but same was with dialogues , wich doesn't know anything about each other.
Later , didn't changed anything in code \ server config any elsewhere
The text was updated successfully, but these errors were encountered: