New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Direct Line] Bot state calls fail under load (5 or more concurrent users) #3922

Closed
tristanvandam opened this Issue Dec 14, 2017 · 4 comments

Comments

Projects
None yet
5 participants
@tristanvandam

tristanvandam commented Dec 14, 2017

[Direct Line] Bot state calls fail under load (5 or more concurrent users)

Bot state calls fail when our bots are under load of 5 or more concurrent users interacting with it at one time.

Bot Info

Dev environment

  • App ID: b41c930f-bb27-44cc-aa48-4938d024b293
  • SDK Platform: .NET
  • SDK Version: 3.9.0
  • Active Channels: Direct Line
  • Deployment Environment: Azure App Service

Qa Environment

  • App ID: f20b587f-3aeb-4b55-bf81-657057cb7aa9
  • SDK Platform: .NET
  • SDK Version: 3.9.0
  • Active Channels: Direct Line
  • Deployment Environment: Azure App Service

Prod Environment

  • App ID: 5af95ce2-5086-43c4-b9db-869161ba51b4
  • SDK Platform: .NET
  • SDK Version: 3.9.0
  • Active Channels: Direct Line
  • Deployment Environment: Azure App Service

Issue Description

We are deploying our bot to production now and when the bot is under 'load' (Multiple simultaneous interactions or load test) on any of our environments we start encountering direct line errors and the bot falls over.

This happens as soon as we have 5 or more concurrent users, which is well below what is expected and required. The error of 5 concurrent users resulting in the bot falling over is regardless of what size app service plan we are running as we have tried to scale and the result is the same.

Exceptions that are being thrown vary between 502 (most common), 503 and then 500 errors.
errorexample

Failure Examples

The calls that seem to be failing are the botstate Direct Line call. Here are some example calls that failed.
https://state.botframework.com/v3/botstate/directline/conversations/CUgSMYbMPqJFakLhjfFFVe
https://state.botframework.com/v3/botstate/directline/conversations/HGUEVFqYOM0JkVuxd4cPyM
https://state.botframework.com/v3/botstate/directline/conversations/4ifUN0DCpUc8u71Ml826Bz

A screenshot of the bot's live metrics can be seen below:
livemetrics

Reproduction Steps

  1. Interact with the bot with more than 5 concurrent users

Expected Behavior

We have run the tests on a S1, S2 and S3 app service plan, all with the same result - 5 or more users the bot falls over.
We expect the bot to be able to handle more than 5 users running concurrently on an S1 app service plan and expect the failure point of number of concurrent users to increase when we scale up our app service plan.

Actual Results

The Test

A load test was run on our bot.
The Load policy was to rump up from 1 user interacting with the bot by adding 2 users every 3.0 minutes to a maximum of 50 users.

Results

Bot starts falling over after 5 concurrent users and is completely un-usable at 15 users.

Statistic Summary

S1 App Service Plan S2 App Service PLan
Test Duration (HH:mm:ss) 00:25:40 00:19:23
Termination Reason Manual action Manual action
Average requests/s 3.8 3.4
Average Response Time 0.873s 0.816s
Error rate 2.8 2.8
Total users launched 175 124
Total iterations completed 166 118

Load Graphs

loadgraph

Example error:

exampleerror

A similar issue was raised previously of slow responses from Direct Line and can be seen here: #3787

@EricDahlvang

This comment has been minimized.

Show comment
Hide comment
@EricDahlvang

EricDahlvang Dec 14, 2017

Collaborator

Have you implemented a custom state client? The default is intended only for prototyping. https://blog.botframework.com/2017/07/18/saving-state-azure-extensions/

Also, please do not load test the direct line channel. If you wish to load test the bot, call it directly. https://blog.botframework.com/2017/06/19/load-testing-a-bot/

Collaborator

EricDahlvang commented Dec 14, 2017

Have you implemented a custom state client? The default is intended only for prototyping. https://blog.botframework.com/2017/07/18/saving-state-azure-extensions/

Also, please do not load test the direct line channel. If you wish to load test the bot, call it directly. https://blog.botframework.com/2017/06/19/load-testing-a-bot/

@ederbond

This comment has been minimized.

Show comment
Hide comment
@ederbond

ederbond Dec 14, 2017

Hey @EricDahlvang
I'm facing a similar issue....
With 4 directline clientes calling my bot simultaneously, it starts to return a "Sorry, my bot code is having an issue." as I've reported on this issue.

Note: I'm using the InMemoryDataStore as you can see bellow:

image

I've already did what was described on the blog post about
Optimizing latency with the Bot Framework
as you can on my global.asax in the image bellow:
image

I don't know what can I do now to make my bot scale to much more users simultaneously. Scaling to multiple users is a key requirement to make it acceptable, and my time is rushing as my production ready deadline is getting close.

Could you guys please help-me to investigate it ? For me it seems to be a problem on BotConnector, but I'm not sure ? If needed I' could share my code privatly with you and your team @EricDahlvang .

ederbond commented Dec 14, 2017

Hey @EricDahlvang
I'm facing a similar issue....
With 4 directline clientes calling my bot simultaneously, it starts to return a "Sorry, my bot code is having an issue." as I've reported on this issue.

Note: I'm using the InMemoryDataStore as you can see bellow:

image

I've already did what was described on the blog post about
Optimizing latency with the Bot Framework
as you can on my global.asax in the image bellow:
image

I don't know what can I do now to make my bot scale to much more users simultaneously. Scaling to multiple users is a key requirement to make it acceptable, and my time is rushing as my production ready deadline is getting close.

Could you guys please help-me to investigate it ? For me it seems to be a problem on BotConnector, but I'm not sure ? If needed I' could share my code privatly with you and your team @EricDahlvang .

@ederbond

This comment has been minimized.

Show comment
Hide comment
@ederbond

ederbond Dec 27, 2017

After enabling Elmah on my but I've find out that there was an issue with the way I was registering the Microsoft.Bot.Builder.Dialogs.Internals.InMemoryDataStore as can be seen on these threads:
#3973
#3906

ederbond commented Dec 27, 2017

After enabling Elmah on my but I've find out that there was an issue with the way I was registering the Microsoft.Bot.Builder.Dialogs.Internals.InMemoryDataStore as can be seen on these threads:
#3973
#3906

@EricDahlvang

This comment has been minimized.

Show comment
Hide comment
@EricDahlvang

EricDahlvang Jan 3, 2018

Collaborator

@tristanvandam

Have you had the opportunity to implement a custom state client, and retry your tests? The default state service is intended only for prototyping.

Collaborator

EricDahlvang commented Jan 3, 2018

@tristanvandam

Have you had the opportunity to implement a custom state client, and retry your tests? The default state service is intended only for prototyping.

@EricDahlvang EricDahlvang self-assigned this Jan 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment