Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cosmos library takes 100% CPU on Windows #6112

Closed
dvanackere-lpg opened this issue Oct 31, 2019 · 8 comments
Closed

[BUG] Cosmos library takes 100% CPU on Windows #6112

dvanackere-lpg opened this issue Oct 31, 2019 · 8 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. Investigate The issue needs investigation to get to the root cause
Milestone

Comments

@dvanackere-lpg
Copy link

Describe the bug
Due to a bug in netty (netty/netty#9710, already solved but no version of netty with this correction is available yet), the RntbdRequestTimer makes an infinite loop (Sleep(0)) when running on a Windows machine.

To Reproduce
Run a Spring application with the cosmos library on Windows.

Code Snippet
In RntbdRequestTimer:
this.timer = new HashedWheelTimer(FIVE_MILLISECONDS, TimeUnit.NANOSECONDS)

Expected behavior
This could be solved by:

  • allowing Azure Function to run on Linux (I don't know why it's not possible, even dotnet Azure Function are allowed on Linux).
  • setting the timer to 10 milliseconds

Setup (please complete the following information):

  • OS: Windows10
  • IDE : IntelliJ
  • Version of the Library used: tested with 3.2.0 and 3.4.0

Additional context
I use this library through spring-data-cosmodb hosted in an Azure Function.

@loarabia loarabia added Client This issue points to a problem in the data-plane of the library. Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Nov 2, 2019
@triage-new-issues triage-new-issues bot removed the triage label Nov 2, 2019
@kushagraThapar kushagraThapar added this to To do in Cosmos DB SDK team via automation Nov 4, 2019
@kushagraThapar
Copy link
Member

@David-Noble-at-work Can you please look into this?

@dvanackere-lpg
Copy link
Author

I've recompiled the project to try different values for the timer:
After setting the timer to 10 milliseconds, the CPU usage drop to 40% (in Azure).
After setting the timer to 100 milliseconds, the CPU usage drop to 3% (in Azure), which is normal.
I don't know if the problem is that the HashedWheelTimer has a poor performance (in a Windows machine at least).
I don't know which side effects this timer have: my application seems to work normally but is it safe to keep this setting?

@kushagraThapar
Copy link
Member

I've recompiled the project to try different values for the timer:
After setting the timer to 10 milliseconds, the CPU usage drop to 40% (in Azure).
After setting the timer to 100 milliseconds, the CPU usage drop to 3% (in Azure), which is normal.
I don't know if the problem is that the HashedWheelTimer has a poor performance (in a Windows machine at least).
I don't know which side effects this timer have: my application seems to work normally but is it safe to keep this setting?

Hey, we are testing this particular timer change on our end, and still waiting to get an update from a windows machine perspective.

We have tested this on Mac and linux, and it works fine for us, so you should be safe / good to keep this setting.

@kushagraThapar kushagraThapar added the Investigate The issue needs investigation to get to the root cause label Dec 3, 2019
@kushagraThapar kushagraThapar modified the milestones: Backlog, Sprint 163 Dec 4, 2019
@David-Noble-at-work David-Noble-at-work moved this from To do to In progress in Cosmos DB SDK team Dec 9, 2019
@David-Noble-at-work David-Noble-at-work moved this from In progress to To do in Cosmos DB SDK team Dec 10, 2019
@kushagraThapar
Copy link
Member

@David-Noble-at-work - any updates on this ?

@kushagraThapar
Copy link
Member

@david-lpg I had a chat with David, and we couldn't see any CPU difference at least on linux, not sure about windows. So you are good to go with 100 milliseconds on the timer.

In addition to above, I would like to mention that we have exposed these options to our Rntbd (TCP Transport Client) in v3.6.0 release.
So you don't need to build the SDK locally, or change the source code, you can use these options to provide these values to the SDK.

Here is the changelog and way to provide these options:
https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/cosmos/changelog/README.md#360

Here is more information on how to use it: https://github.com/David-Noble-at-work/azure-cosmos-examples#using-system-properties-to-modify-default-direct-tcp-options

@kushagraThapar
Copy link
Member

@david-lpg please close this if this fixes your issue.

@dvanackere-lpg
Copy link
Author

Sorry for the delay, I had to work on an other project lately. I tried today to use the config file to modify the requestTimerResolution parameter. This seems to work even if I don't really know which side effects this parameter could have on my application. I'll use it for now. But the real fix would be to use netty 4.1.44.Final (4.1.42.Final at the moment). Thanks for your help.

Cosmos DB SDK team automation moved this from To do to Done Feb 17, 2020
@kushagraThapar
Copy link
Member

Sorry for the delay, I had to work on an other project lately. I tried today to use the config file to modify the requestTimerResolution parameter. This seems to work even if I don't really know which side effects this parameter could have on my application. I'll use it for now. But the real fix would be to use netty 4.1.44.Final (4.1.42.Final at the moment). Thanks for your help.

Thanks @david-lpg We will update netty versions in next release.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. Investigate The issue needs investigation to get to the root cause
Projects
Development

No branches or pull requests

4 participants