New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbounded SQLite instances/connections contributing to OOM failures #22650

Closed
sharwell opened this Issue Oct 10, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@sharwell
Member

sharwell commented Oct 10, 2017

Version Used: 15.3

Currently we fail to bound the number of instances of the following types which are created at runtime:

  • SQLitePCL.sqlite3
  • Microsoft.CodeAnalysis.SQLite.Interop.SqlConnection

Associated with these types is a pair of allocations in the native heap. One is 64,000 bytes, and the other is 425,600 bytes. Ordinarily, this would not be a problem. However, it appears that it is possible for the number of connections to grow over time, resulting in overwhelming memory pressure stemming from the (mis-)use of SQLite. The following image shows one such case:

image

After fixing this for 15.5, we should port the fix to 15.4 servicing.

@CyrusNajmabadi

This comment has been minimized.

Show comment
Hide comment
@CyrusNajmabadi

CyrusNajmabadi Oct 10, 2017

Contributor

Interesting. This could definitely happen. Specifically in the case of a huge amount of concurrent calls into the persistence service.

The worst case would be:

Task-1 makes persistence call, disk is loaded and actual disk writing blocks thread. Threadpool gets full and continues spawning more tasks, which all end up doing this. Eventually disk is released and these all complete. But the pool of connections still holds onto all of these.

A reasonable cap on the pool size (maybe 16 or so) woudl avoid this. but it would not address the problem of threadpool starvation+thread-spawning that can occur if the disk is loaded.

A different sort of fix would be to have a dedicated thread for writing to the DB. Requests to write from other threads would be queued to this single thread, allowing the writes to be 'async' from the perspective of the callign thread. Slow disk would then only be an impact on that single writer.

We would also only need a single connection at that point.

Contributor

CyrusNajmabadi commented Oct 10, 2017

Interesting. This could definitely happen. Specifically in the case of a huge amount of concurrent calls into the persistence service.

The worst case would be:

Task-1 makes persistence call, disk is loaded and actual disk writing blocks thread. Threadpool gets full and continues spawning more tasks, which all end up doing this. Eventually disk is released and these all complete. But the pool of connections still holds onto all of these.

A reasonable cap on the pool size (maybe 16 or so) woudl avoid this. but it would not address the problem of threadpool starvation+thread-spawning that can occur if the disk is loaded.

A different sort of fix would be to have a dedicated thread for writing to the DB. Requests to write from other threads would be queued to this single thread, allowing the writes to be 'async' from the perspective of the callign thread. Slow disk would then only be an impact on that single writer.

We would also only need a single connection at that point.

@heejaechang

This comment has been minimized.

Show comment
Hide comment
@heejaechang

heejaechang Oct 11, 2017

Contributor

I think I saw from some sqlite doc there is a mode where sqlite serialize all operation to db so that there is basically no concurrency? can we use that mode if we are going to implement our own queue?

Contributor

heejaechang commented Oct 11, 2017

I think I saw from some sqlite doc there is a mode where sqlite serialize all operation to db so that there is basically no concurrency? can we use that mode if we are going to implement our own queue?

@heejaechang

This comment has been minimized.

Show comment
Hide comment
@heejaechang

heejaechang Oct 11, 2017

Contributor

we also changed connection to be a shared connection 15.5 which will share all cache and etc rather than creating new data structure for each connections. I wonder whether this still happens for 15.5

Contributor

heejaechang commented Oct 11, 2017

we also changed connection to be a shared connection 15.5 which will share all cache and etc rather than creating new data structure for each connections. I wonder whether this still happens for 15.5

sharwell added a commit to sharwell/roslyn that referenced this issue Oct 11, 2017

@sharwell sharwell self-assigned this Oct 11, 2017

sharwell added a commit to sharwell/roslyn that referenced this issue Oct 12, 2017

sharwell added a commit to sharwell/roslyn that referenced this issue Oct 12, 2017

@sharwell sharwell modified the milestones: 15.5, 15.4 Oct 14, 2017

@sharwell sharwell closed this Oct 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment