Skip to content

Increase max active resources supported by server#2189

Merged
r4victor merged 8 commits intomasterfrom
issue_2138_server_load
Jan 16, 2025
Merged

Increase max active resources supported by server#2189
r4victor merged 8 commits intomasterfrom
issue_2138_server_load

Conversation

@r4victor
Copy link
Copy Markdown
Collaborator

@r4victor r4victor commented Jan 15, 2025

Closes #2183

This PR improves server performance and documents the new server limits:

  • Adds batch processing to background tasks to support more active resources without increased processing latency.
  • Fixes some long write transactions. Some are left with comments since they happen rare and are not critical.
  • Fixes a major bug with locking in _wait_to_lock_many().

The new limits is 150 run/jobs/instances. At that load, the server can process resources with <2minutes latency, which is tolerable. Increasing processing rate further leads to cloud rate limit errors (I hit AWS).

@r4victor r4victor requested review from jvstme and un-def January 15, 2025 11:16
Due to aiosqlite's default, NullPool was used for sqlite.
This is a suboptimal setting when spawning many sessions as we do.
(See bluesky/tiled#663)
Also increase busy_timeout to 30s.
It allows getting rid of "database is locked" when stopping many runs (e.g. 20 at a time).
@r4victor r4victor merged commit 5c6891a into master Jan 16, 2025
@r4victor r4victor deleted the issue_2138_server_load branch January 16, 2025 11:54
pranitnaik43 pushed a commit to bahaal-tech/dstack that referenced this pull request Feb 9, 2025
* Increase runner submitWaitDuration to 5m

* Fix potentially long write DB transactions

* Set PRAGMA synchronous=NORMAL

* Fix _wait_to_lock_many()

* Implement batch background processing

* Add Server limits to Server deployment

* Fix reuse instances lock

* Use AsyncAdaptedQueuePool for sqlite

Due to aiosqlite's default, NullPool was used for sqlite.
This is a suboptimal setting when spawning many sessions as we do.
(See bluesky/tiled#663)
Also increase busy_timeout to 30s.
It allows getting rid of "database is locked" when stopping many runs (e.g. 20 at a time).
pranitnaik43 pushed a commit to bahaal-tech/dstack that referenced this pull request Mar 4, 2025
* Increase runner submitWaitDuration to 5m

* Fix potentially long write DB transactions

* Set PRAGMA synchronous=NORMAL

* Fix _wait_to_lock_many()

* Implement batch background processing

* Add Server limits to Server deployment

* Fix reuse instances lock

* Use AsyncAdaptedQueuePool for sqlite

Due to aiosqlite's default, NullPool was used for sqlite.
This is a suboptimal setting when spawning many sessions as we do.
(See bluesky/tiled#663)
Also increase busy_timeout to 30s.
It allows getting rid of "database is locked" when stopping many runs (e.g. 20 at a time).
pranitnaik43 pushed a commit to bahaal-tech/dstack that referenced this pull request Mar 5, 2025
* Increase runner submitWaitDuration to 5m

* Fix potentially long write DB transactions

* Set PRAGMA synchronous=NORMAL

* Fix _wait_to_lock_many()

* Implement batch background processing

* Add Server limits to Server deployment

* Fix reuse instances lock

* Use AsyncAdaptedQueuePool for sqlite

Due to aiosqlite's default, NullPool was used for sqlite.
This is a suboptimal setting when spawning many sessions as we do.
(See bluesky/tiled#663)
Also increase busy_timeout to 30s.
It allows getting rid of "database is locked" when stopping many runs (e.g. 20 at a time).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test and document max number of active runs and instances supported by the server

2 participants