Skip to content

Queueing fixes#15

Merged
jwstanwick merged 6 commits intostagingfrom
queueing-fixes
Aug 5, 2025
Merged

Queueing fixes#15
jwstanwick merged 6 commits intostagingfrom
queueing-fixes

Conversation

@jwstanwick
Copy link
Copy Markdown
Collaborator

Fixed so that jobs waiting in the queue are cached in the queue instead of just being thrown out if no active worker is available.

@jwstanwick jwstanwick requested a review from Copilot August 5, 2025 17:27

This comment was marked as outdated.

@jwstanwick jwstanwick merged commit 37ca8c9 into staging Aug 5, 2025
1 of 2 checks passed
jwstanwick added a commit that referenced this pull request Aug 22, 2025
* changed deployment process to be exclusively for docker

* added bundle commands

* E2E CI / CD (#2)

* feat: Initialize GridLLM project with package.json, TypeScript configuration, and integration tests

- Added package.json with scripts for server and client installation, building, and Docker commands.
- Created integration tests for GridLLM, including health checks and job processing flow.
- Implemented Jest setup for integration tests with increased timeout.
- Configured TypeScript with strict settings and included necessary directories for compilation.

* refactor: Update integration tests workflow and remove legacy test files

* refactor: Update integration tests workflow to use 'docker compose' syntax

* changed launch pattern

* test

* reduced the timeout and added ollama to network

* refactor: Simplify Ollama container setup and connection to Docker Compose network

* change testing flow

* fix: update endpoint URL for Ollama API generation test (#3)

* Local dev environment fixes (#4)

* module alias fails with `npm run dev` - ilearnio/module-alias#103

* remove unused dep

* fix dev on client also

* bring back `npm start` from makefile if you want to run prod natively for whatever reason

* feat: add GitHub Action for automatic code formatting on PR approval (#5)

* Add hot swapping support for worker disconnections (#6)

* added support for hot swapping in if a client disconnects

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* undo copilot

* update action

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* added support for streaming (#7)

* added support for streaming

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fix port conflicts in docker compose and enable /ollama/api/tags (#9)

* fix: correct port mappings and healthcheck URLs in docker-compose.yml

* fixed docker compose

* fix: update server port mapping and enhance error handling in API tags endpoint

* Added support for /api/embed and easier client connection configuration (#10)

* updated package

* changed `npm run client` to only launch in a dockerized container

* added an ollama reference to compare responses to

* embeddings is live

* removed integration tests

* updated prettier script

* added workflow_dispatch

* Update documentation (#11)

* added license. rewrote readme

* update license, readme

* updated license to MIT

* update license

* Update README.md

* Docker build image tweaks (#12)

* bump node version - CVE for 18, also EOL

* two stage build file, image size 512MB -> 210MB

* Docker changes complete

* Update .github/workflows/docker-build.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update .github/workflows/docker-build.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update format.yml

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: John Stanwick <48192612+jwstanwick@users.noreply.github.com>

* cleaned up the mocked endpoints (#13)

* cleaned up the mocked endpoints

* fixed compilation

* Update README.md

* Potential caching strategy improvement - local testing showed ~40s-1m improvement on second build, some argument on GH issues if this cache strategy is broken with multi-stage builds

* Queueing fixes (#15)

* Enhance job scheduling logic to handle busy workers and improve logging for job queue management

* Improve job scheduling logging for model worker status and queue management

* update format script. resolve logger errors

* update formatting

* formatting

* Added openai api support (#17)

* added /v1/completions

* added /v1/chat/completions

* integration test

* added equivelancy

* update integration tests

* update integration test

* remove logprobs

* update to use /v1/chat/completions for ollama streaming chat

* update ci

* Auto-format code with prettier [skip ci]

---------

Co-authored-by: GitHub Action <action@github.com>

---------

Co-authored-by: Camp Steiner <joefakocamp@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: GitHub Action <action@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants