Skip to content

feat: add BaseInfer ABC, input size limits, and payload size middleware#10

Merged
alez007 merged 2 commits intomainfrom
feat/input-size-limits
Apr 9, 2026
Merged

feat: add BaseInfer ABC, input size limits, and payload size middleware#10
alez007 merged 2 commits intomainfrom
feat/input-size-limits

Conversation

@alez007
Copy link
Copy Markdown
Owner

@alez007 alez007 commented Apr 7, 2026

Introduce BaseInfer abstract base class that all inference backends (vLLM, Transformers, Diffusers, Custom) now inherit from. This unifies the interface, eliminates duplicated "not supported" boilerplate, and adds per-model max_context_length detection in every loader.

Add PayloadSizeLimitMiddleware to the gateway (YASHA_MAX_REQUEST_BODY_BYTES, default 50 MB) as a coarse safety net against oversized requests.

Add max_context_length() to all plugin base classes so plugins can report their model's context limit.

Alex M added 2 commits April 7, 2026 15:57
Introduce BaseInfer abstract base class that all inference backends
(vLLM, Transformers, Diffusers, Custom) now inherit from. This unifies
the interface, eliminates duplicated "not supported" boilerplate, and
adds per-model max_context_length detection in every loader.

Add PayloadSizeLimitMiddleware to the gateway (YASHA_MAX_REQUEST_BODY_BYTES,
default 50 MB) as a coarse safety net against oversized requests.

Add max_context_length() to all plugin base classes so plugins can
report their model's context limit.
…ion and context length tracking

- Remove use_gpu (int/str) config and CUDA_VISIBLE_DEVICES pinning — Ray
  handles GPU scheduling via num_gpus fractions
- Add BaseInfer._get_memory_fraction() used by vllm, diffusers, transformers
- Add BaseInfer._set_max_context_length() for per-model context tracking
- Remove vllm speech serving (TTS requires loader=custom with plugin)
- Add CUDA_DEVICE_ORDER=PCI_BUS_ID to Dockerfile
- Add lint-fix Makefile target
- Update docs to reflect simplified GPU allocation
@alez007 alez007 merged commit d89d166 into main Apr 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant