refactor(model-platform): replace provider dialect configuration by earayu · Pull Request #1697 · apecloud/ApeRAG

earayu · 2026-04-25T13:17:04Z

Summary

Replace the old provider/dialect/custom_llm_provider configuration model with ModelProvider, ModelAccount, Model, ModelUse, and ModelRunner abstractions.
Add /api/v3/model-* APIs, a shared model runtime resolver/runner layer, and update collection/retrieval/agent/chat/title paths to reference model_id.
Remove legacy provider v2 UI/service code and the root models/ generated catalog assets.

Test plan

uv run ruff check aperag tests/unit_test/test_model_platform_v3_contract.py tests/unit_test/llm/test_model_runtime_resolver.py
uv run python -m pytest tests/unit_test/test_model_platform_v3_contract.py tests/unit_test/llm/test_model_runtime_resolver.py tests/unit_test/test_openapi_spec.py tests/unit_test/chat/test_chat_completion_service.py tests/unit_test/service/test_collection_service_embedding_lock.py tests/unit_test/test_modularization_boundaries.py tests/unit_test/test_web_typed_api_contract.py -q
uv run python scripts/export_openapi.py
cd web && yarn api:v2:types

Notes

cd web && yarn tsc --noEmit is still blocked by existing admin/quota/i18n type issues unrelated to this model-platform path.

Made with Cursor

Introduce a first-class model account/model/use abstraction so users no longer configure LiteLLM dialects or custom provider routing names. Made-with: Cursor

The model_provider table introduced by b4f2d91c8e3a declared provider_type with a separate UniqueConstraint plus a non-unique index, but the ORM declares the column as unique=True, index=True (which SQLAlchemy renders as a single unique index). alembic check flagged the drift and broke lint-and-unit on PR #1697. This revision drops the redundant unique constraint and promotes the index to unique=True so the autogenerate diff is clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The model-platform refactor deleted /api/v2/providers/* and db.ops.query_provider_api_key without migrating the call sites that still rely on them. Three small follow-ups to make CI green: * Add AsyncLlmProviderRepositoryMixin.query_provider_api_key as a thin shim over ModelAccount so document_service.fetch_url_documents and web_access routes that look up the user's Jina key keep working. Falls back to public ACTIVE accounts when ``need_public`` is set, matching the old llm_provider.api_key semantics. * Rewrite tests/e2e_http/hurl/full/10_provider_llm.hurl to drive the new /api/v3/model-providers, /model-accounts, /models, /model-uses surface plus /api/v1/embeddings + /api/v1/rerank with model_id. Uses provider_type=dashscope for the alibabacloud account and provider_type=openai_compatible for the openrouter account, both of which are seeded by 7c4e9e1f8b21. * Update tests/unit_test/chat/test_chat_title_service.py to reflect the new chat-title flow: it no longer reaches into default_model_service; instead db_ops.query_model_uses returns the background-task ModelUse and the assertion now guards against model_invocation_service.chat being awoken on an empty history. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ice surface The temporary ``query_provider_api_key`` shim on the LLM provider repository is removed. Cross-domain callers that need a raw provider API key (web_access JINA reader, knowledge_base fetch-url) now go through ``ModelPlatformService.get_user_provider_api_key``, the canonical surface for non-model-platform domains. The repository keeps the SQL primitive as ``query_model_account_api_key`` — a properly-named consumer of the new ``ModelAccount`` row, no longer framed as a backward-compat shim. This closes the design completeness gap PM flagged for #1697 (msg =8ac3e7d9): the model-platform refactor must not leak a v2-shape api key lookup into other domains' DB-ops surface. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The model-system refactor cuts ``ModelSpec.{model, model_service_provider, custom_llm_provider}`` and only keeps ``model_id``. Pydantic silently drops extras, so any hurl file still sending the old triple-shape parses as ``model_id=None`` and the downstream code-path goes silently broken (collection vector index, bot completion). Following the template established by ``10_provider_llm.hurl``, each file that exercises a real provider path (11, 13, 17) now seeds its own ``ModelAccount`` + ``Model`` via the v3 routes and references the captured ``model_id`` in the ``embedding`` / ``completion`` blobs. ``19_retrieval_http.hurl`` is a deterministic 4xx-shape test that must not depend on any provider seed; the optional embedding / completion config is dropped entirely. Closes the second design completeness gap PM flagged for #1697 (msg=8ac3e7d9). Provider keys are still required to actually run these against a live stack — the shape itself is now correct. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… fix Three Weston blockers (msg=80e873c1) on the model-platform refactor: * Blocker A — ``/api/v1/embeddings`` and ``/api/v1/rerank`` are permanent OpenAI-compat allowlist routes. The PR's first cut required ``model_id`` and broke pre-#1697 callers (provider hurl + external clients) with 422. ``EmbeddingRequest`` / ``RerankRequest`` now accept either the new ``{model_id}`` shape *or* the legacy ``{model, model_service_provider, custom_llm_provider}`` triple. The triple is resolved server-side via the new ``ModelPlatformService.resolve_legacy_model_id`` (provider_type + provider_model_id → ``Model.id``). ``/api/v3/model-*`` is untouched and still ``model_id``-only. * Blocker B — alembic multi-head with #74. Both ``b4f2d91c8e3a`` and ``d8e2c4a17b91`` (which landed on main while #1697 was open) had ``down_revision=7c4e9e1f8b21``. Rebased onto current main and re-chained ``b4f2d91c8e3a → d8e2c4a17b91`` so ``alembic heads`` reports a single head (``84fac9e3d8c2``). * Blocker C — pre-#1697 collection / bot configs in the DB hold the legacy triple in JSON. Pydantic silently dropped extras after the schema cut, so existing rows would parse with ``model_id=None`` and the runtime resolver would 404. Two halves: - ``ModelSpec`` now stashes the legacy triple onto private ``legacy_*`` slots at parse time and exposes ``has_legacy_triple()``. Sync code paths (``base_embedding`` / ``base_completion``) and the async agent-runtime path (``_resolve_request``) call the new sync/async resolvers to fill ``model_id`` lazily before the runtime lookup runs. ``model_dump`` does not leak the legacy fields — only the canonical ``{model_id}`` shape goes back out the wire. - The ``b4f2d91c8e3a`` migration captures any user-supplied ``model_service_provider`` API keys + ``llm_provider_models`` rows and replays them as ``ModelAccount`` (``user_id="public"``) + ``Model`` rows in the new schema *before* dropping the legacy tables. Best-effort, not strict — rows that don't fit any new provider type are skipped (see migration docstring for the mapping). Idempotent w.r.t. repeated ``alembic upgrade`` (the legacy-table drops are now ``inspect``-guarded after Phase-7 teardown rebuilt the baseline migrations from scratch). Regression coverage in ``tests/unit_test/test_model_platform_v1_compat.py``: new + legacy parses for both ``EmbeddingRequest`` / ``RerankRequest`` and ``ModelSpec``; ``model_id`` precedence over the legacy triple; private legacy fields stay out of ``model_dump``. The existing v3 contract test (``test_model_platform_v3_contract.py``) still asserts the new ``/api/v3/...`` schemas never expose the legacy field names. Local gates: ``make db-check`` (single head, no autogen drift), ``make lint``, ``pytest tests/unit_test/`` (722 passed, 29 skipped). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ec shape Closes Blocker D (Option A-extended, PM lock-in msg=e551e144 + msg=06e8f718). PR #1697 collapses the legacy ``{model, model_service_provider, custom_llm_provider}`` triple and keeps only ``model_id`` in ``ModelSpec``. Pydantic silently drops extras, so any hurl / bash file still sending the old triple parses as ``model_id=None`` and the downstream code-path goes silently broken (collection vector index, bot completion, graph index). Following the template established in ``10_provider_llm.hurl`` (and the previous migration of 11/13/17 in ad69af9), each remaining file that exercises a real provider path now seeds its own ``ModelAccount`` + ``Model`` via the v3 routes and references the captured ``model_id`` in the ``embedding`` / ``completion`` blobs. Files migrated: * tests/e2e_http/hurl/full/12_bot.hurl * tests/e2e_http/hurl/full/14_graph_http.hurl * tests/e2e_http/hurl/full/15_agent_runtime_v3.hurl * tests/e2e_http/hurl/full/20_knowledge_graph_http.hurl * tests/e2e_http/scripts/run_chat_collection_flow.sh * tests/e2e_http/scripts/run_graph_index_flow.sh The bash scripts now require ``E2E_ALIBABACLOUD_API_KEY`` + ``E2E_OPENROUTER_API_KEY`` so they can seed the v3 model rows up front, mirroring the hurl variable convention. No semantic changes beyond the shape rewrite. Final grep across ``aperag/ tests/ web/src/ docs/zh-CN/`` confirms no live caller / hurl / bash / FE config payload still sends the legacy triple — the only remaining matches are: * the new Blocker A compat parser in ``aperag/domains/model_platform`` + ``aperag/schema`` (intended — it accepts both shapes), * litellm SDK ``custom_llm_provider`` keyword args inside the LLM invocation runners (a different namespace — that field is the Python kwarg name on ``litellm.completion``), * the ghost-guard / regression-guard tests in ``tests/unit_test/test_model_platform_v3_contract.py`` and ``tests/unit_test/tasks/test_collection_init_skip.py``, * the legacy migration files (which by definition had to ``CREATE TABLE`` the old schema before the refactor migration ``DROP``s it), * and the FE i18n bundles + audit-log docs (string labels, not JSON contract field names — out of scope). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

query_model_account_api_key documents fallback semantics as "fall back to public when the user has no personal account", but ORDER BY only sorted by gmt_updated DESC — a freshly-edited shared "public" row silently shadowed the caller's own credential when fallback_to_public=True. Both production callers (fetch_url_documents, _get_user_jina_api_key) hit this path, so it was a real correctness bug. Add an ownership-first ORDER BY (CASE WHEN user_id = $caller THEN 0 ELSE 1) before the timestamp ordering so a user-owned row always wins over public, regardless of update timestamps. Public is still considered when (and only when) the caller has no personal row. Regression tests in test_model_platform_v1_compat.py: - test_user_personal_key_wins_over_newer_public_key: seed user row (1h ago) + public row (now), expect "user_key"; verified red on the un-fixed query - test_public_key_returned_when_user_has_no_personal_account: sanity that the actual fallback path is unaffected Weston blocker msg=fcefbaf7. No schema change (db-check clean).

earayu and others added 7 commits April 25, 2026 22:15

refactor(model-platform): replace provider dialect configuration

ca59679

Introduce a first-class model account/model/use abstraction so users no longer configure LiteLLM dialects or custom provider routing names. Made-with: Cursor

earayu force-pushed the refactor/model-system branch from e119437 to 9d03ac0 Compare April 25, 2026 14:32

earayu merged commit 384dd92 into main Apr 25, 2026
9 checks passed

earayu deleted the refactor/model-system branch April 25, 2026 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(model-platform): replace provider dialect configuration#1697

refactor(model-platform): replace provider dialect configuration#1697
earayu merged 8 commits intomainfrom
refactor/model-system

earayu commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

earayu commented Apr 25, 2026

Summary

Test plan

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant