[codex] Add provider model request profiles#278
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughIntroduces a ChangesLLM Provider Model Abstraction and Refactoring
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR introduces a centralized provider/model “request profile” layer for OpenAI-compatible chat completions, then routes key extraction flows (job matching + resume/skills extraction) through that layer so strict providers (e.g., Bifrost) don’t reject unsupported parameters (notably temperature on GPT‑5-style models).
Changes:
- Added
ProviderModel+ compiled model request profiles to filter request kwargs based on model capabilities (e.g., omittemperatureforgpt-5-mini). - Updated job requirements extraction, candidate reranking, resume extraction, and skills extraction to build OpenAI request kwargs via
ProviderModel.chat_completion_kwargs(). - Added/updated unit tests to assert profile-driven kwargs behavior (model selection, response_format, and omitted temperature).
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_resume_extractor.py | Adjusts tests to set a non-GPT5 model where temperature must remain present. |
| tests/unit/test_llm.py | Adds unit coverage for profile-driven kwargs filtering and OpenRouter prefixing. |
| tests/unit/test_job_match.py | Asserts job-match calls use JSON response_format and omit temperature for GPT‑5 models. |
| packages/shared/src/five08/resume_skills_extractor.py | Routes skills extraction calls through ProviderModel-filtered kwargs. |
| packages/shared/src/five08/resume_extractor.py | Routes resume extraction + name-splitting calls through profile-aware kwargs. |
| packages/shared/src/five08/llm.py | Adds ProviderModel/ModelProfile helpers + profile loading/filtering logic. |
| packages/shared/src/five08/llm_model_profiles.json | Adds compiled request-option support matrix for common models. |
| packages/shared/src/five08/job_match.py | Routes extraction and reranking calls through ProviderModel-filtered kwargs. |
| apps/worker/src/five08/worker/crm/skills_extractor.py | Updates worker-side skills extraction to use ProviderModel for request kwargs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def chat_completion_kwargs( | ||
| self, | ||
| *, | ||
| messages: list[dict[str, str]], | ||
| temperature: float | None = None, | ||
| max_tokens: int | None = None, | ||
| response_format: Any | None = None, | ||
| reasoning_effort: str | None = None, | ||
| verbosity: str | None = None, | ||
| **extra: Any, | ||
| ) -> dict[str, Any]: | ||
| """Build strict-provider-safe kwargs for chat.completions calls.""" | ||
| kwargs: dict[str, Any] = { | ||
| "model": self.model, | ||
| "messages": messages, | ||
| } | ||
| if max_tokens is not None: | ||
| kwargs["max_tokens"] = max_tokens | ||
| if response_format is not None and self.supports("response_format"): | ||
| kwargs["response_format"] = response_format | ||
| if temperature is not None and self.supports("temperature"): | ||
| kwargs["temperature"] = temperature | ||
| if reasoning_effort is not None and self.supports("reasoning_effort"): | ||
| kwargs["reasoning_effort"] = reasoning_effort | ||
| if verbosity is not None and self.supports("verbosity"): | ||
| kwargs["verbosity"] = verbosity | ||
| kwargs.update(extra) | ||
| return kwargs |
| def get_model_profile(model: str) -> ModelProfile | None: | ||
| """Look up a model profile, handling provider-prefixed OpenAI names.""" | ||
| profiles = _load_model_profiles() | ||
| normalized = _profile_lookup_key(model) | ||
| payload = profiles.get(normalized) | ||
| if payload is None: | ||
| return _fallback_profile(model) | ||
| return _profile_from_payload(normalized, payload) |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/unit/test_job_match.py (1)
178-181: ⚡ Quick winCover the legacy-model branch too.
Line 178 and Line 405 only pin the default
gpt-5-minibehavior. Add one test that passesmodel="gpt-4.1-mini"and assertstemperature=0.1is still forwarded, so this layer also guards the model-dependent filtering path it now relies on.Also applies to: 405-408
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_job_match.py` around lines 178 - 181, The tests only assert behavior for the default model "gpt-5-mini" and miss the legacy-model branch; add an additional assertion case in tests referencing the mock call args (mock_client.chat.completions.create.call_args.kwargs / create_kwargs) that invokes the code with model="gpt-4.1-mini" and checks that "temperature" is present and equals 0.1 (i.e., assert create_kwargs["model"] == "gpt-4.1-mini" and assert create_kwargs["temperature"] == 0.1) so the model-dependent filtering path is covered; replicate the same additional assertion in the other test block that currently pins "gpt-5-mini" to cover both spots.tests/unit/test_resume_extractor.py (1)
953-953: ⚡ Quick winUse real profiled model names in these request-shaping tests.
fake-modelwill bypass the concrete model profiles this PR adds, so these assertions can keep passing even if the GPT-5 / gpt-4.1 request-shaping logic regresses. Prefer explicit profiled names per scenario so the test exercises the branch you actually ship.Also applies to: 1562-1562, 1641-1641, 1712-1712
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_resume_extractor.py` at line 953, Replace the placeholder "fake-model" with the actual profiled model name that matches the scenario being tested (e.g., a gpt-4.1 or gpt-5 profile string) so the request-shaping branches are exercised; specifically update extractor.model assignments in tests/unit/test_resume_extractor.py (the occurrence setting extractor.model = "fake-model" and the other occurrences referenced) to use the concrete profiled names for each scenario so the test hits the real profile-based logic rather than bypassing it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/shared/src/five08/llm.py`:
- Around line 85-94: The messages parameter in chat_completion_kwargs is too
narrow (list[dict[str, str]]); change its annotation to allow non-string content
(e.g., list[dict[str, Any]] or list[dict[str, Union[str, list[Any]]]]) so
callers can pass structured content for vision models; update the signature of
chat_completion_kwargs and any internal handling that relies on messages being
str-only to accept and pass through arbitrary content types without casting.
In `@packages/shared/src/five08/resume_extractor.py`:
- Around line 2686-2706: The code calls self.client.chat.completions.create with
kwargs from self._provider_model().chat_completion_kwargs but does not request
JSON output, yet later unconditionally parses message.content as JSON; update
the chat completion call to include the response_format (e.g.,
response_format="json" or response_format=True) via
_provider_model().chat_completion_kwargs so the provider is asked to return
JSON, keeping the system/user messages (the name-splitting prompt) intact;
ensure the provider_model call that builds chat_completion_kwargs is passed this
response_format so message.content parsing in the name-splitting routine
succeeds without falling back to heuristics.
---
Nitpick comments:
In `@tests/unit/test_job_match.py`:
- Around line 178-181: The tests only assert behavior for the default model
"gpt-5-mini" and miss the legacy-model branch; add an additional assertion case
in tests referencing the mock call args
(mock_client.chat.completions.create.call_args.kwargs / create_kwargs) that
invokes the code with model="gpt-4.1-mini" and checks that "temperature" is
present and equals 0.1 (i.e., assert create_kwargs["model"] == "gpt-4.1-mini"
and assert create_kwargs["temperature"] == 0.1) so the model-dependent filtering
path is covered; replicate the same additional assertion in the other test block
that currently pins "gpt-5-mini" to cover both spots.
In `@tests/unit/test_resume_extractor.py`:
- Line 953: Replace the placeholder "fake-model" with the actual profiled model
name that matches the scenario being tested (e.g., a gpt-4.1 or gpt-5 profile
string) so the request-shaping branches are exercised; specifically update
extractor.model assignments in tests/unit/test_resume_extractor.py (the
occurrence setting extractor.model = "fake-model" and the other occurrences
referenced) to use the concrete profiled names for each scenario so the test
hits the real profile-based logic rather than bypassing it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9ae12f52-607a-4823-afd0-d844e2b156b2
📒 Files selected for processing (9)
apps/worker/src/five08/worker/crm/skills_extractor.pypackages/shared/src/five08/job_match.pypackages/shared/src/five08/llm.pypackages/shared/src/five08/llm_model_profiles.jsonpackages/shared/src/five08/resume_extractor.pypackages/shared/src/five08/resume_skills_extractor.pytests/unit/test_job_match.pytests/unit/test_llm.pytests/unit/test_resume_extractor.py
| @lru_cache(maxsize=1) | ||
| def _load_model_profiles() -> dict[str, Mapping[str, Any]]: | ||
| try: | ||
| raw = resources.files("five08").joinpath(_PROFILE_RESOURCE).read_text() | ||
| data = json.loads(raw) | ||
| except Exception as exc: # pragma: no cover - static package data should exist | ||
| logger.warning("Failed to load LLM model profiles: %s", exc) | ||
| return {} |
| def _profile_lookup_key(model: str) -> str: | ||
| value = (model or "").strip() | ||
| for prefix in _OPENAI_PROVIDER_PREFIXES: | ||
| if value.startswith(prefix): | ||
| return value.removeprefix(prefix) | ||
| return value |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0d7b492063
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| temperature=temperature, | ||
| max_tokens=max_tokens, | ||
| response_format=response_format, | ||
| reasoning_effort="minimal", |
There was a problem hiding this comment.
Use a gpt-5.1-supported reasoning effort
When the resume extractor is configured with gpt-5.1, the new profile in llm_model_profiles.json marks reasoning_effort as supported, so this hard-coded default is sent unchanged; however, the OpenAI Chat Completions docs list gpt-5.1 reasoning values as none, low, medium, and high rather than minimal (https://platform.openai.com/docs/api-reference/chat/create-completion). In that configuration, resume extraction requests will be rejected before falling back or parsing, so this should be model-specific or omitted unless the catalog supplies a valid value.
Useful? React with 👍 / 👎.
Summary
Why
Bifrost strictly rejects unsupported model parameters.
gpt-5-minidoes not accept non-defaulttemperature, but job posting extraction and reranking always senttemperature=0.1, causing automatic posting analysis to fail.Impact
GPT-5-style models now omit unsupported
temperaturewhile older chat models such asgpt-4.1-minikeep it. OpenRouter model prefixing is centralized in the same factory.Validation
./scripts/test.sh./scripts/lint.sh./scripts/mypy.shSummary by CodeRabbit
New Features
Refactor
Tests