[Infra] Check for overfit in the scenario prompt as well#218
Merged
JanKrivanek merged 3 commits intomainfrom Mar 5, 2026
Merged
[Infra] Check for overfit in the scenario prompt as well#218JanKrivanek merged 3 commits intomainfrom
JanKrivanek merged 3 commits intomainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds prompt-level overfitting detection to the skill validator so scenario prompts that explicitly reference (or instruct using) a skill are treated as overfitting signals, alongside existing rubric/assertion checks.
Changes:
- Extend overfitting schema/model to include
prompt_assessmentsand incorporate them into score computation. - Add deterministic prompt scanning (
DetectPromptOverfitting) and merge it with LLM-provided prompt assessments. - Update console reporting, dashboard data generation, and expand test coverage for prompt overfitting.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| eng/skill-validator/src/Services/OverfittingJudge.cs | Adds deterministic prompt checks, parses/merges prompt_assessments, and updates scoring + LLM prompts/schema. |
| eng/skill-validator/src/Models/Models.cs | Introduces PromptOverfitAssessment and adds PromptAssessments to OverfittingResult. |
| eng/skill-validator/src/Services/Reporter.cs | Prints prompt-level overfitting signals in console output for moderate/high results. |
| eng/dashboard/generate-benchmark-data.ps1 | Flags scenario overfitting when prompt assessments exist (supports new schema). |
| eng/skill-validator/tests/OverfittingJudgeTests.cs | Adds coverage for prompt detection, parsing, and score effects; updates existing constructions for new result shape. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
ViktorHofer
approved these changes
Mar 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Revealing the need to use skill (specific on or in general) in scenario prompt is a sign of overfitting as well.
Previously we've been checking anly the rubric graders