fix(scenarios): correct vibration_utterance.json IDs 304 and 306#325
Open
iksnerd wants to merge 1 commit into
Open
fix(scenarios): correct vibration_utterance.json IDs 304 and 306#325iksnerd wants to merge 1 commit into
iksnerd wants to merge 1 commit into
Conversation
Two `characteristic_form` values disagreed with the live `vibration` MCP server's output, which an LLM judge would penalize a correct agent answer for. Both verified against the tool that produces the ground truth. - **ID 304** (Bearing Analysis, 6205 @ 1800 RPM) — referenced `pitch_dia=39.04 mm` (which is actually the 6305's pitch diameter, per `list_known_bearings`). Corrected to `pitch_dia=38.5 mm` and also tightened `ball_dia=7.94 mm` → `7.938 mm` to match the bearing database entry exactly. - **ID 306** (Condition Assessment, 4.5 mm/s @ group2) — classified 4.5 mm/s as `Zone B (acceptable)`, but `assess_vibration_severity` returns `Zone C (Alarm - not suitable for long-term operation)`. Group2 thresholds are A=1.4 / B=2.8 / C=7.1 mm/s, so 4.5 lands in Zone C. Corrected the zone and included threshold context. Fixes IBM#323. Signed-off-by: iksnerd <bdrensk@me.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #323. Two
characteristic_formvalues insrc/scenarios/local/vibration_utterance.jsondisagreed with what thevibrationMCP server actually returns, which the LLM judge would mark a correct agent answer wrong for. Both updates align the expected behavior with live tool output captured this week.Fix Details
ID 304 — Bearing Analysis (6205 @ 1800 RPM)
Before:
After:
Why:
39.04 mmis the 6305 bearing's pitch diameter, not the 6205's. Verified by callinglist_known_bearings:```json
{ "designation": "6205", "n_balls": 9, "ball_dia_mm": 7.938, "pitch_dia_mm": 38.5, "contact_angle_deg": 0 }
{ "designation": "6305", "n_balls": 8, "ball_dia_mm": 10.319, "pitch_dia_mm": 39.04, "contact_angle_deg": 0 }
```
Also tightened
ball_dia=7.94 mm→7.938 mmto match the bearing database entry exactly (the rounding was inconsistent with the precision used elsewhere in the file).ID 306 — Condition Assessment (4.5 mm/s @ group2)
Before:
After:
Why: Verified by calling
assess_vibration_severity(rms_velocity_mm_s=4.5, machine_group="group2"):```json
{
"rms_velocity_mm_s": 4.5,
"iso_zone": "C",
"description": "Alarm - not suitable for long-term operation",
"machine_group": "group2",
"thresholds": { "A_good": 1.4, "B_acceptable": 2.8, "C_alarm": 7.1 }
}
```
4.5 mm/s exceeds the B/C boundary at 2.8 mm/s and falls below the C/D boundary at 7.1 mm/s, so it lands unambiguously in Zone C. The new wording also surfaces the threshold values inline so the LLM judge has context for partial-credit grading.
Impact on Benchmarking
Before vs. After expectation:
characteristic_formcharacteristic_formpitch_dia=38.5 mmanswers wrongpitch_dia=38.5 mmanswers correct (matches live tool)Zone Canswers wrongZone Canswers correct (matches live tool)Any baseline runs against IDs 304 and 306 should be re-scored to reflect the corrected expected behavior. Two affected scenarios out of the local vibration corpus (24+ utterances), so the impact on aggregate scores depends on how those two rows weighted prior reports.
Related Issues
Verification Steps
JSON valid:
python -m json.tool src/scenarios/local/vibration_utterance.json > /dev/null→ clean parse.Diff scope:
git diff --stat→1 file changed, 2 insertions(+), 2 deletions(-). Only IDs 304 and 306 touched.Live-tool re-verification of both new strings (captured during this PR's prep):
```
list_known_bearings → 6205 = {n_balls: 9, ball_dia: 7.938, pitch_dia: 38.5}
assess_vibration_severity(4.5, group2) → {iso_zone: "C", description: "Alarm - not suitable for long-term operation", thresholds: A=1.4, B=2.8, C=7.1}
```
Checklist
characteristic_formnow matches what the production tool returns. Live tool output captured in this PR body and in Two ground-truth errors in src/scenarios/local/vibration_utterance.json (IDs 304, 306) #323.)