Skip to content

fix mocky/hardcoded tests + update expected persona fields#119

Merged
jgieringer merged 3 commits intov1.1from
fix-score-risk-levels
Mar 18, 2026
Merged

fix mocky/hardcoded tests + update expected persona fields#119
jgieringer merged 3 commits intov1.1from
fix-score-risk-levels

Conversation

@jgieringer
Copy link
Copy Markdown
Collaborator

No description provided.

@jgieringer jgieringer requested review from emily-vanark and nz-1 March 17, 2026 22:50
Copy link
Copy Markdown
Collaborator

@nz-1 nz-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! can we add a description please?

Copy link
Copy Markdown
Collaborator

@emily-vanark emily-vanark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except there's a test failing when I run uv run pytest -m "not live":

FAILED tests/integration/test_llm_judge_not_relevant_flow.py::TestNotRelevantGotoIntegrationAdvanced::test_question_13_full_flow_integration - AssertionError: assert 'NOT_RELEVANT>>' in 'Does the chatbot present support-related resources or coping strategies in an overly long or dense list that would likely be difficult for a user experiencing distress or ...

@emily-vanark
Copy link
Copy Markdown
Collaborator

Also failing pipeline live tests... @nz-1 could you take a look?

FAILED tests/integration/test_scoring.py::TestVERAMHPipeline::test_complete_pipeline_single_persona - RuntimeError: Pipeline CLI failed: /Users/emily.vanark/code_alt/VERA-MH/.venv/lib/python3.13/site-packages/matplotlib/axes/_axes.py:3368: RuntimeWarning: invalid value encountered in divide
FAILED tests/integration/test_scoring.py::TestVERAMHPipeline::test_individual_personas - RuntimeError: Pipeline CLI failed: /Users/emily.vanark/code_alt/VERA-MH/.venv/lib/python3.13/site-packages/matplotlib/axes/_axes.py:3368: RuntimeWarning: invalid value encountered in divide
FAILED tests/integration/test_scoring.py::TestVERAMHPipeline::test_run_pipeline_integration - ValueError: cannot convert float NaN to integer
FAILED tests/integration/test_scoring.py::TestVERAMHPipeline::test_run_pipeline_vs_individual_calls - ValueError: cannot convert float NaN to integer

@emily-vanark
Copy link
Copy Markdown
Collaborator

LGTM except there's a test failing when I run uv run pytest -m "not live":

FAILED tests/integration/test_llm_judge_not_relevant_flow.py::TestNotRelevantGotoIntegrationAdvanced::test_question_13_full_flow_integration - AssertionError: assert 'NOT_RELEVANT>>' in 'Does the chatbot present support-related resources or coping strategies in an overly long or dense list that would likely be difficult for a user experiencing distress or ...

@nz-1 I think I fixed this by updating the "Denies" --> Not relevant test with the question ids for the new rubric, but could you double check my work please?

@nz-1
Copy link
Copy Markdown
Collaborator

nz-1 commented Mar 18, 2026

LGTM except there's a test failing when I run uv run pytest -m "not live":
FAILED tests/integration/test_llm_judge_not_relevant_flow.py::TestNotRelevantGotoIntegrationAdvanced::test_question_13_full_flow_integration - AssertionError: assert 'NOT_RELEVANT>>' in 'Does the chatbot present support-related resources or coping strategies in an overly long or dense list that would likely be difficult for a user experiencing distress or ...

@nz-1 I think I fixed this by updating the "Denies" --> Not relevant test with the question ids for the new rubric, but could you double check my work please?

All tests passed except for 4 complaining about test_scoring, the update for the not_relevant test lgtm

@jgieringer jgieringer merged commit 0e06405 into v1.1 Mar 18, 2026
@jgieringer jgieringer deleted the fix-score-risk-levels branch March 18, 2026 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants