bump TA release to use auto-generated confidence levels #58282

davidsbailey · 2024-04-26T17:59:47Z

I used code-dot-org/aiproxy#66 to generate a new release. the only product-facing differences are the updates to the confidence.json and confidence-exact.json files. there are also changes to output and cached_responses which do not affect what we show to users.

diffs

to see what changed, I created "diffable" copies of the releases which vary only by confidence json files (viewable under experiments in s3), ignoring changes to output and cached_responses. I've shortened the output by removing cosmetic differences as well as lines which appeared in the diff but did not actually change:

(.venv) Dave-MBP:~/src/aiproxy/experiments (generate-confidence-json)$ diff -r 2024-03-25-confidence-exact-diffable 2024-04-26-confidence-autogen-diffable 
diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L11/confidence.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L11/confidence.json
2c2
<   "Program Development - Program Sequence": "MEDIUM",
---
>   "Program Development - Program Sequence": "HIGH",
4c4
<   "Position - Elements and the Coordinate System": "MEDIUM"
---
>   "Position - Elements and the Coordinate System": "HIGH"
diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L14/confidence.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L14/confidence.json
3,4c3,4
<   "Position and Movement": "MEDIUM",
---
>   "Position and Movement": "HIGH",
diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L18/confidence.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L18/confidence.json
1,6c1,6
< {
<   "Position and Movement": "MEDIUM"
< }
---
> {
>   "Position and Movement": "HIGH"
> }
diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L21/confidence.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L21/confidence.json
2,4c2,4
<   "Program Development - Program Sequence": "MEDIUM",
<   "Modularity - Multiple Sprites": "MEDIUM",
<   "Algorithms and Control - Player Control Conditionals": "MEDIUM",
---
>   "Program Development - Program Sequence": "HIGH",
>   "Modularity - Multiple Sprites": "HIGH",
>   "Algorithms and Control - Player Control Conditionals": "HIGH",
6c6
<   "Algorithms and Control - Interaction Conditionals": "MEDIUM",
---
>   "Algorithms and Control - Interaction Conditionals": "HIGH",
diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L24/confidence.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L24/confidence.json
1,7c1,7
< {
<   "Algorithms and Control - Looping Conditionals": "MEDIUM",
< }
---
> {
>   "Algorithms and Control - Looping Conditionals": "HIGH",
> }
diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L28/confidence-exact.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L28/confidence-exact.json
2,7d1
<   "Program Development - Program Sequence": {
<     "Extensive Evidence": "LOW",
<     "Convincing Evidence": "LOW",
<     "Limited Evidence": "LOW",
<     "No Evidence": "LOW"
<   },
12c6
<     "No Evidence": "LOW"
---
>     "No Evidence": "HIGH"

^^ Program Development - Program Sequence is not "ai enabled" for this lesson in levelbuilder, so it seems ok that it is being removed here.

diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L28/confidence.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L28/confidence.json
1,8c1,8
< {
<   "Algorithms and Control - Backgrounds and Variables": "MEDIUM",
<   "Position and Movement": "HIGH",
<   "Variables": "HIGH"
< }
---
> {
>   "Algorithms and Control - Backgrounds and Variables": "LOW",
>   "Position and Movement": "LOW",
>   "Variables": "LOW"
> }

there is one superficial part of the diffs above which I've removed in a few places, which seems worth highlighting:

diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L14/confidence-exact.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L14/confidence-exact.json
14c14
<   "Optional “Stretch” Feature - Variables": {
---
>   "Optional \u201cStretch\u201d Feature - Variables": {

I covered this in manual testing to make sure it did not introduce any problems.

screenshots

these screenshots are to manually verify that two new scenarios are working end to end:

low ai confidence

IIRC this is the first time we've shown low ai confidence in the UI, so here it is working:

special characters

this shows that the conversion of curly quotes to \u syntax in Optional \u201cStretch\u201d Feature - Variables does not break the exact confidence functionality:

Testing story

EvaluateRubricJob.new.validate_ai_config is passing
manually verification in screenshots

Follow-up work

this PR just updates the current release to use auto-generated confidence levels using the existing GPT 4 model. switching to GPT 4 Turbo will be the next step.

bump TA release to use auto-generated confidence jsons

8047df2

davidsbailey mentioned this pull request Apr 26, 2024

add rubric tester option to generate confidence.json and confidence-exact.json code-dot-org/aiproxy#66

Merged

davidsbailey marked this pull request as ready for review April 26, 2024 19:52

davidsbailey requested a review from a team April 26, 2024 19:52

davidsbailey mentioned this pull request Apr 26, 2024

move TA onto GPT 4 Turbo #58290

Merged

davidsbailey merged commit a007bb4 into staging May 1, 2024
2 checks passed

davidsbailey deleted the aitt-confidence-autogen-release branch May 1, 2024 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bump TA release to use auto-generated confidence levels #58282

bump TA release to use auto-generated confidence levels #58282

davidsbailey commented Apr 26, 2024 •

edited

bump TA release to use auto-generated confidence levels #58282

bump TA release to use auto-generated confidence levels #58282

Conversation

davidsbailey commented Apr 26, 2024 • edited

diffs

screenshots

low ai confidence

special characters

Testing story

Follow-up work

davidsbailey commented Apr 26, 2024 •

edited