Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump TA release to use auto-generated confidence levels #58282

Merged
merged 1 commit into from
May 1, 2024

Conversation

davidsbailey
Copy link
Member

@davidsbailey davidsbailey commented Apr 26, 2024

I used code-dot-org/aiproxy#66 to generate a new release. the only product-facing differences are the updates to the confidence.json and confidence-exact.json files. there are also changes to output and cached_responses which do not affect what we show to users.

diffs

to see what changed, I created "diffable" copies of the releases which vary only by confidence json files (viewable under experiments in s3), ignoring changes to output and cached_responses. I've shortened the output by removing cosmetic differences as well as lines which appeared in the diff but did not actually change:

(.venv) Dave-MBP:~/src/aiproxy/experiments (generate-confidence-json)$ diff -r 2024-03-25-confidence-exact-diffable 2024-04-26-confidence-autogen-diffable 
diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L11/confidence.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L11/confidence.json
2c2
<   "Program Development - Program Sequence": "MEDIUM",
---
>   "Program Development - Program Sequence": "HIGH",
4c4
<   "Position - Elements and the Coordinate System": "MEDIUM"
---
>   "Position - Elements and the Coordinate System": "HIGH"
diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L14/confidence.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L14/confidence.json
3,4c3,4
<   "Position and Movement": "MEDIUM",
---
>   "Position and Movement": "HIGH",
diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L18/confidence.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L18/confidence.json
1,6c1,6
< {
<   "Position and Movement": "MEDIUM"
< }
---
> {
>   "Position and Movement": "HIGH"
> }
diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L21/confidence.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L21/confidence.json
2,4c2,4
<   "Program Development - Program Sequence": "MEDIUM",
<   "Modularity - Multiple Sprites": "MEDIUM",
<   "Algorithms and Control - Player Control Conditionals": "MEDIUM",
---
>   "Program Development - Program Sequence": "HIGH",
>   "Modularity - Multiple Sprites": "HIGH",
>   "Algorithms and Control - Player Control Conditionals": "HIGH",
6c6
<   "Algorithms and Control - Interaction Conditionals": "MEDIUM",
---
>   "Algorithms and Control - Interaction Conditionals": "HIGH",
diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L24/confidence.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L24/confidence.json
1,7c1,7
< {
<   "Algorithms and Control - Looping Conditionals": "MEDIUM",
< }
---
> {
>   "Algorithms and Control - Looping Conditionals": "HIGH",
> }
diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L28/confidence-exact.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L28/confidence-exact.json
2,7d1
<   "Program Development - Program Sequence": {
<     "Extensive Evidence": "LOW",
<     "Convincing Evidence": "LOW",
<     "Limited Evidence": "LOW",
<     "No Evidence": "LOW"
<   },
12c6
<     "No Evidence": "LOW"
---
>     "No Evidence": "HIGH"

^^ Program Development - Program Sequence is not "ai enabled" for this lesson in levelbuilder, so it seems ok that it is being removed here.

diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L28/confidence.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L28/confidence.json
1,8c1,8
< {
<   "Algorithms and Control - Backgrounds and Variables": "MEDIUM",
<   "Position and Movement": "HIGH",
<   "Variables": "HIGH"
< }
---
> {
>   "Algorithms and Control - Backgrounds and Variables": "LOW",
>   "Position and Movement": "LOW",
>   "Variables": "LOW"
> }

there is one superficial part of the diffs above which I've removed in a few places, which seems worth highlighting:

diff -r 2024-03-25-confidence-exact-diffable/csd3-2023-L14/confidence-exact.json 2024-04-26-confidence-autogen-diffable/csd3-2023-L14/confidence-exact.json
14c14
<   "Optional “Stretch” Feature - Variables": {
---
>   "Optional \u201cStretch\u201d Feature - Variables": {

I covered this in manual testing to make sure it did not introduce any problems.

screenshots

these screenshots are to manually verify that two new scenarios are working end to end:

low ai confidence

IIRC this is the first time we've shown low ai confidence in the UI, so here it is working:

Screenshot 2024-04-26 at 12 38 45 PM

special characters

this shows that the conversion of curly quotes to \u syntax in Optional \u201cStretch\u201d Feature - Variables does not break the exact confidence functionality:
Screenshot 2024-04-26 at 12 48 01 PM

Testing story

  • EvaluateRubricJob.new.validate_ai_config is passing
  • manually verification in screenshots

Follow-up work

this PR just updates the current release to use auto-generated confidence levels using the existing GPT 4 model. switching to GPT 4 Turbo will be the next step.

@davidsbailey davidsbailey marked this pull request as ready for review April 26, 2024 19:52
@davidsbailey davidsbailey requested a review from a team April 26, 2024 19:52
@davidsbailey davidsbailey merged commit a007bb4 into staging May 1, 2024
2 checks passed
@davidsbailey davidsbailey deleted the aitt-confidence-autogen-release branch May 1, 2024 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant