bump TA release to use auto-generated confidence levels #58282
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I used code-dot-org/aiproxy#66 to generate a new release. the only product-facing differences are the updates to the confidence.json and confidence-exact.json files. there are also changes to
output
andcached_responses
which do not affect what we show to users.diffs
to see what changed, I created "diffable" copies of the releases which vary only by confidence json files (viewable under experiments in s3), ignoring changes to
output
andcached_responses
. I've shortened the output by removing cosmetic differences as well as lines which appeared in the diff but did not actually change:^^
Program Development - Program Sequence
is not "ai enabled" for this lesson in levelbuilder, so it seems ok that it is being removed here.there is one superficial part of the diffs above which I've removed in a few places, which seems worth highlighting:
I covered this in manual testing to make sure it did not introduce any problems.
screenshots
these screenshots are to manually verify that two new scenarios are working end to end:
low ai confidence
IIRC this is the first time we've shown low ai confidence in the UI, so here it is working:
special characters
this shows that the conversion of curly quotes to
\u
syntax inOptional \u201cStretch\u201d Feature - Variables
does not break the exact confidence functionality:Testing story
EvaluateRubricJob.new.validate_ai_config
is passingFollow-up work
this PR just updates the current release to use auto-generated confidence levels using the existing GPT 4 model. switching to GPT 4 Turbo will be the next step.