-
Notifications
You must be signed in to change notification settings - Fork 468
feat(LLMObs): update the prompt annotation #12551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
BenchmarksBenchmark execution time: 2025-05-06 14:49:14 Comparing candidate commit f6c2446 in PR branch Found 14 performance improvements and 7 performance regressions! Performance is the same for 479 metrics, 8 unstable metrics. scenario:iast_aspects-modulo_aspect_for_bytes_bytearray
scenario:iast_aspects-ospathbasename_aspect
scenario:iast_aspects-ospathdirname_aspect
scenario:iast_aspects-ospathjoin_aspect
scenario:iast_aspects-ospathsplit_aspect
scenario:iast_aspects-ospathsplitext_aspect
scenario:otelspan-start
scenario:packagesupdateimporteddependencies-import_many
scenario:packagesupdateimporteddependencies-import_many_cached
scenario:packagesupdateimporteddependencies-import_many_stdlib
scenario:packagesupdateimporteddependencies-import_many_stdlib_cached
scenario:packagesupdateimporteddependencies-import_many_unknown
scenario:packagesupdateimporteddependencies-import_many_unknown_cached
scenario:packagesupdateimporteddependencies-import_one
scenario:packagesupdateimporteddependencies-import_one_cache
scenario:packagesupdateimporteddependencies-import_one_stdlib
scenario:packagesupdateimporteddependencies-import_one_stdlib_cache
scenario:packagesupdateimporteddependencies-import_one_unknown
scenario:packagesupdateimporteddependencies-import_one_unknown_cache
scenario:samplingrules-average_match
scenario:samplingrules-low_match
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 273 ± 5 ms. The average import time from base is: 274 ± 4 ms. The import time difference between this PR and base is: -0.9 ± 0.2 ms. Import time breakdownThe following import paths have shrunk:
|
|
|
sabrenner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did a quick first pass before our sync review tomorrow - i agree with evan that we might want to not have this on LLMObs.llm for now since we can't do it on @llm nicely as well. let's explore if that's possible within your timeframe, otherwise it can be done in a follow-up!
didn't review the validation functions too closely, we can go over them in the sync review tomorrow
releasenotes/notes/update-prompt-annotation-0fa90edf6829fe1d.yaml
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
|
This pull request has been automatically closed after a period of inactivity. |
Performance SLOsComparing candidate yahya/update-prompt-annotation (d9d6fc9) with baseline main (fcacd83) 📈 Performance Regressions (1 suite)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 5.318µs (SLO: <10.000µs 📉 -46.8%) vs baseline: 📈 +21.9% Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +5.0% ✅ ospathbasename_noaspectTime: ✅ 1.088µs (SLO: <10.000µs 📉 -89.1%) vs baseline: +0.7% Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +4.7% ✅ ospathjoin_aspectTime: ✅ 6.666µs (SLO: <10.000µs 📉 -33.3%) vs baseline: +10.0% Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +5.0% ✅ ospathjoin_noaspectTime: ✅ 2.296µs (SLO: <10.000µs 📉 -77.0%) vs baseline: +0.4% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ ospathnormcase_aspectTime: ✅ 3.473µs (SLO: <10.000µs 📉 -65.3%) vs baseline: ~same Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.9% ✅ ospathnormcase_noaspectTime: ✅ 0.574µs (SLO: <10.000µs 📉 -94.3%) vs baseline: +0.9% Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +4.9% ✅ ospathsplit_aspectTime: ✅ 4.920µs (SLO: <10.000µs 📉 -50.8%) vs baseline: -0.2% Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +5.1% ✅ ospathsplit_noaspectTime: ✅ 1.612µs (SLO: <10.000µs 📉 -83.9%) vs baseline: +2.2% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.9% ✅ ospathsplitdrive_aspectTime: ✅ 4.233µs (SLO: <10.000µs 📉 -57.7%) vs baseline: 📈 +13.4% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.7% ✅ ospathsplitdrive_noaspectTime: ✅ 0.699µs (SLO: <10.000µs 📉 -93.0%) vs baseline: +0.4% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ ospathsplitext_aspectTime: ✅ 4.636µs (SLO: <10.000µs 📉 -53.6%) vs baseline: +0.7% Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +5.0% ✅ ospathsplitext_noaspectTime: ✅ 1.381µs (SLO: <10.000µs 📉 -86.2%) vs baseline: -0.4% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% 🟡 Near SLO Breach (2 suites)🟡 djangosimple - 28/28✅ appsecTime: ✅ 20.521ms (SLO: <22.300ms -8.0%) vs baseline: -0.2% Memory: ✅ 65.209MB (SLO: <67.000MB -2.7%) vs baseline: +4.8% ✅ exception-replay-enabledTime: ✅ 1.346ms (SLO: <1.450ms -7.1%) vs baseline: +0.5% Memory: ✅ 64.105MB (SLO: <67.000MB -4.3%) vs baseline: +4.8% ✅ iastTime: ✅ 20.495ms (SLO: <22.250ms -7.9%) vs baseline: -0.3% Memory: ✅ 65.196MB (SLO: <67.000MB -2.7%) vs baseline: +4.7% ✅ profilerTime: ✅ 15.249ms (SLO: <16.550ms -7.9%) vs baseline: -0.4% Memory: ✅ 53.302MB (SLO: <54.500MB -2.2%) vs baseline: +4.7% ✅ span-code-originTime: ✅ 26.125ms (SLO: <28.200ms -7.4%) vs baseline: ~same Memory: ✅ 67.421MB (SLO: <69.500MB -3.0%) vs baseline: +4.9% ✅ tracerTime: ✅ 20.498ms (SLO: <21.750ms -5.8%) vs baseline: ~same Memory: ✅ 65.278MB (SLO: <67.000MB -2.6%) vs baseline: +4.9% ✅ tracer-and-profilerTime: ✅ 22.084ms (SLO: <23.500ms -6.0%) vs baseline: -0.7% Memory: ✅ 66.414MB (SLO: <67.500MB 🟡 -1.6%) vs baseline: +5.0% ✅ tracer-dont-create-db-spansTime: ✅ 19.415ms (SLO: <21.500ms -9.7%) vs baseline: +0.2% Memory: ✅ 65.243MB (SLO: <66.000MB 🟡 -1.1%) vs baseline: +4.7% ✅ tracer-minimalTime: ✅ 16.661ms (SLO: <17.500ms -4.8%) vs baseline: ~same Memory: ✅ 65.047MB (SLO: <66.000MB 🟡 -1.4%) vs baseline: +5.1% ✅ tracer-nativeTime: ✅ 20.511ms (SLO: <21.750ms -5.7%) vs baseline: -0.2% Memory: ✅ 71.054MB (SLO: <72.500MB 🟡 -2.0%) vs baseline: +4.8% ✅ tracer-no-cachesTime: ✅ 18.422ms (SLO: <19.650ms -6.2%) vs baseline: -0.1% Memory: ✅ 65.248MB (SLO: <67.000MB -2.6%) vs baseline: +4.9% ✅ tracer-no-databasesTime: ✅ 18.778ms (SLO: <20.100ms -6.6%) vs baseline: -0.3% Memory: ✅ 64.863MB (SLO: <67.000MB -3.2%) vs baseline: +4.8% ✅ tracer-no-middlewareTime: ✅ 20.166ms (SLO: <21.500ms -6.2%) vs baseline: -0.3% Memory: ✅ 65.254MB (SLO: <67.000MB -2.6%) vs baseline: +4.9% ✅ tracer-no-templatesTime: ✅ 20.360ms (SLO: <22.000ms -7.5%) vs baseline: ~same Memory: ✅ 65.220MB (SLO: <67.000MB -2.7%) vs baseline: +4.8% 🟡 otelspan - 22/22✅ add-eventTime: ✅ 45.274ms (SLO: <47.150ms -4.0%) vs baseline: +0.2% Memory: ✅ 45.063MB (SLO: <47.000MB -4.1%) vs baseline: +4.9% ✅ add-metricsTime: ✅ 321.492ms (SLO: <344.800ms -6.8%) vs baseline: ~same Memory: ✅ 553.509MB (SLO: <562.000MB 🟡 -1.5%) vs baseline: +5.1% ✅ add-tagsTime: ✅ 292.543ms (SLO: <314.000ms -6.8%) vs baseline: ~same Memory: ✅ 554.131MB (SLO: <563.500MB 🟡 -1.7%) vs baseline: +4.8% ✅ get-contextTime: ✅ 82.651ms (SLO: <92.350ms 📉 -10.5%) vs baseline: ~same Memory: ✅ 40.124MB (SLO: <46.500MB 📉 -13.7%) vs baseline: +4.6% ✅ is-recordingTime: ✅ 42.726ms (SLO: <44.500ms -4.0%) vs baseline: -0.7% Memory: ✅ 44.399MB (SLO: <47.500MB -6.5%) vs baseline: +4.6% ✅ record-exceptionTime: ✅ 61.656ms (SLO: <67.650ms -8.9%) vs baseline: +0.3% Memory: ✅ 40.412MB (SLO: <47.000MB 📉 -14.0%) vs baseline: +4.7% ✅ set-statusTime: ✅ 48.863ms (SLO: <50.400ms -3.1%) vs baseline: +0.6% Memory: ✅ 44.347MB (SLO: <47.000MB -5.6%) vs baseline: +4.6% ✅ startTime: ✅ 42.065ms (SLO: <43.450ms -3.2%) vs baseline: +0.3% Memory: ✅ 44.358MB (SLO: <47.000MB -5.6%) vs baseline: +4.7% ✅ start-finishTime: ✅ 84.306ms (SLO: <88.000ms -4.2%) vs baseline: +2.0% Memory: ✅ 34.544MB (SLO: <46.500MB 📉 -25.7%) vs baseline: +4.9% ✅ start-finish-telemetryTime: ✅ 84.203ms (SLO: <89.000ms -5.4%) vs baseline: ~same Memory: ✅ 34.544MB (SLO: <46.500MB 📉 -25.7%) vs baseline: +4.8% ✅ update-nameTime: ✅ 44.116ms (SLO: <45.150ms -2.3%) vs baseline: ~same Memory: ✅ 44.654MB (SLO: <47.000MB -5.0%) vs baseline: +4.8%
|
09a5791 to
7f2c7ce
Compare
9099b25 to
d08119a
Compare
…in (#14671) ## Description https://python.langchain.com/api_reference/core/prompts.html# Recently added support for auto-instrumentation of prompt templates in langchain, but the way it was done only worked when the templates were simple strings, due to the constraints of our prompt template type/back compatibility. following [this pr](#12551), those issues are dealt with. here, we fix that original half-effort and enable instrumentation that can properly support multi-message templates. this is a shadow release, while the FE and event handling finishes, thus no changelog known issue with the reflection prompt id scheme: in certain lcel situations, as well as certain batched situations, seems like the reflection isn't working. wondering if it has to do with some threading langchain may be doing? if so, one approach could be to wrap __init__, and grab what we can there. but the variable isn't assigned at that point, so we can probably only get module name (and the enclosing function name maybe?). a hybrid approach might be to take the current at-invocation time method when it works, and fall back on the module_name.function_name? open to ideas. ## Testing <!-- Describe your testing strategy or note what tests are included --> ## Risks <!-- Note any risks associated with this change, or "None" if no risks --> ## Additional Notes <!-- Any other information that would be helpful for reviewers --> --------- Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
This PR updates the prompt typed dict extending it with a couple fields :
It also adds the strict validation mode. Strict validation adds the following checks :
Checklist
Reviewer Checklist