Different approach to determining final confidence level of prompt injection evaluation outcomes#6729
Merged
dorien-koelemeijer merged 1 commit intomainfrom Jan 29, 2026
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adjusts how the prompt-injection scanner combines the tool-level and conversation-context classifier outputs to derive a final security confidence score and logging details. The goal appears to be a more nuanced confidence-combination heuristic and richer logging while keeping the external ScanResult interface stable.
Changes:
- Replace context-aware result selection (
select_result_with_context_awareness) with a new numeric combination heuristic incombine_confidences, using both tool and context confidences. - Update logging in
analyze_tool_call_with_contextto structured fields, including per-signal confidences, presence of ML and pattern matches, and the effective malicious decision. - Build the final
ScanResultfrom a syntheticDetailedScanResultthat uses the combined confidence along with the tool’s pattern matches and ML confidence.
c098b4e to
95465f8
Compare
michaelneale
added a commit
that referenced
this pull request
Jan 29, 2026
* main: (30 commits) Different approach to determining final confidence level of prompt injection evaluation outcomes (#6729) fix: read_resource_tool deadlock causing test_compaction to hang (#6737) Upgrade error handling (#6747) Fix/filter audience 6703 local (#6773) chore: re-sync package-lock.json (#6783) upgrade electron to 39.3.0 (#6779) allow skipping providers in test_providers.sh (#6778) fix: enable custom model entry for OpenRouter provider (#6761) Remove codex skills flag support (#6775) Improve mcp test (#6671) Feat/anthropic custom headers (#6774) Fix/GitHub copilot error handling 5845 (#6771) fix(ui): respect width parameter in MCP app size-changed notifications (#6376) fix: address compilation issue in main (#6776) Upgrade GitHub Actions for Node 24 compatibility (#6699) fix(google): preserve thought signatures in streaming responses (#6708) added reduce motion support for css animations and streaming text (#6551) fix: Re-enable subagents for Gemini models (#6513) fix(google): use parametersJsonSchema for full JSON Schema support (#6555) fix: respect GOOSE_CLI_MIN_PRIORITY for shell streaming output (#6558) ...
This was referenced Jan 29, 2026
zanesq
added a commit
that referenced
this pull request
Jan 29, 2026
* 'main' of github.com:block/goose: (62 commits) Swap canonical model from openrouter to models.dev (#6625) Hook thinking status (#6815) Fetch new skills hourly (#6814) copilot instructions: Update "No prerelease docs" instruction (#6795) refactor: centralize audience filtering before providers receive messages (#6728) update doc to remind contributors to activate hermit and document minimal npm and node version (#6727) nit: don't spit out compaction when in term mode as it fills up the screen (#6799) fix: correct tool support detection in Tetrate provider model fetching (#6808) Session manager fixes (#6809) fix(desktop): handle quoted paths with spaces in extension commands (#6430) fix: we can default gooseignore without writing it out (#6802) fix broken link (#6810) docs: add Beads MCP extension tutorial (#6792) feat(goose): add support for AWS_BEARER_TOKEN_BEDROCK environment variable (#6739) [docs] Add OSS Skills Marketplace (#6752) feat: make skills available in codemode (#6763) Fix: Recipe Extensions Not Loading in Desktop (#6777) Different approach to determining final confidence level of prompt injection evaluation outcomes (#6729) fix: read_resource_tool deadlock causing test_compaction to hang (#6737) Upgrade error handling (#6747) ...
zanesq
added a commit
that referenced
this pull request
Jan 29, 2026
…sion-session * 'main' of github.com:block/goose: (78 commits) copilot instructions: Update "No prerelease docs" instruction (#6795) refactor: centralize audience filtering before providers receive messages (#6728) update doc to remind contributors to activate hermit and document minimal npm and node version (#6727) nit: don't spit out compaction when in term mode as it fills up the screen (#6799) fix: correct tool support detection in Tetrate provider model fetching (#6808) Session manager fixes (#6809) fix(desktop): handle quoted paths with spaces in extension commands (#6430) fix: we can default gooseignore without writing it out (#6802) fix broken link (#6810) docs: add Beads MCP extension tutorial (#6792) feat(goose): add support for AWS_BEARER_TOKEN_BEDROCK environment variable (#6739) [docs] Add OSS Skills Marketplace (#6752) feat: make skills available in codemode (#6763) Fix: Recipe Extensions Not Loading in Desktop (#6777) Different approach to determining final confidence level of prompt injection evaluation outcomes (#6729) fix: read_resource_tool deadlock causing test_compaction to hang (#6737) Upgrade error handling (#6747) Fix/filter audience 6703 local (#6773) chore: re-sync package-lock.json (#6783) upgrade electron to 39.3.0 (#6779) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Simplifies the logic for combining tool and context confidence scores; the previous approach was slightly too aggressive in tuning down false positives by zeroing out confidence in certain cases. This refactor uses a more balanced threshold-based approach using dampening/boosting rules to reduce false positives.
We'll need to do some user testing to determine whether this approach is solid or whether we need to tweak this slightly later on.
Type of Change
AI Assistance
Testing
Quick local testing, but I can only test so much that way - will need broader user testing to determine if this is actually an improvement or if things need to be tweaked further.