Feature Proposal: Real-Time Deterministic RAG Telemetry (Confidence & Risk Scoring) #95

Pandidharan22 · 2026-04-10T17:44:31Z

Pandidharan22
Apr 10, 2026

Hi Team,

I've been going through your Project's Goals and "Is the RAG system returning relevant context or noise?" Problem statement got my eye. To complement your solution: post-hoc LLM evaluation engine, I'd like to propose adding Real-Time Deterministic RAG Telemetry.

While an LLM-as-a-judge is really good for deep semantic grading, the API cost & latency is high and runs after generation. By capturing deterministic metrics - Retrieval Confidence and Hallucination Risk which are calculated from vector similarity and spread score, gives us a zero latency warning/alert at the moment of retrieval. This will notify the Devs about the noise from generation failures.

My proposed Implementation:
I want to make sure that this solution plugs into the existing ingestion flow without any disruption:
(I) Add retrieval_confidence and hallucination_risk fields to the "SpanIngest" schema.
(II) Then storing these metrics alongside existing token/latency data.
(III) Finally add a REST endpoint to sum up these metrics and render a scatterplot for "Risk vs Confidence" in the Agent metrics dashboard.

I have already mapped this out locally and would like to take ownership of building it end-to-end. I'd like to hear your thoughts on this architecture—do you have any feedback, alternative suggestions, or tweaks to the ingestion strategy before I start building?

ShaanNarendran · 2026-04-17T07:00:20Z

ShaanNarendran
Apr 17, 2026
Maintainer

Super sorry for taking so long on the reply, it's been a busy week for us. If you're still interested I think the idea sounds solid, you could make a PR for the same and test it, we will also see how the approach works in comparison to what we have planned for the eval.

0 replies

Apoorvgarg-creator · 2026-04-19T04:32:13Z

Apoorvgarg-creator
Apr 19, 2026
Maintainer

@Pandidharan22 I will suggest if you could open a Pull request with your idea and if there's a benchmark result that you can add in your Pull request to show how effective the approach is, that would be super helpful.

Thanks for the contribution in advance!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Proposal: Real-Time Deterministic RAG Telemetry (Confidence & Risk Scoring) #95

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Feature Proposal: Real-Time Deterministic RAG Telemetry (Confidence & Risk Scoring) #95

Uh oh!

Pandidharan22 Apr 10, 2026

Replies: 2 comments

Uh oh!

ShaanNarendran Apr 17, 2026 Maintainer

Uh oh!

Apoorvgarg-creator Apr 19, 2026 Maintainer

Pandidharan22
Apr 10, 2026

ShaanNarendran
Apr 17, 2026
Maintainer

Apoorvgarg-creator
Apr 19, 2026
Maintainer