feat(llma): add evaluation reports frontend#54366
feat(llma): add evaluation reports frontend#54366andrewm4894 wants to merge 1 commit intoandy/llma-eval-reports-3-delivery-workflowsfrom
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
Hey @andrewm4894! 👋\nThis pull request seems to contain no description. Please add useful context, rationale, and/or any other information that will help make sense of this change now and in the distant Mars-based future. |
|
Size Change: +8.34 kB (+0.01%) Total Size: 129 MB
ℹ️ View Unchanged
|
Prompt To Fix All With AIThis is a comment left during a code review.
Path: products/llm_analytics/frontend/evaluations/evaluationReportLogic.ts
Line: 105-108
Comment:
**Client-side filtering on paginated endpoint silently drops results**
`loadReports` fetches the full list without a server-side `evaluation` filter, then filters the first page in memory. Because the response is paginated (`response.results`), any reports that fall outside the first page are silently ignored — so an evaluation's schedule could disappear from the UI if it isn't on page 1.
Pass the evaluation ID as a query parameter instead:
```suggestion
const response = await api.get(
`api/environments/${values.currentTeamId}/llm_analytics/evaluation_reports/?evaluation=${props.evaluationId}`
)
return response.results || []
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: products/llm_analytics/frontend/evaluations/components/EvaluationReportConfig.tsx
Line: 99
Comment:
**Double non-null assertion can pass `undefined` to `SlackChannelPicker`**
`integrations!.find(...)!` — if the selected integration ID no longer exists in the list (e.g. it was disconnected), `find()` returns `undefined`, and the `!` doesn't prevent that from being forwarded to `SlackChannelPicker.integration`. The component receives an `undefined` where it expects an object, which will likely throw.
More safely, render nothing if the integration isn't found:
```ts
const selectedIntegration = integrations?.find((i) => i.id === slackIntegrationId)
// then: {selectedIntegration && <SlackChannelPicker ... integration={selectedIntegration} />}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: products/llm_analytics/frontend/evaluations/components/EvaluationReportViewer.tsx
Line: 214
Comment:
**Title rendered twice in non-compact mode**
In non-compact mode `content.title` is already shown in the `<h2>` at line 170 (alongside the status badge). This `<h3>` then renders the same title again below the divider, so the title appears twice. The compact block (lines 188–212) already handles the title for compact mode.
If the intent is to show a standalone title below the divider only in compact mode, the condition needs to be flipped:
```suggestion
{compact && content.title && <h3 className="font-semibold text-sm mb-2">{content.title}</h3>}
```
But since the compact block at line 190 already renders `content.title` as an `<h3>`, this line may not be needed at all.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: products/llm_analytics/frontend/evaluations/components/EvaluationReportConfig.tsx
Line: 188-194
Comment:
**`useState` in a component backed by a kea logic**
PostHog's coding conventions say to avoid React hooks when a kea logic file exists ("write all business logic there, avoid React hooks at all costs"). `ExistingReportConfig` uses seven `useState` calls for form fields that mirror the persisted `activeReport` shape — this state belongs in `evaluationReportLogic` (or a thin sibling logic) where it can be observed, tested, and reset predictably.
The `useEffect` that seeds these fields from `activeReport` is also a common kea antipattern; a `listeners` handler on `loadReportsSuccess` / `updateReportSuccess` could set kea reducers instead.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: products/llm_analytics/frontend/evaluations/components/EvaluationReportConfig.tsx
Line: 327-342
Comment:
**Targets-building logic duplicated (OnceAndOnlyOnce)**
The code that builds the `targets: EvaluationReportDeliveryTarget[]` array is repeated verbatim here in `handleSave` and again inside the IIFE `onClick` for the update path (lines ~349–362). Extract it to a helper to keep the two code paths in sync — the two implementations are already subtly different (one explicitly passes `trigger_threshold: null`, the other omits the key entirely).
```ts
const buildTargets = (): EvaluationReportDeliveryTarget[] => {
const targets: EvaluationReportDeliveryTarget[] = []
if (hasEmail) {
targets.push({ type: 'email', value: emailValue.trim() })
}
if (hasSlack) {
targets.push({ type: 'slack', integration_id: slackIntegrationId!, channel: slackChannelValue })
}
return targets
}
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "feat(llma): add evaluation reports front..." | Re-trigger Greptile |
| const response = await api.get( | ||
| `api/environments/${values.currentTeamId}/llm_analytics/evaluation_reports/` | ||
| ) | ||
| return (response.results || []).filter((r: EvaluationReport) => r.evaluation === props.evaluationId) |
There was a problem hiding this comment.
Client-side filtering on paginated endpoint silently drops results
loadReports fetches the full list without a server-side evaluation filter, then filters the first page in memory. Because the response is paginated (response.results), any reports that fall outside the first page are silently ignored — so an evaluation's schedule could disappear from the UI if it isn't on page 1.
Pass the evaluation ID as a query parameter instead:
| const response = await api.get( | |
| `api/environments/${values.currentTeamId}/llm_analytics/evaluation_reports/` | |
| ) | |
| return (response.results || []).filter((r: EvaluationReport) => r.evaluation === props.evaluationId) | |
| const response = await api.get( | |
| `api/environments/${values.currentTeamId}/llm_analytics/evaluation_reports/?evaluation=${props.evaluationId}` | |
| ) | |
| return response.results || [] |
Prompt To Fix With AI
This is a comment left during a code review.
Path: products/llm_analytics/frontend/evaluations/evaluationReportLogic.ts
Line: 105-108
Comment:
**Client-side filtering on paginated endpoint silently drops results**
`loadReports` fetches the full list without a server-side `evaluation` filter, then filters the first page in memory. Because the response is paginated (`response.results`), any reports that fall outside the first page are silently ignored — so an evaluation's schedule could disappear from the UI if it isn't on page 1.
Pass the evaluation ID as a query parameter instead:
```suggestion
const response = await api.get(
`api/environments/${values.currentTeamId}/llm_analytics/evaluation_reports/?evaluation=${props.evaluationId}`
)
return response.results || []
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Good catch — the client-side filter on a paginated endpoint is indeed wrong. I'll add server-side ?evaluation= filtering to the viewset's safely_get_queryset and pass it from the frontend. Fix incoming.
| <SlackChannelPicker | ||
| value={slackChannelValue} | ||
| onChange={(val) => onSlackChannelChange(val || '')} | ||
| integration={integrations!.find((i) => i.id === slackIntegrationId)!} |
There was a problem hiding this comment.
Double non-null assertion can pass
undefined to SlackChannelPicker
integrations!.find(...)! — if the selected integration ID no longer exists in the list (e.g. it was disconnected), find() returns undefined, and the ! doesn't prevent that from being forwarded to SlackChannelPicker.integration. The component receives an undefined where it expects an object, which will likely throw.
More safely, render nothing if the integration isn't found:
const selectedIntegration = integrations?.find((i) => i.id === slackIntegrationId)
// then: {selectedIntegration && <SlackChannelPicker ... integration={selectedIntegration} />}Prompt To Fix With AI
This is a comment left during a code review.
Path: products/llm_analytics/frontend/evaluations/components/EvaluationReportConfig.tsx
Line: 99
Comment:
**Double non-null assertion can pass `undefined` to `SlackChannelPicker`**
`integrations!.find(...)!` — if the selected integration ID no longer exists in the list (e.g. it was disconnected), `find()` returns `undefined`, and the `!` doesn't prevent that from being forwarded to `SlackChannelPicker.integration`. The component receives an `undefined` where it expects an object, which will likely throw.
More safely, render nothing if the integration isn't found:
```ts
const selectedIntegration = integrations?.find((i) => i.id === slackIntegrationId)
// then: {selectedIntegration && <SlackChannelPicker ... integration={selectedIntegration} />}
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Valid — the double ! is unsafe if the integration was disconnected. I'll guard with a conditional render. Fix incoming.
| </div> | ||
| )} | ||
|
|
||
| {!compact && content.title && <h3 className="font-semibold text-sm mb-2">{content.title}</h3>} |
There was a problem hiding this comment.
Title rendered twice in non-compact mode
In non-compact mode content.title is already shown in the <h2> at line 170 (alongside the status badge). This <h3> then renders the same title again below the divider, so the title appears twice. The compact block (lines 188–212) already handles the title for compact mode.
If the intent is to show a standalone title below the divider only in compact mode, the condition needs to be flipped:
| {!compact && content.title && <h3 className="font-semibold text-sm mb-2">{content.title}</h3>} | |
| {compact && content.title && <h3 className="font-semibold text-sm mb-2">{content.title}</h3>} |
But since the compact block at line 190 already renders content.title as an <h3>, this line may not be needed at all.
Prompt To Fix With AI
This is a comment left during a code review.
Path: products/llm_analytics/frontend/evaluations/components/EvaluationReportViewer.tsx
Line: 214
Comment:
**Title rendered twice in non-compact mode**
In non-compact mode `content.title` is already shown in the `<h2>` at line 170 (alongside the status badge). This `<h3>` then renders the same title again below the divider, so the title appears twice. The compact block (lines 188–212) already handles the title for compact mode.
If the intent is to show a standalone title below the divider only in compact mode, the condition needs to be flipped:
```suggestion
{compact && content.title && <h3 className="font-semibold text-sm mb-2">{content.title}</h3>}
```
But since the compact block at line 190 already renders `content.title` as an `<h3>`, this line may not be needed at all.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Good eye — this is a genuine duplicate. The <h2> at line 170 already renders the title in non-compact mode, so this <h3> at line 214 should be removed. Fix incoming.
| const [formEnabled, setFormEnabled] = useState(false) | ||
| const [frequency, setFrequency] = useState<EvaluationReportFrequency>('daily') | ||
| const [emailValue, setEmailValue] = useState('') | ||
| const [slackIntegrationId, setSlackIntegrationId] = useState<number | null>(null) | ||
| const [slackChannelValue, setSlackChannelValue] = useState('') | ||
| const [guidance, setGuidance] = useState('') | ||
| const [triggerThreshold, setTriggerThreshold] = useState(TRIGGER_THRESHOLD_DEFAULT) |
There was a problem hiding this comment.
useState in a component backed by a kea logic
PostHog's coding conventions say to avoid React hooks when a kea logic file exists ("write all business logic there, avoid React hooks at all costs"). ExistingReportConfig uses seven useState calls for form fields that mirror the persisted activeReport shape — this state belongs in evaluationReportLogic (or a thin sibling logic) where it can be observed, tested, and reset predictably.
The useEffect that seeds these fields from activeReport is also a common kea antipattern; a listeners handler on loadReportsSuccess / updateReportSuccess could set kea reducers instead.
Prompt To Fix With AI
This is a comment left during a code review.
Path: products/llm_analytics/frontend/evaluations/components/EvaluationReportConfig.tsx
Line: 188-194
Comment:
**`useState` in a component backed by a kea logic**
PostHog's coding conventions say to avoid React hooks when a kea logic file exists ("write all business logic there, avoid React hooks at all costs"). `ExistingReportConfig` uses seven `useState` calls for form fields that mirror the persisted `activeReport` shape — this state belongs in `evaluationReportLogic` (or a thin sibling logic) where it can be observed, tested, and reset predictably.
The `useEffect` that seeds these fields from `activeReport` is also a common kea antipattern; a `listeners` handler on `loadReportsSuccess` / `updateReportSuccess` could set kea reducers instead.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Agreed — this should use kea reducers, not useState. The ExistingReportConfig form fields are essentially an "edit draft" that mirrors the persisted activeReport shape, which fits naturally into kea. I'll move this into the logic in a follow-up within this PR stack (PR 5 will need to touch this component anyway for wiring). Tracking it.
| const frequencyDirty = frequency !== activeReport.frequency | ||
| const targetsDirty = | ||
| emailValue.trim() !== currentEmail || | ||
| slackIntegrationId !== currentSlackIntegrationId || | ||
| slackChannelValue !== currentSlackChannel | ||
| const guidanceDirty = guidance !== currentGuidance | ||
| const thresholdDirty = frequency === 'every_n' && triggerThreshold !== currentThreshold | ||
| const isDirty = frequencyDirty || targetsDirty || guidanceDirty || thresholdDirty | ||
| const hasAnyTarget = hasEmail || hasSlack | ||
| return ( | ||
| <div className="flex justify-end"> | ||
| <LemonButton | ||
| type="primary" | ||
| size="small" | ||
| loading={reportsLoading} | ||
| disabledReason={ |
There was a problem hiding this comment.
Targets-building logic duplicated (OnceAndOnlyOnce)
The code that builds the targets: EvaluationReportDeliveryTarget[] array is repeated verbatim here in handleSave and again inside the IIFE onClick for the update path (lines ~349–362). Extract it to a helper to keep the two code paths in sync — the two implementations are already subtly different (one explicitly passes trigger_threshold: null, the other omits the key entirely).
const buildTargets = (): EvaluationReportDeliveryTarget[] => {
const targets: EvaluationReportDeliveryTarget[] = []
if (hasEmail) {
targets.push({ type: 'email', value: emailValue.trim() })
}
if (hasSlack) {
targets.push({ type: 'slack', integration_id: slackIntegrationId!, channel: slackChannelValue })
}
return targets
}Prompt To Fix With AI
This is a comment left during a code review.
Path: products/llm_analytics/frontend/evaluations/components/EvaluationReportConfig.tsx
Line: 327-342
Comment:
**Targets-building logic duplicated (OnceAndOnlyOnce)**
The code that builds the `targets: EvaluationReportDeliveryTarget[]` array is repeated verbatim here in `handleSave` and again inside the IIFE `onClick` for the update path (lines ~349–362). Extract it to a helper to keep the two code paths in sync — the two implementations are already subtly different (one explicitly passes `trigger_threshold: null`, the other omits the key entirely).
```ts
const buildTargets = (): EvaluationReportDeliveryTarget[] => {
const targets: EvaluationReportDeliveryTarget[] = []
if (hasEmail) {
targets.push({ type: 'email', value: emailValue.trim() })
}
if (hasSlack) {
targets.push({ type: 'slack', integration_id: slackIntegrationId!, channel: slackChannelValue })
}
return targets
}
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Agreed — the two paths already diverge subtly on trigger_threshold: null vs omission. I'll extract a buildTargets() helper. Fix incoming.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2eb779d0a8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ONBOARDING_WIZARD_PROMINENCE: 'onboarding-wizard-prominence', // owner: #team-growth multivariate=control,wizard-hero,wizard-tab,wizard-only | ||
| ONBOARDING_WIZARD_INSTALLATION_IMPROVED_COPY: 'onboarding-wizard-installation-improved-copy', // owner: @fercgomes #team-growth multivariate=control,test | ||
| ONBOARDING_MOBILE_INSTALL_HELPER: 'onboarding-mobile-install-helper', // owner: @fercgomes #team-growth multivariate=control,test — target $device_type=Mobile at the flag level |
There was a problem hiding this comment.
Restore onboarding value-prop feature flag constant
Removing ONBOARDING_DATA_WAREHOUSE_VALUE_PROP from FEATURE_FLAGS breaks the onboarding experiment lookup used in OnboardingDataWarehouseSourcesStep (useFeatureFlag('ONBOARDING_DATA_WAREHOUSE_VALUE_PROP', ...)), so the variant branches can no longer activate and users are forced into the fallback flow. If type-checking is bypassed this becomes a silent runtime regression (FEATURE_FLAGS[flag] resolves to undefined), and if type-checking runs it blocks the build.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This is a rebase artifact — the removal of ONBOARDING_DATA_WAREHOUSE_VALUE_PROP (and SURVEYS_TOOLBAR) is not intentional. These were dropped during a conflict resolution when stacking. I'll restore them. Fix incoming.
| // The backend auto-creates a default report config on eval creation. | ||
| // If the user configured delivery targets or custom settings, update | ||
| // the auto-created report after creation via the existing reports list. | ||
| if (targets.length > 0 || pendingConfig.reportPromptGuidance.trim()) { | ||
| actions.createReport({ |
There was a problem hiding this comment.
Create a report when saving pending schedule settings
This listener only calls createReport when delivery targets or guidance are present, based on the assumption that evaluation creation already provisions a default report; however, EvaluationViewSet.perform_create only saves/tracks the evaluation and does not create an EvaluationReport, so users who enable scheduled reports and set cadence/threshold without targets lose their settings and end up with no report configuration after creation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This is incorrect — the backend does auto-create a default EvaluationReport on evaluation creation. See EvaluationViewSet.perform_create in products/llm_analytics/backend/api/evaluations.py (lines 235-246), which calls EvaluationReport.objects.create(...). The comment in the listener at line 221 explains this: the pending config listener only fires createReport when the user has configured delivery targets or custom guidance beyond the defaults.
394a1ad to
b927375
Compare
2eb779d to
da19b28
Compare
b927375 to
6bc5ab1
Compare
da19b28 to
5859546
Compare
6bc5ab1 to
bc73add
Compare
409b90c to
d8a5ccf
Compare
d286f52 to
7b533aa
Compare
d8a5ccf to
cb89a1b
Compare
7b533aa to
a33ef4f
Compare
f3a26f3 to
d61fdeb
Compare
a33ef4f to
011b107
Compare
d61fdeb to
0d80907
Compare
011b107 to
05b9e4d
Compare
|
🎭 Playwright report · View test results →
These issues are not necessarily caused by your changes. |
0d80907 to
bced54f
Compare
Add React components for evaluation report configuration, history table, and report viewer. Includes kea logic for API interactions and state management. All UI is gated behind the LLM_ANALYTICS_EVALUATIONS_REPORTS feature flag.
bced54f to
e4dde8f
Compare
11b420d to
2803d96
Compare

Problem
Users need a UI to configure evaluation report settings (frequency, delivery targets, prompt guidance) and view generated reports.
Part 4 of 5 in a stacked PR series. Depends on #54365.
Stack:
Changes
EvaluationReportConfig— form for frequency, threshold, delivery targets, prompt guidanceEvaluationReportsTab— history table with status badges for report runsEvaluationReportViewer— rendered report content with sections and metricsevaluationReportLogic— kea logic for API interactions and state managementLLM_ANALYTICS_EVALUATIONS_REPORTSHow did you test this code?
Agent-authored code. Unit/integration tests are not included for frontend components in this PR. Manual testing was performed during development on the reference branch.
No
Docs update
skip-inkeep-docs
🤖 LLM context
Agent-assisted PR stack creation from a single reference branch.