Skip to content

fix(occurrences): remove all n+1 queries from occurrence API #1274

Open
mihow wants to merge 10 commits intomainfrom
fix/occurrence-list-queries
Open

fix(occurrences): remove all n+1 queries from occurrence API #1274
mihow wants to merge 10 commits intomainfrom
fix/occurrence-list-queries

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented Apr 29, 2026

Summary

Speeds up the occurrences list endpoint by ~9x and frees up gunicorn workers that were previously pinned for 30+ seconds per request, which under multi-user load was stalling every other API endpoint sharing the pool (captures, jobs, deployments — "every page feels frozen"). Fewer queries per request also drops Postgres load and lets the same hardware support more concurrent reviewers. Closes #1271 (Sentry AMI-PLATFORM-API-BMY, 34–40s cold responses).

Mechanically: introduces reusable Prefetch factories and wires them into OccurrenceViewSet list and detail paths. Strict prefetch contract — serializer raises RuntimeError if a caller skips the prefetch instead of silently re-introducing per-row queries.

Measurement

CaptureQueriesContext against an in-test project with 25 occurrences (each: one detection, one classification). Cachalot cache cleared between runs.

Page size Before After Reduction
limit=5 24 13 1.8x
limit=25 64 7 9.1x

Baseline grew ~1.6 queries per occurrence (clear N+1). Post-fix is flat regardless of page size.

Changes

  1. ami/main/models_future/occurrence.py (new module)
    • prefetch_detections_for_list() — detections + nested classifications, ordered for stable galleries
    • prefetch_detections_for_detail() — detections + nested classifications + source_image, ordered most-recent-first
    • best_prediction_from_prefetch() — mirrors Occurrence.predictions() per-algorithm max-score grouping in Python
    • best_identification_from_prefetch() — mirrors BEST_IDENTIFICATION_ORDER + withdrawn=False
    • detection_image_urls_from_prefetch(occurrence, limit=...) — bounded URL list
    • _require_prefetch() — strict gate; raises if relations missing from _prefetched_objects_cache
  2. OccurrenceQuerySet.with_list_prefetches() / with_detail_prefetches() — aggregator methods on the queryset.
  3. OccurrenceListSerializer
    • detection_images, determination_details, best_machine_prediction now route through the strict helpers — no fallback to model methods.
    • detection_images_limit: int | None = 1 — list cards render a single cover image.
    • OccurrenceSerializer (detail) overrides detection_images_limit = None (unbounded; TODO bound when occurrence tracking lands).
  4. OccurrenceViewSet.get_queryset calls with_list_prefetches() on list, with_detail_prefetches() on detail.
  5. JSONExporter.get_queryset (ami/exports/format_types.py) adds with_detail_prefetches(). The export serializer subclasses OccurrenceSerializer and inherits the strict helpers, so it needs the same prefetch contract.

Behavior changes (read carefully)

  • List response detection_images is now capped at 1 (was unbounded via the model method, but in practice fed from the same query and rarely larger). Detail response still returns all detection images. Response shape unchanged.
  • Strict prefetch contract: any new call site that uses OccurrenceListSerializer or OccurrenceSerializer must apply with_list_prefetches() or with_detail_prefetches() on the queryset. CI surfaced one missing site (JSONExporter) — fixed here. Audited all six occurrence-related serializers; only the three that hit the strict helpers (OccurrenceListSerializer, OccurrenceSerializer, OccurrenceExportSerializer) need prefetches, and all three are wired.

Caveats

  • _require_prefetch only checks top-level relations. A caller prefetching detections without nested classifications will silently re-introduce per-detection queries. Mitigated by always going through the list/detail factories. Tighter depth-enforcement deferred to django-zen-queries follow-up.

Deferred to follow-up

  • django-zen-queries QueriesDisabledViewMixin as a guardrail (cachalot interaction needs a quick spike — cache-key lookups may register as queries)
  • Audit SourceImageViewSet and TaxonViewSet list paths for the same pattern

Test plan

  • python manage.py test ami.main.tests.TestOccurrenceListQueryCount passes
  • python manage.py test ami.main.tests.TestOccurrenceDetailQueryCount passes
  • python manage.py test ami.main.tests.TestOccurrencePrefetchHelpersEdgeCases passes (zero-detection no 500, missing prefetch raises)
  • python manage.py test ami.exports.tests.DataExportTest.test_export_record_count ami.exports.tests.DataExportTest.test_json_export_record_count passes (regression caught the missing exporter prefetch)
  • Manual smoke: hit /api/v2/occurrences/?project_id=18&limit=25 against staging, confirm response shape unchanged and Sentry latency drops
  • Manual smoke: detail view /api/v2/occurrences/<id>/ still returns full nested detections/predictions

Summary by CodeRabbit

  • New Features

    • Detection images now included in occurrence list and detail responses—limited to 1 image per item in lists and all images in detail views.
  • Improvements

    • Enhanced database query efficiency for list and detail endpoints through optimized data prefetching.
  • Tests

    • Added regression tests validating query performance stability and response payload constraints.

… remove N+1

Issue: #1271

OccurrenceViewSet.list issued one query per occurrence to fetch
detection paths, the best machine prediction, and the best
identification, scaling roughly linearly with page size and producing
the 34-40s cold-cache responses tracked in Sentry as AMI-PLATFORM-API-BMY.

This change adds reusable Prefetch factories in
ami/main/models_future/occurrence.py and wires them into the list path:

- prefetch_detection_images() populates a `prefetched_detection_images`
  to_attr list of detections that have a path
- prefetch_classifications_for_best_prediction() populates
  detections__classifications with select_related taxon and algorithm
- best_prediction_from_prefetch / best_identification_from_prefetch
  pick the winners in Python without re-querying

OccurrenceQuerySet.with_list_prefetches() bundles these prefetches and
is called from OccurrenceViewSet.get_queryset only when action == 'list'.
OccurrenceListSerializer reads from the prefetch caches when present
and falls back to the existing model methods for non-list callers
(admin, exports, signals).

Local measurement on a project with 25 occurrences:

  page size | before | after | reduction
  limit=5   |   24   |  13   | 1.8x
  limit=25  |   64   |   7   | 9.1x

A new TestOccurrenceListQueryCount guards against regression by
asserting the query count does not grow with page size.

Follow-ups (separate PR):
- django-zen-queries QueriesDisabledViewMixin to fail loudly on lazy
  queries during list rendering
- audit other large-list serializers (SourceImage, Taxon) for the
  same pattern

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 29, 2026 14:50
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 29, 2026

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 80f36cb
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/69f3ef19a07bdf0008f41081

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 29, 2026

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit 80f36cb
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/69f3ef195c23690008f3ec41

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Warning

Rate limit exceeded

@mihow has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 24 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e49b2ba5-1998-4f02-be9e-f74406469cea

📥 Commits

Reviewing files that changed from the base of the PR and between 1f5918a and 80f36cb.

📒 Files selected for processing (5)
  • ami/main/api/serializers.py
  • ami/main/models.py
  • ami/main/models_future/occurrence.py
  • ami/main/tests.py
  • scripts/benchmark_occurrences_list.sh
📝 Walkthrough

Walkthrough

List and detail queryset prefetches were split and centralized into new helpers; serializers consume prefetched detections/classifications/identifications via helper functions for best-identification, best-prediction, and detection image URLs. Tests and exporter usage were updated to rely on these prefetch contracts.

Changes

Cohort / File(s) Summary
Serializers
ami/main/api/serializers.py
Added detection_images SerializerMethodField and detection_images_limit (default 1 for list, None for detail). Refactored determination logic to use _best_identification / _best_prediction helpers that read from prefetched relations; updated get_best_machine_prediction to use the prefetched helper.
View & Queryset
ami/main/api/views.py, ami/main/models.py
OccurrenceViewSet.get_queryset() now calls with_list_prefetches() for list actions and with_detail_prefetches() for non-list actions. Added with_list_prefetches() / with_detail_prefetches() on OccurrenceQuerySet. Introduced BEST_IDENTIFICATION_ORDER constant and applied it where best-identification is selected.
Prefetch helpers (new module)
ami/main/models_future/occurrence.py
New module centralizes Prefetch builders for list/detail serializers and provides helpers: best_prediction_from_prefetch, best_identification_from_prefetch, detection_image_urls_from_prefetch, plus strict prefetch-presence checks and stable ordering for detections/classifications.
Tests
ami/main/tests.py
Added regression tests asserting bounded SQL query counts for list/detail endpoints, tests for many-detections detail-case, and prefetch-helper edge-case tests (including RuntimeError when prefetch missing).
Exporter
ami/exports/format_types.py
JSONExporter.get_queryset now applies .with_detail_prefetches() so export serializers relying on prefetched detection/classification data don't issue extra DB lookups.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client as Client
  participant View as OccurrenceViewSet\n(get_queryset)
  participant QS as OccurrenceQuerySet\n(with_*_prefetches)
  participant DB as Database
  participant Serializer as OccurrenceList/Detail\nSerializer
  participant Helpers as models_future/occurrence.py

  Client->>View: GET /occurrences/ or /occurrences/{id}/
  View->>QS: build queryset -> with_list_prefetches() / with_detail_prefetches()
  QS->>DB: query occurrences + prefetch(detections, classifications, identifications...)
  DB-->>QS: rows + prefetched related objects
  View->>Serializer: pass queryset / instance
  Serializer->>Helpers: best_identification_from_prefetch / best_prediction_from_prefetch / detection_image_urls_from_prefetch
  Helpers-->>Serializer: chosen identification/prediction and image URLs
  Serializer-->>Client: JSON response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • annavik

Poem

🐇 I hopped through cached rows, light and fleet,

Gathered image paths so queries retreat.
Prefetch in pocket, predictions in view,
One tidy hop — responses return true. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: removing N+1 queries from the occurrence API endpoints.
Description check ✅ Passed The PR description comprehensively covers objectives, implementation details, measurements, behavior changes, and test coverage following the template structure.
Linked Issues check ✅ Passed The PR directly addresses all coding requirements from issue #1271: prefetch detections/classifications with custom querysets, implement strict helpers for best_prediction/identification, cap list detection_images, and achieve constant-time query performance.
Out of Scope Changes check ✅ Passed All changes are scoped to the N+1 query elimination objective. New module, queryset methods, serializer updates, view changes, and exporter fix are all directly necessary for the primary goal.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/occurrence-list-queries

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 4 minutes and 24 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets the /api/v2/occurrences/ list endpoint’s dominant N+1 sources by adding reusable Prefetch factories, wiring them into the list queryset, and teaching the list serializer to consume prefetched caches (with fallbacks), plus adding a regression query-count test.

Changes:

  • Added ami/main/models_future/occurrence.py with reusable Prefetch factories and Python helpers for selecting “best” records from prefetched data.
  • Added OccurrenceQuerySet.with_list_prefetches() and applied it only for OccurrenceViewSet list actions.
  • Updated OccurrenceListSerializer to read detection images / best prediction / best identification from prefetch caches when present, and added a query-count regression test.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ami/main/tests.py Adds TestOccurrenceListQueryCount to guard against query-count scaling with page size.
ami/main/models_future/occurrence.py Introduces prefetch factories and helper functions to pick best prediction/identification and derive detection image URLs from caches.
ami/main/models.py Adds OccurrenceQuerySet.with_list_prefetches() to apply the list serializer’s required prefetches.
ami/main/api/views.py Applies list-only prefetching via with_list_prefetches(); keeps existing detail prefetch behavior.
ami/main/api/serializers.py Routes list fields through prefetched data when available, with fallbacks to existing model methods.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ami/main/api/serializers.py Outdated
Comment thread ami/main/models_future/occurrence.py Outdated
Comment thread ami/main/models_future/occurrence.py
Comment thread ami/main/models_future/occurrence.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/main/models_future/occurrence.py`:
- Around line 69-91: best_prediction_from_prefetch currently aggregates all
det.classifications and ranks them, which diverges from
Occurrence.best_prediction because that method first applies per-algorithm
max-score filtering via Occurrence.predictions(); update
best_prediction_from_prefetch to call/use Occurrence.predictions() (or replicate
its per-algorithm max-score grouping) to produce the filtered set of
Classification objects, then apply the same sort key (terminal, -score, -pk) and
return the first element; ensure you reference and reuse
Occurrence.predictions() so the prefetch path preserves identical semantics to
best_prediction.

In `@ami/main/tests.py`:
- Around line 2956-2959: The try/except around caches["default"].clear() is
swallowing failures—remove the broad except or replace it with a specific catch
that fails the test; call caches["default"].clear() directly (so exceptions
propagate) or if you must catch, re-raise or call pytest.fail() with the caught
exception text so a cache-clear failure causes the test to fail and exposes the
error rather than hiding it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 829c8693-706f-40db-89be-84b6b55a965a

📥 Commits

Reviewing files that changed from the base of the PR and between a347d55 and 3e7d9dd.

📒 Files selected for processing (5)
  • ami/main/api/serializers.py
  • ami/main/api/views.py
  • ami/main/models.py
  • ami/main/models_future/occurrence.py
  • ami/main/tests.py

Comment thread ami/main/models_future/occurrence.py Outdated
Comment thread ami/main/tests.py Outdated
- Prevent detail-path N+1: `_best_prediction` now requires both
  `detections` AND nested `classifications` to be in the prefetch cache
  before calling `best_prediction_from_prefetch`. Detail viewset only
  prefetches `detections`, so the previous gate still walked
  `det.classifications.all()` lazily (one query per detection).

- Mirror `Occurrence.best_prediction` semantics in the prefetch helper:
  `Occurrence.predictions()` filters classifications to the per-algorithm
  max-score row before ordering by `-terminal, -score`. The helper now
  applies the same per-algorithm grouping in Python so list and detail
  return the same winner for occurrences whose top-scoring classification
  is non-terminal.

- Add deterministic tiebreaker to `best_identification_from_prefetch`:
  tuple-keyed comparison (`created_at_is_set, created_at, pk`) treats
  `None` timestamps as lowest and breaks ties by `pk`, matching
  `order_by('-created_at', '-pk')`.

- Collapse the duplicate `detections` prefetch: prior PR loaded
  detections twice (once via a `to_attr` filtered list, once via
  `detections__classifications`). Replace both with a single
  `prefetch_detections_for_list()` that nests classifications, and
  filter for non-null paths in Python. Saves one query per request.

- Don't swallow cache-clear failures in the regression guard. A silent
  cache-clear failure would let the warm cache hide query-count
  regressions.

Refs: #1271
PR: #1274

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/main/api/serializers.py`:
- Around line 1377-1379: The fast-path should only run for list responses, not
any object with a cached 'detections' relation. Change the condition in the
serializer method (the branch using detection_image_urls_from_prefetch and
obj.detection_images) to require an explicit list-only marker (e.g. check
getattr(obj, "_prefetched_for_list", False) && "detections" in getattr(obj,
"_prefetched_objects_cache", {})); then set that marker when you build the list
queryset that prefetches detections (for example, after the prefetch in the list
view or in the queryset iteration set obj._prefetched_for_list = True) so
OccurrenceSerializer and other detail callers still call detection_images() and
preserve ordering/filtering.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 274a54ab-5ddf-4af8-9c00-c63dab62b87b

📥 Commits

Reviewing files that changed from the base of the PR and between 3e7d9dd and b62fb39.

📒 Files selected for processing (3)
  • ami/main/api/serializers.py
  • ami/main/models_future/occurrence.py
  • ami/main/tests.py

Comment thread ami/main/api/serializers.py Outdated
Mirror BEST_MACHINE_PREDICTION_ORDER: define the identification ordering
once and use it everywhere a "best identification" is selected.

- New constant `BEST_IDENTIFICATION_ORDER = ("-created_at", "-pk")`
  alongside `BEST_MACHINE_PREDICTION_ORDER`.
- `Occurrence.best_identification` and `OccurrenceQuerySet.with_verification_info`
  now order by the constant. Adds a `-pk` deterministic tiebreaker; created_at
  is auto_now_add so equal timestamps are vanishingly rare in practice.
- `best_identification_from_prefetch` references the constant in its
  docstring (the helper's tuple key already encodes the same ordering).

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread ami/main/api/serializers.py Outdated
Comment thread ami/main/api/serializers.py Outdated
mihow and others added 2 commits April 29, 2026 15:24
…paths

Per PR review: the cache-key-gated fallbacks in `OccurrenceListSerializer`
exercised the same N+1 path the PR closed, and silently triggered on the
detail endpoint via shared serializer inheritance.

- Drop `_best_prediction`, `_best_identification`, `get_detection_images`
  fallbacks. Serializer trusts the prefetch contract.
- Add `prefetch_detections_for_detail()` factory + `with_detail_prefetches()`
  queryset method. Detail viewset now applies it instead of an inline Prefetch.
- Drop `has_prefetched_classifications()` helper (no longer needed).
- Add `TestOccurrenceDetailQueryCount` regression guard. Detail endpoint
  measures 12 queries; list (warm test client) measures 6.

Closes review threads on PR #1274 (mihow comments 3164296309/3164306519,
CodeRabbit comment 3163836361).

Co-Authored-By: Claude <noreply@anthropic.com>
…on_images

Per PR review (mihow): adopt the 'disable model methods' safety pattern as
strict prefetch contract. Helpers now raise RuntimeError when required
relations are missing from `_prefetched_objects_cache` instead of silently
slow-pathing through Django's lazy lookups.

- `_require_prefetch()` gate on best_prediction / best_identification /
  detection_image_urls helpers.
- Restore old behavior: list responses cap detection_images at 3
  (`OccurrenceListSerializer.detection_images_limit`). Detail subclass
  overrides to None (unbounded) — TODO: bound when occurrence tracking
  enables thousands of detections per occurrence.
- New tests:
  - TestOccurrenceResponseShape pins list + detail JSON keys + the list cap.
  - TestOccurrencePrefetchHelpersEdgeCases asserts safe defaults on empty
    prefetched relations, and RuntimeError when prefetch is missing.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/main/models_future/occurrence.py`:
- Around line 83-85: The strict-path prefetch check currently only asserts
detections were prefetched but then iterates det.classifications.all(), which
can trigger per-detection queries; update the check to require the nested
classifications prefetch (use _require_prefetch(occurrence,
"detections__classifications") or otherwise validate that each Detection object
has a prefetched 'classifications' attribute) before iterating, so the code
fails fast when nested prefetching is missing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dff16c79-6ad9-4049-9270-883a29b5cf76

📥 Commits

Reviewing files that changed from the base of the PR and between b62fb39 and 454ec77.

📒 Files selected for processing (5)
  • ami/main/api/serializers.py
  • ami/main/api/views.py
  • ami/main/models.py
  • ami/main/models_future/occurrence.py
  • ami/main/tests.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • ami/main/api/views.py
  • ami/main/tests.py

Comment thread ami/main/models_future/occurrence.py
mihow and others added 2 commits April 29, 2026 16:22
Drop TestOccurrenceResponseShape — most assertions are key-presence checks
already enforced by serializer Meta. Fold the only valuable assertion (the
detection_images per-row cap on list responses) into _list_occurrences in
TestOccurrenceListQueryCount where the response data is already inspected.

Co-Authored-By: Claude <noreply@anthropic.com>
…cover image to 1

Exports invoke OccurrenceExportSerializer (subclasses OccurrenceSerializer),
whose inherited get_determination_details / _best_prediction call the strict
prefetch helpers in models_future/occurrence.py. JSONExporter.get_queryset()
did not apply the prefetch, so two CI tests in ami.exports.tests.DataExportTest
raised RuntimeError ("Occurrence X is missing prefetched relations
['detections']"). Add .with_detail_prefetches() to the exporter queryset.

Cap OccurrenceListSerializer.detection_images_limit from 3 to 1 — list cards
render a single cover image; payload stays tight without changing response
shape.

Document the nested-prefetch caveat in _require_prefetch: it only checks
top-level relations, so a caller prefetching 'detections' without nested
'classifications' silently re-introduces per-detection queries. Tighter
enforcement is deferred to the django-zen-queries follow-up (issue #1271),
which catches per-row queries at the view boundary regardless of depth.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow changed the title fix(occurrences): prefetch detection paths and classifications in list view fix(occurrences): remove all n+1 queries from occurrence API Apr 30, 2026
@mihow mihow requested a review from Copilot April 30, 2026 00:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ami/main/models_future/occurrence.py Outdated
Comment thread ami/main/models_future/occurrence.py Outdated
Comment thread ami/main/models_future/occurrence.py Outdated
Comment thread ami/main/models_future/occurrence.py
Comment thread ami/main/tests.py
best_prediction_from_prefetch:
  - Skip classifications with score=None (mirrors SQL NULL semantics:
    Max(NULL) = NULL, score IN (NULL) never matches).
  - Removes the asymmetry between -inf in max calc and 0.0 in sort key.
  - Document Python-vs-SQL semantic difference around per-algorithm
    max grouping (Python is the stricter interpretation).

TestOccurrenceDetailQueryCount:
  - Add test_detail_query_count_does_not_scale_with_detections,
    inflating one occurrence to 9 detections x 4 classifications.
    The single-detection test alone could not catch a multi-detection
    N+1; this guards the actual property the test name asserts.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
ami/main/api/serializers.py (1)

1384-1426: ⚡ Quick win

Memoize _best_prediction per occurrence to avoid duplicate scans.

_best_prediction(obj) is used in both get_determination_details (Line 1406) and get_best_machine_prediction (Line 1425). Caching it once on the object avoids repeated Python traversal of prefetched detections/classifications.

Suggested refactor
     def _best_prediction(self, obj: Occurrence):
         from ami.main.models_future.occurrence import best_prediction_from_prefetch
 
-        return best_prediction_from_prefetch(obj)
+        cache_attr = "_cached_best_prediction_from_prefetch"
+        if hasattr(obj, cache_attr):
+            return getattr(obj, cache_attr)
+        value = best_prediction_from_prefetch(obj)
+        setattr(obj, cache_attr, value)
+        return value
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/main/api/serializers.py` around lines 1384 - 1426, The _best_prediction
traversal is called twice; modify _best_prediction to memoize its result on the
Occurrence instance (e.g., check for and return obj._cached_best_prediction if
present, otherwise compute via best_prediction_from_prefetch(obj), assign it to
obj._cached_best_prediction and return it) so both get_determination_details and
get_best_machine_prediction reuse the cached value and avoid duplicate scans;
update only the _best_prediction function (reference symbols: _best_prediction,
best_prediction_from_prefetch, get_determination_details,
get_best_machine_prediction) and ensure you cache None results as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/main/tests.py`:
- Around line 3049-3054: The test creates a Detection with bbox set to
normalized floats but the codebase expects absolute pixel coordinates; update
the Detection.objects.create call to supply integer pixel bbox values instead of
normalized floats—for example compute with the fixture image dimensions (e.g.,
[int(source_image.width*0.1), int(source_image.height*0.1),
int(source_image.width*0.2), int(source_image.height*0.2)]) or use explicit
pixel values like [10, 10, 20, 20] for the bbox field so the Detection.bbox
matches production assumptions.

---

Nitpick comments:
In `@ami/main/api/serializers.py`:
- Around line 1384-1426: The _best_prediction traversal is called twice; modify
_best_prediction to memoize its result on the Occurrence instance (e.g., check
for and return obj._cached_best_prediction if present, otherwise compute via
best_prediction_from_prefetch(obj), assign it to obj._cached_best_prediction and
return it) so both get_determination_details and get_best_machine_prediction
reuse the cached value and avoid duplicate scans; update only the
_best_prediction function (reference symbols: _best_prediction,
best_prediction_from_prefetch, get_determination_details,
get_best_machine_prediction) and ensure you cache None results as well.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2a4cdc38-93a6-40e4-883d-075bc1815960

📥 Commits

Reviewing files that changed from the base of the PR and between 454ec77 and 1f5918a.

📒 Files selected for processing (4)
  • ami/exports/format_types.py
  • ami/main/api/serializers.py
  • ami/main/models_future/occurrence.py
  • ami/main/tests.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • ami/main/models_future/occurrence.py

Comment thread ami/main/tests.py
Comment on lines +3049 to +3054
det = Detection.objects.create(
source_image=source_image,
occurrence=occurrence,
timestamp=source_image.timestamp,
bbox=[0.1, 0.1, 0.2, 0.2],
path=f"detections/inflated_{i}.jpg",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use pixel-space bbox values in this detection fixture.

At Line 3053, the bbox uses normalized-looking floats. This repo treats Detection.bbox as absolute pixel coordinates, so using integer pixel values here will keep fixtures consistent with production assumptions.

Suggested fix
         for i in range(8):
             det = Detection.objects.create(
                 source_image=source_image,
                 occurrence=occurrence,
                 timestamp=source_image.timestamp,
-                bbox=[0.1, 0.1, 0.2, 0.2],
+                bbox=[10, 10, 20, 20],
                 path=f"detections/inflated_{i}.jpg",
             )
Based on learnings: `Detection.bbox`/`BoundingBox` values in this repo are in absolute pixel coordinates, not normalized floats.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/main/tests.py` around lines 3049 - 3054, The test creates a Detection
with bbox set to normalized floats but the codebase expects absolute pixel
coordinates; update the Detection.objects.create call to supply integer pixel
bbox values instead of normalized floats—for example compute with the fixture
image dimensions (e.g., [int(source_image.width*0.1),
int(source_image.height*0.1), int(source_image.width*0.2),
int(source_image.height*0.2)]) or use explicit pixel values like [10, 10, 20,
20] for the bbox field so the Detection.bbox matches production assumptions.

@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented Apr 30, 2026

Staging A/B benchmark

Side-by-side run against arctia.dev (this PR deployed) vs serbia.dev (baseline, similar infra), Vermont Atlas of Life project 5 (5,075 occurrences). Tooling: plain curl + xargs.

Single-request latency (5 runs each, varied offset to bypass cachalot)

limit arctia (PR) serbia (baseline) speedup
5 0.63s avg 1.30s avg 2.06×
25 1.63s avg 3.91s avg 2.39×
100 6.06s avg 12.17s avg 2.01×

The wall-clock speedup is smaller than the 9.1× query-count reduction at limit=25 — per-request fixed costs (TLS, auth, project filter, presigned URL generation) don't scale with query count, so they dominate the residual.

Detail endpoint (3 runs)

arctia (PR) serbia (baseline) speedup
0.33s avg 0.62s avg 1.9×

Concurrent load (10 in flight, 30 total, limit=25)

metric arctia (PR) serbia (baseline) speedup
wall 17.2s 47.8s 2.78×
p50 5.92s 14.80s 2.50×
p95 6.51s 21.29s 3.27×
p99 6.54s 21.68s 3.31×

Tail widens under load — the PR holds steady, the baseline blows out at p95/p99.

Worker-pool unblocking — /captures/?project_id=5&limit=10 while 8 /occurrences/?limit=25 are in flight

env captures baseline captures under occurrence load
arctia (PR) 0.32–0.50s 1.46–2.37s
serbia (baseline) 0.91–1.66s 3.50–5.72s

Captures-under-occurrence-load on the PR (~2s) is roughly equivalent to captures-no-load on the baseline (~1.2s). Workers free up ~3× faster, so the "every page feels frozen" symptom under multi-reviewer use should largely go away.

Other view A/B (PR doesn't change these — sanity check that nothing regressed)

view arctia (PR) serbia (baseline)
captures 0.40s 0.89s
taxa 0.27s 0.44s
events 0.45s 0.90s
deployments 0.32s 0.48s
jobs 0.64s 1.10s
sourceimages 0.27s 0.26s

sourceimages is identical on both → confirms baseline path. Other deltas are infra/cache noise (sequential probes, no shared load); the PR doesn't touch these viewsets.

Production sizing reference (no benchmark run on prod, just for scale)

api.antenna.insectai.org Vermont Atlas of Life (project 18, 31,741 occurrences):

limit latency (no PR)
5 1.25s avg
25 2.24s avg
100 6.07s avg

Post-deploy: based on the staging multipliers, expect roughly halved latency at limit=25 (~1.0s) and ~2× at limit=5 (~0.6s). Larger pages benefit less in absolute terms because per-row serializer cost dominates above the N+1 ceiling.

Correctness

  • Top-level keys identical on PR vs baseline ✓
  • detection_images length = 1 in list response on both (this dataset has 1 detection per occurrence; cap not visible)
  • Detail content diff shows only env-specific URLs and a re-run classification (different id/score/created_at) — expected, since arctia ran ML again
  • Identification create flow exercised in UI on arctia (occurrences 50781, 50804 identified live, no 500) — confirms strict prefetch contract holds under POST
  • Occurrences-by-collection JSON export triggered for 7,090 records — validates JSONExporter.get_queryset() strict-prefetch wiring

Still pending

  • Soak window on AMI-PLATFORM-API-BMY (the original 34–40s cold-response Sentry issue) — needs ~30 min of real reviewer load on arctia
  • Production benchmark on api.antenna.insectai.org after merge

@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented Apr 30, 2026

Production load test (api.antenna.insectai.org, project 18 = 31,741 occurrences)

Light concurrency, no concurrent users at probe time.

metric 4 in flight, 12 total, limit=25
wall 14.7s
avg 4.33s
p50 4.60s
p95 6.54s
max 6.60s

Prod baseline holds up better than serbia.dev despite 6× more data — likely bigger gunicorn pool / beefier infra. But p95 still hits 6.5s under modest 4-concurrent load. Under real multi-reviewer concurrency, expect the same tail widening that we see on serbia → arctia (3.3× p95 gap), so post-deploy p95 should drop into the ~2s range.

Export verification status

  • occurrences_simple_csv export on arctia completed successfully (7,090 records, file generated). However, CSV export uses OccurrenceTabularSerializer, which doesn't go through the strict prefetch contract — this proves the CSV path wasn't broken, but does NOT validate the JSONExporter.get_queryset() fix in 8f7db77.
  • A occurrences_simple_json export still needs to be triggered on arctia to exercise OccurrenceExportSerializer → strict-prefetch helpers under real export workload.

Reusable curl + xargs runner that probes the project's occurrence count,
then issues N parallel requests at varied offsets to bypass cachalot.
Reports wall, avg, min, p50, p95, p99, max, and per-status error counts.

Used to A/B compare staging deployments (arctia vs serbia) and to
characterize prod baseline for PR #1274. Re-run after deploy to verify
the speedup against prod's pre-merge numbers in the PR thread.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented Apr 30, 2026

Production load test — scaled

Runs against https://api.antenna.insectai.org/api/v2/occurrences/?project_id=18 (Vermont Atlas of Life, 31,741 occurrences). All 200s except where noted. Test reproducer: scripts/benchmark_occurrences_list.sh (committed in cbae3fc).

concurrency total limit wall avg p50 p95 p99 max errors
4 12 25 14.7s 4.33s 4.60s 6.54s 6.60s 0/12
10 30 25 13.9s 3.79s 3.93s 5.28s 5.32s 5.32s 0/30
20 60 25 27.1s 7.70s 8.11s 13.50s 13.98s 14.00s 4/60 (502 Bad Gateway)
10 30 100 90.4s 27.13s 13.93s 57.00s 58.62s 59.60s 0/30

Findings:

  • At limit=25, prod stays usable up to ~10 concurrent (~5s p95). At 20 concurrent it saturates the gunicorn pool — 4 requests returned 502 after ~2.7s (ALB-level timeout / backpressure).
  • At limit=100, prod is already in trouble at 10 concurrent: p95 = 57s, max = 59.6s — i.e. hitting the gunicorn worker timeout. This is exactly the original Sentry symptom (AMI-PLATFORM-API-BMY, 34–40s cold responses).

Inferred post-merge expectation, based on the staging multipliers (~3.3× p95 reduction at limit=25 under concurrent load):

  • 10 concurrent / limit=25: prod p95 ~5.3s → ~1.6s
  • 20 concurrent / limit=25: 502s should disappear (workers free up faster, no saturation)
  • 10 concurrent / limit=100: prod p95 ~57s → ~17–25s (still slow, but below worker timeout — list with limit=100 is rare in the UI anyway)

Reproducer

# Pre-merge prod baseline
./scripts/benchmark_occurrences_list.sh https://api.antenna.insectai.org 18 10 30 25
./scripts/benchmark_occurrences_list.sh https://api.antenna.insectai.org 18 20 60 25
./scripts/benchmark_occurrences_list.sh https://api.antenna.insectai.org 18 10 30 100

# Post-merge prod re-run (same exact commands)
# Compare against the table above.

The script:

  • Probes count for the project so the random offset stays in range
  • Uses RANDOM % MAX_OFFSET to bypass django-cachalot per request
  • Reports avg / min / p50 / p95 / p99 / max plus per-status error counts

…rings

Simplify the prefetch contract surface introduced in this branch:

- Drop _best_prediction / _best_identification thin wrappers in
  OccurrenceListSerializer; inline best_*_from_prefetch calls.
- Drop prefetches_for_list_serializer / prefetches_for_detail_serializer
  1-line list wrappers; with_list_prefetches / with_detail_prefetches now
  call the Prefetch factories directly.
- Compress _require_prefetch and best_*_from_prefetch docstrings; keep the
  strict-prefetch behavior and the SQL/Python null-handling note as a
  single line each.
- OccurrenceSerializer.detection_images_limit = 100 (was None) — bound
  detail responses from the start; pagination is the long-term answer.

Net -47 lines, no behavior change beyond the detail detection_images cap.

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OccurrenceViewSet list: N+1 queries on detections, best_prediction, best_identification

2 participants