Skip to content

mapped-sources endpoint extremely slow on CIEL (~38s) #2476

@paynejd

Description

@paynejd

Problem

GET /orgs/CIEL/sources/CIEL/latest/mapped-sources/ takes ~38 seconds for a single page.

Example: https://api.openconceptlab.org/orgs/CIEL/sources/CIEL/latest/mapped-sources/?excludeSelf=false&brief=true&limit=25&offset=25&page=2

Root cause

In core/sources/models.py:

def get_mapped_sources(self, exclude_self=True):
    source_ids = self.__get_mapped_source_ids()
    ...
    return Source.objects.filter(id__in=source_ids)

def __get_mapped_source_ids(self):
    return self.mappings.values_list('to_source_id', flat=True)

self.mappings traverses a M2M (Mapping.sources = M2M('Source')). For CIEL latest, this yields one row per mapping in the version — hundreds of thousands of rows — with no DISTINCT. Postgres can't collapse the subquery early, so the id__in=(...) plan scans the full M2M join.

DRF pagination compounds this: the subquery is evaluated twice (once for COUNT(*), once for the page).

The actual result set is only a few dozen distinct sources.

Proposed fix

Rewrite get_mapped_sources as a direct join with DISTINCT, so Postgres plans it once and uses the mappings_mapping.to_source_id + M2M indexes:

def get_mapped_sources(self, exclude_self=True):
    qs = Source.objects.filter(mappings_to__sources=self).distinct()
    if exclude_self:
        qs = qs.exclude(id=self.id)
    return qs

Expected: 38s → sub-second.

Labels

type/bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    signal/small-scopeLimited to a small part of the codebasestage/triagedAI triage complete — scored and classifiedtype/bugSomething is broken

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions