Skip to content

Bug: CitationRegistry global state causes cross-request citation contamination #394

@CrepuscularIRIS

Description

@CrepuscularIRIS

Bug Description

CitationRegistry stores its state in a class-level dictionary (_instances) shared across all requests. init_citation_registry() calls CitationRegistry.reset(), which wipes this global dict for every concurrent session. When two requests run the init_citation_registryassign_citation_ids_stateful pipeline at the same time, one request's reset destroys the other's in-progress state, producing corrupted or swapped citation IDs in responses.

Location

servers/custom/src/custom.py, ~line 405:

class CitationRegistry:
    _instances: Dict[int, Dict[str, Any]] = {}  # class-level, shared across all requests

    @classmethod
    def reset(cls):
        cls._instances = {}                      # wipes state for ALL concurrent sessions

~line 435:

@app.tool(output="q_ls->q_ls")
def init_citation_registry(q_ls: List[str]) -> Dict[str, Any]:
    CitationRegistry.reset()                     # global reset triggered per request
    return {"q_ls": q_ls}

Reproduction

  1. Send two concurrent requests that both invoke init_citation_registry followed by assign_citation_ids_stateful.
  2. Request A calls reset() while Request B is mid-way through assign_citation_ids_stateful.
  3. Request B's accumulated citations are wiped; it returns citation IDs starting from 1 for documents it had already assigned higher IDs.

Impact

Users receive incorrect citation numbers in answers, causing documents to be cited under wrong IDs. In multi-tenant deployments this also constitutes a cross-session information leak (one user's citation state can be reset by another user's request).

Suggested Fix

Scope registry state per request using a unique session/request ID rather than global class state:

def init_citation_registry(q_ls: List[str], request_id: str) -> Dict[str, Any]:
    CitationRegistry._instances[request_id] = {}
    return {"q_ls": q_ls, "request_id": request_id}

Or pass a fresh CitationRegistry instance through the pipeline context instead of using class-level storage.


Found via automated codebase analysis. Happy to submit a PR if this is confirmed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions