Skip to content

fix: dedupe /api/projects across providers#5

Merged
0bserver07 merged 3 commits into
mainfrom
fix/project-list-dedup
Apr 25, 2026
Merged

fix: dedupe /api/projects across providers#5
0bserver07 merged 3 commits into
mainfrom
fix/project-list-dedup

Conversation

@0bserver07
Copy link
Copy Markdown
Owner

Problem

Same project, used through both Claude and Codex, appears twice in the projects list. Sort by Est. Cost looks broken because each duplicate has different stats (one row has the Claude cost, the other has Codex cost or $0).

Reproduced live: 166 total entries with 7 duplicate `dir_name`s including SutroYaro, chimera, StackUnderflow.

Cause

`projects` schema has `UNIQUE (provider, slug)` — same slug per provider is fine. The Codex adapter (added in v0.3.1) registered the same projects under `provider='codex'`, producing a second row per slug. The `/api/projects` endpoint passes them through as-is.

Fix

Group `project_rows` by `slug` in `routes/projects.py` and merge:

  • `last_modified` = max
  • `first_seen` = min
  • `file_count` = sum
  • `total_cost` / tokens / commands = sum
  • `avg_tokens_per_command` / `avg_steps_per_command` = weighted by command count
  • `first_message_date` = min, `last_message_date` = max

Schema unchanged. Presentation-layer fix only.

Verification

  • 166 → 159 entries, 0 duplicate slugs
  • chimera total_cost: $7,379 (claude only) → $7,414 (claude + codex)
  • 420 backend tests still pass

🤖 Generated with Claude Code

0bserver07 and others added 3 commits April 25, 2026 13:55
Schema has UNIQUE(provider, slug) so a project used through both Claude
and Codex got two rows. Frontend rendered them as separate projects with
the same dir_name, breaking sort and showing duplicates (e.g. SutroYaro
once with $4645, again with $0).

Group rows by slug in /api/projects, merge stats additively (sum tokens
/ commands / cost; min first_message_date; max last_message_date;
weighted-mean averages by command count). Schema unchanged — fix is
presentation-layer only.

Verified on this machine: 166 → 159 projects, 7 → 0 duplicate slugs.
chimera total_cost: \$7,379 (claude only) → \$7,414 (claude + codex).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0bserver07 0bserver07 merged commit 43fb7db into main Apr 25, 2026
8 checks passed
@0bserver07 0bserver07 deleted the fix/project-list-dedup branch April 25, 2026 18:03
0bserver07 added a commit that referenced this pull request May 20, 2026
* fix(api): dedupe /api/projects across providers (claude + codex)

Schema has UNIQUE(provider, slug) so a project used through both Claude
and Codex got two rows. Frontend rendered them as separate projects with
the same dir_name, breaking sort and showing duplicates (e.g. SutroYaro
once with $4645, again with $0).

Group rows by slug in /api/projects, merge stats additively (sum tokens
/ commands / cost; min first_message_date; max last_message_date;
weighted-mean averages by command count). Schema unchanged — fix is
presentation-layer only.

Verified on this machine: 166 → 159 projects, 7 → 0 duplicate slugs.
chimera total_cost: \$7,379 (claude only) → \$7,414 (claude + codex).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: 0.3.3 — dedupe projects across providers

* chore: bump version to 0.3.3

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0bserver07 added a commit that referenced this pull request May 20, 2026
Ships HANDOFF §"What's left" #5: a UNION-ALL VIEW over per-month
``messages_YYYYMM`` partition tables behind the existing ``messages``
name. Future-proofs the store at multi-year scale without touching
read code.

What lands
----------
* ``store/migrations/v008_messages_partitioning.py`` — idempotent .py
  migration. Discovers existing months by ``substr(timestamp, 1, 7)``,
  splits rows into ``messages_YYYYMM`` (or ``messages_unknown`` for
  malformed timestamps), rebuilds ``usage_events`` to drop the FK on
  ``source_message_fk`` (FKs to a view aren't enforceable), drops the
  base ``messages`` table, recreates it as a UNION-ALL view, and
  installs ``_messages_id_seq`` + an INSTEAD OF trigger.
* ``ingest/writer.py`` — ``_partition_for(ts)`` routes inserts to the
  right partition; ``_ensure_partition`` lazily creates new month
  tables + rebuilds the view + trigger. The INSTEAD OF trigger is the
  slow path (raw ``INSERT INTO messages`` from tests / tooling);
  production writes bypass it.
* ``docs/specs/messages-partitioning.md`` — design choice (Option A
  view, not Option B ATTACH), rollback plan, ops rollout for the
  maintainer's 1.9 GB store.
* ``tests/stackunderflow/store/test_partitioning.py`` — 12 tests
  cover migration on fresh + seeded DBs, FK-drop verification, writer
  routing across months, future-month auto-creation, malformed-ts
  routing, normalize hook end-to-end, backfill end-to-end, the
  trigger's explicit-id path.
* Spot fixes to existing tests where ``cur.lastrowid`` after
  ``INSERT INTO messages`` is now meaningless (the trigger's nested
  insert id doesn't propagate); they read from ``_messages_id_seq``
  instead.

Constraint compliance
---------------------
* Did NOT touch ``~/.stackunderflow/store.db``. Migration is reviewed
  + applied manually by the maintainer per the spec doc.
* Pre: 1598 passing, 2 skipped, 11 deselected.
* Post: 1610 passing (12 new partition tests), 2 skipped, 11
  deselected.
* ``ruff check`` clean on the new files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
0bserver07 added a commit that referenced this pull request May 20, 2026
HANDOFF #5 asked whether /api/cost-data's command_costs block could
migrate from the aggregator to command_mart, mirroring the Wave 5
tool_costs migration. Investigation confirms the shape mismatch the
HANDOFF flagged is structural, not stale:

- aggregator: list of per-Interaction rows (interaction_id, session_id,
  prompt_preview, timestamp, tools_used, steps, models_used, had_error,
  cost, tokens), top 50 desc by cost
- command_mart: (day, project_id, command_name) rollup with
  {event_count, cost_usd, tokens_in, tokens_out, session_count}

command_mart_for_project returns sums over command_name — the helper
is already wired and feeds reports/optimize.py + the CLI report
command. It is NOT a drop-in source for this route's response shape
because the mart's grain discards the per-Interaction fields the
frontend's CommandCostList (CommandCost[] in analytics.ts) reads.
Extending the helper cannot recover what the grain doesn't store.

Changes:
- routes/cost.py: expand the _overlay_mart_rollups docstring to spell
  out the structural reason command_costs stays aggregator-driven
- tests/stackunderflow/routes/test_cost_command_mart_overlay.py: three
  new tests lock in the verified behaviour — populated command_mart
  does not swap out aggregator output, empty command_mart does the
  same, and the helper's rollup shape is asserted (no per-Interaction
  fields)

A future per-Interaction-grain mart (e.g. interaction_mart) could
power this overlay; the new tests will need updating then.

Tests: 2312 → 2315 passing (+3). ruff baseline preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant