Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,36 @@ All notable changes to Iris are documented here. The format is based on [Keep a

---

## v1.2.0 — Merge Strategy detection + mergeCommit ingestion (2026-06-03)

### Added

- **Merge Strategy metric** (#76). New engine module
`analysis/merge_strategy_detector.py` classifies each repository's
dominant merge strategy (`merge` / `squash` / `rebase` / `mixed` /
`unknown`) from its merged PRs, and emits `merge_strategy`,
`merge_strategy_dominant_share`, and a `commit_metrics_reliable` flag
(False for squash/mixed, where collapsing commits makes per-commit
metrics approximate). Classification combines merge-commit ground truth
(parent count), commit-ref presence in `main`, and the GitHub squash
`(#N)` subject stamp. Strictly per-repository — no author axis.
- Wired through the full chain: schema, aggregator, report writer (Merge
Strategy section), narrative finding, i18n (en + pt-br), TypeScript
types, platform UI (repo-detail reliability badge + compare-table
column), ingest route, migration `019`, and `docs/METRICS.md`.

### Changed

- **PR ingestion** (#75) now captures `merge_commit_sha` +
`merge_commit_parent_count` on `PullRequest` and `subject` on
`CommitRef`. `github_reader` adds `mergeCommit` to the gh field lists
and fetches `mergeCommit{oid parents{totalCount}}` plus per-commit
`messageHeadline` via the light GraphQL enrichment pass. The data is
the ground-truth enabler for Merge Strategy detection; backward
compatible (fields default to `None` / `""`).

---

## v1.1.0 — Human Review Coverage + sortable compare table (2026-05-29)

### Added
Expand Down
62 changes: 61 additions & 1 deletion docs/METRICS.md
Original file line number Diff line number Diff line change
Expand Up @@ -836,7 +836,66 @@ Findings emitted by `narrative.py` (see `iris/i18n.py:finding_open_pr_aging_*`):

---

## 28. Adoption timeline (post-report, not on `ReportMetrics`)
## 28. Merge Strategy

Per-**repository** classification of how PRs land on the default branch,
plus a per-commit reliability flag. A repo's merge strategy decides how
much per-commit signal survives in history: **squash** collapses N commits
into 1 — discarding commit counts, temporal distribution, bursts, cascades,
and (depending on GitHub config) the `Co-Authored-By` trailers AI
attribution relies on. Comparing per-commit metrics across repos with
different strategies compares things that aren't comparable, and can
under-report AI adoption in squash repos.

| Field | Unit | Source | Nullable when |
|---|---|---|---|
| `merge_strategy` | `merge\|squash\|rebase\|mixed\|unknown` | `analysis/merge_strategy_detector.py` | no PR data (`prs` empty/None) |
| `merge_strategy_dominant_share` | float `0.0–1.0` | same | strategy is `unknown` |
| `commit_metrics_reliable` | bool | same | no PR data |

Per-PR classification (over **merged** PRs only), in confidence order:

1. **Ground truth** — `merge_commit_parent_count == 2` → `merge` (true
merge commit). Parent count comes from PR ingestion
(`merge_commit_sha` + parent count, issue #75); available on the
GraphQL enrichment path, `None` on the gh one-shot path → falls through
to the heuristic.
2. **Commit-ref presence in local `main` history** (the `commits` window):
- all of a PR's `commit_refs` present → `merge` (commits landed verbatim).
- none present → collapsed/rewritten: GitHub's squash default stamps the
landed subject `(#<pr-number>)` (matched against `main` subjects) →
`squash`; else single original commit → `squash`; else N commits, none
preserved → `rebase`.
3. Ambiguous (partial presence, no `commit_refs`, no signal) → that PR is
`unknown` and excluded from the dominant computation.

Aggregation: the dominant strategy over **classified** merged PRs.
`dominant_share ≥ 0.8` → that strategy; below → `mixed`; fewer than
`MIN_CLASSIFIED_PRS` (5) classified → `unknown` (reason logged, never
invented). `commit_metrics_reliable` is `False` only for `squash`/`mixed`;
`merge`/`rebase`/`unknown` stay `True` (we never flag what we can't
determine).

Privacy / ranking risk (Principle #2): **none by construction.** Strictly
per-repository — a property of repo *configuration* (the merge button),
never of people. No author axis anywhere in the output. The report/UI
framing is "this config affects metric reliability", never "this team/dev
does X".

Finding emitted by `narrative.py` (`iris/i18n.py:finding_merge_strategy_*`):

- `finding_merge_strategy_unreliable` — when `commit_metrics_reliable is
False` (squash/mixed): per-commit metrics are approximate for this repo.
- `finding_merge_strategy_descriptive` — otherwise (merge/rebase): history
preserved. Silent when `merge_strategy == "unknown"`.

Platform: indexed columns `merge_strategy` + `commit_metrics_reliable` on
`metrics` (migration `019`) feed the compare table; the full payload
carries `merge_strategy_dominant_share` for the repo-detail badge.

---

## 29. Adoption timeline (post-report, not on `ReportMetrics`)

When AI-assisted commits started appearing, and how the pre-adoption vs
post-adoption metrics compare.
Expand Down Expand Up @@ -936,6 +995,7 @@ By-origin attribution:
| `analysis/pr_lifecycle.py` | `pr_merged_count`, `pr_median_time_to_merge_hours`, `pr_mean_time_to_merge_hours`, `pr_p90_time_to_merge_hours`, `pr_pct_merged_within_24h`, `pr_cycle_time_buckets`, `pr_median_size_files`, `pr_median_size_lines`, `pr_review_rounds_median`, `pr_single_pass_rate` |
| `analysis/flow_load.py` | `flow_load` |
| `analysis/flow_efficiency.py` | `flow_efficiency_median`, `median_time_to_first_review_hours`, `time_in_phase_median_hours`, `flow_efficiency_by_intent`, `flow_efficiency_by_origin` |
| `analysis/merge_strategy_detector.py` | `merge_strategy`, `merge_strategy_dominant_share`, `commit_metrics_reliable` |
| `analysis/dora_real.py` | `dora_source`, `dora_deployments_total`, `dora_deployments_failed`, `dora_deployments_pending_evaluation`, `dora_incidents_total`, `dora_cfr`, `dora_mttr_per_deploy_seconds_median`, `dora_mttr_per_deploy_seconds_p90`, `dora_mttr_per_incident_seconds_median`, `dora_mttr_per_incident_seconds_p90`, `dora_rollback_rate`, `dora_rollbacks_total`, `dora_lead_time_seconds_median`, `dora_deploy_frequency_per_day`, `dora_remediation_distribution`, `dora_cfr_by_origin`, `dora_rollback_rate_by_origin`, `dora_cfr_by_origin_coverage_pct` |
| `analysis/duplicate_detector.py` | `duplicate_block_rate`, `duplicate_block_count`, `duplicate_median_block_size`, `duplicate_by_origin`, `duplicate_by_tool` |
| `analysis/move_detector.py` | `moved_code_pct`, `refactoring_ratio`, `move_by_origin` |
Expand Down
215 changes: 215 additions & 0 deletions iris/analysis/merge_strategy_detector.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
"""Merge Strategy detection — per-repository classification of how PRs land.

A repository's merge strategy determines how much per-commit signal
survives in history. Squash collapses N commits into 1, discarding commit
counts, temporal distribution, bursts/cascades, and — depending on the
GitHub config — the ``Co-Authored-By`` trailers that AI attribution relies
on. Comparing per-commit metrics across repos that use *different*
strategies compares things that are not comparable, and can under-report AI
adoption in squash repos as if it were low real usage.

This module classifies a repo into one of ``{merge, squash, rebase, mixed,
unknown}`` from its merged PRs, and emits ``commit_metrics_reliable=False``
when the strategy (squash/mixed) erodes per-commit signal.

Privacy / ranking risk
----------------------
Strictly per-repository, by construction. The classification is a property
of repo *configuration* (the merge button), never of people. There is no
author axis anywhere in the output (Principle #2 / Non-Goal: no individual
ranking). The report/UI framing is always "this config affects metric
reliability", never "this team/dev does X".

Signals, in order of confidence
--------------------------------
1. **Ground truth** (when ``merge_commit_parent_count`` is available):
- parent count ``== 2`` → the PR landed as a true merge commit → MERGE.
2. **Commit-ref presence in the local main history:**
- *all* of the PR's ``commit_refs`` appear in main → the original
commits landed verbatim (merge or fast-forward) → MERGE.
- *none* of the PR's ``commit_refs`` appear in main → the commits were
collapsed or rewritten:
* GitHub's squash default stamps the landed commit subject with
``(#<pr-number>)`` — a main-branch subject carrying this PR's
number corroborates SQUASH.
* else a single original commit → SQUASH (1→1 collapse/rewrite).
* else multiple original commits, none preserved → REBASE (replayed).
3. Anything ambiguous (partial presence, no commit_refs, no signal) →
``unknown`` for that PR; excluded from the dominant computation.

Aggregation
-----------
The dominant strategy over the *classified* merged PRs.
``dominant_share >= DOMINANT_SHARE_THRESHOLD`` → that strategy; below that
→ ``mixed``; fewer than ``MIN_CLASSIFIED_PRS`` classified → ``unknown``
(the reason is carried on the result and logged by the caller, never
invented).

Window / edge cases
-------------------
Uses the same ``commits`` window the rest of the pipeline runs on. A PR
merged near the window's start whose commits predate the window will read
as "refs absent" — the same window limitation Acceptance Rate lives with.
Repo with no merged PRs → ``unknown``. Open/closed-without-merge PRs are
ignored (they never landed).
"""

import re
from collections import defaultdict
from dataclasses import dataclass, field

from iris.models.commit import Commit
from iris.models.pull_request import PullRequest

# Minimum classifiable merged PRs before we trust a dominant strategy.
MIN_CLASSIFIED_PRS = 5

# Share of classified PRs the leading strategy must reach to "own" the repo;
# below it (with a real split) the repo is ``mixed``. Hypothesis pending
# calibration — see Principle #4 (metrics are hypotheses).
DOMINANT_SHARE_THRESHOLD = 0.8

# Strategies that erode per-commit signal — these flip commit_metrics_reliable.
_UNRELIABLE_STRATEGIES = frozenset({"squash", "mixed"})

# GitHub's squash default titles the landed commit "<PR title> (#<number>)".
_SQUASH_SUBJECT_RE = re.compile(r"\(#(\d+)\)\s*$")


@dataclass(frozen=True)
class MergeStrategyResult:
"""Per-repository merge-strategy classification.

``distribution`` and ``reason`` are diagnostics (not persisted to the
metrics schema) — they let the aggregator log *why* a repo came back
``unknown`` instead of inventing a strategy.
"""

merge_strategy: str # merge | squash | rebase | mixed | unknown
dominant_share: float | None # None when unknown
commit_metrics_reliable: bool
classified_pr_count: int
distribution: dict[str, int] = field(default_factory=dict)
reason: str | None = None


def detect_merge_strategy(
prs: list[PullRequest],
commits: list[Commit],
*,
min_classified: int = MIN_CLASSIFIED_PRS,
) -> MergeStrategyResult:
"""Classify a repo's dominant merge strategy from its merged PRs.

Args:
prs: PRs from github_reader (any state — only merged ones count).
commits: local main-branch commits from git_reader (the window the
rest of the pipeline analyses). Used to test whether a PR's
commit_refs landed verbatim and to read squash subject stamps.
min_classified: minimum classifiable merged PRs before a dominant
strategy is trusted; below it the repo is ``unknown``.

Returns:
``MergeStrategyResult`` — always non-None (``unknown`` when there
isn't enough signal). ``commit_metrics_reliable`` is False only for
squash/mixed; merge/rebase/unknown stay True (we never flag what we
can't determine).
"""
merged = [pr for pr in prs if pr.state == "merged"]
if not merged:
return MergeStrategyResult(
merge_strategy="unknown",
dominant_share=None,
commit_metrics_reliable=True,
classified_pr_count=0,
distribution={},
reason="no merged PRs in window",
)

main_hashes = {c.hash for c in commits}
squash_pr_numbers = _squash_pr_numbers(commits)

distribution: dict[str, int] = defaultdict(int)
for pr in merged:
distribution[_classify_pr(pr, main_hashes, squash_pr_numbers)] += 1

classified = {k: v for k, v in distribution.items() if k != "unknown"}
classified_count = sum(classified.values())

if classified_count < min_classified:
return MergeStrategyResult(
merge_strategy="unknown",
dominant_share=None,
commit_metrics_reliable=True,
classified_pr_count=classified_count,
distribution=dict(distribution),
reason=(
f"only {classified_count} classifiable merged PRs "
f"(< {min_classified} required)"
),
)

dominant = max(classified, key=classified.get)
dominant_share = round(classified[dominant] / classified_count, 3)
strategy = dominant if dominant_share >= DOMINANT_SHARE_THRESHOLD else "mixed"

return MergeStrategyResult(
merge_strategy=strategy,
dominant_share=dominant_share,
commit_metrics_reliable=strategy not in _UNRELIABLE_STRATEGIES,
classified_pr_count=classified_count,
distribution=dict(distribution),
reason=None,
)


def _classify_pr(
pr: PullRequest,
main_hashes: set[str],
squash_pr_numbers: set[int],
) -> str:
"""Classify a single merged PR. Returns merge/squash/rebase/unknown."""
# 1. Ground truth: a true merge commit has two parents.
if pr.merge_commit_parent_count == 2:
return "merge"

refs = [r.hash for r in pr.commit_refs]
present = sum(1 for h in refs if h in main_hashes)

# 2a. Every commit landed verbatim → merge / fast-forward (signal intact).
if refs and present == len(refs):
return "merge"

squash_corroborated = pr.number in squash_pr_numbers

# 2b. No original commit survived → collapsed or rewritten.
if refs and present == 0:
if squash_corroborated:
return "squash"
if len(refs) == 1:
return "squash"
return "rebase"

# 3. No usable commit_refs — lean only on the squash subject stamp.
if not refs:
return "squash" if squash_corroborated else "unknown"

# Partial presence (force-push, shared base, dropped commits) — ambiguous.
return "unknown"


def _squash_pr_numbers(commits: list[Commit]) -> set[int]:
"""PR numbers stamped on main-branch commit subjects via ``(#N)``.

GitHub's squash default writes the landed commit subject as
``<title> (#<pr-number>)``. Collecting these lets ``_classify_pr``
corroborate squash even for multi-commit PRs. Subjects without the
trailing stamp contribute nothing (no false positives from mid-message
issue references).
"""
numbers: set[int] = set()
for commit in commits:
match = _SQUASH_SUBJECT_RE.search(commit.message)
if match:
numbers.add(int(match.group(1)))
return numbers
2 changes: 1 addition & 1 deletion iris/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from iris.reports.narrative import generate_narrative
from iris.reports.writer import write_output

VERSION = "v1.1.0"
VERSION = "v1.2.0"


def _merge_durability(metrics, durability):
Expand Down
Loading
Loading