##### (1) **Define what a failed PR means.**

A failed PR is a PR that was closed and not merged. A stale-failed PR can also be a PR that has remained open with greater than 90 days of activity (optional for now).

##### (2) **Translate Wang et.al's categories onto our approach.**

Wang et.al's paper deals with abandonment of code changes, particularly on Gerrit, which is a popular review tool. There, they have enough contextual information to classify the reason for abandonment.

Our case is similar but differentiates in two clear ways. Firstly, we deal with GitHub PRs and secondly, we are dealing with failed/rejected PRs and not necessarily 'abandoned' PRs. Nevertheless, we aim to define a similar taxonomy inspired by Wang et.al.

High-level groups:

| High-Level Group               | Description                                                             |
|-------------------------------|-------------------------------------------------------------------------|
| TECHNICAL (T)                 | Code, tests, style, architecture                                        |
| FIT_OR_VALUE (F)              | Feature/requirement mismatch, not needed, wrong direction               |
| PROCESS_OR_OWNERSHIP (P)      | Time, priority, unresponsiveness, social/review issues, policy          |
| REDUNDANCY_OR_OBSOLETE (R)    | Duplicate, superseded, obsolete because project moved on                |
| INVALID_OR_SPAM (I)           | Spam, noise, mistakes, non-meaningful PRs                               |
| UNKNOWN (U)                   | Not enough info                                                          |

----

| Wang Category                     | Codes        |
|-----------------------------------|--------------|
| low quality / technical issues    | T1, T2       |
| requirement / design evolution    | F1, R2       |
| priority / time constraints       | P1, P2       |
| social / review process           | P3           |
| process / policy                  | P4           |
| redundant / obsolete              | R1, R2       |
| abandonment by author             | P2           |
| unknown / no info                 | U1           |




In [None]:
## T1 –> Technical: CI/test failures not addressed

{
  "id": "T1_CI_OR_TEST_FAILURE",
  "group": "TECHNICAL",
  "definition": "The PR is not merged mainly because tests, builds, or CI checks fail due to issues introduced or exposed by the PR, and the author never fixes them.",
  "typical_signals": [
    "ci_failed = True",
    "Timeline events like 'ci_failure', 'workflow_run.completed: failure'",
    "Review comments: 'tests are failing', 'please fix CI', 'build is broken'",
    "No later commit that fixes the failures before close"
  ]
}


In [None]:
## T2 –> Technical: Code quality/correctness/style

{
  "id": "T2_CODE_QUALITY_OR_CORRECTNESS",
  "group": "TECHNICAL",
  "definition": "The PR is rejected due to code-level problems: bugs, unsafe changes, poor design, missing tests, or not following project conventions, even if CI might pass.",
  "typical_signals": [
    "Review comments pointing to logic bugs, missing edge cases",
    "Mentions of 'too complex', 'not maintainable', 'needs refactoring'",
    "CHANGES_REQUESTED with code-quality arguments",
    "No strong signals about CI or requirements mismatch"
  ]
}

In [None]:
##  F1 –> Fit: Wrong feature, not needed, or misaligned
{
  "id": "F1_REQUIREMENT_MISMATCH_OR_NOT_NEEDED",
  "group": "FIT_OR_VALUE",
  "definition": "The PR is closed because the proposed change does not align with project goals, design decisions, or feature priorities (even if technically sound).",
  "typical_signals": [
    "Comments: 'we don't want to support this', 'out of scope', 'not on our roadmap'",
    "Labels: wontfix, invalid-feature, design-discussion-needed",
    "Maintainers suggest a different direction or existing alternative"
  ]
}

In [None]:
## P1 –> Process: Low priority/backlog/no one picked it up
{
  "id": "P1_LOW_PRIORITY_OR_STALE",
  "group": "PROCESS_OR_OWNERSHIP",
  "definition": "The PR is not merged because it is treated as low priority or left in the backlog until it becomes stale, without clear technical objections.",
  "typical_signals": [
    "Long periods with no comments or updates",
    "Stale bot comments, 'This PR has been inactive for X days'",
    "Maintainer closes with 'closing due to inactivity', 'no bandwidth'"
  ]
}

In [None]:
## P2 –> Process: Author unresponsive / withdrew
{
  "id": "P2_AUTHOR_UNRESPONSIVE_OR_WITHDREW",
  "group": "PROCESS_OR_OWNERSHIP",
  "definition": "The PR fails because the author stops responding to review/CI feedback or explicitly closes/abandons the PR.",
  "typical_signals": [
    "changes_requested = True and no follow-up commits",
    "Comments: 'any update?', 'ping?' from maintainers",
    "Author comment: 'I won't have time', 'closing this for now'",
    "PR closed by author without other clear reason"
  ]
}

In [None]:
## P3 –> Process: Review/social/coordination conflict
{
  "id": "P3_REVIEW_OR_SOCIAL_CONFLICT",
  "group": "PROCESS_OR_OWNERSHIP",
  "definition": "The PR is not merged due to unresolved disagreement, confusion about ownership, or non-technical conflicts in the review process.",
  "typical_signals": [
    "Long back-and-forth arguments in comments",
    "Reviewers and author disagree on approach, style, or responsibility",
    "Comments: 'we can't agree on this', 'let's not move forward', 'needs broader design discussion'"
  ]
}

In [None]:
## P4 –> Process: Policy/compliance/bureaucracy
{
  "id": "P4_POLICY_OR_COMPLIANCE_ISSUE",
  "group": "PROCESS_OR_OWNERSHIP",
  "definition": "The PR is closed because it violates project/process policies (e.g., missing CLA, no issue linked, wrong branch, template not followed), not because of the technical content itself.",
  "typical_signals": [
    "Comments: 'please sign the CLA', 'target the dev branch, not main'",
    "Labels: 'needs-signoff', 'invalid', 'needs-issue'",
    "Close message about missing legal / policy requirements"
  ]
}

In [None]:
## R1 –> Redundancy: Duplicate/already fixed
{
  "id": "R1_DUPLICATE_OR_ALREADY_FIXED",
  "group": "REDUNDANCY_OR_OBSOLETE",
  "definition": "The PR is not merged because it duplicates an existing PR/commit or the issue is already fixed elsewhere.",
  "typical_signals": [
    "Comments: 'duplicate of #123', 'already fixed in #456'",
    "Links to other PRs that do the same thing",
    "Close reason explicitly references another PR"
  ]
}

In [None]:
## R2 –> Redundancy: Obsolete due to project changes
{
  "id": "R2_OBSOLETE_DUE_TO_PROJECT_CHANGE",
  "group": "REDUNDANCY_OR_OBSOLETE",
  "definition": "The PR becomes obsolete because the codebase or requirements changed (e.g., big refactor, feature removed), so merging no longer makes sense.",
  "typical_signals": [
    "Comments: 'code has changed too much', 'this area was refactored'",
    "Mentions of 'no longer relevant', 'outdated', 'module removed'",
    "Merge conflicts that persist for a long time and then close"
  ]
}

In [None]:
## S1 –> Invalid/spam/nonsensical
{
  "id": "S1_INVALID_OR_SPAM",
  "group": "INVALID_OR_SPAM",
  "definition": "The PR is not a legitimate contribution: spam, nonsense changes, automated noise, or clearly opened by mistake.",
  "typical_signals": [
    "Labels: 'spam', 'invalid', 'invalid PR', 'hacktoberfest-spam'",
    "Comments: 'spam PR', 'test PR', 'opened by mistake'",
    "Diff is trivial/no-op or clearly nonsense"
  ]
}

In [None]:
## U1 –> Unknown/not enough information
{
  "id": "U1_UNKNOWN",
  "group": "UNKNOWN",
  "definition": "There is not enough information in the PR, comments, CI logs, or timeline to infer a plausible reason for failure.",
  "typical_signals": [
    "No comments or only generic bot comments",
    "Closed without explanation, no CI failures, no reviews",
    "Context text is too sparse or missing key fields"
  ]
}