[P0] Comprehensive Testing Strategy Refactor

## Problem Statement

AgentReady's current testing approach has too many tests with insufficient signal:

- **Test failures**: Unclear what broke and why
- **Flaky tests**: Tests fail intermittently without code changes
- **Slow CI**: Tests take too long, slowing development velocity
- **Low coverage**: ~37% coverage despite many tests
- **GHA complexity**: Multiple workflows with overlapping responsibilities

**Root cause**: Focus on quantity over quality. More tests ≠ better testing.

---

## Proposed Solution: Signal-Focused Testing Strategy

### Phase 1: Categorize and Audit Existing Tests (Week 1)

**Goal**: Understand what we have and what provides value.

1. **Inventory all tests**:
   - Count tests by category (unit, integration, e2e)
   - Identify duplicate/overlapping tests
   - Find tests with unclear assertions
   - Flag flaky tests (fail >5% of runs)

2. **Measure signal quality**:
   - Which tests catch real bugs?
   - Which tests provide clear failure messages?
   - Which tests are too brittle (fail on safe refactors)?

3. **Deliverable**: Testing audit report
   - List of tests to keep/delete/refactor
   - Signal-to-noise ratio analysis
   - Recommended testing philosophy document

### Phase 2: Simplify GitHub Actions (Week 1-2)

**Goal**: Reduce GHA complexity and improve CI speed.

**Current state**:
- Multiple workflows with overlapping responsibilities
- Tests run multiple times (waste compute)
- Hard to understand what failed and why

**Proposed changes**:

1. **Consolidate workflows**:
   - Single PR workflow for all quality checks
   - Separate release workflow (keep existing)
   - Remove redundant/duplicate checks

2. **Optimize test execution**:
   - Run E2E tests first (fast, high signal)
   - Run unit tests in parallel by module
   - Skip slow tests for draft PRs
   - Cache dependencies aggressively

3. **Improve failure reporting**:
   - Clear job names that explain what they test
   - Fail-fast for E2E failures
   - Annotate PRs with specific failure context

4. **Deliverable**: Simplified GHA configuration
   - Single `.github/workflows/pr.yml` for all checks
   - Clear job structure with descriptive names
   - <5 minute CI time for typical PRs

### Phase 3: Refactor Test Suite (Week 2-3)

**Goal**: High-signal tests that catch real issues quickly.

**Testing pyramid target**:

```
E2E Tests (5-10 tests)          ← Critical user journeys only
├─ Happy path: assess current repo
├─ Error handling: invalid config
├─ Security: sensitive directory blocking
└─ Performance: large repo (<5min timeout)

Integration Tests (20-30 tests) ← Module boundaries
├─ Scanner + Assessors
├─ Reporter + Templates
└─ CLI + Services

Unit Tests (100-150 tests)      ← Core logic only
├─ Assessment scoring algorithm
├─ Pattern extraction
├─ Research report validation
└─ Edge cases and error handling
```

**Principles**:

1. **Each test has clear purpose**:
   - What does it test? (one thing)
   - What could break? (specific failure mode)
   - How do you fix it? (actionable error message)

2. **Avoid testing implementation details**:
   - Test behavior, not internal structure
   - Refactors shouldn't break tests
   - Mock only external dependencies

3. **Fast feedback**:
   - E2E tests: <10s each (total <2min)
   - Integration tests: <1s each
   - Unit tests: <100ms each
   - Full suite: <5min

4. **Deliverable**: Refactored test suite
   - Delete 50%+ of existing tests (low signal)
   - Rewrite 30% with clearer assertions
   - Keep 20% as-is (already good)
   - Target 70% coverage of critical paths

### Phase 4: Documentation & Process (Week 3-4)

**Goal**: Prevent test suite from degrading again.

1. **Testing guidelines** (`TESTING.md`):
   - When to write unit vs integration vs e2e tests
   - How to write high-signal tests
   - Common anti-patterns to avoid

2. **PR checklist template**:
   - New feature = new test (which category?)
   - Bug fix = regression test first
   - Refactor = tests stay green

3. **Test review process**:
   - Code reviews include test quality check
   - PRs with low-signal tests get feedback
   - Flaky test reports trigger investigation

4. **Deliverable**: Testing culture documentation
   - `TESTING.md` guide
   - Updated `CONTRIBUTING.md` with test requirements
   - PR template with test checklist

---

## Success Metrics

| Metric | Current | Target | Measure |
|--------|---------|--------|---------|
| **CI time** | ~15min | <5min | GitHub Actions duration |
| **Test count** | ~800 tests | 150-200 tests | pytest count |
| **Coverage** | ~37% | 70% (critical paths) | pytest-cov |
| **Flakiness** | Unknown | <1% failure rate | Track over 100 runs |
| **Signal quality** | Low | High | Failure investigation time |

**Definition of "high signal"**:
- When test fails, developer knows what broke immediately
- Fix time: <30 minutes from failure to root cause identified
- False positive rate: <1% (tests fail only when code is broken)

---

## Out of Scope (Not Changing)

- E2E test framework (pytest is fine)
- Assertion library (assert statements are fine)
- Test discovery mechanism (pytest auto-discovery works)

---

## Related Issues

- Replaces #179 (test failures fix - too broad)
- Replaces #103 (coverage target - wrong focus)
- Incorporates #192 (test reliability - now completed)
- Blocks #156 (Codecov integration - wait for coverage improvements)

---

## Acceptance Criteria

- [ ] Testing audit report completed
- [ ] GHA workflows consolidated to single PR workflow
- [ ] CI time reduced to <5 minutes for typical PRs
- [ ] Test count reduced to 150-200 high-signal tests
- [ ] Coverage reaches 70% of critical code paths
- [ ] Flakiness rate <1% (measured over 100 CI runs)
- [ ] `TESTING.md` guide created and reviewed
- [ ] PR template updated with test checklist

---

## Priority: P0

**Why P0**: Testing is infrastructure. Bad tests slow down all development.

**Timeline**: 3-4 weeks (can be done incrementally in PRs)

**Assignee**: TBD (could be broken into multiple assignees for phases)

---

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P0] Comprehensive Testing Strategy Refactor #207

Problem Statement

Proposed Solution: Signal-Focused Testing Strategy

Phase 1: Categorize and Audit Existing Tests (Week 1)

Phase 2: Simplify GitHub Actions (Week 1-2)

Phase 3: Refactor Test Suite (Week 2-3)

Phase 4: Documentation & Process (Week 3-4)

Success Metrics

Out of Scope (Not Changing)

Related Issues

Acceptance Criteria

Priority: P0

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Current	Target	Measure
CI time	~15min	<5min	GitHub Actions duration
Test count	~800 tests	150-200 tests	pytest count
Coverage	~37%	70% (critical paths)	pytest-cov
Flakiness	Unknown	<1% failure rate	Track over 100 runs
Signal quality	Low	High	Failure investigation time

[P0] Comprehensive Testing Strategy Refactor #207

Description

Problem Statement

Proposed Solution: Signal-Focused Testing Strategy

Phase 1: Categorize and Audit Existing Tests (Week 1)

Phase 2: Simplify GitHub Actions (Week 1-2)

Phase 3: Refactor Test Suite (Week 2-3)

Phase 4: Documentation & Process (Week 3-4)

Success Metrics

Out of Scope (Not Changing)

Related Issues

Acceptance Criteria

Priority: P0

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions