Skip to content

[P0] Comprehensive Testing Strategy Refactor #207

@jeremyeder

Description

@jeremyeder

Problem Statement

AgentReady's current testing approach has too many tests with insufficient signal:

  • Test failures: Unclear what broke and why
  • Flaky tests: Tests fail intermittently without code changes
  • Slow CI: Tests take too long, slowing development velocity
  • Low coverage: ~37% coverage despite many tests
  • GHA complexity: Multiple workflows with overlapping responsibilities

Root cause: Focus on quantity over quality. More tests ≠ better testing.


Proposed Solution: Signal-Focused Testing Strategy

Phase 1: Categorize and Audit Existing Tests (Week 1)

Goal: Understand what we have and what provides value.

  1. Inventory all tests:

    • Count tests by category (unit, integration, e2e)
    • Identify duplicate/overlapping tests
    • Find tests with unclear assertions
    • Flag flaky tests (fail >5% of runs)
  2. Measure signal quality:

    • Which tests catch real bugs?
    • Which tests provide clear failure messages?
    • Which tests are too brittle (fail on safe refactors)?
  3. Deliverable: Testing audit report

    • List of tests to keep/delete/refactor
    • Signal-to-noise ratio analysis
    • Recommended testing philosophy document

Phase 2: Simplify GitHub Actions (Week 1-2)

Goal: Reduce GHA complexity and improve CI speed.

Current state:

  • Multiple workflows with overlapping responsibilities
  • Tests run multiple times (waste compute)
  • Hard to understand what failed and why

Proposed changes:

  1. Consolidate workflows:

    • Single PR workflow for all quality checks
    • Separate release workflow (keep existing)
    • Remove redundant/duplicate checks
  2. Optimize test execution:

    • Run E2E tests first (fast, high signal)
    • Run unit tests in parallel by module
    • Skip slow tests for draft PRs
    • Cache dependencies aggressively
  3. Improve failure reporting:

    • Clear job names that explain what they test
    • Fail-fast for E2E failures
    • Annotate PRs with specific failure context
  4. Deliverable: Simplified GHA configuration

    • Single .github/workflows/pr.yml for all checks
    • Clear job structure with descriptive names
    • <5 minute CI time for typical PRs

Phase 3: Refactor Test Suite (Week 2-3)

Goal: High-signal tests that catch real issues quickly.

Testing pyramid target:

E2E Tests (5-10 tests)          ← Critical user journeys only
├─ Happy path: assess current repo
├─ Error handling: invalid config
├─ Security: sensitive directory blocking
└─ Performance: large repo (<5min timeout)

Integration Tests (20-30 tests) ← Module boundaries
├─ Scanner + Assessors
├─ Reporter + Templates
└─ CLI + Services

Unit Tests (100-150 tests)      ← Core logic only
├─ Assessment scoring algorithm
├─ Pattern extraction
├─ Research report validation
└─ Edge cases and error handling

Principles:

  1. Each test has clear purpose:

    • What does it test? (one thing)
    • What could break? (specific failure mode)
    • How do you fix it? (actionable error message)
  2. Avoid testing implementation details:

    • Test behavior, not internal structure
    • Refactors shouldn't break tests
    • Mock only external dependencies
  3. Fast feedback:

    • E2E tests: <10s each (total <2min)
    • Integration tests: <1s each
    • Unit tests: <100ms each
    • Full suite: <5min
  4. Deliverable: Refactored test suite

    • Delete 50%+ of existing tests (low signal)
    • Rewrite 30% with clearer assertions
    • Keep 20% as-is (already good)
    • Target 70% coverage of critical paths

Phase 4: Documentation & Process (Week 3-4)

Goal: Prevent test suite from degrading again.

  1. Testing guidelines (TESTING.md):

    • When to write unit vs integration vs e2e tests
    • How to write high-signal tests
    • Common anti-patterns to avoid
  2. PR checklist template:

    • New feature = new test (which category?)
    • Bug fix = regression test first
    • Refactor = tests stay green
  3. Test review process:

    • Code reviews include test quality check
    • PRs with low-signal tests get feedback
    • Flaky test reports trigger investigation
  4. Deliverable: Testing culture documentation

    • TESTING.md guide
    • Updated CONTRIBUTING.md with test requirements
    • PR template with test checklist

Success Metrics

Metric Current Target Measure
CI time ~15min <5min GitHub Actions duration
Test count ~800 tests 150-200 tests pytest count
Coverage ~37% 70% (critical paths) pytest-cov
Flakiness Unknown <1% failure rate Track over 100 runs
Signal quality Low High Failure investigation time

Definition of "high signal":

  • When test fails, developer knows what broke immediately
  • Fix time: <30 minutes from failure to root cause identified
  • False positive rate: <1% (tests fail only when code is broken)

Out of Scope (Not Changing)

  • E2E test framework (pytest is fine)
  • Assertion library (assert statements are fine)
  • Test discovery mechanism (pytest auto-discovery works)

Related Issues


Acceptance Criteria

  • Testing audit report completed
  • GHA workflows consolidated to single PR workflow
  • CI time reduced to <5 minutes for typical PRs
  • Test count reduced to 150-200 high-signal tests
  • Coverage reaches 70% of critical code paths
  • Flakiness rate <1% (measured over 100 CI runs)
  • TESTING.md guide created and reviewed
  • PR template updated with test checklist

Priority: P0

Why P0: Testing is infrastructure. Bad tests slow down all development.

Timeline: 3-4 weeks (can be done incrementally in PRs)

Assignee: TBD (could be broken into multiple assignees for phases)


🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions