# Technical Challenge - Code Review and Deployment Pipeline Orchestration

**Format:** Structured interview with whiteboarding/documentation  
**Assessment Focus:** Problem decomposition, AI prompting strategy, system design

**Please Fill in your Responses in the Response markdown boxes**

---

## Challenge Scenario

You are tasked with creating an AI-powered system that can handle the complete lifecycle of code review and deployment pipeline management for a mid-size software company. The system needs to:

**Current Pain Points:**
- Manual code reviews take 2-3 days per PR
- Inconsistent review quality across teams
- Deployment failures due to missed edge cases
- Security vulnerabilities slip through reviews
- No standardized deployment process across projects
- Rollback decisions are manual and slow

**Business Requirements:**
- Reduce review time to <4 hours for standard PRs
- Maintain or improve code quality
- Catch 90%+ of security vulnerabilities before deployment
- Standardize deployment across 50+ microservices
- Enable automatic rollback based on metrics
- Support multiple environments (dev, staging, prod)
- Handle both new features and hotfixes
---

## Part A: Problem Decomposition (25 points)

**Question 1.1:** Break this challenge down into discrete, manageable steps that could be handled by AI agents or automated systems. Each step should have:
- Clear input requirements
- Specific output format
- Success criteria
- Failure handling strategy

**Question 1.2:** Which steps can run in parallel? Which are blocking? Where are the critical decision points?

**Question 1.3:** Identify the key handoff points between steps. What data/context needs to be passed between each phase?

## Response Part A:

### Question 1.1: Problem Decomposition into Discrete Steps

#### Step 1: PR Intake & Context Gathering
**Input Requirements:**
- PR metadata (branch, author, files changed, description)
- Commit history and diff
- Project configuration (language, framework, testing setup)
- Related issues/tickets

**Output Format:**
```json
{
  "pr_id": "string",
  "risk_level": "low|medium|high|critical",
  "change_type": "feature|bugfix|hotfix|refactor",
  "affected_modules": ["array"],
  "test_coverage_required": "boolean",
  "security_review_required": "boolean",
  "estimated_complexity": "1-10"
}
```

**Success Criteria:** All metadata extracted, risk level assigned with 95%+ accuracy
**Failure Handling:** Manual classification fallback, notify reviewer for ambiguous cases

---

#### Step 2: Static Code Analysis
**Input Requirements:**
- Source code files
- Language-specific linting rules
- Security scanning policies
- Code quality thresholds

**Output Format:**
```json
{
  "syntax_violations": [{"file": "string", "line": "int", "severity": "string", "message": "string"}],
  "security_issues": [{"type": "OWASP category", "severity": "critical|high|medium|low", "cwe_id": "string"}],
  "code_smells": [{"pattern": "string", "location": "string", "suggestion": "string"}],
  "metrics": {"complexity": "int", "duplication": "percentage", "test_coverage": "percentage"}
}
```

**Success Criteria:** Zero false negatives on critical security issues, <5% false positive rate
**Failure Handling:** Fail-safe to manual review, log analysis gaps

---

#### Step 3: AI-Powered Semantic Code Review
**Input Requirements:**
- Code diff with full context
- Project architecture documentation
- Coding standards and best practices
- Historical review feedback patterns

**Output Format:**
```json
{
  "review_comments": [
    {
      "file": "string",
      "line": "int",
      "type": "bug|performance|maintainability|security|style",
      "severity": "blocking|major|minor|suggestion",
      "message": "string",
      "suggested_fix": "string (optional)",
      "confidence": "0.0-1.0"
    }
  ],
  "architectural_concerns": ["array"],
  "breaking_changes": ["array"],
  "approval_recommendation": "approve|request_changes|needs_discussion"
}
```

**Success Criteria:** 90%+ agreement with senior engineer reviews, <10 min processing time
**Failure Handling:** Flag for human review if confidence <0.7, timeout protection

---

#### Step 4: Automated Testing & Validation
**Input Requirements:**
- Test suite configuration
- Code changes
- Test environment specifications
- Coverage requirements (e.g., 80% line coverage)

**Output Format:**
```json
{
  "test_results": {
    "total": "int",
    "passed": "int",
    "failed": "int",
    "skipped": "int",
    "duration_ms": "int"
  },
  "coverage": {"lines": "percentage", "branches": "percentage", "functions": "percentage"},
  "failed_tests": [{"name": "string", "error": "string", "stack_trace": "string"}],
  "performance_benchmarks": [{"test": "string", "baseline_ms": "int", "current_ms": "int", "regression": "boolean"}]
}
```

**Success Criteria:** All tests pass, coverage meets threshold, no performance regressions >15%
**Failure Handling:** Block deployment, notify author with detailed failure report

---

#### Step 5: Security & Compliance Validation
**Input Requirements:**
- Code changes
- Dependency manifests
- Security policies (OWASP, SANS, company-specific)
- Compliance frameworks (SOC2, HIPAA, PCI-DSS if applicable)

**Output Format:**
```json
{
  "vulnerability_scan": [
    {"cve_id": "string", "severity": "critical|high|medium|low", "component": "string", "fix_available": "boolean"}
  ],
  "dependency_risks": [{"package": "string", "current_version": "string", "vulnerabilities": "int", "recommended_version": "string"}],
  "secrets_detected": [{"type": "api_key|password|token", "file": "string", "line": "int", "entropy": "float"}],
  "compliance_violations": [{"framework": "string", "rule": "string", "severity": "string"}],
  "overall_risk_score": "0-100"
}
```

**Success Criteria:** Zero critical vulnerabilities, no secrets in code, compliance checks pass
**Failure Handling:** Block deployment for critical issues, allow override with approval

---

#### Step 6: Build & Artifact Generation
**Input Requirements:**
- Approved code
- Build configuration
- Target environment specifications
- Versioning strategy

**Output Format:**
```json
{
  "build_status": "success|failure",
  "artifact_url": "string",
  "artifact_hash": "string",
  "build_duration_ms": "int",
  "image_size_mb": "float",
  "version": "string",
  "build_logs": "string"
}
```

**Success Criteria:** Build completes in <10 min, artifact size within bounds, reproducible build
**Failure Handling:** Retry once, detailed error logs, rollback to last known good build

---

#### Step 7: Deployment Orchestration
**Input Requirements:**
- Build artifact
- Deployment configuration (canary %, rollout strategy)
- Environment credentials (secured)
- Monitoring baseline metrics

**Output Format:**
```json
{
  "deployment_id": "string",
  "strategy": "blue_green|canary|rolling|recreate",
  "environments": ["dev", "staging", "prod"],
  "current_phase": "string",
  "instances_updated": "int",
  "instances_total": "int",
  "health_check_status": "healthy|degraded|unhealthy",
  "rollback_ready": "boolean"
}
```

**Success Criteria:** Zero downtime, health checks pass, metrics within SLA bounds
**Failure Handling:** Auto-rollback on health check failure, circuit breaker pattern

---

#### Step 8: Post-Deployment Monitoring & Validation
**Input Requirements:**
- Deployment metadata
- Monitoring/observability data (metrics, logs, traces)
- SLA thresholds
- Alerting rules

**Output Format:**
```json
{
  "status": "stable|degraded|failing",
  "metrics": {
    "error_rate": "percentage",
    "latency_p50": "ms",
    "latency_p99": "ms",
    "throughput": "requests/sec",
    "cpu_usage": "percentage",
    "memory_usage": "percentage"
  },
  "anomalies_detected": [{"metric": "string", "severity": "string", "deviation": "percentage"}],
  "recommendation": "continue|rollback|investigate",
  "confidence": "0.0-1.0"
}
```

**Success Criteria:** Metrics within SLA for 30+ minutes, no critical errors
**Failure Handling:** Trigger auto-rollback if error rate >5% or p99 latency >2x baseline

---

### Question 1.2: Parallelization, Blocking Steps, and Critical Decision Points

#### Parallel Execution Groups:

**Group 1 (Post-PR Submission):**
-  Static Code Analysis
-  Security & Compliance Validation (Steps 2 & 5 can run concurrently)
-  AI Semantic Code Review (Step 3)
These all analyze the same code but focus on different aspects

**Group 2 (Post-Review Approval):**
-  Build & Artifact Generation (Step 6)
- ⚠️ Integration tests can start in parallel with build completion

#### Blocking (Sequential) Steps:

1. **PR Intake (Step 1)** → Blocks everything (needs metadata)
2. **Review Results** (Steps 2, 3, 5 merge) → Blocks testing (Step 4)
3. **Testing (Step 4)** → Blocks deployment (needs validation)
4. **Build (Step 6)** → Blocks deployment (needs artifact)
5. **Deployment (Step 7)** → Blocks monitoring (Step 8)

#### Critical Decision Points:

**Decision Point 1: Post-Static Analysis**
- **Condition:** If critical security issues OR >100 code quality violations
- **Action:** Block PR, require fixes before AI review
- **Rationale:** No point in expensive AI review if basic checks fail

**Decision Point 2: Post-AI Review**
- **Condition:** If approval_recommendation = "request_changes" AND severity = "blocking"
- **Action:** Halt pipeline, notify author + human reviewer
- **Confidence Gate:** If confidence <0.7, escalate to human

**Decision Point 3: Post-Testing**
- **Condition:** Test pass rate <100% OR coverage <80%
- **Action:** Block deployment, require fixes or override approval
- **Override Path:** Principal engineer can approve with documented risk

**Decision Point 4: Pre-Production Deployment**
- **Condition:** Check if hotfix vs. feature
- **Action:** Hotfix → expedited path (skip staging), Feature → full pipeline
- **Additional Validation:** Require 2+ approvals for prod

**Decision Point 5: Post-Deployment Monitoring (15-min window)**
- **Condition:** Error rate >5% OR latency >2x baseline OR anomaly detected
- **Action:** Automatic rollback, create incident ticket
- **Human Override:** On-call engineer can force continue with justification

---

### Question 1.3: Key Handoff Points and Context Requirements

#### Handoff 1: PR Intake → Analysis Phase
**Data Passed:**
```json
{
  "pr_context": {
    "pr_id": "string",
    "risk_level": "enum",
    "change_type": "enum",
    "affected_modules": ["array"],
    "author_experience_level": "junior|mid|senior",
    "similar_pr_history": ["pr_ids"]
  },
  "code_snapshot": {
    "diff": "string",
    "full_files": ["array"],
    "dependency_changes": ["array"]
  }
}
```
**Why:** Risk level determines review depth; history helps pattern matching

---

#### Handoff 2: Analysis Phase → Review Consolidation
**Data Passed:**
```json
{
  "analysis_results": {
    "static_analysis": { /* Step 2 output */ },
    "security_scan": { /* Step 5 output */ },
    "ai_review": { /* Step 3 output */ }
  },
  "aggregated_issues": [
    {
      "issue_id": "uuid",
      "severity": "blocking|major|minor",
      "category": "string",
      "confidence": "float",
      "affected_files": ["array"],
      "recommendation": "string"
    }
  ],
  "overall_verdict": {
    "approval_status": "approved|rejected|needs_revision",
    "blocking_issues_count": "int",
    "auto_fixable_issues": ["array"]
  }
}
```
**Why:** Deduplicates overlapping findings, prioritizes issues, enables single review report

---

#### Handoff 3: Review → Testing
**Data Passed:**
```json
{
  "review_approved": "boolean",
  "code_changes": {
    "modified_functions": ["array"],
    "new_dependencies": ["array"],
    "risk_areas": ["array"]
  },
  "test_strategy": {
    "required_tests": ["unit", "integration", "e2e"],
    "focus_areas": ["array"],
    "performance_benchmarks": ["array"],
    "coverage_targets": {"lines": "80", "branches": "75"}
  }
}
```
**Why:** Directs testing to high-risk areas, ensures appropriate test depth

---

#### Handoff 4: Testing → Build
**Data Passed:**
```json
{
  "test_results": { /* Step 4 output */ },
  "build_config": {
    "environment_vars": {"key": "value"},
    "build_flags": ["array"],
    "target_platforms": ["array"],
    "optimization_level": "debug|release"
  },
  "versioning": {
    "semantic_version": "X.Y.Z",
    "git_commit_sha": "string",
    "build_timestamp": "ISO8601"
  }
}
```
**Why:** Test success gates build; version metadata embedded in artifact

---

#### Handoff 5: Build → Deployment
**Data Passed:**
```json
{
  "artifact": {
    "location": "string",
    "hash": "string",
    "size_mb": "float",
    "signature": "string (for verification)"
  },
  "deployment_config": {
    "target_environments": ["dev", "staging", "prod"],
    "rollout_strategy": {
      "type": "canary|blue_green|rolling",
      "canary_percentage": "int",
      "rollout_duration_min": "int"
    },
    "health_checks": {
      "endpoint": "string",
      "expected_status": "200",
      "timeout_sec": "int"
    }
  },
  "rollback_plan": {
    "previous_version": "string",
    "previous_artifact": "string",
    "rollback_triggers": ["array"]
  }
}
```
**Why:** Ensures artifact integrity, defines deployment behavior, enables instant rollback

---

#### Handoff 6: Deployment → Monitoring
**Data Passed:**
```json
{
  "deployment_metadata": {
    "deployment_id": "string",
    "version": "string",
    "timestamp": "ISO8601",
    "affected_services": ["array"],
    "deployment_strategy_used": "string"
  },
  "baseline_metrics": {
    "error_rate_baseline": "float",
    "latency_p99_baseline": "int",
    "throughput_baseline": "int"
  },
  "monitoring_config": {
    "alert_thresholds": {
      "error_rate_max": "0.05",
      "latency_p99_max_ms": "500",
      "cpu_max": "0.80"
    },
    "observation_period_min": "30",
    "auto_rollback_enabled": "boolean"
  }
}
```
**Why:** Establishes success criteria, enables anomaly detection, triggers auto-remediation

---

#### Cross-Cutting Context (Passed Through All Steps):
```json
{
  "trace_id": "uuid",
  "pr_metadata": { /* original PR info */ },
  "policy_overrides": [{"step": "string", "reason": "string", "approver": "string"}],
  "audit_trail": [
    {"timestamp": "ISO8601", "step": "string", "actor": "human|ai|system", "action": "string", "result": "string"}
  ]
}
```
**Why:** Enables end-to-end tracing, compliance auditing, troubleshooting

---

## Part B: AI Prompting Strategy (30 points)

**Question 2.1:** For 2 consecutive major steps you identified, design specific AI prompts that would achieve the desired outcome. Include:
- System role/persona definition
- Structured input format
- Expected output format
- Examples of good vs bad responses
- Error handling instructions

**Question 2.2:** How would you handle the following challenging scenarios with your AI prompts:
- **Code that uses obscure libraries or frameworks**
- **Security reviews for code**
- **Performance analysis of database queries**
- **Legacy code modifications**

**Question 2.3:** How would you ensure your prompts are working effectively and getting consistent results?

## Response Part B:

### Question 2.1: AI Prompts for Two Consecutive Steps

---

## Prompt 1: AI-Powered Semantic Code Review Agent

### System Role Definition
```yaml
role: Senior Software Engineer & Security Specialist
expertise:
  - 15+ years full-stack development
  - Security best practices (OWASP Top 10)
  - Architecture patterns (microservices, event-driven, DDD)
  - Performance optimization
  - Code maintainability and technical debt assessment
tone: Professional, constructive, educational
output_style: Structured, actionable, evidence-based
```

### Structured Input Format
```json
{
  "context": {
    "pr_id": "PR-12345",
    "repository": "payment-service",
    "language": "Python",
    "framework": "FastAPI",
    "change_type": "feature|bugfix|hotfix|refactor",
    "risk_level": "low|medium|high|critical",
    "author_experience": "junior|mid|senior"
  },
  "code_diff": {
    "files_changed": [
      {
        "path": "src/payment/processor.py",
        "additions": 45,
        "deletions": 12,
        "diff": "<!-- unified diff format -->",
        "full_context": "<!-- complete file with surrounding code -->"
      }
    ],
    "related_files": [
      "<!-- unchanged but contextually relevant files -->"
    ]
  },
  "project_context": {
    "architecture_docs": "<!-- system design, data flow -->",
    "coding_standards": "<!-- team-specific conventions -->",
    "recent_incidents": "<!-- related production issues in last 30 days -->"
  },
  "historical_patterns": {
    "common_mistakes_in_repo": ["<!-- e.g., 'Missing input validation in payment endpoints' -->"],
    "approved_patterns": ["<!-- e.g., 'Use of circuit breaker for external API calls' -->"]
  }
}
```

### Expected Output Format
```json
{
  "review_id": "uuid",
  "overall_assessment": {
    "recommendation": "approve|request_changes|needs_discussion",
    "confidence": 0.92,
    "rationale": "Well-structured implementation with proper error handling. Minor performance concern in database query pattern.",
    "estimated_review_time_saved_hours": 2.5
  },
  "issues": [
    {
      "id": "ISS-001",
      "file": "src/payment/processor.py",
      "line": 47,
      "type": "security",
      "severity": "blocking",
      "title": "SQL Injection Vulnerability",
      "description": "User input directly interpolated into SQL query without parameterization",
      "evidence": "query = f\"SELECT * FROM transactions WHERE user_id = {user_id}\"",
      "impact": "Allows attackers to execute arbitrary SQL commands, potentially exposing sensitive payment data",
      "suggested_fix": "query = \"SELECT * FROM transactions WHERE user_id = %s\"\nresult = db.execute(query, (user_id,))",
      "references": ["CWE-89", "OWASP A03:2021"],
      "confidence": 0.98
    },
    {
      "id": "ISS-002",
      "file": "src/payment/processor.py",
      "line": 89,
      "type": "performance",
      "severity": "major",
      "title": "N+1 Query Problem",
      "description": "Loop executes database query for each item instead of batch loading",
      "evidence": "for transaction in transactions:\n    user = db.query(User).filter_by(id=transaction.user_id).first()",
      "impact": "100 transactions = 101 database queries. Latency scales linearly with data volume.",
      "suggested_fix": "user_ids = [t.user_id for t in transactions]\nusers = db.query(User).filter(User.id.in_(user_ids)).all()\nuser_map = {u.id: u for u in users}",
      "references": ["https://docs.sqlalchemy.org/en/14/orm/loading_relationships.html"],
      "confidence": 0.95
    },
    {
      "id": "ISS-003",
      "file": "src/payment/processor.py",
      "line": 112,
      "type": "maintainability",
      "severity": "minor",
      "title": "Magic Number",
      "description": "Hardcoded timeout value without context",
      "evidence": "response = requests.get(url, timeout=30)",
      "impact": "Reduces code readability and makes configuration changes harder",
      "suggested_fix": "PAYMENT_GATEWAY_TIMEOUT = 30  # seconds - gateway SLA is 25s\nresponse = requests.get(url, timeout=PAYMENT_GATEWAY_TIMEOUT)",
      "references": ["Clean Code - Chapter 17"],
      "confidence": 0.85
    }
  ],
  "positive_observations": [
    "Excellent use of type hints throughout (lines 23-45)",
    "Comprehensive error handling with specific exception types (lines 67-82)",
    "Well-documented complex business logic with examples (lines 102-108)"
  ],
  "architectural_concerns": [
    "Payment retry logic is implemented in controller layer. Consider moving to a dedicated service class for reusability."
  ],
  "breaking_changes": [],
  "test_recommendations": [
    "Add integration test for payment gateway timeout scenario",
    "Add edge case test for zero-amount transactions (currently uncovered)"
  ]
}
```

### Examples of Good vs Bad Responses

** BAD RESPONSE (Vague, Unhelpful):**
```json
{
  "recommendation": "request_changes",
  "issues": [
    {
      "line": 47,
      "severity": "high",
      "description": "Security issue detected",
      "suggested_fix": "Fix the security problem"
    }
  ]
}
```
**Problems:** No specific vulnerability type, no evidence, no actionable fix, no confidence score

** GOOD RESPONSE (Specific, Actionable):**
```json
{
  "recommendation": "request_changes",
  "confidence": 0.97,
  "issues": [
    {
      "file": "payment/processor.py",
      "line": 47,
      "type": "security",
      "severity": "blocking",
      "title": "SQL Injection via String Formatting",
      "evidence": "query = f\"SELECT * FROM transactions WHERE user_id = {user_id}\"",
      "impact": "Attacker can inject: user_id='1 OR 1=1; DROP TABLE transactions--' to execute arbitrary SQL",
      "suggested_fix": "Use parameterized query: cursor.execute('SELECT * FROM transactions WHERE user_id = %s', (user_id,))",
      "references": ["CWE-89", "OWASP A03:2021"],
      "test_case": "assert_raises(ValueError, process_payment, user_id=\"1'; DROP TABLE--\")",
      "confidence": 0.97
    }
  ]
}
```

### Error Handling Instructions

**If code uses unfamiliar libraries:**
```
1. Identify the library and version from imports
2. Mark confidence as <0.6 for library-specific issues
3. Add to output: "review_caveats": ["Limited knowledge of library X v2.3 - recommend specialist review"]
4. Focus on language-agnostic issues (logic, security, performance patterns)
```

**If context is insufficient:**
```json
{
  "status": "incomplete_review",
  "missing_context": [
    "Unable to access database schema - cannot validate query correctness",
    "No test files provided - cannot assess test coverage"
  ],
  "partial_review": { /* issues found with available context */ },
  "recommendation": "needs_discussion"
}
```

**If confidence threshold not met:**
```
IF confidence < 0.7 for blocking issues:
  - Escalate to human reviewer
  - Include in output: "requires_human_review": true, "escalation_reason": "Complex async pattern - needs architecture review"
```

---

## Prompt 2: Automated Testing Strategy Generator

### System Role Definition
```yaml
role: Test Automation Architect & Quality Engineer
expertise:
  - Test pyramid principles (unit > integration > e2e)
  - Coverage analysis and quality metrics
  - Test design patterns (AAA, Given-When-Then)
  - Performance and load testing
  - Mutation testing and fault injection
tone: Methodical, thoroughness-focused
output_style: Structured test plans with concrete examples
```

### Structured Input Format
```json
{
  "code_review_output": {
    "review_id": "uuid",
    "issues": [ /* from previous step */ ],
    "architectural_concerns": [],
    "risk_areas": ["authentication", "payment_processing"]
  },
  "code_changes": {
    "new_functions": [
      {
        "name": "process_refund",
        "file": "payment/processor.py",
        "lines": "156-203",
        "complexity": 8,
        "branches": 12,
        "external_dependencies": ["stripe_api", "database", "email_service"]
      }
    ],
    "modified_functions": [],
    "deleted_functions": []
  },
  "existing_tests": {
    "test_files": ["tests/test_payment.py"],
    "coverage": {
      "lines": 73,
      "branches": 68,
      "functions": 82
    },
    "recent_failures": ["test_concurrent_payments - flaky, passes 60% of time"]
  },
  "project_config": {
    "test_framework": "pytest",
    "language": "Python 3.11",
    "performance_requirements": {
      "p99_latency_ms": 200,
      "max_database_queries_per_request": 5
    }
  }
}
```

### Expected Output Format
```json
{
  "test_strategy_id": "uuid",
  "overall_strategy": {
    "test_types_required": ["unit", "integration", "security", "performance"],
    "estimated_test_count": 27,
    "estimated_execution_time_sec": 45,
    "coverage_targets": {
      "lines": 85,
      "branches": 80,
      "functions": 90,
      "critical_paths": 100
    }
  },
  "unit_tests": [
    {
      "test_name": "test_process_refund_success_full_amount",
      "target_function": "process_refund",
      "file": "tests/unit/test_payment_processor.py",
      "purpose": "Verify successful full refund with valid transaction ID",
      "test_type": "happy_path",
      "priority": "critical",
      "pseudocode": "# Arrange\nmock_transaction = create_mock_transaction(amount=100, status='completed')\nmock_stripe.refund.return_value = {'status': 'succeeded', 'id': 'ref_123'}\n\n# Act\nresult = processor.process_refund(transaction_id='txn_123', amount=100)\n\n# Assert\nassert result.status == 'refunded'\nassert result.refund_id == 'ref_123'\nassert mock_stripe.refund.called_once_with(charge_id='ch_123', amount=10000)\nassert mock_db.commit.called_once()",
      "edge_cases_covered": ["full_refund"],
      "dependencies_to_mock": ["stripe_api", "database"]
    },
    {
      "test_name": "test_process_refund_partial_amount",
      "target_function": "process_refund",
      "purpose": "Verify partial refund calculation and processing",
      "test_type": "edge_case",
      "priority": "high",
      "pseudocode": "# Test partial refund of $30 from $100 transaction\n# Assert: remaining balance = $70, refund status = 'partial'"
    },
    {
      "test_name": "test_process_refund_invalid_transaction_id",
      "target_function": "process_refund",
      "purpose": "Verify error handling for non-existent transaction",
      "test_type": "error_case",
      "priority": "high",
      "pseudocode": "# Arrange: mock_db.query returns None\n# Assert: raises TransactionNotFoundError with message 'Transaction txn_invalid not found'"
    },
    {
      "test_name": "test_process_refund_already_refunded",
      "target_function": "process_refund",
      "purpose": "Prevent double refunds (idempotency check)",
      "test_type": "business_logic",
      "priority": "critical",
      "pseudocode": "# Arrange: transaction with status='refunded'\n# Assert: raises RefundError with message 'Transaction already refunded'"
    },
    {
      "test_name": "test_process_refund_exceeds_original_amount",
      "target_function": "process_refund",
      "purpose": "Prevent refunding more than original charge",
      "test_type": "validation",
      "priority": "critical",
      "pseudocode": "# Arrange: transaction amount=$100, refund_request=$150\n# Assert: raises ValidationError('Refund amount exceeds original transaction')"
    }
  ],
  "integration_tests": [
    {
      "test_name": "test_refund_end_to_end_with_real_database",
      "scope": "payment_processor + database",
      "purpose": "Verify database transaction rollback on Stripe API failure",
      "priority": "high",
      "pseudocode": "# Setup: real test database with seeded transaction\n# Arrange: mock Stripe API to return error\n# Act: call process_refund\n# Assert: database transaction rolled back, original status unchanged\n# Cleanup: rollback test database"
    },
    {
      "test_name": "test_refund_triggers_email_notification",
      "scope": "payment_processor + email_service",
      "purpose": "Verify customer receives refund confirmation email",
      "priority": "medium",
      "pseudocode": "# Assert: email_service.send called with template='refund_confirmation', recipient=customer.email"
    }
  ],
  "security_tests": [
    {
      "test_name": "test_refund_authorization_user_owns_transaction",
      "purpose": "Prevent unauthorized refunds (IDOR vulnerability)",
      "attack_vector": "User A attempts to refund User B's transaction",
      "priority": "critical",
      "pseudocode": "# Arrange: user_a authenticated, transaction belongs to user_b\n# Act: process_refund(transaction_id=user_b_txn, user=user_a)\n# Assert: raises UnauthorizedError('Cannot refund transaction owned by another user')"
    },
    {
      "test_name": "test_refund_sql_injection_transaction_id",
      "purpose": "Verify parameterized queries prevent SQL injection",
      "attack_vector": "Malicious transaction_id with SQL payload",
      "priority": "critical",
      "pseudocode": "# Act: process_refund(transaction_id=\"1' OR '1'='1\")\n# Assert: raises ValidationError or returns None (not database error)"
    }
  ],
  "performance_tests": [
    {
      "test_name": "test_concurrent_refunds_race_condition",
      "purpose": "Verify database locking prevents double-refund race condition",
      "priority": "high",
      "pseudocode": "# Arrange: single transaction\n# Act: spawn 10 concurrent threads calling process_refund\n# Assert: only 1 succeeds, 9 raise 'Already refunded' error\n# Measure: assert total execution time < 500ms"
    },
    {
      "test_name": "test_refund_database_query_count",
      "purpose": "Verify refund operation uses ≤3 database queries",
      "priority": "medium",
      "pseudocode": "# Use pytest-db-query-counter plugin\n# Assert: query_count <= 3 (1=select transaction, 2=update status, 3=insert audit log)"
    }
  ],
  "test_data_requirements": [
    {
      "entity": "Transaction",
      "scenarios": [
        {"status": "completed", "amount": 100, "description": "Standard refundable transaction"},
        {"status": "refunded", "amount": 50, "description": "Already refunded (idempotency test)"},
        {"status": "pending", "amount": 200, "description": "Non-refundable status"}
      ]
    }
  ],
  "mutation_testing_recommendations": [
    "Change '>' to '>=' in amount validation (line 167) - test should catch",
    "Remove database commit call (line 189) - integration test should catch",
    "Change status from 'refunded' to 'pending' (line 195) - assertion should catch"
  ],
  "ci_pipeline_config": {
    "test_execution_order": ["unit", "integration", "security", "performance"],
    "failure_handling": "fail_fast on critical tests, continue on medium/low",
    "parallel_execution": {
      "unit_tests": "max_workers=4",
      "integration_tests": "sequential (database conflicts)"
    },
    "coverage_gates": {
      "minimum_total": 80,
      "minimum_new_code": 90,
      "block_pr_if_below": true
    }
  }
}
```

### Examples of Good vs Bad Test Strategies

** BAD TEST PLAN:**
```json
{
  "tests": [
    {"name": "test_refund", "type": "unit"},
    {"name": "test_refund_error", "type": "unit"}
  ]
}
```
**Problems:** No test details, vague names, missing edge cases, no assertions

** GOOD TEST PLAN:**
```json
{
  "tests": [
    {
      "name": "test_process_refund_exceeds_transaction_amount_raises_validation_error",
      "type": "unit",
      "priority": "critical",
      "covers_code_lines": "167-170",
      "covers_branch": "amount_validation_failure",
      "pseudocode": "assert_raises(ValidationError, process_refund, amount=150, transaction_amount=100)",
      "rationale": "Addresses ISS-004 from code review (missing amount validation)"
    }
  ]
}
```

### Error Handling Instructions

**If insufficient context:**
```json
{
  "status": "incomplete_strategy",
  "reason": "Cannot generate performance tests - no baseline metrics provided",
  "generated_tests": { /* partial output */ },
  "required_context": ["historical_p99_latency", "expected_load_rps"]
}
```

**If test generation confidence is low:**
```
IF code uses async/await patterns AND no existing async tests:
  - Generate basic test structure only
  - Add warning: "requires_specialist_review": "Async test patterns need verification"
```

---

### Question 2.2: Handling Challenging Scenarios

#### Scenario 1: Code Using Obscure Libraries/Frameworks

**Prompt Enhancement:**
```yaml
system_instruction: |
  When encountering unfamiliar libraries:
  
  1. IDENTIFY: Extract library name, version from imports
     Example: "from obscure_lib import SomeClass" → Research 'obscure_lib'
  
  2. INFER: Analyze usage patterns in context
     - What parameters are passed? (suggests purpose)
     - What's the return value used for?
     - Are there error handlers? (indicates failure modes)
  
  3. APPLY GENERAL PRINCIPLES:
     - Input validation (is user data sanitized before passing to lib?)
     - Error handling (are exceptions from lib caught?)
     - Resource management (are connections/files properly closed?)
     - Performance (is lib called in loops? cached?)
  
  4. FLAG UNCERTAINTY:
     Output: {
       "library_specific_review": {
         "library": "obscure_lib v2.3",
         "confidence": 0.45,
         "general_observations": ["Input validation missing", "No error handling"],
         "requires_specialist": true,
         "specialist_type": "obscure_lib expert",
         "research_links": ["https://obscure-lib-docs.com"]
       }
     }
  
  5. COMPENSATE: Focus on surrounding code quality
     - Review how results from lib are used
     - Check integration patterns
     - Validate configuration

example_input:
  code: |
    from hl7apy import parse_message
    msg = parse_message(raw_hl7_data)  # HL7 medical data format
    patient_id = msg.PID.PID_3.value

example_output:
  - "Low confidence (0.4) on HL7-specific validation - recommend healthcare IT specialist review"
  - "General concern: raw_hl7_data not validated before parsing (potential crash on malformed input)"
  - "Recommendation: Wrap in try/except with specific error message for data quality issues"
```

---

#### Scenario 2: Security Reviews for Code

**Specialized Security Review Prompt:**
```yaml
system_role: "Security Engineer (OWASP, SANS Top 25, CWE specialist)"

security_checklist:
  authentication:
    - "Are credentials ever logged or stored in plaintext?"
    - "Is multi-factor authentication enforced for sensitive operations?"
    - "Are session tokens securely generated (cryptographically random, sufficient entropy)?"
  
  authorization:
    - "Is there object-level authorization (IDOR prevention)?"
    - "Are role checks performed server-side (not just client-side)?"
    - "Is the principle of least privilege applied?"
  
  input_validation:
    - "Is all user input validated against an allowlist (not just blocklist)?"
    - "Are SQL queries parameterized (preventing injection)?"
    - "Is file upload type checked via content analysis (not just extension)?"
  
  data_protection:
    - "Is sensitive data encrypted at rest (using AES-256 or better)?"
    - "Is TLS enforced for data in transit (minimum TLS 1.2)?"
    - "Are encryption keys stored securely (not hardcoded)?"
  
  cryptography:
    - "Are strong algorithms used (avoid MD5, SHA1, DES)?"
    - "Is randomness cryptographically secure (secrets.token_bytes, not random.random)?"
    - "Are passwords hashed with adaptive algorithms (bcrypt, Argon2, PBKDF2)?"
  
  error_handling:
    - "Do error messages avoid leaking sensitive information (stack traces, DB schema)?"
    - "Are errors logged with sufficient detail for forensics?"
  
  dependencies:
    - "Are all dependencies scanned for known vulnerabilities (CVEs)?"
    - "Is dependency pinning used (exact versions, not ranges)?"

output_format:
  security_issues:
    - cwe_id: "CWE-89"
      owasp_category: "A03:2021 - Injection"
      severity: "critical"
      attack_scenario: "Attacker sends user_id='1 OR 1=1' to bypass authentication"
      exploitability: "easy (requires only HTTP client)"
      impact: "Complete database compromise, PII exposure"
      remediation: "Use parameterized queries: cursor.execute('SELECT * FROM users WHERE id=%s', (user_id,))"
      verification: "Test with payload: user_id=\"1' OR '1'='1\""
```

**Example Security Review Output:**
```json
{
  "security_assessment": {
    "overall_risk": "HIGH",
    "critical_issues": 2,
    "exploitability_score": 8.5,
    "owasp_categories_violated": ["A03:Injection", "A07:Identification and Authentication Failures"]
  },
  "critical_findings": [
    {
      "id": "SEC-001",
      "cwe": "CWE-798",
      "title": "Hardcoded Database Credentials",
      "evidence": "db_password = 'admin123'  # Line 23",
      "attack_scenario": "Source code leaked via GitHub → attacker gains direct database access",
      "business_impact": "Complete data breach, regulatory fines (GDPR: up to 4% revenue)",
      "remediation": "Use environment variables: db_password = os.getenv('DB_PASSWORD')\nStore in vault (AWS Secrets Manager, HashiCorp Vault)",
      "immediate_action": "BLOCK DEPLOYMENT - Rotate database password immediately"
    }
  ]
}
```

---

#### Scenario 3: Performance Analysis of Database Queries

**Database Performance Review Prompt:**
```yaml
system_role: "Database Performance Engineer & Query Optimizer"

analysis_framework:
  1_query_pattern_analysis:
    n_plus_one:
      detection: "Loop contains database query (for x in items: db.query(...))"
      impact: "O(n) queries instead of O(1)"
      fix: "Eager loading: db.query(Parent).options(joinedload(Parent.children))"
    
    missing_indexes:
      detection: "WHERE clause on non-indexed column"
      impact: "Full table scan (O(n) instead of O(log n))"
      fix: "CREATE INDEX idx_users_email ON users(email)"
    
    select_star:
      detection: "SELECT * FROM large_table"
      impact: "Transfers unnecessary data, cache pollution"
      fix: "SELECT id, name, email FROM users"
    
    cartesian_product:
      detection: "JOIN without ON clause or multiple tables without relationships"
      impact: "Result set = table1_rows × table2_rows (exponential growth)"
      fix: "Add proper JOIN conditions"
  
  2_execution_plan_inference:
    - "Estimate query complexity based on JOINs, subqueries, aggregations"
    - "Identify potential table scans"
    - "Flag correlated subqueries (execute once per row)"
  
  3_scalability_analysis:
    - "How does query perform with 10x data? 100x?"
    - "Are there unbounded result sets (missing LIMIT)?"
    - "Is pagination implemented correctly (keyset vs offset)?"

output_format:
  performance_issues:
    - type: "N+1 Query"
      location: "user_service.py:45-52"
      current_complexity: "O(n) - 1 query + n queries in loop"
      optimized_complexity: "O(1) - single query with JOIN"
      impact_at_scale:
        current: "100 users = 101 queries, ~2000ms"
        optimized: "100 users = 1 query, ~50ms"
      code_diff: |
        - for user in users:
        -     orders = db.query(Order).filter_by(user_id=user.id).all()
        + users_with_orders = db.query(User).options(joinedload(User.orders)).all()
```

**Example Output:**
```json
{
  "query_performance_analysis": {
    "query_id": "get_user_dashboard_data",
    "current_performance": {
      "estimated_time_ms": 1500,
      "database_calls": 8,
      "data_transferred_kb": 420,
      "scalability": "poor - degrades linearly with user count"
    },
    "issues": [
      {
        "type": "N+1 Query Pattern",
        "severity": "critical",
        "location": "line 67-72",
        "detection_evidence": "for order in orders: customer = db.query(Customer).get(order.customer_id)",
        "performance_impact": "10 orders = 11 queries. 1000 orders = 1001 queries.",
        "fix": "Use eager loading: orders = db.query(Order).options(joinedload(Order.customer)).all()",
        "expected_improvement": "1001 queries → 1 query (99.9% reduction)"
      }
    ],
    "optimized_performance": {
      "estimated_time_ms": 120,
      "database_calls": 2,
      "improvement_percentage": 92
    }
  }
}
```

---

#### Scenario 4: Legacy Code Modifications

**Legacy Code Review Prompt:**
```yaml
system_role: "Legacy System Specialist (brownfield projects, technical debt management)"

legacy_code_principles:
  1_risk_assessment:
    - "Is this code currently working in production? (if yes, changes are HIGH RISK)"
    - "Are there existing tests? (test coverage indicates safe refactoring zone)"
    - "When was this last modified? (age indicates brittleness)"
    - "Are there recent production incidents related to this code?"
  
  2_change_impact_analysis:
    - "What's the blast radius? (how many callers/dependents?)"
    - "Are there hidden dependencies? (globals, side effects)"
    - "Is behavior documented anywhere? (comments, specs, tests)"
  
  3_legacy_specific_checks:
    preserve_existing_behavior:
      - "Do changes maintain backward compatibility?"
      - "Are there undocumented edge cases being handled?"
      - "Could this break existing integrations?"
    
    technical_debt_awareness:
      - "Is new code following old bad patterns (consistency) or new good patterns (improvement)?"
      - "Recommendation: Isolate new code in separate functions to avoid contaminating legacy"
    
    testing_strategy:
      - "CRITICAL: Characterization tests before refactoring (capture current behavior)"
      - "Add integration tests (unit tests may miss system-level issues)"
  
  4_migration_path:
    - "Can this be done incrementally? (strangler fig pattern)"
    - "Is feature flagging possible? (gradual rollout)"

output_additions:
  legacy_risk_assessment:
    risk_level: "high|medium|low"
    risk_factors: ["no existing tests", "undocumented behavior", "last modified 5 years ago"]
    mitigation_strategy: "Add characterization tests capturing current behavior before proceeding"
  
  backward_compatibility_check:
    breaking_changes: []
    migration_required: false
  
  recommended_approach:
    - "Phase 1: Add tests around existing behavior (no code changes)"
    - "Phase 2: Refactor with test safety net"
    - "Phase 3: Deploy with feature flag, monitor for anomalies"
```

---

### Question 2.3: Ensuring Prompt Effectiveness & Consistency

#### Strategy 1: Automated Prompt Testing Framework

```python
class PromptValidator:
    """Validates AI prompt outputs for consistency and quality"""
    
    def __init__(self):
        self.test_cases = self.load_golden_dataset()
        self.metrics = MetricsTracker()
    
    def load_golden_dataset(self):
        """Curated set of code samples with expert-reviewed expected outputs"""
        return [
            {
                "input": {
                    "code": "<!-- SQL injection example -->",
                    "context": {...}
                },
                "expected_output": {
                    "must_detect": ["SQL injection", "CWE-89"],
                    "severity": "critical|blocking",
                    "confidence": ">0.9"
                },
                "test_type": "security_detection"
            },
            # 50+ test cases covering various scenarios
        ]
    
    def validate_prompt_version(self, prompt_version: str) -> ValidationReport:
        """Run all test cases against prompt version"""
        results = []
        
        for test_case in self.test_cases:
            ai_response = self.call_ai_with_prompt(
                prompt_version,
                test_case["input"]
            )
            
            validation = self.compare_with_expected(
                ai_response,
                test_case["expected_output"]
            )
            
            results.append({
                "test_id": test_case["id"],
                "passed": validation.passed,
                "discrepancies": validation.diffs,
                "confidence_score": ai_response.get("confidence", 0)
            })
        
        return ValidationReport(
            prompt_version=prompt_version,
            pass_rate=sum(r["passed"] for r in results) / len(results),
            avg_confidence=mean(r["confidence_score"] for r in results),
            failed_tests=[r for r in results if not r["passed"]],
            recommendation="approve" if pass_rate > 0.95 else "revise"
        )
```

#### Strategy 2: Continuous Monitoring & Feedback Loop

```python
class PromptPerformanceMonitor:
    """Tracks real-world prompt performance over time"""
    
    def track_review_quality(self, ai_review, human_review):
        """Compare AI reviews with human reviews (ground truth)"""
        metrics = {
            "precision": self.calculate_precision(ai_review, human_review),
            "recall": self.calculate_recall(ai_review, human_review),
            "false_positive_rate": self.count_false_positives(ai_review, human_review),
            "false_negative_rate": self.count_false_negatives(ai_review, human_review),
            "severity_accuracy": self.compare_severity_ratings(ai_review, human_review),
            "time_saved_hours": human_review.time_spent - ai_review.processing_time
        }
        
        self.log_metrics(metrics)
        
        # Alert if performance degraded
        if metrics["false_negative_rate"] > 0.10:  # Missing >10% of issues
            self.alert("AI review quality degraded - missing critical issues")
            self.trigger_prompt_review()
    
    def track_deployment_outcomes(self, deployment_id):
        """Correlate AI predictions with actual deployment results"""
        prediction = self.get_deployment_prediction(deployment_id)
        actual_outcome = self.monitor_deployment_for_24h(deployment_id)
        
        accuracy = {
            "predicted_risk": prediction.risk_level,
            "actual_incidents": actual_outcome.incidents_count,
            "prediction_correct": prediction.risk_level == actual_outcome.risk_level,
            "false_alarm": prediction.risk == "high" and actual_outcome.risk == "low"
        }
        
        self.store_outcome(deployment_id, accuracy)
        
        # Use outcomes to retrain risk assessment model
        if self.accumulated_samples > 100:
            self.retrain_risk_predictor()
```

#### Strategy 3: Structured Output Validation

```python
from pydantic import BaseModel, Field, validator

class CodeReviewOutput(BaseModel):
    """Enforces strict output schema for consistency"""
    
    review_id: str
    confidence: float = Field(ge=0.0, le=1.0)
    recommendation: Literal["approve", "request_changes", "needs_discussion"]
    issues: List[ReviewIssue]
    
    @validator("issues")
    def validate_critical_issues_have_fixes(cls, issues):
        """Ensure all blocking issues have suggested fixes"""
        for issue in issues:
            if issue.severity == "blocking" and not issue.suggested_fix:
                raise ValueError(f"Blocking issue {issue.id} missing suggested_fix")
        return issues
    
    @validator("confidence")
    def validate_confidence_matches_complexity(cls, confidence, values):
        """Low confidence on simple issues is suspicious"""
        if values.get("estimated_complexity", 0) < 3 and confidence < 0.8:
            raise ValueError("Suspiciously low confidence on simple code")
        return confidence

# Usage
try:
    validated_output = CodeReviewOutput(**ai_raw_response)
except ValidationError as e:
    log_error("AI output failed validation", errors=e.errors())
    trigger_prompt_revision()
```

#### Strategy 4: A/B Testing Prompt Variations

```python
class PromptExperiment:
    """Compare multiple prompt variations to find optimal version"""
    
    def run_ab_test(self, prompt_a: str, prompt_b: str, sample_size: int = 100):
        """Split traffic between two prompt versions"""
        results = {"A": [], "B": []}
        
        for i in range(sample_size):
            pr = self.get_random_pr_from_queue()
            
            # Randomly assign to A or B
            variant = "A" if random.random() < 0.5 else "B"
            prompt = prompt_a if variant == "A" else prompt_b
            
            ai_review = self.generate_review(pr, prompt)
            human_review = self.get_human_review(pr)  # Ground truth
            
            accuracy = self.calculate_accuracy(ai_review, human_review)
            processing_time = ai_review.duration_seconds
            
            results[variant].append({
                "accuracy": accuracy,
                "time": processing_time,
                "developer_satisfaction": self.get_feedback(pr.author, ai_review)
            })
        
        # Statistical analysis
        winner = self.statistical_test(results["A"], results["B"])
        
        return ABTestReport(
            prompt_a_performance=mean(results["A"]),
            prompt_b_performance=mean(results["B"]),
            winner=winner,
            confidence_interval=0.95,
            recommendation=f"Deploy prompt {winner} (statistically significant improvement)"
        )
```

#### Strategy 5: Human-in-the-Loop Validation

```yaml
validation_workflow:
  1_sampling:
    - "Randomly sample 5% of AI reviews for human verification"
    - "Always sample: low-confidence reviews (<0.7), critical security issues, large PRs (>500 lines)"
  
  2_expert_review:
    - "Senior engineers review sampled AI outputs"
    - "Rate on scale: Excellent, Good, Acceptable, Poor, Dangerous"
    - "Flag discrepancies (missed issues, false positives)"
  
  3_feedback_incorporation:
    - "Poor/Dangerous ratings trigger immediate prompt revision"
    - "Collect examples of good vs bad outputs → add to test suite"
    - "Update golden dataset with new edge cases"
  
  4_continuous_improvement:
    - "Monthly prompt review based on accumulated feedback"
    - "Track improvement trend: target 95% 'Good' or better ratings"
```

#### Strategy 6: Confidence Calibration

```python
def calibrate_confidence_scores():
    """Ensure confidence scores match actual accuracy"""
    
    # Collect historical data
    predictions = db.query("""
        SELECT confidence_score, human_validated_correct
        FROM ai_reviews
        WHERE human_review_completed = true
    """)
    
    # Group by confidence buckets
    buckets = {
        "0.9-1.0": {"predicted_confidence": 0.95, "actual_accuracy": 0.0},
        "0.8-0.9": {"predicted_confidence": 0.85, "actual_accuracy": 0.0},
        # ...
    }
    
    for pred in predictions:
        bucket = get_bucket(pred.confidence_score)
        buckets[bucket]["count"] += 1
        if pred.human_validated_correct:
            buckets[bucket]["correct"] += 1
    
    # Calculate actual accuracy per bucket
    for bucket in buckets.values():
        bucket["actual_accuracy"] = bucket["correct"] / bucket["count"]
    
    # Check calibration
    calibration_error = mean(
        abs(b["predicted_confidence"] - b["actual_accuracy"])
        for b in buckets.values()
    )
    
    if calibration_error > 0.1:
        alert("Confidence scores are poorly calibrated - AI is overconfident")
        # Apply calibration correction or retrain
```

**Target Metrics for Prompt Effectiveness:**
- **Precision:** >90% (low false positive rate)
- **Recall:** >90% (catches most issues)
- **Confidence Calibration:** <0.05 error (confidence scores match reality)
- **Consistency:** <5% variance on repeated same-input tests
- **Processing Time:** <10 minutes per review
- **Human Agreement:** >85% agreement with senior engineer reviews

---

## Part C: System Architecture & Reusability (25 points)

**Question 3.1:** How would you make this system reusable across different projects/teams? Consider:
- Configuration management
- Language/framework variations
- Different deployment targets (cloud providers, on-prem)
- Team-specific coding standards
- Industry-specific compliance requirements

**Question 3.2:** How would the system get better over time based on:
- False positive/negative rates in reviews
- Deployment success/failure patterns
- Developer feedback
- Production incident correlation

## Response Part C:

### Question 3.1: Making the System Reusable Across Projects/Teams

---

## 1. Configuration Management Architecture

### Hierarchical Configuration Strategy
```yaml
# Structure: Organization → Team → Project → Environment
config_hierarchy:
  organization:  # Broadest scope
    default_policies: "org-wide-security.yaml"
    compliance_frameworks: ["SOC2", "GDPR"]
    approved_tools: ["GitHub", "Jenkins", "AWS"]
  
  team:  # Team-specific overrides
    coding_standards: "team-python-style.yaml"
    review_thresholds:
      min_reviewers: 2
      require_security_review_if: "touches_authentication"
    deployment_windows:
      production: ["tue-thu 10am-4pm EST"]
  
  project:  # Project-specific configuration
    language: "Python 3.11"
    framework: "FastAPI"
    test_framework: "pytest"
    deployment_target: "AWS ECS"
    performance_requirements:
      p99_latency_ms: 200
      error_rate_max: 0.01
  
  environment:  # Environment-specific overrides
    dev:
      auto_deploy: true
      require_approval: false
    staging:
      auto_deploy: true
      require_approval: "tech_lead"
    production:
      auto_deploy: false
      require_approval: ["tech_lead", "senior_engineer"]
      rollback_enabled: true
```

### Configuration Schema Definition
```python
from pydantic import BaseModel, Field
from typing import Literal, Optional, List, Dict

class ReviewPolicyConfig(BaseModel):
    """Defines code review policies"""
    min_reviewers: int = Field(ge=1, default=1)
    require_security_review: bool = False
    security_review_triggers: List[str] = [
        "authentication", "payment", "pii_handling"
    ]
    auto_approve_threshold: Optional[float] = Field(ge=0.0, le=1.0, default=None)
    block_on_critical_issues: bool = True
    max_review_time_hours: int = 24

class DeploymentConfig(BaseModel):
    """Deployment strategy configuration"""
    target_platform: Literal["AWS", "GCP", "Azure", "on-prem", "kubernetes"]
    strategy: Literal["blue_green", "canary", "rolling", "recreate"]
    canary_percentage: int = Field(ge=0, le=100, default=10)
    health_check_url: str
    rollback_on_error_rate: float = Field(ge=0.0, le=1.0, default=0.05)
    environments: List[str] = ["dev", "staging", "production"]

class LanguageConfig(BaseModel):
    """Language/framework specific settings"""
    language: str
    version: str
    framework: Optional[str] = None
    package_manager: str  # npm, pip, maven, etc.
    linter: str  # eslint, pylint, checkstyle, etc.
    linter_config_path: str  # .eslintrc.json, .pylintrc, etc.
    test_framework: str  # jest, pytest, junit, etc.
    test_command: str  # "npm test", "pytest", etc.
    build_command: str

class ProjectConfig(BaseModel):
    """Top-level project configuration"""
    project_id: str
    team_id: str
    language_config: LanguageConfig
    review_policy: ReviewPolicyConfig
    deployment_config: DeploymentConfig
    compliance_requirements: List[str] = []
    custom_rules: Dict[str, any] = {}
```

### Configuration Loading with Inheritance
```python
class ConfigurationManager:
    """Manages hierarchical configuration with inheritance"""
    
    def __init__(self, config_store: ConfigStore):
        self.store = config_store
        self.cache = {}
    
    def get_project_config(self, project_id: str) -> ProjectConfig:
        """Loads config with hierarchy: org → team → project"""
        
        # Load all levels
        org_config = self.store.get_org_config()
        team_config = self.store.get_team_config(project_id)
        project_config = self.store.get_project_config(project_id)
        
        # Merge with precedence: project > team > org
        merged = self._deep_merge(
            org_config,
            team_config,
            project_config
        )
        
        # Validate against schema
        return ProjectConfig(**merged)
    
    def _deep_merge(self, *configs):
        """Deep merge configs with right-most taking precedence"""
        result = {}
        for config in configs:
            for key, value in config.items():
                if isinstance(value, dict) and key in result:
                    result[key] = self._deep_merge(result[key], value)
                else:
                    result[key] = value
        return result

# Example usage
config_manager = ConfigurationManager(config_store)
project_config = config_manager.get_project_config("payment-service")

# Apply config to review pipeline
review_agent = CodeReviewAgent(
    language=project_config.language_config.language,
    framework=project_config.language_config.framework,
    coding_standards=project_config.review_policy.coding_standards_url,
    security_enabled=project_config.review_policy.require_security_review
)
```

---

## 2. Language/Framework Variation Handling

### Plugin-Based Architecture
```python
from abc import ABC, abstractmethod

class LanguagePlugin(ABC):
    """Base class for language-specific implementations"""
    
    @abstractmethod
    def parse_code(self, code: str) -> AST:
        """Parse code into Abstract Syntax Tree"""
        pass
    
    @abstractmethod
    def run_linter(self, files: List[str]) -> LintResults:
        """Run language-specific linter"""
        pass
    
    @abstractmethod
    def run_tests(self, test_command: str) -> TestResults:
        """Execute tests"""
        pass
    
    @abstractmethod
    def analyze_dependencies(self, manifest_file: str) -> List[Dependency]:
        """Parse dependency manifest"""
        pass
    
    @abstractmethod
    def get_security_rules(self) -> List[SecurityRule]:
        """Language-specific security patterns"""
        pass

class PythonPlugin(LanguagePlugin):
    def parse_code(self, code: str) -> AST:
        import ast
        return ast.parse(code)
    
    def run_linter(self, files: List[str]) -> LintResults:
        # Run pylint, flake8, black
        return subprocess.run(["pylint"] + files, capture_output=True)
    
    def run_tests(self, test_command: str) -> TestResults:
        return subprocess.run(test_command.split(), capture_output=True)
    
    def analyze_dependencies(self, manifest_file: str) -> List[Dependency]:
        # Parse requirements.txt or pyproject.toml
        with open(manifest_file) as f:
            return [Dependency.from_requirement(line) for line in f]
    
    def get_security_rules(self) -> List[SecurityRule]:
        return [
            SecurityRule(
                id="PY-001",
                pattern=r"pickle\.loads\(",
                message="Avoid pickle.loads() - can execute arbitrary code",
                severity="high"
            ),
            SecurityRule(
                id="PY-002",
                pattern=r"eval\(|exec\(",
                message="Avoid eval/exec - code injection risk",
                severity="critical"
            ),
            # ... more Python-specific rules
        ]

class JavaScriptPlugin(LanguagePlugin):
    def parse_code(self, code: str) -> AST:
        # Use esprima or acorn
        return esprima.parseScript(code)
    
    def run_linter(self, files: List[str]) -> LintResults:
        return subprocess.run(["eslint"] + files, capture_output=True)
    
    # ... implement other methods

class PluginRegistry:
    """Manages language plugins"""
    
    def __init__(self):
        self.plugins = {
            "python": PythonPlugin(),
            "javascript": JavaScriptPlugin(),
            "typescript": TypeScriptPlugin(),
            "java": JavaPlugin(),
            "go": GoPlugin(),
            "rust": RustPlugin(),
        }
    
    def get_plugin(self, language: str) -> LanguagePlugin:
        plugin = self.plugins.get(language.lower())
        if not plugin:
            raise ValueError(f"Unsupported language: {language}")
        return plugin
    
    def register_plugin(self, language: str, plugin: LanguagePlugin):
        """Allow custom language plugins"""
        self.plugins[language.lower()] = plugin

# Usage in pipeline
plugin = PluginRegistry().get_plugin(project_config.language_config.language)
lint_results = plugin.run_linter(changed_files)
dependencies = plugin.analyze_dependencies("requirements.txt")
```

---

## 3. Deployment Target Abstraction

### Cloud Provider Agnostic Interface
```python
class DeploymentProvider(ABC):
    """Abstract interface for deployment targets"""
    
    @abstractmethod
    def deploy(self, artifact: Artifact, config: DeploymentConfig) -> DeploymentResult:
        pass
    
    @abstractmethod
    def get_health(self, deployment_id: str) -> HealthStatus:
        pass
    
    @abstractmethod
    def rollback(self, deployment_id: str, target_version: str) -> RollbackResult:
        pass
    
    @abstractmethod
    def get_metrics(self, service: str, time_range: TimeRange) -> Metrics:
        pass

class AWSDeploymentProvider(DeploymentProvider):
    def __init__(self, region: str, credentials: AWSCredentials):
        self.ecs_client = boto3.client('ecs', region_name=region)
        self.elb_client = boto3.client('elbv2', region_name=region)
        self.cloudwatch = boto3.client('cloudwatch', region_name=region)
    
    def deploy(self, artifact: Artifact, config: DeploymentConfig) -> DeploymentResult:
        if config.strategy == "blue_green":
            return self._blue_green_deploy(artifact, config)
        elif config.strategy == "canary":
            return self._canary_deploy(artifact, config)
        # ... other strategies
    
    def _blue_green_deploy(self, artifact: Artifact, config: DeploymentConfig):
        # 1. Create new task definition
        new_task_def = self.ecs_client.register_task_definition(
            family=config.service_name,
            containerDefinitions=[{
                'name': config.container_name,
                'image': artifact.image_url,
                # ... other configs
            }]
        )
        
        # 2. Create new target group (green)
        green_tg = self.elb_client.create_target_group(
            Name=f"{config.service_name}-green",
            # ... config
        )
        
        # 3. Update service to use new task definition
        self.ecs_client.update_service(
            cluster=config.cluster,
            service=config.service_name,
            taskDefinition=new_task_def['taskDefinition']['taskDefinitionArn']
        )
        
        # 4. Wait for health checks
        waiter = self.ecs_client.get_waiter('services_stable')
        waiter.wait(cluster=config.cluster, services=[config.service_name])
        
        # 5. Switch traffic (blue → green)
        self.elb_client.modify_listener(
            ListenerArn=config.listener_arn,
            DefaultActions=[{
                'Type': 'forward',
                'TargetGroupArn': green_tg['TargetGroups'][0]['TargetGroupArn']
            }]
        )
        
        return DeploymentResult(success=True, deployment_id=new_task_def['taskDefinition']['taskDefinitionArn'])

class KubernetesDeploymentProvider(DeploymentProvider):
    def __init__(self, kubeconfig_path: str):
        config.load_kube_config(kubeconfig_path)
        self.apps_v1 = client.AppsV1Api()
        self.core_v1 = client.CoreV1Api()
    
    def deploy(self, artifact: Artifact, config: DeploymentConfig):
        if config.strategy == "canary":
            return self._canary_deploy_k8s(artifact, config)
        # ... other strategies
    
    def _canary_deploy_k8s(self, artifact: Artifact, config: DeploymentConfig):
        # 1. Create canary deployment (10% of replicas)
        canary_replicas = max(1, int(config.total_replicas * config.canary_percentage / 100))
        
        canary_deployment = self.apps_v1.create_namespaced_deployment(
            namespace=config.namespace,
            body={
                "metadata": {"name": f"{config.service_name}-canary"},
                "spec": {
                    "replicas": canary_replicas,
                    "selector": {"matchLabels": {"app": config.service_name, "version": "canary"}},
                    "template": {
                        "metadata": {"labels": {"app": config.service_name, "version": "canary"}},
                        "spec": {
                            "containers": [{
                                "name": config.container_name,
                                "image": artifact.image_url
                            }]
                        }
                    }
                }
            }
        )
        
        # 2. Monitor canary metrics
        metrics = self.get_metrics(f"{config.service_name}-canary", TimeRange(minutes=15))
        
        if metrics.error_rate > config.rollback_on_error_rate:
            self.rollback(canary_deployment.metadata.name, config.previous_version)
            return DeploymentResult(success=False, reason="Canary failed health checks")
        
        # 3. Promote canary to full deployment
        self.apps_v1.patch_namespaced_deployment(
            name=config.service_name,
            namespace=config.namespace,
            body={"spec": {"template": {"spec": {"containers": [{"image": artifact.image_url}]}}}}
        )
        
        return DeploymentResult(success=True)

class DeploymentProviderFactory:
    """Factory for creating deployment providers"""
    
    @staticmethod
    def create(provider_type: str, **kwargs) -> DeploymentProvider:
        providers = {
            "aws": AWSDeploymentProvider,
            "gcp": GCPDeploymentProvider,
            "azure": AzureDeploymentProvider,
            "kubernetes": KubernetesDeploymentProvider,
            "on-prem": OnPremDeploymentProvider,
        }
        
        provider_class = providers.get(provider_type.lower())
        if not provider_class:
            raise ValueError(f"Unknown provider: {provider_type}")
        
        return provider_class(**kwargs)

# Usage
provider = DeploymentProviderFactory.create(
    project_config.deployment_config.target_platform,
    region="us-east-1",
    credentials=credentials
)

result = provider.deploy(build_artifact, project_config.deployment_config)
```

---

## 4. Team-Specific Coding Standards Integration

### Standards as Code
```python
class CodingStandardsLoader:
    """Loads and applies team-specific coding standards"""
    
    def load_standards(self, team_id: str) -> CodingStandards:
        """Load standards from team configuration"""
        config_url = f"https://config-server/teams/{team_id}/standards.yaml"
        standards_data = requests.get(config_url).json()
        
        return CodingStandards(
            naming_conventions=standards_data.get("naming", {}),
            complexity_limits=standards_data.get("complexity", {}),
            custom_rules=standards_data.get("custom_rules", []),
            forbidden_patterns=standards_data.get("forbidden", []),
            required_patterns=standards_data.get("required", [])
        )
    
    def apply_to_review_prompt(self, standards: CodingStandards, base_prompt: str) -> str:
        """Inject standards into AI review prompt"""
        standards_section = f"""
        TEAM-SPECIFIC CODING STANDARDS:
        
        Naming Conventions:
        {yaml.dump(standards.naming_conventions)}
        
        Complexity Limits:
        - Max function length: {standards.complexity_limits.get('max_function_lines', 50)} lines
        - Max cyclomatic complexity: {standards.complexity_limits.get('max_complexity', 10)}
        - Max parameters: {standards.complexity_limits.get('max_parameters', 5)}
        
        Forbidden Patterns:
        {chr(10).join(f"- {pattern}" for pattern in standards.forbidden_patterns)}
        
        Required Patterns:
        {chr(10).join(f"- {pattern}" for pattern in standards.required_patterns)}
        
        Custom Rules:
        {yaml.dump(standards.custom_rules)}
        
        IMPORTANT: Flag violations of these team-specific standards as 'style' issues.
        """
        
        return base_prompt.replace("{{TEAM_STANDARDS}}", standards_section)

# Example team standards file (teams/fintech-team/standards.yaml)
"""
naming_conventions:
  classes: PascalCase
  functions: snake_case
  constants: SCREAMING_SNAKE_CASE
  private_methods: _leading_underscore

complexity_limits:
  max_function_lines: 30
  max_complexity: 8
  max_parameters: 4
  max_nesting: 3

forbidden_patterns:
  - pattern: "print\\("
    reason: "Use logging instead of print statements"
  - pattern: "time\\.sleep\\("
    reason: "Avoid blocking sleep in async code"
  - pattern: "TODO"
    reason: "Create JIRA tickets instead of TODO comments"

required_patterns:
  - pattern: "type annotations on all function signatures"
    check: "all functions must have return type hints"
  - pattern: "docstrings on all public methods"
    check: "all public methods must have docstrings"

custom_rules:
  - id: "FINTECH-001"
    description: "All monetary amounts must use Decimal, not float"
    pattern: "amount.*=.*float\\("
    severity: "critical"
  - id: "FINTECH-002"
    description: "All database writes must be within transactions"
    pattern: "db\\.commit\\(\\)"
    requires_context: "with.*transaction"
    severity: "major"
"""
```

---

## 5. Industry-Specific Compliance Requirements

### Compliance Framework Plugin System
```python
class ComplianceFramework(ABC):
    """Base class for compliance frameworks"""
    
    @abstractmethod
    def get_security_requirements(self) -> List[SecurityRequirement]:
        pass
    
    @abstractmethod
    def validate_deployment(self, deployment_plan: DeploymentPlan) -> ComplianceReport:
        pass
    
    @abstractmethod
    def get_audit_requirements(self) -> AuditRequirements:
        pass

class SOC2ComplianceFramework(ComplianceFramework):
    def get_security_requirements(self) -> List[SecurityRequirement]:
        return [
            SecurityRequirement(
                id="SOC2-CC6.1",
                category="Logical and Physical Access Controls",
                description="Implement MFA for production access",
                validation_method=self._check_mfa_enabled
            ),
            SecurityRequirement(
                id="SOC2-CC7.2",
                category="System Monitoring",
                description="Log all production deployments with approver identity",
                validation_method=self._check_deployment_logging
            ),
            # ... more requirements
        ]
    
    def validate_deployment(self, deployment_plan: DeploymentPlan) -> ComplianceReport:
        violations = []
        
        # Check: Production deployments require approval
        if deployment_plan.environment == "production":
            if not deployment_plan.approvers or len(deployment_plan.approvers) < 2:
                violations.append(Violation(
                    requirement="SOC2-CC6.8",
                    message="Production deployment requires 2+ approvals",
                    severity="critical"
                ))
        
        # Check: All changes are auditable
        if not deployment_plan.audit_trail:
            violations.append(Violation(
                requirement="SOC2-CC7.2",
                message="Deployment must include audit trail",
                severity="high"
            ))
        
        return ComplianceReport(
            framework="SOC2",
            compliant=len(violations) == 0,
            violations=violations
        )

class HIPAAComplianceFramework(ComplianceFramework):
    def get_security_requirements(self) -> List[SecurityRequirement]:
        return [
            SecurityRequirement(
                id="HIPAA-164.312(a)(1)",
                category="Access Control",
                description="Unique user identification required",
                validation_method=self._check_unique_user_ids
            ),
            SecurityRequirement(
                id="HIPAA-164.312(e)(1)",
                category="Transmission Security",
                description="Encrypt PHI in transit (TLS 1.2+)",
                validation_method=self._check_encryption_in_transit
            ),
            SecurityRequirement(
                id="HIPAA-164.308(a)(1)(ii)(D)",
                category="Information System Activity Review",
                description="Review audit logs for unauthorized access",
                validation_method=self._check_audit_log_review
            ),
            # ... more HIPAA requirements
        ]
    
    def validate_deployment(self, deployment_plan: DeploymentPlan) -> ComplianceReport:
        violations = []
        
        # Check: PHI data must be encrypted at rest
        if deployment_plan.handles_phi:
            if not deployment_plan.encryption_at_rest_enabled:
                violations.append(Violation(
                    requirement="HIPAA-164.312(a)(2)(iv)",
                    message="PHI must be encrypted at rest (AES-256)",
                    severity="critical"
                ))
        
        # Check: Access to PHI must be logged
        if deployment_plan.handles_phi:
            if not deployment_plan.audit_logging_enabled:
                violations.append(Violation(
                    requirement="HIPAA-164.312(b)",
                    message="All PHI access must be logged",
                    severity="critical"
                ))
        
        return ComplianceReport(framework="HIPAA", compliant=len(violations) == 0, violations=violations)

class PCI_DSSComplianceFramework(ComplianceFramework):
    """Payment Card Industry Data Security Standard"""
    
    def get_security_requirements(self) -> List[SecurityRequirement]:
        return [
            SecurityRequirement(
                id="PCI-DSS-6.2",
                category="Develop and maintain secure systems",
                description="All critical security patches applied within 30 days",
                validation_method=self._check_patch_compliance
            ),
            SecurityRequirement(
                id="PCI-DSS-6.5.1",
                category="Injection Flaws",
                description="Protect against SQL injection",
                validation_method=self._check_sql_injection_prevention
            ),
            # ... more PCI-DSS requirements
        ]

class ComplianceOrchestrator:
    """Applies all required compliance frameworks"""
    
    def __init__(self, frameworks: List[str]):
        self.frameworks = [self._load_framework(f) for f in frameworks]
    
    def _load_framework(self, framework_name: str) -> ComplianceFramework:
        framework_map = {
            "SOC2": SOC2ComplianceFramework(),
            "HIPAA": HIPAAComplianceFramework(),
            "PCI-DSS": PCI_DSSComplianceFramework(),
            "GDPR": GDPRComplianceFramework(),
        }
        return framework_map.get(framework_name)
    
    def validate_deployment(self, deployment_plan: DeploymentPlan) -> List[ComplianceReport]:
        """Validate against all applicable frameworks"""
        reports = []
        for framework in self.frameworks:
            report = framework.validate_deployment(deployment_plan)
            reports.append(report)
        
        return reports
    
    def get_all_security_requirements(self) -> List[SecurityRequirement]:
        """Aggregate requirements from all frameworks"""
        all_requirements = []
        for framework in self.frameworks:
            all_requirements.extend(framework.get_security_requirements())
        
        # Deduplicate by requirement ID
        return list({req.id: req for req in all_requirements}.values())

# Usage
compliance = ComplianceOrchestrator(
    frameworks=project_config.compliance_requirements  # ["SOC2", "HIPAA"]
)

compliance_reports = compliance.validate_deployment(deployment_plan)

for report in compliance_reports:
    if not report.compliant:
        print(f"{report.framework} violations detected:")
        for violation in report.violations:
            print(f"  - {violation.requirement}: {violation.message}")
        
        if any(v.severity == "critical" for v in report.violations):
            raise DeploymentBlockedError("Critical compliance violations - deployment blocked")
```

---

### Question 3.2: Continuous Improvement Through Feedback Loops

## 1. False Positive/Negative Rate Tracking

```python
class ReviewFeedbackTracker:
    """Tracks accuracy of AI code reviews"""
    
    def __init__(self, db: Database):
        self.db = db
    
    def record_human_feedback(self, review_id: str, feedback: HumanFeedback):
        """Developers/reviewers mark AI findings as correct/incorrect"""
        
        for issue in feedback.ai_issues:
            self.db.execute("""
                INSERT INTO review_feedback (
                    review_id, issue_id, ai_severity, human_verdict,
                    human_severity, feedback_comment, timestamp
                )
                VALUES (?, ?, ?, ?, ?, ?, ?)
            """, (
                review_id,
                issue.id,
                issue.ai_severity,
                feedback.verdict,  # "true_positive", "false_positive", "false_negative"
                feedback.human_severity,
                feedback.comment,
                datetime.now()
            ))
    
    def analyze_accuracy(self, time_window: timedelta = timedelta(days=30)) -> AccuracyReport:
        """Calculate precision, recall, F1 score"""
        
        results = self.db.query("""
            SELECT
                COUNT(*) FILTER (WHERE human_verdict = 'true_positive') as tp,
                COUNT(*) FILTER (WHERE human_verdict = 'false_positive') as fp,
                COUNT(*) FILTER (WHERE human_verdict = 'false_negative') as fn
            FROM review_feedback
            WHERE timestamp > ?
        """, (datetime.now() - time_window,))
        
        tp, fp, fn = results[0]
        
        precision = tp / (tp + fp) if (tp + fp) > 0 else 0
        recall = tp / (tp + fn) if (tp + fn) > 0 else 0
        f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
        
        # Analyze by category
        category_breakdown = self.db.query("""
            SELECT
                issue_category,
                COUNT(*) FILTER (WHERE human_verdict = 'true_positive') * 1.0 /
                COUNT(*) as precision
            FROM review_feedback
            WHERE timestamp > ?
            GROUP BY issue_category
        """, (datetime.now() - time_window,))
        
        # Identify problematic patterns
        high_false_positive_patterns = self.db.query("""
            SELECT issue_pattern, COUNT(*) as fp_count
            FROM review_feedback
            WHERE human_verdict = 'false_positive'
              AND timestamp > ?
            GROUP BY issue_pattern
            HAVING COUNT(*) > 5
            ORDER BY fp_count DESC
        """, (datetime.now() - time_window,))
        
        return AccuracyReport(
            precision=precision,
            recall=recall,
            f1_score=f1_score,
            category_breakdown=dict(category_breakdown),
            high_fp_patterns=high_false_positive_patterns,
            recommendation=self._generate_recommendation(precision, recall)
        )
    
    def _generate_recommendation(self, precision: float, recall: float) -> str:
        if precision < 0.85:
            return "High false positive rate - review prompt to be more conservative"
        elif recall < 0.85:
            return "High false negative rate - review prompt to be more thorough"
        else:
            return "Performance within acceptable thresholds"

class AdaptivePromptTuner:
    """Automatically adjusts prompts based on feedback"""
    
    def tune_prompt_based_on_feedback(self, accuracy_report: AccuracyReport, current_prompt: str) -> str:
        """Modify prompt to address accuracy issues"""
        
        modifications = []
        
        # If missing security issues (low recall in security category)
        if accuracy_report.category_breakdown.get("security", 1.0) < 0.85:
            modifications.append("""
            ENHANCED SECURITY FOCUS:
            - Double-check for OWASP Top 10 vulnerabilities
            - Review all user input handling
            - Verify authentication/authorization on all endpoints
            - Check for hardcoded secrets or credentials
            """)
        
        # If too many false positives in style category
        if "style" in [p.pattern for p in accuracy_report.high_fp_patterns]:
            modifications.append("""
            REDUCE STYLE FALSE POSITIVES:
            - Only flag style issues that violate documented team standards
            - Confidence threshold for style issues: minimum 0.90
            - Ignore minor formatting issues if code is consistent
            """)
        
        # Inject modifications into prompt
        enhanced_prompt = current_prompt + "\n\n" + "\n".join(modifications)
        
        return enhanced_prompt
```

## 2. Deployment Success/Failure Pattern Learning

```python
class DeploymentOutcomeAnalyzer:
    """Correlates deployment decisions with outcomes"""
    
    def record_deployment_outcome(self, deployment_id: str):
        """Track deployment for 24 hours post-deploy"""
        
        deployment = self.db.get_deployment(deployment_id)
        
        # Collect metrics over 24 hours
        metrics_24h = self.monitoring.get_metrics(
            service=deployment.service,
            time_range=TimeRange(hours=24, start=deployment.timestamp)
        )
        
        # Determine outcome
        outcome = DeploymentOutcome(
            deployment_id=deployment_id,
            success=metrics_24h.error_rate < 0.05 and metrics_24h.rollback_count == 0,
            error_rate=metrics_24h.error_rate,
            latency_p99=metrics_24h.latency_p99,
            incidents_count=metrics_24h.incidents_count,
            rollback_occurred=metrics_24h.rollback_count > 0,
            ai_predicted_risk=deployment.predicted_risk_level
        )
        
        self.db.save_outcome(outcome)
        
        # Update prediction model
        self.update_risk_predictor(deployment, outcome)
    
    def analyze_failure_patterns(self) -> FailurePatternReport:
        """Find common characteristics of failed deployments"""
        
        failed_deployments = self.db.query("""
            SELECT d.*, o.error_rate, o.incidents_count
            FROM deployments d
            JOIN deployment_outcomes o ON d.id = o.deployment_id
            WHERE o.success = false
            AND d.timestamp > ?
        """, (datetime.now() - timedelta(days=90),))
        
        # Analyze patterns
        patterns = {
            "by_pr_size": self._group_by(failed_deployments, "lines_of_code_changed"),
            "by_file_types": self._group_by(failed_deployments, "file_types_modified"),
            "by_complexity": self._group_by(failed_deployments, "cyclomatic_complexity"),
            "by_test_coverage": self._group_by(failed_deployments, "test_coverage"),
            "by_review_score": self._group_by(failed_deployments, "ai_review_confidence"),
        }
        
        # Find correlations
        correlations = {
            "large_prs_fail_more": self._correlation(patterns["by_pr_size"], "failure_rate"),
            "low_coverage_fail_more": self._correlation(patterns["by_test_coverage"], "failure_rate"),
        }
        
        return FailurePatternReport(
            patterns=patterns,
            correlations=correlations,
            recommendations=self._generate_recommendations(patterns)
        )
    
    def _generate_recommendations(self, patterns) -> List[str]:
        recommendations = []
        
        if patterns["by_pr_size"]["large"]["failure_rate"] > 0.20:
            recommendations.append(
                "PRs >500 lines have 20%+ failure rate - recommend breaking into smaller changes"
            )
        
        if patterns["by_test_coverage"]["<80%"]["failure_rate"] > 0.15:
            recommendations.append(
                "Deployments with <80% test coverage have 15%+ failure rate - enforce higher threshold"
            )
        
        return recommendations
    
    def update_risk_predictor(self, deployment: Deployment, outcome: DeploymentOutcome):
        """Retrain ML model to predict deployment risk"""
        
        # Extract features
        features = {
            "lines_of_code_changed": deployment.lines_changed,
            "files_modified": deployment.files_count,
            "test_coverage": deployment.test_coverage,
            "review_issues_count": deployment.review_issues_count,
            "critical_issues_count": deployment.critical_issues,
            "author_experience_level": deployment.author_experience,
            "time_since_last_deploy": deployment.time_since_last_deploy,
            "deployment_day_of_week": deployment.timestamp.weekday(),
        }
        
        label = 1 if outcome.success else 0
        
        # Add to training dataset
        self.training_data.append((features, label))
        
        # Retrain every 100 samples
        if len(self.training_data) % 100 == 0:
            self.retrain_model()
```

## 3. Developer Feedback Integration

```python
class DeveloperFeedbackCollector:
    """Collects and acts on developer feedback"""
    
    def collect_feedback(self, review_id: str, developer_id: str):
        """Prompt developer for feedback after review"""
        
        feedback_form = {
            "review_quality": "1-5 stars",
            "helpfulness": "1-5 stars",
            "false_positives": "List of issue IDs",
            "missed_issues": "Describe issues AI missed",
            "suggested_improvements": "Free text",
        }
        
        # Send feedback request via Slack/email
        response = self.send_feedback_request(developer_id, review_id, feedback_form)
        
        return DeveloperFeedback(**response)
    
    def analyze_sentiment(self, time_window: timedelta = timedelta(days=30)) -> SentimentReport:
        """Analyze developer satisfaction trends"""
        
        feedback = self.db.query("""
            SELECT review_quality, helpfulness, suggested_improvements
            FROM developer_feedback
            WHERE timestamp > ?
        """, (datetime.now() - time_window,))
        
        avg_quality = mean(f.review_quality for f in feedback)
        avg_helpfulness = mean(f.helpfulness for f in feedback)
        
        # Sentiment analysis on free-text feedback
        suggestions = [f.suggested_improvements for f in feedback if f.suggested_improvements]
        common_themes = self._extract_themes(suggestions)
        
        return SentimentReport(
            avg_quality_score=avg_quality,
            avg_helpfulness_score=avg_helpfulness,
            response_rate=len(feedback) / self.total_reviews_sent,
            common_improvement_themes=common_themes,
            trend="improving" if avg_quality > self.previous_period_quality else "declining"
        )
    
    def _extract_themes(self, suggestions: List[str]) -> Dict[str, int]:
        """Use NLP to extract common themes from feedback"""
        from collections import Counter
        import re
        
        # Simple keyword extraction (could use more sophisticated NLP)
        keywords = []
        for suggestion in suggestions:
            words = re.findall(r'\b\w+\b', suggestion.lower())
            keywords.extend([w for w in words if len(w) > 4])  # Words longer than 4 chars
        
        return dict(Counter(keywords).most_common(10))

## 4. Production Incident Correlation

```python
class IncidentCorrelationEngine:
    """Correlates production incidents with code changes"""
    
    def analyze_incident(self, incident_id: str):
        """When incident occurs, find related deployments"""
        
        incident = self.incident_tracker.get_incident(incident_id)
        
        # Find deployments in time window before incident
        suspect_deployments = self.db.query("""
            SELECT * FROM deployments
            WHERE service = ?
            AND timestamp BETWEEN ? AND ?
        """, (
            incident.service,
            incident.timestamp - timedelta(hours=48),
            incident.timestamp
        ))
        
        for deployment in suspect_deployments:
            # Check if AI review missed anything
            ai_review = self.get_ai_review(deployment.pr_id)
            
            # Retrospective analysis
            retrospective = self.perform_retrospective_review(
                deployment.code_changes,
                incident.root_cause,
                ai_review
            )
            
            if retrospective.ai_should_have_caught:
                self.log_missed_issue(
                    review_id=ai_review.id,
                    issue=retrospective.missed_issue,
                    incident_id=incident_id,
                    severity="critical"  # It caused production incident
                )
                
                # Update AI prompt to catch this pattern in future
                self.prompt_tuner.add_pattern_to_watch(retrospective.missed_pattern)
    
    def perform_retrospective_review(self, code_changes, root_cause, original_ai_review):
        """Re-analyze code with benefit of hindsight"""
        
        retrospective_prompt = f"""
        A production incident occurred with the following root cause:
        {root_cause}
        
        Code changes that were deployed:
        {code_changes}
        
        Original AI review found these issues:
        {original_ai_review.issues}
        
        Question: Should the AI review have caught this issue? If yes, what pattern should it look for?
        """
        
        analysis = self.ai_analyze(retrospective_prompt)
        
        return RetrospectiveAnalysis(
            ai_should_have_caught=analysis.should_have_caught,
            missed_issue=analysis.issue_description,
            missed_pattern=analysis.detection_pattern,
            suggested_prompt_enhancement=analysis.prompt_enhancement
        )
```

---

## Part D: Implementation Strategy (20 points)

**Question 4.1:** Prioritize your implementation. What would you build first? Create a 6-month roadmap with:
- MVP definition (what's the minimum viable system?)
- Pilot program strategy
- Rollout phases
- Success metrics for each phase

**Question 4.2:** Risk mitigation. What could go wrong and how would you handle:
- AI making incorrect review decisions
- System downtime during critical deployments
- Integration failures with existing tools
- Resistance from development teams
- Compliance/audit requirements

**Question 4.3:** Tool selection. What existing tools/platforms would you integrate with or build upon:
- Code review platforms (GitHub, GitLab, Bitbucket)
- CI/CD systems (Jenkins, GitHub Actions, GitLab CI)
- Monitoring tools (Datadog, New Relic, Prometheus)
- Security scanning tools (SonarQube, Snyk, Veracode)
- Communication tools (Slack, Teams, Jira)

## Response Part D:

### Question 4.1: Implementation Prioritization & 6-Month Roadmap

---

## MVP Definition (Minimum Viable System)

**Core Value Proposition:** Reduce code review time from 2-3 days to <4 hours while maintaining quality

### MVP Scope (Month 1-2)

**What's IN the MVP:**
1.  **AI-Powered Code Review Agent**
   - Static code analysis (linters, security scanners)
   - AI semantic review for one language (Python - most common)
   - Integration with GitHub PRs
   - Basic structured output (issues, severity, suggested fixes)
   - Confidence scoring

2.  **Automated Testing Pipeline**
   - Run existing tests automatically on PR submission
   - Coverage reporting
   - Pass/fail gates

3.  **Manual Deployment Orchestration**
   - Deploy to dev/staging with single command
   - Health check validation
   - Manual approval for production

4.  **Basic Metrics Dashboard**
   - Review time savings
   - Issue detection rate
   - Deployment success rate

**What's NOT in MVP (deferred to later phases):**
-  Multi-language support (add incrementally)
-  Automatic rollback (manual rollback only)
-  Advanced deployment strategies (canary, blue-green)
-  Production deployment automation
-  Compliance framework integration
-  AI test generation
-  Cross-project learning

**Success Criteria for MVP:**
- Review time reduced to <12 hours (not <4h yet, but significant improvement)
- 70%+ of developers rate review quality as "good" or better
- Zero critical security vulnerabilities deployed in pilot projects
- 90%+ deployment success rate to dev/staging

---

## Pilot Program Strategy

### Phase 1: Single Team Pilot (Month 2)

**Target:** 1 team, 1 project (8-10 developers)

**Selection Criteria:**
- Team has high PR volume (>20 PRs/week)
- Python codebase (MVP supports Python first)
- Team is tech-forward and willing to provide feedback
- Non-critical service (lower risk)

**Pilot Setup:**
```yaml
pilot_configuration:
  team: backend-platform-team
  project: user-service (Python/FastAPI)
  duration: 4 weeks
  
  success_metrics:
    - avg_review_time_hours: target <12h, baseline 48h
    - developer_satisfaction: target 4/5 stars
    - false_positive_rate: target <15%
    - issues_caught: target 80% of what human reviewers find
  
  feedback_loop:
    - weekly_retrospectives: true
    - daily_slack_feedback_channel: true
    - issue_tracker: "pilot-feedback" label in Jira
  
  safety_net:
    - human_review_override: always_available
    - rollback_plan: disable_ai_review_if_satisfaction <3/5
```

**Week-by-Week Pilot Plan:**
- **Week 1:** Enable AI review, but don't block PRs (shadow mode)
  - Collect data on accuracy without impacting workflow
  - Developers can ignore AI suggestions

- **Week 2:** AI reviews visible, but optional
  - Encourage developers to act on high-confidence issues
  - Track adoption rate

- **Week 3:** AI reviews required, but human can override
  - Block PRs with critical security issues unless overridden
  - Measure false positive feedback

- **Week 4:** Full integration
  - AI reviews are primary, human reviews for complex cases only
  - Collect comprehensive feedback for next phase

### Phase 2: Multi-Team Expansion (Month 3-4)

**Expand to 3-5 teams** based on pilot success

**Selection Criteria:**
- Different tech stacks to test language plugin system (add JavaScript, Java)
- Mix of service criticality (1 production-critical service to stress-test)
- Geographic distribution if applicable (test timezone handling)

**Rollout Strategy:**
```python
expansion_plan = {
    "month_3": {
        "teams": ["mobile-backend-team (Java)", "frontend-team (JavaScript)"],
        "enhancements": [
            "Add Java language plugin",
            "Add JavaScript/TypeScript language plugin",
            "Improve AI prompt based on pilot feedback",
            "Add team-specific coding standards configuration"
        ],
        "risk_mitigation": [
            "Keep pilot team as control group (monitor for regression)",
            "Gradual rollout (1 team per week)",
            "Dedicated support channel"
        ]
    },
    "month_4": {
        "teams": ["payment-service (Python, critical)", "analytics-service (Go)"],
        "enhancements": [
            "Add Go language plugin",
            "Enhanced security scanning for payment service",
            "Compliance framework integration (PCI-DSS for payments)",
            "Improved test strategy generation"
        ],
        "validation": [
            "Payment service requires 100% human review for first month",
            "AI suggestions marked as 'advisory' for critical services",
            "Measure: zero critical issues missed"
        ]
    }
}
```

---

## Rollout Phases (6-Month Roadmap)

### Month 1-2: MVP Development + Pilot

**Engineering Focus:**
- Core AI review agent
- GitHub integration
- Basic testing pipeline
- Metrics collection

**Success Metrics:**
- MVP deployed to pilot team
- First PR reviewed by AI successfully
- Positive early feedback (>3.5/5 stars)

---

### Month 3-4: Multi-Language + Multi-Team Expansion

**Engineering Focus:**
- Language plugins (Java, JavaScript, Go)
- Team-specific configuration system
- Enhanced security scanning
- Improved UI/UX based on feedback

**New Features:**
- Configurable coding standards per team
- Integration with Slack/Teams for notifications
- Review analytics dashboard
- False positive feedback mechanism

**Success Metrics:**
- 5+ teams using the system
- 4+ languages supported
- Review time <8 hours average
- 80%+ developer satisfaction

---

### Month 5: Production Deployment Automation

**Engineering Focus:**
- Automated deployment to production (with approvals)
- Canary deployment strategy
- Automatic rollback based on metrics
- Production monitoring integration

**New Features:**
- Deployment approval workflow
- Health check monitoring
- Automatic rollback triggers
- Deployment metrics (MTTR, MTBF, change failure rate)

**Pilot Approach:**
- Start with non-critical services
- Require 2+ human approvals for production deploy
- Monitor for 1 week before auto-rollback enabled

**Success Metrics:**
- 10+ services using automated deployment
- <5% deployment failure rate
- Zero production incidents caused by missed AI review issues

---

### Month 6: Compliance + Advanced Features

**Engineering Focus:**
- Compliance framework integration (SOC2, HIPAA, PCI-DSS)
- AI-powered test generation (experimental)
- Cross-project learning
- Advanced analytics

**New Features:**
- Compliance validation before deployment
- Audit trail for regulatory requirements
- Incident correlation engine
- Predictive deployment risk scoring

**Rollout:**
- Compliance features enabled for regulated teams (finance, healthcare)
- Test generation in beta for select teams
- Advanced analytics available to all teams

**Success Metrics:**
- 100% compliance validation coverage for regulated services
- Review time <4 hours (target achieved)
- 90%+ issue detection rate
- 50+ microservices onboarded

---

## Success Metrics by Phase

| Phase | Timeline | Review Time | Teams | Services | Developer Satisfaction | Deployment Success |
|-------|----------|-------------|-------|----------|----------------------|-------------------|
| MVP Pilot | Month 1-2 | <12h | 1 | 1 | >3.5/5 | 90% |
| Expansion | Month 3-4 | <8h | 5 | 10 | >4.0/5 | 92% |
| Prod Auto | Month 5 | <6h | 10 | 25 | >4.2/5 | 95% |
| Full Scale | Month 6 | <4h | 20+ | 50+ | >4.5/5 | 95% |

---

### Question 4.2: Risk Mitigation Strategies

## Risk 1: AI Making Incorrect Review Decisions

### Scenario A: False Positives (AI flags good code as problematic)

**Impact:** Developer frustration, reduced trust, wasted time

**Mitigation Strategies:**
1. **Confidence Thresholds:**
   ```python
   # Only block PRs for high-confidence issues
   if issue.severity == "blocking" and issue.confidence < 0.85:
       issue.severity = "warning"  # Downgrade to non-blocking
   ```

2. **Human Override Mechanism:**
   - All AI-flagged issues can be dismissed with explanation
   - Dismissals are tracked and feed back into AI training
   ```python
   def dismiss_ai_issue(issue_id: str, reason: str, developer_id: str):
       db.record_false_positive(issue_id, reason, developer_id)
       # Auto-train: if same issue dismissed 3+ times, lower confidence
   ```

3. **Gradual Enforcement:**
   - Week 1-2: AI suggestions are advisory only
   - Week 3-4: Block on critical issues only
   - Month 2+: Full enforcement with override

4. **False Positive Feedback Loop:**
   - Dedicated "Not an issue" button on every AI comment
   - Weekly analysis of false positives
   - Adjust prompts to reduce common false positive patterns

**Monitoring:**
- Alert if false positive rate >20% in any week
- Automatic prompt rollback if satisfaction drops below 3/5

---

### Scenario B: False Negatives (AI misses real issues)

**Impact:** Security vulnerabilities, bugs in production

**Mitigation Strategies:**
1. **Layered Defense:**
   ```python
   review_pipeline = [
       StaticAnalysisLayer(),  # Catches obvious issues
       AISemanticReview(),     # Catches complex issues
       HumanSamplingReview(),  # Catches AI misses
       ProductionMonitoring()  # Catches escaped issues
   ]
   ```

2. **Mandatory Human Review for High-Risk Changes:**
   ```python
   if pr.touches_files(["auth.py", "payment.py"]) or pr.risk_level == "critical":
       require_human_review(pr, min_reviewers=2, required_role="senior_engineer")
   ```

3. **Retrospective Analysis:**
   - When production incident occurs, trace back to PR
   - Identify what AI should have caught
   - Add to training dataset as negative example

4. **Periodic Human Audits:**
   - Random sample 5% of AI-approved PRs for human review
   - Measure false negative rate
   - Target: <10% false negative rate

**Monitoring:**
- Track production incidents correlated with recent deployments
- If incident rate increases, pause auto-approval

---

## Risk 2: System Downtime During Critical Deployments

### Scenario: AI review service crashes during urgent hotfix deployment

**Impact:** Deployment blocked, cannot ship critical fix

**Mitigation Strategies:**
1. **Bypass Mode:**
   ```python
   @emergency_override
   def bypass_ai_review(pr_id: str, approver: str, reason: str):
       # Requires VP+ approval
       if approver.role in ["vp_engineering", "cto"]:
           db.log_override(pr_id, approver, reason, severity="emergency")
           return allow_deployment()
       else:
           raise PermissionDenied("Emergency override requires VP+ approval")
   ```

2. **Degraded Mode:**
   - If AI service down, fall back to static analysis only
   - If static analysis down, require 2+ human reviews
   ```python
   def get_review_strategy():
       if ai_service.is_healthy():
           return AIReviewStrategy()
       elif static_analysis.is_healthy():
           return StaticAnalysisOnlyStrategy()
       else:
           return RequireHumanReviewsStrategy(min_reviewers=2)
   ```

3. **High Availability Architecture:**
   ```yaml
   infrastructure:
     ai_review_service:
       replicas: 3
       auto_scaling: true
       health_checks: /health (every 10s)
       circuit_breaker: enabled
     
     database:
       primary: postgres-master
       replicas: 2 (read replicas)
       failover: automatic
     
     deployment_service:
       replicas: 2
       stateless: true (can scale horizontally)
   ```

4. **Caching:**
   ```python
   @cache(ttl=3600)
   def get_ai_review(pr_id: str, code_hash: str):
       # Cache review results for 1 hour
       # If service down, return cached result if available
       pass
   ```

**Monitoring:**
- 99.9% uptime SLA for review service
- Page on-call engineer if service down >5 minutes
- Automated failover to degraded mode

---

## Risk 3: Integration Failures with Existing Tools

### Scenario: GitHub webhook stops firing, PRs not reviewed

**Impact:** PRs stuck in pending state, workflow broken

**Mitigation Strategies:**
1. **Health Checks & Monitoring:**
   ```python
   class IntegrationHealthMonitor:
       def check_github_webhooks(self):
           # Simulate PR creation every hour
           test_pr = self.create_test_pr()
           received_webhook = self.wait_for_webhook(test_pr, timeout=60)
           if not received_webhook:
               self.alert("GitHub webhook integration broken")
               self.attempt_auto_heal()
   ```

2. **Fallback Polling:**
   ```python
   # If webhooks fail, fall back to polling
   if webhook_integration_healthy == False:
       scheduler.add_job(poll_github_for_new_prs, interval=5*60)  # Poll every 5 min
   ```

3. **Idempotency:**
   - Reviewing same PR twice is safe
   - Use PR commit SHA as idempotency key
   ```python
   @idempotent(key=lambda pr: f"{pr.id}:{pr.head_commit_sha}")
   def review_pr(pr: PullRequest):
       # Safe to call multiple times
       pass
   ```

4. **Integration Tests:**
   ```python
   @pytest.mark.integration
   def test_end_to_end_pr_workflow():
       # Create PR in test GitHub repo
       pr = github_test_client.create_pull_request(...)
       
       # Wait for AI review to be posted
       assert wait_for_review_comment(pr, timeout=300)
       
       # Verify deployment triggered
       assert deployment_initiated(pr)
   ```

**Monitoring:**
- Alert if no PRs reviewed in last 2 hours (during business hours)
- Integration test suite runs every 30 minutes
- Dashboards for webhook delivery rate

---

## Risk 4: Resistance from Development Teams

### Scenario: Developers bypass or ignore AI reviews, adoption fails

**Impact:** System unused, ROI not achieved

**Mitigation Strategies:**
1. **Involve Developers Early:**
   - Form "AI Review Advisory Group" with developers from each team
   - Collect requirements, pain points, wishlist features
   - Monthly office hours for feedback

2. **Demonstrate Value Quickly:**
   - Start with easy wins (catch obvious bugs, security issues)
   - Show time savings metrics weekly
   - Highlight catches: "AI found SQL injection that was missed in manual review"

3. **Make it Easy to Use:**
   - Zero configuration required (auto-detect language, framework)
   - Integrates into existing workflow (GitHub comments)
   - One-click dismissal of false positives

4. **Gamification (Optional):**
   ```python
   # Show developer stats
   developer_stats = {
       "time_saved_hours": 12.5,
       "bugs_caught_before_production": 8,
       "security_issues_prevented": 2,
       "streak": "15 PRs reviewed in <4 hours"
   }
   ```

5. **Transparency:**
   - Publish AI accuracy metrics (precision, recall, false positive rate)
   - Show how feedback improves the system
   - Open to suggestions and feature requests

6. **Enforcement with Flexibility:**
   - Make AI review required, but allow dismissal with reason
   - Track dismissal patterns (if one team dismisses 80% of issues, investigate)

**Monitoring:**
- Developer satisfaction surveys (monthly)
- Adoption metrics (% of PRs reviewed by AI)
- Dismissal rate by team (high dismissal = poor fit or false positives)

---

## Risk 5: Compliance/Audit Requirements

### Scenario: Auditor asks "How do you ensure AI reviews meet SOC2 requirements?"

**Impact:** Failed audit, regulatory fines

**Mitigation Strategies:**
1. **Comprehensive Audit Trail:**
   ```python
   class AuditLog:
       def log_event(self, event_type: str, details: dict):
           db.insert({
               "timestamp": datetime.now(),
               "event_type": event_type,  # "pr_reviewed", "deployment_approved", etc.
               "actor": details.get("actor"),  # human or AI
               "pr_id": details.get("pr_id"),
               "decision": details.get("decision"),
               "confidence": details.get("confidence"),
               "overridden": details.get("overridden", False),
               "override_reason": details.get("override_reason"),
               "ip_address": details.get("ip_address"),
               "review_details": json.dumps(details.get("review"))
           })
   ```

2. **Immutable Audit Logs:**
   - Store audit logs in append-only database
   - Cryptographically sign each entry
   - Retention: 7 years (typical compliance requirement)

3. **Human-in-the-Loop for Critical Decisions:**
   ```python
   if deployment.environment == "production" and deployment.compliance_required:
       require_human_approval(
           approvers=2,
           roles=["senior_engineer", "security_engineer"],
           audit_reason="SOC2 CC6.8 requires human approval for production changes"
       )
   ```

4. **Explainability:**
   - Every AI decision includes explanation
   - Can trace decision back to specific prompt, model version, input data
   ```python
   ai_decision = {
       "decision": "block_deployment",
       "reason": "Critical security vulnerability detected (SQL injection at line 47)",
       "evidence": "query = f'SELECT * FROM users WHERE id={user_id}'",
       "model_version": "gpt-4-2024-01-15",
       "prompt_version": "security_review_v2.3",
       "confidence": 0.97,
       "human_reviewable": True
   }
   ```

5. **Regular Compliance Audits:**
   - Quarterly internal audit of AI decisions
   - Sample 100 PRs: verify all required checks performed
   - Validate audit logs are complete and tamper-proof

**Documentation for Auditors:**
- "AI Review System Control Documentation"
  - How AI decisions are made
  - What controls are in place (human oversight, override mechanisms)
  - How audit trails are maintained
  - Evidence of effectiveness (metrics, validation results)

---

### Question 4.3: Tool Selection & Integration Strategy

## Code Review Platforms

### GitHub (Primary)
**Integration Points:**
- **Webhooks:** PR events (opened, updated, closed)
- **GitHub API:** Post review comments, request changes, approve
- **GitHub Actions:** Trigger AI review workflow
- **GitHub Checks API:** Show review status in PR UI

**Implementation:**
```python
from github import Github

class GitHubIntegration:
    def __init__(self, token: str):
        self.client = Github(token)
    
    def post_review_comments(self, pr: PullRequest, review: AIReview):
        repo = self.client.get_repo(pr.repository)
        pull = repo.get_pull(pr.number)
        
        # Post line-specific comments
        for issue in review.issues:
            pull.create_review_comment(
                body=f"**{issue.title}** (AI Confidence: {issue.confidence})\n\n{issue.description}\n\nSuggested fix:\n```python\n{issue.suggested_fix}\n```",
                commit=pull.head.sha,
                path=issue.file,
                line=issue.line
            )
        
        # Submit overall review
        if review.recommendation == "approve":
            pull.create_review(event="APPROVE", body=review.summary)
        elif review.recommendation == "request_changes":
            pull.create_review(event="REQUEST_CHANGES", body=review.summary)
```

**Why GitHub:**
- Most popular platform (90% of teams already use it)
- Excellent API and webhook support
- Mature ecosystem (Actions, Apps, Checks)

**Alternatives:** GitLab, Bitbucket (add via plugin architecture)

---

## CI/CD Systems

### GitHub Actions (Primary for GitHub users)
**Use Case:** Trigger AI review, run tests, build artifacts

**Example Workflow:**
```yaml
# .github/workflows/ai-review.yml
name: AI Code Review and Deploy
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  ai-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Run AI Code Review
        uses: company/ai-review-action@v1
        with:
          api_key: ${{ secrets.AI_REVIEW_API_KEY }}
          language: python
          framework: fastapi
      
      - name: Run Tests
        run: pytest --cov=. --cov-report=xml
      
      - name: Security Scan
        uses: snyk/actions/python@master
        with:
          args: --severity-threshold=high
  
  deploy-staging:
    needs: ai-review
    if: github.event.pull_request.merged == true
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to Staging
        run: ./scripts/deploy.sh staging
```

### Jenkins (For enterprises with existing Jenkins infrastructure)
**Integration:**
```groovy
// Jenkinsfile
pipeline {
    agent any
    
    stages {
        stage('AI Review') {
            steps {
                script {
                    def review = sh(
                        script: "curl -X POST https://ai-review-api/review -d @pr-data.json",
                        returnStdout: true
                    )
                    
                    if (review.contains('"blocking": true')) {
                        error("AI review found blocking issues")
                    }
                }
            }
        }
        
        stage('Test') {
            steps {
                sh 'pytest'
            }
        }
        
        stage('Deploy') {
            when {
                branch 'main'
            }
            steps {
                sh './deploy.sh production'
            }
        }
    }
}
```

**Why Multi-CI Support:**
- GitHub Actions: Modern, cloud-native teams
- Jenkins: Enterprises with on-prem infrastructure
- GitLab CI: Teams using GitLab
- CircleCI, Travis CI: Via generic webhook interface

---

## Monitoring Tools

### Datadog (Primary)
**Use Case:** Post-deployment monitoring, anomaly detection, auto-rollback triggers

**Integration:**
```python
from datadog import initialize, api

class DatadogMonitoring:
    def __init__(self):
        initialize(api_key=os.getenv('DD_API_KEY'))
    
    def get_deployment_health(self, service: str, deployment_time: datetime) -> HealthMetrics:
        # Query metrics from Datadog
        query = f"avg:http.server.error_rate{{service:{service}}}"
        
        metrics = api.Metric.query(
            start=int(deployment_time.timestamp()),
            end=int((deployment_time + timedelta(minutes=30)).timestamp()),
            query=query
        )
        
        error_rate = metrics['series'][0]['pointlist'][-1][1]
        
        # Compare to baseline
        baseline = self.get_baseline_error_rate(service)
        
        return HealthMetrics(
            error_rate=error_rate,
            baseline_error_rate=baseline,
            anomaly_detected=error_rate > baseline * 2,
            recommendation="rollback" if error_rate > baseline * 2 else "continue"
        )
    
    def set_rollback_monitor(self, service: str, deployment_id: str):
        """Create Datadog monitor that triggers rollback"""
        api.Monitor.create(
            type="metric alert",
            query=f"avg(last_5m):avg:http.server.error_rate{{service:{service}}} > 0.05",
            name=f"Auto-Rollback Monitor - {service} - {deployment_id}",
            message=f"Error rate exceeded threshold. Triggering rollback. @webhook-rollback-{deployment_id}",
            tags=[f"deployment:{deployment_id}", "auto-rollback:enabled"]
        )
```

### Prometheus + Grafana (Open-source alternative)
**Use Case:** Self-hosted monitoring for cost-conscious teams

**Integration:**
```python
from prometheus_client import Counter, Histogram, Gauge
import requests

# Metrics
deployment_counter = Counter('deployments_total', 'Total deployments', ['service', 'environment', 'status'])
review_time = Histogram('review_duration_seconds', 'Time spent in code review', ['language'])
error_rate_gauge = Gauge('error_rate', 'Current error rate', ['service'])

class PrometheusMonitoring:
    def query(self, promql: str) -> dict:
        response = requests.get(
            f"{PROMETHEUS_URL}/api/v1/query",
            params={'query': promql}
        )
        return response.json()
    
    def get_error_rate(self, service: str) -> float:
        result = self.query(f'rate(http_requests_total{{service="{service}", status=~"5.."}}[5m])')
        return float(result['data']['result'][0]['value'][1])
```

**Why Both:**
- Datadog: Best-in-class, low setup time, great for startups/cloud-native
- Prometheus: Open-source, cost-effective, great for Kubernetes environments

---

## Security Scanning Tools

### Snyk (Dependency Scanning)
**Use Case:** Detect vulnerabilities in dependencies (npm, pip, maven, etc.)

**Integration:**
```python
import requests

class SnykIntegration:
    def scan_dependencies(self, manifest_file: str, language: str) -> SecurityReport:
        response = requests.post(
            "https://snyk.io/api/v1/test",
            headers={"Authorization": f"token {SNYK_TOKEN}"},
            json={
                "encoding": "plain",
                "files": {
                    manifest_file: open(manifest_file).read()
                }
            }
        )
        
        result = response.json()
        
        vulnerabilities = [
            Vulnerability(
                package=vuln['package'],
                severity=vuln['severity'],
                cve=vuln.get('identifiers', {}).get('CVE', []),
                fix_available=vuln.get('isUpgradable', False),
                recommended_version=vuln.get('upgradePath', [])[-1] if vuln.get('upgradePath') else None
            )
            for vuln in result.get('vulnerabilities', [])
        ]
        
        return SecurityReport(
            vulnerabilities=vulnerabilities,
            critical_count=len([v for v in vulnerabilities if v.severity == 'critical']),
            block_deployment=any(v.severity == 'critical' for v in vulnerabilities)
        )
```

### SonarQube (Static Code Analysis)
**Use Case:** Code quality metrics, code smells, security hotspots

**Integration:**
```python
class SonarQubeIntegration:
    def analyze_pr(self, pr: PullRequest) -> CodeQualityReport:
        # Trigger SonarQube scan
        os.system(f"sonar-scanner -Dsonar.pullrequest.key={pr.number}")
        
        # Retrieve results
        response = requests.get(
            f"{SONARQUBE_URL}/api/issues/search",
            params={
                'pullRequest': pr.number,
                'types': 'BUG,VULNERABILITY,CODE_SMELL',
                'severities': 'BLOCKER,CRITICAL,MAJOR'
            }
        )
        
        issues = response.json()['issues']
        
        return CodeQualityReport(
            bugs=len([i for i in issues if i['type'] == 'BUG']),
            vulnerabilities=len([i for i in issues if i['type'] == 'VULNERABILITY']),
            code_smells=len([i for i in issues if i['type'] == 'CODE_SMELL']),
            quality_gate_passed=response.json()['qualityGateStatus'] == 'OK'
        )
```

### Veracode (Enterprise SAST/DAST)
**Use Case:** Enterprise-grade security scanning for regulated industries

**When to Use:**
- Companies requiring SOC2, ISO 27001, PCI-DSS compliance
- Need for attestation reports
- Comprehensive vulnerability database

---

## Communication Tools

### Slack Integration
**Use Case:** Real-time notifications, feedback collection

**Implementation:**
```python
from slack_sdk import WebClient

class SlackNotifier:
    def __init__(self):
        self.client = WebClient(token=os.getenv('SLACK_BOT_TOKEN'))
    
    def notify_review_complete(self, pr: PullRequest, review: AIReview):
        channel = self.get_team_channel(pr.team)
        
        self.client.chat_postMessage(
            channel=channel,
            text=f"PR #{pr.number} reviewed by AI",
            blocks=[
                {
                    "type": "section",
                    "text": {
                        "type": "mrkdwn",
                        "text": f"*PR #{pr.number}: {pr.title}*\nAuthor: <@{pr.author}>"
                    }
                },
                {
                    "type": "section",
                    "fields": [
                        {"type": "mrkdwn", "text": f"*Issues Found:*\n{review.issues_count}"},
                        {"type": "mrkdwn", "text": f"*Recommendation:*\n{review.recommendation}"}
                    ]
                },
                {
                    "type": "actions",
                    "elements": [
                        {
                            "type": "button",
                            "text": {"type": "plain_text", "text": "View PR"},
                            "url": pr.url
                        },
                        {
                            "type": "button",
                            "text": {"type": "plain_text", "text": "Report False Positive"},
                            "action_id": "report_false_positive"
                        }
                    ]
                }
            ]
        )
    
    def notify_deployment_status(self, deployment: Deployment):
        if deployment.status == "failed":
            # Urgent notification with @channel mention
            self.client.chat_postMessage(
                channel="#deployments",
                text=f"<!channel> Deployment FAILED: {deployment.service} to {deployment.environment}\n"
                     f"Error: {deployment.error}\n"
                     f"Rollback initiated automatically."
            )
```

### Jira Integration
**Use Case:** Create tickets for issues, track remediation

**Implementation:**
```python
from jira import JIRA

class JiraIntegration:
    def __init__(self):
        self.client = JIRA(server=JIRA_URL, basic_auth=(JIRA_USER, JIRA_TOKEN))
    
    def create_issue_for_security_finding(self, finding: SecurityIssue, pr: PullRequest):
        """Create Jira ticket for security vulnerabilities"""
        issue = self.client.create_issue(
            project='SEC',
            summary=f"Security Vulnerability: {finding.title} in {pr.repository}",
            description=f"""
                *Detected by AI Security Review*
                
                *Severity:* {finding.severity}
                *CWE:* {finding.cwe_id}
                *File:* {finding.file}:{finding.line}
                
                *Description:*
                {finding.description}
                
                *Evidence:*
                {{code}}{finding.evidence}{{code}}
                
                *Remediation:*
                {finding.suggested_fix}
                
                *PR Link:* {pr.url}
            """,
            issuetype={'name': 'Bug'},
            priority={'name': 'High' if finding.severity == 'critical' else 'Medium'},
            labels=['security', 'ai-detected', pr.repository]
        )
        
        return issue.key
```

---

## Integration Architecture Overview

```python
class OrchestrationEngine:
    """Central orchestrator that coordinates all integrations"""
    
    def __init__(self):
        self.github = GitHubIntegration()
        self.ci_cd = CICDFactory.create()  # Auto-detect GitHub Actions, Jenkins, etc.
        self.monitoring = MonitoringFactory.create()  # Datadog, Prometheus, etc.
        self.security = [SnykIntegration(), SonarQubeIntegration()]
        self.notifications = [SlackNotifier(), JiraIntegration()]
    
    def handle_pr_event(self, pr: PullRequest):
        """Main workflow when PR is created/updated"""
        
        # 1. Trigger CI/CD pipeline
        self.ci_cd.trigger_workflow(pr, workflow='ai-review-and-test')
        
        # 2. Run AI review
        review = self.ai_review_agent.review(pr)
        
        # 3. Run security scans in parallel
        security_reports = [scanner.scan(pr) for scanner in self.security]
        
        # 4. Post results to GitHub
        self.github.post_review_comments(pr, review)
        for report in security_reports:
            self.github.post_security_report(pr, report)
        
        # 5. Notify team
        self.notifications[0].notify_review_complete(pr, review)  # Slack
        
        # 6. Create Jira tickets for critical issues
        for issue in review.issues:
            if issue.severity == 'critical':
                self.notifications[1].create_issue_for_security_finding(issue, pr)
    
    def handle_deployment_event(self, deployment: Deployment):
        """Monitor deployment and trigger rollback if needed"""
        
        # 1. Deploy
        result = self.ci_cd.deploy(deployment)
        
        # 2. Set up monitoring
        self.monitoring.set_rollback_monitor(deployment.service, deployment.id)
        
        # 3. Check health after 15 minutes
        time.sleep(15 * 60)
        health = self.monitoring.get_deployment_health(deployment.service, deployment.timestamp)
        
        # 4. Auto-rollback if unhealthy
        if health.recommendation == "rollback":
            self.ci_cd.rollback(deployment)
            self.notifications[0].notify_deployment_status(deployment)
```

**Key Principle:** Plugin-based architecture allows easy addition of new integrations without modifying core system

---