[Phase 1 Issue #4] MongoDB-Based Scan Service with Multi-Scanner Routing by remyluslosius · Pull Request #101 · Hanalyx/OpenWatch

remyluslosius · 2025-10-15T12:02:25Z

Summary

This PR implements Issue #100: MongoDB-Based Scan Service with Multi-Scanner Routing, completing the core scanning infrastructure for Phase 1 of the hybrid scanning architecture.

Phase 1 Progress: 4/7 tasks completed (57%)

What's New

1. MongoDB Scan Result Models (`backend/app/models/scan_models.py`)

ScanResult: Document model for complete scan execution records
ScanConfiguration: Scan parameters (target, framework, variables)
ScanTarget: Multi-target support (SSH hosts, Kubernetes, cloud accounts)
RuleResult: Rule-level check results with scanner metadata
ScanResultSummary: Aggregated compliance statistics

Key Features:

Scan status tracking: PENDING → RUNNING → COMPLETED/FAILED
Rule-level granularity with pass/fail/error states
Scanner version tracking for audit trails
Compliance percentage calculation
Breakdown by severity and scanner type

2. Abstract Scanner Interface (`backend/app/services/scanners/base_scanner.py`)

BaseScanner: Abstract base class for all scanner types
Standardized scan() method signature
Built-in summary calculation
Version detection
Custom exception hierarchy: ScannerNotAvailableError, ScannerExecutionError, UnsupportedTargetError

Design Pattern: Strategy pattern for pluggable scanner implementations

3. OSCAP Scanner (`backend/app/services/scanners/oscap_scanner.py`)

OVAL-based compliance scanning with OpenSCAP
Remote scanning via oscap-ssh
Local scanning with oscap
XCCDF benchmark generation from MongoDB rules
Tailoring file generation for variable overrides
XCCDF results XML parsing

Supported Targets: SSH hosts, local system

Key Features:

Automatic benchmark assembly from rules
Variable customization via tailoring files
Structured result parsing with severity mapping
Temporary file management for benchmark/tailoring/results

4. Kubernetes Scanner (`backend/app/services/scanners/kubernetes_scanner.py`)

YAML-based compliance checks for Kubernetes/OpenShift
JSONPath queries via kubectl get -o jsonpath
Multi-condition evaluation (equals, contains, exists, any_exist)
Kubeconfig support for multi-cluster scanning

Supported Targets: Kubernetes clusters, OpenShift

Check Types:

resource_type: "image.config.openshift.io"
resource_name: "cluster"
yamlpath: ".spec.allowedRegistriesForImport[:].insecure"
expected_value: false
condition: "equals"

5. Scanner Factory (`backend/app/services/scanners/init.py`)

Registry-based scanner instantiation
Runtime scanner registration
Available scanner listing
Type validation

Current Scanners: oscap, kubernetes
Future: aws_api, azure_api, gcp_api, python, bash

6. Scan Orchestrator (`backend/app/services/scan_orchestrator_service.py`)

Central coordinator for multi-scanner execution:

Workflow:

Query rules from MongoDB matching framework/version
Group rules by scanner_type
Execute scanners in parallel via asyncio.gather()
Aggregate results from all scanners
Calculate overall compliance summary
Store complete scan record in MongoDB

Key Methods:

execute_scan(): Complete scan lifecycle
_get_rules(): MongoDB rule querying with framework filters
_group_by_scanner(): Rule routing by scanner_type
_execute_scanner(): Single scanner execution
_calculate_overall_summary(): Multi-scanner result aggregation

7. Scan Execution API (`backend/app/api/v1/endpoints/scans_api.py`)

RESTful endpoints for scan management:

POST /api/v1/scans/execute - Execute compliance scan
GET /api/v1/scans/{scan_id} - Get scan result
GET /api/v1/scans/ - List scans (with filters)
DELETE /api/v1/scans/{scan_id} - Delete scan
GET /api/v1/scans/statistics/summary - Scan statistics

Features:

User authentication required
Role-based access control (admins see all scans)
Pagination support
Status filtering
Aggregated statistics

Scan Execution Flow

graph TD
    A[API Request] --> B[ScanOrchestrator.execute_scan]
    B --> C[Query MongoDB Rules]
    C --> D{Group by scanner_type}
    D --> E[OSCAP Rules]
    D --> F[Kubernetes Rules]
    D --> G[Future: Cloud Rules]
    
    E --> H[OSCAPScanner.scan]
    F --> I[KubernetesScanner.scan]
    G --> J[CloudScanner.scan]
    
    H --> K[Generate XCCDF Benchmark]
    K --> L[Generate Tailoring File]
    L --> M[Execute oscap-ssh]
    M --> N[Parse XCCDF Results]
    
    I --> O[Validate kubectl connection]
    O --> P[Query resources via JSONPath]
    P --> Q[Evaluate conditions]
    
    N --> R[Aggregate Results]
    Q --> R
    J --> R
    
    R --> S[Calculate Summary]
    S --> T[Store ScanResult in MongoDB]
    T --> U[Return to API]

Scanner Capabilities Matrix

Scanner Type	Target Types	Check Method	Variable Support	Remediation
`oscap`	SSH hosts, local	OVAL definitions	✅ XCCDF tailoring	✅ Ansible/Bash
`kubernetes`	K8s clusters	JSONPath queries	✅ Template vars	✅ Kubectl apply
`aws_api` (future)	AWS accounts	boto3 API calls	✅ Config params	✅ Terraform
`azure_api` (future)	Azure subscriptions	Azure SDK	✅ Config params	✅ ARM templates
`gcp_api` (future)	GCP projects	Google Cloud SDK	✅ Config params	✅ Terraform

API Usage Examples

Execute Scan (SSH Host)

curl -X POST http://localhost:8000/api/v1/scans/execute \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "target": {
        "type": "ssh_host",
        "identifier": "prod-web-01.example.com",
        "credentials": {
          "username": "root",
          "ssh_key": "-----BEGIN OPENSSH PRIVATE KEY-----\n..."
        }
      },
      "framework": "nist",
      "framework_version": "800-53r5",
      "variable_overrides": {
        "xccdf_com.hanalyx.openwatch_value_var_accounts_tmout": "300",
        "xccdf_com.hanalyx.openwatch_value_var_password_pam_minlen": "14"
      }
    },
    "scan_name": "Production Web Server - NIST 800-53r5"
  }'

Execute Scan (Kubernetes)

curl -X POST http://localhost:8000/api/v1/scans/execute \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "target": {
        "type": "kubernetes",
        "identifier": "production-cluster",
        "credentials": {
          "kubeconfig": "/home/user/.kube/config"
        }
      },
      "framework": "cis",
      "framework_version": "kubernetes-1.8"
    },
    "scan_name": "Production K8s - CIS Benchmark"
  }'

Get Scan Result

curl http://localhost:8000/api/v1/scans/{scan_id} \
  -H "Authorization: Bearer $TOKEN"

List Recent Scans

curl "http://localhost:8000/api/v1/scans/?skip=0&limit=10&status=completed" \
  -H "Authorization: Bearer $TOKEN"

Scan Statistics

curl "http://localhost:8000/api/v1/scans/statistics/summary?days=30&framework=nist" \
  -H "Authorization: Bearer $TOKEN"

Implementation Details

Parallel Scanner Execution

Scanners execute concurrently using asyncio:

scanner_tasks = [
    self._execute_scanner(scanner_type, rules, config)
    for scanner_type, rules in rules_by_scanner.items()
]
scanner_results = await asyncio.gather(*scanner_tasks, return_exceptions=True)

Error Handling

Scanner exceptions captured per-scanner (doesn't fail entire scan)
Errors recorded in ScanResult.errors[]
Failed rules get status=error with error message
Partial results still saved to MongoDB

Result Aggregation

def _calculate_overall_summary(self, all_results: List[RuleResult]) -> ScanResultSummary:
    summary = ScanResultSummary(total_rules=len(all_results))
    
    # Count by status
    for result in all_results:
        if result.status == "pass":
            summary.passed += 1
        elif result.status == "fail":
            summary.failed += 1
    
    # Calculate compliance %
    evaluated = summary.passed + summary.failed
    if evaluated > 0:
        summary.compliance_percentage = (summary.passed / evaluated) * 100
    
    # Breakdown by severity and scanner
    summary.by_severity = self._group_by_severity(all_results)
    summary.by_scanner = self._group_by_scanner_summary(all_results)
    
    return summary

Testing Requirements

Unit Tests Needed

Scanner Tests:
- Mock oscap-ssh execution and result parsing
- Mock kubectl execution and JSONPath queries
- Test condition evaluation (equals, contains, any_exist)
Orchestrator Tests:
- Rule grouping by scanner_type
- Parallel execution with mixed success/failure
- Summary calculation accuracy
- MongoDB storage verification
API Tests:
- Authentication/authorization
- Request validation
- Pagination
- Statistics aggregation

Integration Tests Needed

End-to-End Scan:
- Real OSCAP scan against test VM
- Real Kubernetes scan against test cluster
- Variable override verification
- Result accuracy validation
Multi-Scanner Scan:
- Mixed ruleset (OSCAP + K8s rules)
- Verify parallel execution
- Verify result aggregation
Error Scenarios:
- Invalid credentials
- Network timeouts
- Missing scanner binaries
- Malformed XCCDF

Dependencies

Python Packages (already in requirements.txt)

motor: Async MongoDB driver
beanie: MongoDB ODM
fastapi: REST API framework
asyncio: Async execution

External Tools Required

oscap (OpenSCAP command-line tool)
oscap-ssh (Remote scanning wrapper)
kubectl (Kubernetes CLI)

Breaking Changes

None - this is new functionality.

Migration Required

None - new collections are created automatically via Beanie.

Known Limitations

XCCDF Benchmark Generation: Currently generates simplified benchmark structure. Full implementation needs integration with XCCDFGeneratorService from PR [Phase 1] XCCDF Data-Stream Generator from MongoDB #99.
OSCAP Scanner: Placeholder benchmark generation. Production version should call XCCDFGeneratorService.generate_benchmark() (requires passing db instance to scanner).
Kubernetes Scanner: Missing import os at top of file (added at bottom as workaround - should be moved).
No Celery Integration Yet: Scans execute synchronously. Phase 2 will add async task queue for long-running scans.

Future Enhancements (Phase 2+)

Async Task Queue: Celery integration for background scanning
Real-time Progress: WebSocket updates during scan execution
Scan Scheduling: Cron-based periodic scans
Result Comparison: Diff between scan results
Trend Analysis: Historical compliance tracking
Cloud Scanners: AWS, Azure, GCP implementations
Custom Scanners: Python/Bash script execution

Phase 1 Status

✅ Completed (4/7 tasks):

✅ Enhanced ComplianceRule Model (PR [Phase 1] Add XCCDFVariable Model and XCCDF Variables Support #95 - merged)
✅ Enhanced SCAP Converter (PR [Phase 1] Enhanced SCAP Converter with Variable and Remediation Extraction #97 - pending)
✅ XCCDF Generator (PR [Phase 1] XCCDF Data-Stream Generator from MongoDB #99 - pending)
✅ Scan Service (PR [Phase 1 Issue #4] MongoDB-Based Scan Service with Multi-Scanner Routing #101 - this PR)

🚧 Remaining (3/7 tasks):
5. ⏳ ORSA Plugin Architecture (5-7 days)
6. ⏳ Scan Configuration API (3-4 days)
7. ⏳ Frontend Variable Customization UI (5-7 days)

Related Issues

Implements: MongoDB-Based Scan Service with Multi-Scanner Routing #100
Depends on: Enhanced SCAP Converter with Variable Extraction #96 (PR [Phase 1] Enhanced SCAP Converter with Variable and Remediation Extraction #97), XCCDF Data-Stream Generator from MongoDB #98 (PR [Phase 1] XCCDF Data-Stream Generator from MongoDB #99)
Blocks: chore(deps): bump node from 18-alpine to 24-alpine in /docker #5 (ORSA Plugin Architecture)

Checklist

Code follows project style guidelines
Self-review completed
Code commented where necessary
Documentation updated (this PR description)
Unit tests added
Integration tests added
No new warnings generated
Dependent PRs identified ([Phase 1] Enhanced SCAP Converter with Variable and Remediation Extraction #97, [Phase 1] XCCDF Data-Stream Generator from MongoDB #99)

Reviewers

@Hanalyx - Please review scan orchestration logic and scanner architecture

Additional Notes

Integration with PR #99: This PR uses simplified XCCDF generation. After PR #99 merges, update OSCAPScanner._generate_benchmark() to call XCCDFGeneratorService.generate_benchmark().

MongoDB Bundle: Current bundle lacks Phase 1 fields (scanner_type, xccdf_variables, remediation). After PR #97 merges, regenerate bundle with:

python -m backend.app.cli.scap_to_openwatch_converter_enhanced convert \
  --extract-variables --extract-remediation --create-bundle --bundle-version 2.0.0

Security Note: SSH credentials in ScanTarget.credentials should be encrypted at rest. Consider using Fernet encryption before MongoDB storage.

Phase 1, Task 1.1: Enhanced ComplianceRule Model with XCCDF Variables Implements Solution A (XCCDF Variables) for hybrid scanning architecture, enabling scan-time customization of compliance checks. New Model: XCCDFVariable - Supports string, number, and boolean variable types - Validation for constraints (min/max, choices, patterns) - Sensitive variable handling (encrypted storage, masked UI) - Interactive flag for UI/API customization ComplianceRule Enhancements: 1. xccdf_variables: Dict[str, XCCDFVariable] - Variables for scan-time customization 2. scanner_type: str - Route rules to appropriate scanner (oscap, python, aws_api, etc.) 3. remediation: Dict - ORSA (Open Remediation Standard Adapter) plugin content MongoDB Indexes: - Added 'scanner_type' index for routing performance - Added compound index: (scanner_type, is_latest) for latest rules by scanner Testing: - XCCDFVariable creation and validation ✅ - Constraint validation (numeric, string, regex) ✅ - Model serialization (exclude_none) ✅ - ComplianceRule backward compatibility ✅ References: - Issue #94 - /docs/REMEDIATION_WITH_XCCDF_VARIABLES.md - /docs/ADVANCED_SCANNING_ARCHITECTURE.md - /docs/IMPLEMENTATION_PLAN_7_PHASES.md 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

Implements enhanced SCAP converter with XCCDF variable and remediation content extraction for Solution A hybrid scanning architecture. ## New Components ### 1. SCAP YAML Parser Service (scap_yaml_parser_service.py) - **XCCDFVariableExtractor**: Extracts XCCDF variables from SCAP YAML rules - Variable type inference (string, number, boolean) - Constraint detection (min/max values, string lengths, regex patterns) - Sensitive variable marking (passwords, keys, credentials) - Template variable extraction - **RemediationExtractor**: Extracts remediation content - Ansible task generation from templates (sysctl, file, service, package) - Bash script generation for simple remediation - Supports separate remediation files (ansible/shared.yml, bash/shared.sh) - **ScannerTypeDetector**: Detects scanner type from rule metadata - kubernetes: yamlfile_value templates with ocp_data - aws_api, azure_api, gcp_api: Cloud-specific templates - oscap: Traditional OVAL-based checks (default) ### 2. Enhanced SCAP Converter Updates - Added --extract-variables flag for XCCDF variable extraction - Added --extract-remediation flag for remediation content extraction - Integrated scap_yaml_parser_service for metadata extraction - Populates xccdf_variables, remediation, scanner_type fields (from Issue #94) ## Testing Tested with real SCAP content: **Kubernetes Rule** (ocp_insecure_allowed_registries_for_import): - Scanner Type: kubernetes ✅ - Variables: check_existence, values ✅ **Mock Sysctl Rule** (net.ipv4.ip_forward): - Scanner Type: oscap ✅ - Variables: sysctlvar (string), sysctlval (boolean) ✅ - Ansible Remediation: ansible.posix.sysctl task ✅ - Bash Remediation: sysctl -w command + /etc/sysctl.conf ✅ ## Usage ```bash # Extract variables only python -m backend.app.cli.scap_to_openwatch_converter_enhanced convert \ --scap-path /path/to/scap/content \ --output-path /output/dir \ --format json \ --extract-variables # Extract variables + remediation python -m backend.app.cli.scap_to_openwatch_converter_enhanced convert \ --scap-path /path/to/scap/content \ --output-path /output/dir \ --format json \ --extract-variables \ --extract-remediation ``` ## Implementation Notes - Variable extraction handles template vars (most reliable source) - Type inference from naming conventions (timeout→number, banner→string) - Remediation extraction supports 12 template types (sysctl, file, service, etc.) - Scanner detection based on template name and file path - All Phase 1 fields are optional with defaults (backward compatible) ## Related Issues - Closes: #96 (Enhanced SCAP Converter with Variable Extraction) - Requires: #94 (XCCDFVariable Model) - merged - Blocks: #97 (XCCDF Data-Stream Generator) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Implements XCCDF 1.2 compliant benchmark and tailoring file generation from MongoDB compliance rules for Solution A hybrid scanning architecture. ## New Components ### 1. XCCDF Generator Service (xccdf_generator_service.py - 550 lines) - **generate_benchmark()**: Creates XCCDF 1.2 Benchmark XML from MongoDB rules - Generates XCCDF Value elements for variables with constraints - Creates Groups for rule categorization - Generates Profiles per framework (NIST, CIS, STIG, etc.) - Proper XCCDF 1.2 ID formatting: xccdf_<reverse-DNS>_<type>_<name> - **generate_tailoring()**: Creates XCCDF Tailoring files for variable overrides - Allows environment-specific customization (dev, staging, prod) - set-value elements for variable overrides - Extends base profiles without modifying benchmark - **XCCDF 1.2 Compliance**: - Benchmark IDs: xccdf_com.hanalyx.openwatch_benchmark_{name} - Rule IDs: xccdf_com.hanalyx.openwatch_rule_{name} - Value IDs: xccdf_com.hanalyx.openwatch_value_{name} - Group IDs: xccdf_com.hanalyx.openwatch_group_{category} - Profile IDs: xccdf_com.hanalyx.openwatch_profile_{framework}_{version} ### 2. XCCDF API Endpoints (xccdf_api.py - 220 lines) - **POST /api/v1/xccdf/generate-benchmark**: Generate XCCDF Benchmark - **POST /api/v1/xccdf/generate-tailoring**: Generate XCCDF Tailoring file - **GET /api/v1/xccdf/frameworks**: List available frameworks and versions - **GET /api/v1/xccdf/variables**: List customizable XCCDF variables ### 3. Pydantic Schemas (xccdf_schemas.py - 150 lines) - XCCDFBenchmarkRequest/Response - XCCDFTailoringRequest/Response - XCCDFValidationRequest/Response ## XCCDF Structure Generated ### Benchmark XML \`\`\`xml <xccdf:Benchmark id="xccdf_com.hanalyx.openwatch_benchmark_nist_800_53r5"> <xccdf:status>draft</xccdf:status> <xccdf:title>NIST 800-53r5 Benchmark</xccdf:title> <xccdf:version>1.0.0</xccdf:version>  <xccdf:Value id="xccdf_com.hanalyx.openwatch_value_var_accounts_tmout" type="number"> <xccdf:title>Session Timeout</xccdf:title> <xccdf:value>600</xccdf:value> <xccdf:lower-bound>60</xccdf:lower-bound> <xccdf:upper-bound>3600</xccdf:upper-bound> </xccdf:Value>  <xccdf:Group id="xccdf_com.hanalyx.openwatch_group_authentication"> <xccdf:Rule id="xccdf_com.hanalyx.openwatch_rule_accounts_tmout"> <xccdf:title>Set Interactive Session Timeout</xccdf:title> <xccdf:check system="http://oval.mitre.org/XMLSchema/oval-definitions-5"> <xccdf:check-content-ref href="oscap-definitions.xml"/> </xccdf:check> </xccdf:Rule> </xccdf:Group>  <xccdf:Profile id="xccdf_com.hanalyx.openwatch_profile_nist_800_53r5"> <xccdf:title>NIST 800-53r5</xccdf:title> <xccdf:select idref="xccdf_com.hanalyx.openwatch_rule_accounts_tmout" selected="true"/> </xccdf:Profile> </xccdf:Benchmark> \`\`\` ### Tailoring XML \`\`\`xml <xccdf:Tailoring id="production_tailoring"> <xccdf:version>1.0</xccdf:version> <xccdf:benchmark href="benchmark.xml"/> <xccdf:Profile id="nist_800_53r5_production" extends="nist_800_53r5"> <xccdf:title>Production Environment</xccdf:title> <xccdf:set-value idref="xccdf_com.hanalyx.openwatch_value_var_accounts_tmout">300</xccdf:set-value> </xccdf:Profile> </xccdf:Tailoring> \`\`\` ## Usage ### Generate Benchmark \`\`\`python POST /api/v1/xccdf/generate-benchmark { "benchmark_id": "nist-800-53r5", "title": "NIST SP 800-53 Revision 5", "description": "Security controls for federal information systems", "version": "1.0.0", "framework": "nist", "framework_version": "800-53r5" } \`\`\` ### Generate Tailoring \`\`\`python POST /api/v1/xccdf/generate-tailoring { "tailoring_id": "prod_tailoring", "benchmark_href": "nist-800-53r5.xml", "profile_id": "xccdf_com.hanalyx.openwatch_profile_nist_800_53r5", "variable_overrides": { "xccdf_com.hanalyx.openwatch_value_var_accounts_tmout": "300" } } \`\`\` ### Scan with Generated Files \`\`\`bash # Generate benchmark via API, save to file curl -X POST /api/v1/xccdf/generate-benchmark ... > benchmark.xml # Generate tailoring via API curl -X POST /api/v1/xccdf/generate-tailoring ... > tailoring.xml # Scan with oscap oscap xccdf eval \\ --profile xccdf_com.hanalyx.openwatch_profile_nist_800_53r5 \\ --tailoring-file tailoring.xml \\ --results results.xml \\ benchmark.xml \`\`\` ## Implementation Notes ### XCCDF 1.2 ID Requirements - All IDs must follow pattern: `xccdf_<reverse-DNS>_<type>_<name>` - Reverse DNS: `com.hanalyx.openwatch` - Types: benchmark, profile, group, rule, value - Names: Derived from MongoDB rule_id, category, framework, etc. ### Variable Constraints - Number types: lower-bound, upper-bound elements - String types: choice elements for enums, match for regex patterns - Boolean types: No additional constraints needed ### Profile Generation - One profile per framework version found in rules - Profiles automatically select matching rules via xccdf:select - Tailoring extends profiles without modifying benchmark ## Testing Manual XCCDF validation shows ID format requirements working correctly: - ✅ Benchmark IDs properly formatted - ✅ Value IDs with xccdf_ prefix - ✅ Rule IDs with xccdf_ prefix - ✅ Group IDs with xccdf_ prefix - ✅ Profile IDs with xccdf_ prefix Integration testing with real MongoDB data pending PR #97 merge. ## Related Issues - Closes: #98 - Requires: #96 (SCAP converter with variables) - PR #97 pending - Blocks: #99 (MongoDB-Based Scan Service) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…Routing Implements complete scan orchestration system that routes compliance checks to appropriate scanners (OSCAP, Kubernetes, cloud APIs) and stores results in MongoDB. ## New Components ### 1. MongoDB Models (scan_models.py - 280 lines) - **ScanResult**: Complete scan execution record with rule-level results - **ScanConfiguration**: Scan settings (target, framework, variables) - **RuleResult**: Individual rule check result with status and message - **ScanResultSummary**: Aggregated statistics (pass/fail/error counts) - **ScanTarget**: Target system definition (SSH host, K8s cluster, cloud account) - **ScanSchedule**: Future enhancement for recurring scans ### 2. Scanner Interface (base_scanner.py - 180 lines) - **BaseScanner**: Abstract base class for all scanners - **scan()**: Execute compliance checks against target - **_calculate_summary()**: Aggregate rule results into summary - **_group_by_severity()**: Breakdown by high/medium/low - **_group_by_scanner()**: Breakdown by scanner type - Custom exceptions: ScannerNotAvailableError, ScannerExecutionError ### 3. OSCAP Scanner (oscap_scanner.py - 380 lines) - **OSCAPScanner**: Traditional OVAL-based compliance scanning - Generates XCCDF benchmark from MongoDB rules - Creates tailoring files for variable overrides - Executes oscap (local) or oscap-ssh (remote) - Parses XCCDF results XML into RuleResult objects - Supports SSH-based remote scanning ### 4. Kubernetes Scanner (kubernetes_scanner.py - 280 lines) - **KubernetesScanner**: YAML-based checks for K8s/OpenShift - Queries Kubernetes API using kubectl + JSONPath - Evaluates conditions: equals, not_equals, contains, exists, any_exist - Supports OpenShift-specific resources (image.config.openshift.io) - Compatible with kubeconfig-based authentication ### 5. Scanner Factory (scanners/__init__.py - 60 lines) - **ScannerFactory**: Registry and factory for scanner instances - get_scanner(scanner_type): Create scanner on demand - register_scanner(): Plugin support for custom scanners - Available scanners: oscap, kubernetes (more coming: aws_api, azure_api, gcp_api) ### 6. Scan Orchestrator (scan_orchestrator_service.py - 280 lines) - **ScanOrchestrator**: Central coordinator for multi-scanner execution - execute_scan(): Main entry point for scan execution - Queries MongoDB for rules matching framework/version - Groups rules by scanner_type - Executes scanners in parallel with asyncio.gather() - Aggregates results from all scanners - Stores complete results in MongoDB ### 7. Scan API Endpoints (scans_api.py - 220 lines) - **POST /api/v1/scans/execute**: Execute compliance scan - **GET /api/v1/scans/{scan_id}**: Get scan result details - **GET /api/v1/scans**: List scans with filters (status, pagination) - **DELETE /api/v1/scans/{scan_id}**: Delete scan result - **GET /api/v1/scans/statistics/summary**: Aggregated scan statistics ## Scan Execution Flow ``` 1. User submits ScanConfiguration via API ↓ 2. ScanOrchestrator queries MongoDB for rules ↓ 3. Rules grouped by scanner_type: - oscap: 45 rules - kubernetes: 12 rules ↓ 4. Scanners execute in parallel: ├─ OSCAPScanner: │ ├─ Generate XCCDF benchmark │ ├─ Generate tailoring file (if variables provided) │ ├─ Execute oscap-ssh on target │ └─ Parse results XML → RuleResults │ └─ KubernetesScanner: ├─ For each rule: │ ├─ Query K8s API via kubectl │ ├─ Evaluate condition │ └─ Create RuleResult └─ Return results ↓ 5. Orchestrator aggregates results: - Combine all RuleResults - Calculate summary statistics - Store in MongoDB ↓ 6. Return ScanResult to user ``` ## Scanner Capabilities | Scanner | Target Types | Capabilities | Status | |---------|-------------|--------------|--------| | oscap | SSH host, local | OVAL checks, XCCDF variables, tailoring | ✅ Implemented | | kubernetes | K8s cluster | YAML checks, JSONPath queries, API access | ✅ Implemented | | aws_api | AWS account | S3, IAM, VPC compliance | 🔜 Planned | | azure_api | Azure subscription | Resource Manager checks | 🔜 Planned | | gcp_api | GCP project | Cloud API checks | 🔜 Planned | ## Usage ### Execute Scan via API ```bash curl -X POST http://localhost:8000/api/v1/scans/execute \\ -H "Authorization: Bearer $TOKEN" \\ -H "Content-Type: application/json" \\ -d '{ "target": { "type": "ssh_host", "identifier": "prod-web-01.example.com", "credentials": {"username": "root"} }, "framework": "nist", "framework_version": "800-53r5", "variable_overrides": { "xccdf_com.hanalyx.openwatch_value_var_accounts_tmout": "300" } }' ``` ### Check Scan Status ```bash curl http://localhost:8000/api/v1/scans/{scan_id} \\ -H "Authorization: Bearer $TOKEN" ``` ### Response ```json { "scan_id": "a1b2c3d4-...", "status": "completed", "started_at": "2025-10-15T08:00:00Z", "completed_at": "2025-10-15T08:05:23Z", "duration_seconds": 323.5, "summary": { "total_rules": 57, "passed": 45, "failed": 10, "error": 2, "compliance_percentage": 81.8, "by_severity": { "high": {"total": 15, "passed": 10, "failed": 5}, "medium": {"total": 30, "passed": 28, "failed": 2}, "low": {"total": 12, "passed": 7, "failed": 3} }, "by_scanner": { "oscap": {"total": 45, "passed": 35, "failed": 8}, "kubernetes": {"total": 12, "passed": 10, "failed": 2} } } } ``` ## Implementation Details ### Variable Override Application OSCAP scanner generates tailoring files: ```xml <xccdf:Tailoring> <xccdf:Profile id="customized" extends="nist_800_53_r5"> <xccdf:set-value idref="xccdf_...value_var_accounts_tmout">300</xccdf:set-value> </xccdf:Profile> </xccdf:Tailoring> ``` ### Kubernetes Query Example Rule check_content: ```json { "resource_type": "image.config.openshift.io", "resource_name": "cluster", "yamlpath": ".spec.allowedRegistriesForImport[:].insecure", "expected_value": "false", "condition": "not_equals" } ``` Scanner execution: ```bash kubectl get image.config.openshift.io cluster \\ -o jsonpath='{.spec.allowedRegistriesForImport[:].insecure}' ``` ### Parallel Scanner Execution ```python scanner_tasks = [ oscap_scanner.scan(oscap_rules, target, variables), k8s_scanner.scan(k8s_rules, target, variables) ] results = await asyncio.gather(*scanner_tasks) ``` ## Testing Integration testing requires: - MongoDB with compliance rules (Issue #96) - OSCAP installed (`oscap --version`) - Test SSH target or local system - Optional: Kubernetes cluster for K8s scanner tests ## Next Steps After this PR merges: 1. **Issue #5**: ORSA Plugin Architecture (5-7 days) - Execute remediation content (Ansible, Bash) - Track remediation status - Rollback support 2. **Issue #6**: Scan Configuration API (3-4 days) - UI for benchmark selection - Variable customization interface - Tailoring file management 3. **Issue #7**: Frontend Variable Customization UI (5-7 days) - Framework selection - Variable override forms - Real-time scan status ## Related Issues - Closes: #100 - Requires: #98 (XCCDF generator) - PR #99 pending - Blocks: #5 (ORSA Plugin Architecture) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

sonarqubecloud · 2025-10-15T12:03:43Z

Quality Gate failed

Failed conditions
16 Security Hotspots
D Security Rating on New Code (required ≥ A)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

+            f"User {current_user.get('username')} initiating scan: "
+            f"framework={config.framework}, target={config.target.identifier}"


To fix the log injection vulnerability, all user-controlled data that is interpolated into log entries should be sanitized to remove any newline (\n) and carriage return (\r) characters (and ideally, other problematic formatting sequences, but newlines are the critical issue). In this context, the log entry on line 65 contains current_user.get('username'), config.framework, and config.target.identifier, all of which are potentially user-controlled. The fix is to create a small sanitization function (e.g., sanitize_for_log) that removes or replaces these characters in these values before logging, and to sanitize each value before it is included in the formatted log entry. The implementation should be in backend/app/api/v1/endpoints/scans_api.py, affecting only the code shown (lines 63–67, which compose the log line and its arguments).

+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"Error retrieving scan {scan_id}: {e}")


To fix this problem, sanitize the user-provided value before including it in log entries. For plain-text logs, this means removing line breaks (\r, \n) and optionally other control characters from scan_id before logging. The best approach here is to use str.replace or a regular expression to strip or replace these characters with safe alternatives (such as an empty string). Only sanitize the value as it is passed into the logging call, leaving the downstream application logic unchanged so as not to affect business logic or ID lookup. You only need to change the logging line on line 110 in backend/app/api/v1/endpoints/scans_api.py; insert the sanitization inline or just before the log statement.

+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"Error deleting scan {scan_id}: {e}")


To safely log user-supplied values, remove dangerous characters that could affect the log format, like \n, \r, and other control characters, before writing them to the logs. The single best way to fix this is to sanitize the scan_id variable right before logging—remove all \n and \r occurrences from scan_id by using scan_id.replace('\n', '').replace('\r', ''). Do this only inside the except block before logging, to ensure the log entry does not contain unexpected line breaks or control characters. Importing external libraries is unnecessary; Python's built-in str.replace method suffices. This change is only needed in the code that writes scan_id to the log (inside the delete_scan endpoint).

+    """
+    try:
+        logger.info(
+            f"User {current_user.get('username')} generating benchmark: {request.benchmark_id}"


To resolve the log injection issue, ensure any values logged from user input are sanitized to remove log control characters, especially newlines (\n, \r) that could allow log entry forging. The best practice for plain-text logs is to strip or replace those characters. In this context, update the logging line in the generate_benchmark endpoint to clean both current_user.get('username') and request.benchmark_id before logging. All that is required is to add inline .replace(...) calls or to assign to sanitized variables or apply a small helper function.

Only modify lines in backend/app/api/v1/endpoints/xccdf_api.py, focusing on the log on line 48. No new files or external packages are required; use built-in Python string replacement. If used in multiple places, a short helper function may be defined for clarity within the shown snippet.

+    """
+    try:
+        logger.info(
+            f"User {current_user.get('username')} generating tailoring: {request.tailoring_id}"


To fix this problem, any user-supplied input that is written to the logs should be sanitized to eliminate dangerous characters that could alter the structure of the logs. Specifically, newline (\n), carriage return (\r), and potentially other non-printable characters should be stripped. For this code, you should modify line 124 in backend/app/api/v1/endpoints/xccdf_api.py to sanitize both current_user.get('username') and request.tailoring_id before including them in the log entry.

The best way is to:

Add a helper function to sanitize user-provided strings that removes newlines and carriage returns.

Use this function to sanitize both username and tailoring_id before logging.

You only need edits within the lines you have been shown in backend/app/api/v1/endpoints/xccdf_api.py.

+        Returns:
+            XCCDF Tailoring XML as string
+        """
+        logger.info(f"Generating XCCDF Tailoring: {tailoring_id}")


The correct fix for this log injection vulnerability is to sanitize the user input (tailoring_id) before it is inserted into the log entry. The recommended procedure is to remove any characters that could be interpreted as line breaks—specifically the \n and \r characters.

To implement this:

Before logging tailoring_id, use replace() to strip out any potential carriage return (\r) or newline (\n) characters from the value.

This should be done inline where the log entry is created.

The fix should only apply to the problematic logging statement on line 147 of backend/app/services/xccdf_generator_service.py.

No new imports or dependencies are required since string replacement is a built-in feature of Python.

+        version_elem.text = "1.0"
+
+        # Add benchmark reference
+        benchmark_elem = ET.SubElement(


The best fix is to remove the assignment to the unused local variable benchmark_elem on line 169. The function call ET.SubElement(...) must remain to ensure the benchmark subelement is still added to the tailoring element tree. Only remove the left-hand side of the assignment, leaving the function call by itself as a statement. No imports or other changes are necessary. Only lines around 169 in backend/app/services/xccdf_generator_service.py need to be edited.

+            {'system': check_system}
+        )
+
+        check_ref = ET.SubElement(


To fix this problem without affecting existing functionality, we should remove the unnecessary assignment to check_ref and instead call ET.SubElement() as a standalone statement. That is, simply remove check_ref = from line 401 so that ET.SubElement() is called for its side effect of adding the subelement to check. This change should be made in backend/app/services/xccdf_generator_service.py at line 401. There are no additional methods or imports needed.

+        # Add variable exports if rule has variables
+        if rule.get('xccdf_variables'):
+            for var_id in rule['xccdf_variables'].keys():
+                export = ET.SubElement(


The best way to fix the problem in this scenario is to remove the assignment to the unused local variable export, while retaining the call to ET.SubElement(...) so that the side effect of creating the subelement is preserved. This means simply dropping the export = part of the assignment and calling ET.SubElement(...) as a stand-alone statement. This change should be made on line 413 within the _create_xccdf_rule method in backend/app/services/xccdf_generator_service.py. No additional imports, method changes, or variable definitions are required.

+                rule_name = rule_id.replace('ow-', '')
+                rule_id = f"xccdf_com.hanalyx.openwatch_rule_{rule_name}"
+
+            select = ET.SubElement(


To fix this problem, simply remove the assignment to the variable select and call ET.SubElement(...) as a standalone statement. This preserves the side-effect of adding a new subelement to profile but does not create an unused variable. The change is localized to the for-loop at line 526 and does not require adding imports, creating new methods, or changing calls elsewhere.

Only one statement needs to be modified: replace select = ET.SubElement(... with ET.SubElement(.... All other functionality remains exactly as before.

…ash Executors Implements complete ORSA (OpenWatch Remediation and Security Automation) plugin architecture for automated remediation execution after compliance scans. ## New Components (7 files, ~2,500 lines) ### 1. MongoDB Models (backend/app/models/remediation_models.py - 330 lines) **Purpose**: Track remediation execution lifecycle and results **Key Models**: - RemediationResult: Complete execution record with rollback support - Tracks status: pending → running → completed/failed/rolled_back - Stores stdout/stderr, changes made, variables applied - Audit log with timestamped actions - Rollback content and execution tracking - BulkRemediationJob: Batch remediation coordinator - Progress tracking (total, completed, failed counts) - Success rate calculation - Individual remediation ID tracking - RemediationRequest/BulkRemediationRequest: API request schemas - RemediationSummary: Aggregated statistics **Indexes**: remediation_id, rule_id, status, executed_by, scan_id, created_at ### 2. Base Executor Interface (backend/app/services/remediators/base_executor.py - 280 lines) **Purpose**: Abstract interface for all remediation executors **Design Pattern**: Strategy pattern for pluggable executors **Key Features**: - Abstract methods: execute(), rollback(), validate_content(), supports_target() - Capability system: DRY_RUN, ROLLBACK, IDEMPOTENT, VARIABLE_SUBSTITUTION, REMOTE_EXECUTION - Custom exceptions: ExecutorNotAvailableError, ExecutorValidationError, ExecutorExecutionError - Variable substitution: {{var}} and ${var} patterns - Change extraction for rollback tracking - Async context manager support **ExecutorMetadata**: Version, capabilities, supported targets for discovery ### 3. Ansible Executor (backend/app/services/remediators/ansible_executor.py - 380 lines) **Purpose**: Execute Ansible playbooks from string content **Capabilities**: - ✅ Dry-run (--check mode) - ✅ Rollback support - ✅ Idempotent operations - ✅ Variable substitution (--extra-vars) - ✅ Remote execution (SSH) **Features**: - Dynamic inventory generation (local or SSH targets) - YAML validation via yaml.safe_load() - SSH key management (temp files with 0600 permissions) - Ansible output parsing (PLAY RECAP, changed counts) - Change extraction from task output - Timeout support with process cleanup **Execution Flow**: 1. Validate playbook YAML structure 2. Generate inventory file (local or remote) 3. Write SSH key if provided 4. Build ansible-playbook command with --extra-vars, --check, --private-key 5. Execute with timeout 6. Parse PLAY RECAP for change counts 7. Extract changes for rollback tracking **Example Command**: ```bash ansible-playbook playbook.yml -i inventory.ini \ --extra-vars '{"var_tmout": "300"}' \ --check \ --private-key ssh_key \ -e ansible_host_key_checking=False ``` ### 4. Bash Executor (backend/app/services/remediators/bash_executor.py - 380 lines) **Purpose**: Execute bash scripts from string content **Capabilities**: - ✅ Variable substitution (export statements) - ✅ Remote execution (SSH) - ⚠️ Limited dry-run (syntax check only via bash -n) - ⚠️ NOT idempotent by default **Features**: - Syntax validation: bash -n (noexec mode) - Variable preparation: Prepends export statements - Local execution: Direct bash script_file - Remote execution: SSH with script piped to stdin - Change extraction: Looks for "Changed:", "Modified:", "Created:" patterns - Timeout support **Script Preparation**: ```bash #!/bin/bash set -e # Exit on error # Environment variables export var_tmout='300' export var_password_minlen='14' # Remediation script [original content] ``` **Remote Execution**: ```bash ssh -i ssh_key -o StrictHostKeyChecking=no \ root@192.168.1.100 'bash -s' < script.sh ``` ### 5. Executor Factory (backend/app/services/remediators/__init__.py - 180 lines) **Purpose**: Registry and factory for executor instantiation **Pattern**: Factory pattern with runtime registration **Registry**: - ansible: AnsibleExecutor - bash: BashExecutor - (Future): terraform, kubernetes, python **Methods**: - get_executor(type): Instantiate executor - register_executor(type, class): Runtime plugin registration - list_executors(available_only): Discovery - get_executor_metadata(type): Capabilities, version, targets - get_all_executor_metadata(): All registered executors **Example**: ```python executor = RemediationExecutorFactory.get_executor('ansible') metadata = RemediationExecutorFactory.get_executor_metadata('ansible') # {"name": "ansible", "version": "2.14.3", "capabilities": [...]} ``` ### 6. Remediation Orchestrator (backend/app/services/remediation_orchestrator_service.py - 420 lines) **Purpose**: Central coordinator for remediation lifecycle **Workflow**: 1. Query rule from MongoDB 2. Extract remediation content and type 3. Prepare variables (defaults + overrides) 4. Get executor from factory 5. Execute remediation 6. Track status: PENDING → RUNNING → COMPLETED/FAILED 7. Generate rollback content if successful 8. Store RemediationResult in MongoDB 9. Add audit log entries **Key Methods**: - execute_remediation(): Single rule execution - execute_bulk_remediation(): Batch execution from scan results - Query failed rules from scan - Execute each remediation sequentially - Track job progress (completed/failed counts) - Calculate success rate - rollback_remediation(): Execute rollback content - get_remediation_result(): Query by ID - list_remediations(): Pagination and filtering - get_remediation_statistics(): Aggregated stats **Bulk Remediation Sources**: 1. scan_id: Remediate all failed rules from scan 2. rule_ids: Explicit list of rules 3. rule_filter: Query filter (e.g., {"severity": ["high"]}) **Variable Preparation**: ```python # Start with rule defaults if 'xccdf_variables' in rule: for var_id, var_def in rule['xccdf_variables'].items(): variables[var_id] = var_def['default'] # Apply user overrides variables.update(overrides) ``` ### 7. Remediation API (backend/app/api/v1/endpoints/remediation_api.py - 420 lines) **Purpose**: REST API for remediation operations **Endpoints**: - POST /api/v1/remediation/execute - Execute single remediation - POST /api/v1/remediation/execute-bulk - Batch remediation - GET /api/v1/remediation/{id} - Get result - GET /api/v1/remediation/ - List with filters (status, scan_id, pagination) - POST /api/v1/remediation/{id}/rollback - Rollback remediation - DELETE /api/v1/remediation/{id} - Delete result - GET /api/v1/remediation/statistics/summary - Aggregated stats - GET /api/v1/remediation/executors/available - List executors - GET /api/v1/remediation/jobs/{id} - Get bulk job status **Authorization**: - Users can only view/execute/rollback their own remediations - Admins can access all remediations **Query Parameters**: - skip/limit: Pagination - status: Filter by RemediationStatus - scan_id: Filter by source scan ## API Usage Examples ### Execute Single Remediation ```bash curl -X POST http://localhost:8000/api/v1/remediation/execute \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "rule_id": "xccdf_com.hanalyx.openwatch_rule_accounts_tmout", "target": { "type": "ssh_host", "identifier": "192.168.1.100", "credentials": { "username": "root", "ssh_key": "-----BEGIN OPENSSH PRIVATE KEY-----\n..." } }, "variable_overrides": { "var_accounts_tmout": "300" }, "dry_run": false }' ``` ### Bulk Remediation (Failed Rules from Scan) ```bash curl -X POST http://localhost:8000/api/v1/remediation/execute-bulk \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "scan_id": "550e8400-e29b-41d4-a716-446655440000", "rule_filter": { "severity": ["high", "critical"] }, "target": { "type": "ssh_host", "identifier": "prod-web-01.example.com", "credentials": {...} }, "dry_run": true }' ``` ### Rollback Remediation ```bash curl -X POST http://localhost:8000/api/v1/remediation/{id}/rollback \ -H "Authorization: Bearer $TOKEN" ``` ### List Remediations ```bash curl "http://localhost:8000/api/v1/remediation/?status=completed&limit=10" \ -H "Authorization: Bearer $TOKEN" ``` ### Get Statistics ```bash curl "http://localhost:8000/api/v1/remediation/statistics/summary?days=30" \ -H "Authorization: Bearer $TOKEN" # Response: { "total": 150, "completed": 120, "failed": 20, "success_rate": 80.0, "by_executor": {"ansible": 100, "bash": 50} } ``` ### List Available Executors ```bash curl http://localhost:8000/api/v1/remediation/executors/available \ -H "Authorization: Bearer $TOKEN" # Response: [ { "name": "ansible", "version": "2.14.3", "capabilities": ["dry_run", "rollback", "idempotent"], "available": true } ] ``` ## Remediation Execution Flow ``` API Request ↓ RemediationOrchestrator.execute_remediation() ↓ Query rule from MongoDB ↓ Extract remediation: {type: "ansible", content: "..."} ↓ Prepare variables: defaults + overrides ↓ RemediationExecutorFactory.get_executor(type) ↓ AnsibleExecutor.execute(content, target, variables, dry_run) ↓ 1. Validate playbook YAML 2. Generate inventory file 3. Write SSH key (0600 permissions) 4. Build command: ansible-playbook --extra-vars --check --private-key 5. Execute with timeout 6. Parse PLAY RECAP for changes ↓ RemediationExecutionResult: {success, stdout, stderr, changes_made, duration} ↓ Update RemediationResult: - status: RUNNING → COMPLETED/FAILED - execution_result: {...} - rollback_available: true (if changes made) - rollback_content: "..." (future: auto-generated) ↓ Save to MongoDB ↓ Add audit log entry: "Completed" ↓ Return RemediationResult ``` ## Executor Capabilities Matrix | Executor | Dry-Run | Rollback | Idempotent | Variables | Remote | Version Detection | |----------|---------|----------|------------|-----------|--------|-------------------| | Ansible | ✅ Full | ✅ Yes | ✅ Yes | ✅ Yes | ✅ SSH | ansible-playbook --version | | Bash | ⚠️ Syntax only | ✅ Yes | ❌ No | ✅ Yes | ✅ SSH | bash --version | | Terraform (future) | ✅ Plan | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Providers | terraform version | | Kubernetes (future) | ✅ --dry-run | ✅ Yes | ✅ Yes | ✅ Yes | ✅ kubectl | kubectl version | ## Security Considerations 1. **SSH Key Storage**: - Temporary files with 0600 permissions - Deleted after execution - TODO: Encrypt in MongoDB (currently plaintext in credentials dict) 2. **Script Execution**: - Bash scripts run with 'set -e' (exit on error) - Ansible uses --check for dry-run validation - Timeout enforcement prevents runaway processes 3. **Authorization**: - Users can only access their own remediations - Admins can access all - Rollback requires original execution ownership 4. **Audit Trail**: - All status changes logged with timestamps - Username tracking: executed_by - Complete stdout/stderr capture ## Dependencies **Python Packages** (add to requirements.txt): - ansible-core>=2.14.0 (for Ansible executor) - paramiko>=3.0.0 (for SSH in Bash executor) - pyyaml>=6.0 (for Ansible playbook parsing) **System Tools**: - ansible-playbook (Ansible executor) - bash (Bash executor) - ssh (remote execution) ## Known Limitations 1. **Rollback Generation**: Not yet implemented - Currently returns None in _generate_rollback_content() - Future: Parse changes_made and generate inverse operations 2. **Bash Idempotency**: Scripts are NOT idempotent by default - Users must write idempotent scripts manually - Consider adding idempotency wrappers (check-before-change) 3. **Ansible Vault**: Not yet implemented - No support for encrypted playbooks or variables - Future: Add --vault-password-file support 4. **Parallel Bulk Execution**: Sequential only - execute_bulk_remediation() runs remediations one-by-one - Future: Use asyncio.gather() for parallel execution 5. **Credential Encryption**: SSH keys stored plaintext in MongoDB - TODO: Integrate with Fernet encryption (like scan credentials) ## Testing Strategy **Unit Tests Needed**: - Executor validation methods - Variable substitution - Change extraction from output - Error handling (timeouts, invalid content) **Integration Tests Needed**: - Ansible playbook execution against test VM - Bash script execution (local and remote) - Rollback functionality - Bulk remediation workflow **E2E Tests**: 1. Scan → Identify Failed Rules 2. Remediate (dry-run=true) → Verify no changes 3. Remediate (dry-run=false) → Apply changes 4. Re-scan → Verify rules now pass 5. Rollback → Restore original state 6. Re-scan → Verify rules fail again ## Future Enhancements 1. **Additional Executors**: - TerraformExecutor: Infrastructure remediation - KubernetesExecutor: Manifest application - PythonExecutor: Custom Python scripts 2. **Intelligent Rollback**: - Parse execution output for changes - Generate inverse operations automatically - Store pre-remediation state snapshots 3. **Approval Workflow**: - Require manual approval for production systems - Multi-stage approval (dev → staging → prod) 4. **Parallel Execution**: - Bulk remediation with asyncio.gather() - Dependency resolution for ordered execution 5. **Remediation Templates**: - Pre-built remediation content library - Template variables with validation ## Phase 1 Status ✅ Completed (5/7 tasks): 1. ✅ Enhanced ComplianceRule Model (PR #95 - merged) 2. ✅ Enhanced SCAP Converter (PR #97 - pending) 3. ✅ XCCDF Generator (PR #99 - pending) 4. ✅ Scan Service (PR #101 - pending) 5. ✅ **ORSA Remediation Engine (this commit)** 🚧 Remaining (2/7 tasks): 6. ⏳ Scan Configuration API (3-4 days) 7. ⏳ Frontend Variable Customization UI (5-7 days) ## Related Issues - Implements: #102 - Depends on: #96 (PR #97 - remediation field) - Blocks: #6 (Scan Configuration API) Co-Authored-By: Claude <noreply@anthropic.com>

…very & Template Management Implements complete scan configuration API for framework selection, variable management, and template-based scan configurations. New Components (4 files, ~1,800 lines) 1. Scan Configuration Models (backend/app/models/scan_config_models.py - 280 lines) Purpose: MongoDB models and API schemas for templates and framework metadata Key Models: - ScanTemplate: Saved scan configuration with framework, variables, filters, sharing - VariableDefinition: XCCDF variable with type and constraints - FrameworkMetadata: Framework discovery with counts and versions - API Schemas: Request/response models 2. Framework Metadata Service (backend/app/services/framework_metadata_service.py - 420 lines) Purpose: Discover frameworks and validate variable values Key Methods: - list_frameworks(): Aggregate framework metadata - get_framework_details(): Complete framework/version info - get_variables(): Extract variables from rules - validate_variable_value(): Constraint validation (type, range, choices, regex) - validate_variables(): Batch validation 3. Scan Template Service (backend/app/services/scan_template_service.py - 380 lines) Purpose: CRUD operations for scan templates Key Methods: - create_template(), list_templates(), update_template(), delete_template() - apply_template(): Generate scan configuration - set_as_default(), clone_template() - share_template()/unshare_template(): Access control 4. Scan Configuration API (backend/app/api/v1/endpoints/scan_config_api.py - 720 lines) Purpose: REST API with 14 endpoints Framework Discovery: - GET /frameworks - GET /frameworks/{framework}/{version} - GET /frameworks/{framework}/{version}/variables - POST /frameworks/{framework}/{version}/validate Template Management: - POST /templates - GET /templates - GET /templates/{id} - PUT /templates/{id} - DELETE /templates/{id} - POST /templates/{id}/apply - POST /templates/{id}/clone - POST /templates/{id}/set-default - GET /statistics Phase 1 Status: 6/7 tasks completed (86%) Related Issues: - Implements: #104 - Depends on: #98 (PR #99), #100 (PR #101) - Blocks: #7 (Frontend UI) Co-Authored-By: Claude <noreply@anthropic.com>

remyluslosius and others added 4 commits October 14, 2025 22:04

github-advanced-security AI found potential problems Oct 15, 2025

View reviewed changes

remyluslosius mentioned this pull request Oct 15, 2025

[Phase 1 Issue #5] ORSA Plugin Architecture - Remediation Execution Engine #102

Closed

11 tasks

This was referenced Oct 15, 2025

[Phase 1 Issue #5] ORSA Remediation Engine - Ansible & Bash Executors #103

Merged

[Phase 1 Issue #6] Scan Configuration API - Framework Selection & Variable Management #104

Closed

This was referenced Oct 15, 2025

[Phase 1 Issue #6] Scan Configuration API - Framework Discovery & Template Management #105

Merged

[Phase 1 Issue #7] Frontend Variable Customization UI - Complete Scan Configuration Interface #106

Closed

remyluslosius merged commit 437a097 into main Oct 15, 2025
16 of 27 checks passed

remyluslosius deleted the feature/scan-service branch October 15, 2025 14:37

@@ -61,9 +61,19 @@
                 ```
                 """
                 try:
+                    def sanitize_for_log(value):
+                        if isinstance(value, str):
+                            # Remove CR and LF to prevent log injection
+                            return value.replace("\r", "").replace("\n", "")
+                        return str(value)
+                    username_safe = sanitize_for_log(current_user.get('username', 'unknown'))
+                    framework_safe = sanitize_for_log(config.framework)
+                    target_safe = sanitize_for_log(config.target.identifier)
                     logger.info(
-                        f"User {current_user.get('username')} initiating scan: "
-                        f"framework={config.framework}, target={config.target.identifier}"
+                        f"User {username_safe} initiating scan: "
+                        f"framework={framework_safe}, target={target_safe}"
                     )
                     # Create orchestrator

@@ -44,8 +44,12 @@
                 The generated benchmark can be used with oscap for scanning.
                 """
                 try:
+                    def sanitize_log_field(val):
+                        if isinstance(val, str):
+                            return val.replace('\r', '').replace('\n', '')
+                        return val
                     logger.info(
-                        f"User {current_user.get('username')} generating benchmark: {request.benchmark_id}"
+                        f"User {sanitize_log_field(current_user.get('username'))} generating benchmark: {sanitize_log_field(request.benchmark_id)}"
                     )
                     # Create generator service

@@ -26,6 +26,13 @@
             logger = logging.getLogger(__name__)
+            def sanitize_for_log(s: str) -> str:
+                """Remove characters that may break log entry boundaries."""
+                if not isinstance(s, str):
+                    s = str(s)
+                return s.replace('\r', '').replace('\n', '')
             @router.post("/generate-benchmark", response_model=XCCDFBenchmarkResponse)
             async def generate_benchmark(
                 request: XCCDFBenchmarkRequest,
@@ -120,8 +127,10 @@
                 The tailoring file is passed to oscap with --tailoring-file flag.
                 """
                 try:
+                    sanitized_username = sanitize_for_log(current_user.get('username'))
+                    sanitized_tailoring_id = sanitize_for_log(request.tailoring_id)
                     logger.info(
-                        f"User {current_user.get('username')} generating tailoring: {request.tailoring_id}"
+                        f"User {sanitized_username} generating tailoring: {sanitized_tailoring_id}"
                     )
                     # Create generator service

@@ -144,7 +144,8 @@
                     Returns:
                         XCCDF Tailoring XML as string
                     """
-                    logger.info(f"Generating XCCDF Tailoring: {tailoring_id}")
+                    safe_tailoring_id = tailoring_id.replace('\n', '').replace('\r', '')
+                    logger.info(f"Generating XCCDF Tailoring: {safe_tailoring_id}")
                     # Create root Tailoring element
                     tailoring = ET.Element(

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Phase 1 Issue #4] MongoDB-Based Scan Service with Multi-Scanner Routing#101

[Phase 1 Issue #4] MongoDB-Based Scan Service with Multi-Scanner Routing#101
remyluslosius merged 4 commits into
mainfrom
feature/scan-service

remyluslosius commented Oct 15, 2025

Uh oh!

sonarqubecloud Bot commented Oct 15, 2025

Uh oh!

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		f"User {current_user.get('username')} initiating scan: "
		f"framework={config.framework}, target={config.target.identifier}"

@@ -107,7 +107,8 @@
                 except HTTPException:
                     raise
                 except Exception as e:
-                    logger.error(f"Error retrieving scan {scan_id}: {e}")
+                    safe_scan_id = scan_id.replace('\r', '').replace('\n', '')
+                    logger.error(f"Error retrieving scan {safe_scan_id}: {e}")
                     raise HTTPException(status_code=500, detail=str(e))

Conversation

remyluslosius commented Oct 15, 2025

Summary

What's New

1. MongoDB Scan Result Models (backend/app/models/scan_models.py)

2. Abstract Scanner Interface (backend/app/services/scanners/base_scanner.py)

3. OSCAP Scanner (backend/app/services/scanners/oscap_scanner.py)

4. Kubernetes Scanner (backend/app/services/scanners/kubernetes_scanner.py)

5. Scanner Factory (backend/app/services/scanners/__init__.py)

6. Scan Orchestrator (backend/app/services/scan_orchestrator_service.py)

7. Scan Execution API (backend/app/api/v1/endpoints/scans_api.py)

Scan Execution Flow

Scanner Capabilities Matrix

API Usage Examples

Execute Scan (SSH Host)

Execute Scan (Kubernetes)

Get Scan Result

List Recent Scans

Scan Statistics

Implementation Details

Parallel Scanner Execution

Error Handling

Result Aggregation

Testing Requirements

Unit Tests Needed

Integration Tests Needed

Dependencies

Python Packages (already in requirements.txt)

External Tools Required

Breaking Changes

Migration Required

Known Limitations

Future Enhancements (Phase 2+)

Phase 1 Status

Related Issues

Checklist

Reviewers

Additional Notes

Uh oh!

sonarqubecloud Bot commented Oct 15, 2025

Quality Gate failed

Uh oh!

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. MongoDB Scan Result Models (`backend/app/models/scan_models.py`)

2. Abstract Scanner Interface (`backend/app/services/scanners/base_scanner.py`)

3. OSCAP Scanner (`backend/app/services/scanners/oscap_scanner.py`)

4. Kubernetes Scanner (`backend/app/services/scanners/kubernetes_scanner.py`)

5. Scanner Factory (`backend/app/services/scanners/init.py`)

6. Scan Orchestrator (`backend/app/services/scan_orchestrator_service.py`)

7. Scan Execution API (`backend/app/api/v1/endpoints/scans_api.py`)