Skip to content

[RESILIENCE] Chaos testing framework — built-in chaos engineering for resilience validation #236

@ElioNeto

Description

@ElioNeto

Description

ApexStore should be able to test its own resilience. Built-in chaos engineering tools that inject failures during testing.

Implementation

# Simulate various failure scenarios
apexstore-cli chaos inject --type disk-latency --duration 30s --delay 500ms
apexstore-cli chaos inject --type disk-full --size 10MB
apexstore-cli chaos inject --type panic-compaction
apexstore-cli chaos inject --type kill-wal-fsync --probability 0.1

# List active experiments
apexstore-cli chaos list

# Stop experiment
apexstore-cli chaos stop <experiment-id>

Failure modes to implement

  • disk-latency: add artificial delay to I/O operations
  • disk-full: simulate ENOSPC
  • panic-compaction: panic compaction thread intermittently
  • kill-wal-fsync: skip fsync randomly (simulate crash)
  • memory-pressure: reduce available memory
  • corrupt-sstable: flip random bits in SSTable files

Labels

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions