-
-
Notifications
You must be signed in to change notification settings - Fork 0
Troubleshooting Guide
Comprehensive troubleshooting guide for common issues in the DevStack Core infrastructure.
- Service Profile Issues (NEW v1.3)
- Startup Issues
- Service Health Check Failures
- Network Connectivity Problems
- Database Issues
- Redis Cluster Issues
- Vault Issues
- AppRole Authentication Issues (NEW - Phase 1)
- Docker/Colima Issues
- Testing Failures
- Performance Issues
- TLS/Certificate Issues
- Diagnostic Commands
Symptoms:
# Expected 10 services (standard profile)
$ docker ps
# But seeing 18 services (full profile) or 5 services (minimal)Cause: Profile not specified or wrong profile used
Solution:
# Check which services are defined for a profile
docker compose --profile standard config --services
# Stop all and restart with correct profile
docker compose down
./devstack start --profile standard
# Verify correct services running
./devstack statusSymptoms:
$ ./devstack redis-cluster-init
Error: redis-2 and redis-3 not foundCause: Minimal profile only has redis-1 (standalone mode)
Solution: Minimal profile uses standalone Redis, no cluster initialization needed:
# Use standard or full profile for Redis cluster
docker compose down
./devstack start --profile standard
./devstack redis-cluster-initSymptoms:
$ ./devstack start --profile standard
ModuleNotFoundError: No module named 'click'Cause: Python dependencies not installed
Solution:
# Install dependencies using uv (recommended)
uv venv
uv pip install -r scripts/requirements.txt
# The wrapper script automatically uses the venv
# No manual activation needed!
# Verify installation
python3 -c "import click, rich, yaml; print('All dependencies installed!')"
# Try again
./devstack start --profile standardSymptoms:
# Started with minimal profile but Redis cluster is enabled
$ ./devstack start --profile minimal
$ docker exec dev-redis-1 redis-cli INFO cluster | grep cluster_enabled
cluster_enabled:1 # Should be 0 for minimalCause: Environment variables from previous session or shell environment overriding profile
Solution:
# Check for conflicting env vars
env | grep REDIS
# Unset any conflicting variables
unset REDIS_CLUSTER_ENABLED
# Stop and restart with profile
docker compose down
./devstack start --profile minimal
# Verify
docker exec dev-redis-1 redis-cli INFO cluster | grep cluster_enabled
# Should show: cluster_enabled:0Symptoms:
$ ./devstack start --profile minimal --profile standard
# Only seeing standard services, minimal services ignoredCause: Docker Compose uses all specified profiles, but standard includes all minimal services
Explanation:
- minimal ⊂ standard ⊂ full (profiles are hierarchical)
- Specifying both is redundant
- Standard profile includes all minimal services plus additional ones
Solution:
# Just use the larger profile
./devstack start --profile standardSymptoms:
# Previously running full profile, switched to minimal
$ ./devstack start --profile minimal
# Prometheus, Grafana still showing as stopped in docker ps -aCause: Old containers from previous profile still exist (stopped)
Solution:
# Clean shutdown removes old containers
docker compose down # Removes stopped containers
./devstack start --profile minimal
# Or remove all containers manually
docker compose down -v # WARNING: Removes volumes tooSymptoms:
$ ./devstack start --profile reference
# Reference APIs failing to start, database connection errorsCause: Reference profile requires infrastructure services (must combine with standard/full)
Solution:
# Stop and restart with combined profiles
docker compose down
./devstack start --profile standard --profile reference
# Verify both profiles running
docker ps | grep -E "dev-postgres|dev-reference-api"Symptoms:
# Started with standard profile
$ curl http://localhost:9090
Connection refusedCause: Prometheus/Grafana only included in full profile
Solution:
# Use full profile for observability
docker compose down
./devstack start --profile full
# Access services
curl http://localhost:9090 # Prometheus
curl http://localhost:3001 # GrafanaSymptoms:
$ ./devstack --help
# No output or errorCause: Script not executable or Python dependencies missing
Solution:
# Make script executable
chmod +x devstack
# Check she bang line
head -1 devstack
# Should be: #!/usr/bin/env python3
# Install dependencies
uv venv && uv pip install -r scripts/requirements.txt
# Try again (wrapper script auto-uses venv)
./devstack --helpSymptoms:
# Scripts failing with unexpected errors
$ ./scripts/check-markdown-links.sh
rg: error parsing flag -E: grep config error: unknown encoding: \[([^]]+)\]\(([^)]+)\)
# Or grep/find commands behaving differently than expectedCause: Shell has aliased standard Unix tools to Rust alternatives (e.g., grep→rg, find→fd)
Background:
- Modern Rust-based CLI tools (ripgrep, fd, bat, eza) are popular replacements for grep, find, cat, ls
- These tools have different syntax and command-line flags
- DevStack Core scripts explicitly use
/usr/bin/grep,/usr/bin/awk, etc. to avoid conflicts - However, some edge cases may still exist
Verification:
# Check what tools are aliased
which grep # May show: grep: aliased to rg
which find # May show: find: aliased to fd
which ls # May show: ls: aliased to eza
which cat # May show: cat: aliased to bat
# Check actual tool locations
/usr/bin/grep --version # Should show: grep (BSD grep) or GNU grepSolution 1: Scripts Already Handle This (Recommended) All DevStack Core scripts are designed to work with Rust tool aliases:
# Scripts use full paths to avoid aliases
# No action needed - just run the script normally
./scripts/check-markdown-links.sh
./scripts/generate-certificates.sh
./tests/test-vault.shSolution 2: Verify Script Compatibility If a script still fails:
# Check if script uses full paths
grep -n "grep\|find\|awk\|sed" scripts/<script-name>.sh
# Scripts should use:
# - /usr/bin/grep (not just grep)
# - /usr/bin/awk (not just awk)
# Standard paths for awk, sed (these aren't typically aliased)Solution 3: Temporary Alias Removal (Last Resort)
# Only if a specific script fails - run without aliases
unalias grep find ls cat
./scripts/<problematic-script>.sh
# Re-alias after (add to ~/.zshrc if desired)
alias grep='rg'
alias find='fd'
# etc.Note: Rust CLI tools are optional and not required for DevStack Core. See INSTALLATION.md for details on optional tools.
# List available profiles
./devstack profiles
# Check which services are in a profile
docker compose --profile minimal config --services
docker compose --profile standard config --services
docker compose --profile full config --services
# Verify profile .env files exist
ls -la configs/profiles/*.env
# Check environment loading
set -a
source configs/profiles/standard.env
set +a
env | grep REDIS_CLUSTER_ENABLEDSymptoms:
$ docker ps
CONTAINER ID NAME STATUS
abc123 dev-postgres-1 Restarting (1) 30 seconds ago
def456 dev-redis-1 Restarting (1) 25 seconds agoLogs show:
Installing jq...
Installing jq...
Installing jq...
(repeating in a loop)
Root Cause: Vault is unsealed but NOT bootstrapped with service credentials.
Diagnostic Flow:
graph TD
Start["Service continuously restarting?"]
CheckVault["Check: Is Vault healthy?"]
VaultUnsealed["Vault unsealed?"]
CheckCreds["Try to fetch credentials"]
Bootstrap["Vault needs bootstrap"]
OtherIssue["Different issue - see logs"]
Start --> CheckVault
CheckVault -->|Healthy| VaultUnsealed
CheckVault -->|Not healthy| OtherIssue
VaultUnsealed -->|Yes| CheckCreds
VaultUnsealed -->|No| OtherIssue
CheckCreds -->|404 or permission denied| Bootstrap
CheckCreds -->|Success| OtherIssue
Solution:
- Verify Vault is unsealed:
docker exec dev-vault-1 vault statusExpected output:
Sealed false
- Check if credentials exist:
VAULT_ADDR=http://localhost:8200 \
VAULT_TOKEN=$(cat ~/.config/vault/root-token) \
vault kv get secret/postgresqlIf you get 404 or permission denied, Vault needs bootstrap.
- Run Vault bootstrap:
VAULT_ADDR=http://localhost:8200 \
VAULT_TOKEN=$(cat ~/.config/vault/root-token) \
bash configs/vault/scripts/vault-bootstrap.shExpected output:
✓ KV secrets engine enabled
✓ Root CA created
✓ Intermediate CA created
✓ Certificate roles created (9 services)
✓ Credentials generated and stored
✓ Policies created
✓ CA certificates exported
- Restart failing services:
docker compose restart postgres redis-1 redis-2 redis-3 mysql mongodb rabbitmq- Verify all services are healthy:
docker ps --format "table {{.Names}}\t{{.Status}}"All services should show Up and healthy within 60 seconds.
Prevention:
Add automatic bootstrap check to devstack start (see Future Enhancements).
Symptoms:
$ ./devstack start
ERRO[0000] error starting vm: error creating instanceCommon Causes:
- Insufficient disk space
# Check available space
df -h ~
# Solution: Free up disk space or adjust COLIMA_DISK
COLIMA_DISK=100 ./devstack start- VZ framework issue (macOS Ventura+)
# Check Colima version
colima version
# Update Colima if outdated
brew upgrade colima
# Restart with fresh config
colima delete
./devstack start- Port conflicts
# Check if ports are already in use
lsof -i :8200 # Vault
lsof -i :5432 # PostgreSQL
lsof -i :3001 # Grafana
# Stop conflicting services or change ports in docker-compose.ymlSymptoms:
$ docker compose up -d
ERROR: yaml.parser.ParserErrorSolutions:
- YAML syntax error:
# Validate docker-compose.yml
docker compose config
# Look for the error line and fix indentation/syntax- Invalid environment variables:
# Check .env file exists and has correct format
cat .env
# No spaces around = signs
# Correct: VAR=value
# Incorrect: VAR = value- Missing Docker network:
# Create network manually if needed
docker network create dev-services --subnet 172.20.0.0/16Check health status:
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"Inspect specific service health:
docker inspect dev-postgres-1 --format='{{json .State.Health}}' | jqView health check logs:
docker logs dev-postgres-1 --tail 50Diagnostic:
# Check if PostgreSQL is running
docker exec dev-postgres-1 pg_isready -U postgres
# Check if credentials were fetched
docker logs dev-postgres-1 | grep "Fetched credentials"
# Try connecting manually
docker exec -it dev-postgres-1 psql -U dev_admin -d dev_databaseCommon fixes:
- Vault not bootstrapped (see Vault Bootstrap)
- Incorrect credentials in Vault
- Database initialization failed (check logs)
Diagnostic:
# Check cluster status
docker exec dev-redis-1 redis-cli -a $(VAULT_ADDR=http://localhost:8200 VAULT_TOKEN=$(cat ~/.config/vault/root-token) vault kv get -field=password secret/redis-1) cluster info
# Check cluster nodes
docker exec dev-redis-1 redis-cli -a <password> cluster nodesCommon issues:
- Not all 3 nodes started yet (wait 30s)
- Cluster not formed (check
configs/redis/scripts/init.sh) - Network issue between nodes (check
docker network inspect dev-services)
Diagnostic:
# Check if MySQL is ready
docker exec dev-mysql-1 mysqladmin ping
# Check logs
docker logs dev-mysql-1 | grep -i error
# Try connecting
docker exec -it dev-mysql-1 mysql -u dev_user -p dev_databaseSymptoms:
- APIs can't connect to databases
- "Connection refused" or "Host not found" errors
Diagnostic:
graph TD
Start["Network connectivity issue?"]
CheckNetwork["Check: Docker network exists?"]
CheckIPs["Check: Services have static IPs?"]
CheckDNS["Test: DNS resolution works?"]
CheckFirewall["Check: Firewall blocking?"]
Fixed["Issue resolved"]
Start --> CheckNetwork
CheckNetwork -->|No| CreateNetwork["Create network"]
CheckNetwork -->|Yes| CheckIPs
CreateNetwork --> Fixed
CheckIPs -->|No| RestartServices["Restart services"]
CheckIPs -->|Yes| CheckDNS
RestartServices --> Fixed
CheckDNS -->|Fails| CheckFirewall
CheckDNS -->|Works| OtherIssue["Check service logs"]
CheckFirewall --> Fixed
Solutions:
- Verify Docker network:
docker network ls | grep dev-services
docker network inspect dev-servicesExpected: 172.20.0.0/16 subnet with all services listed.
- Check service IP assignments:
docker inspect dev-vault-1 --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'
# Should be: 172.20.0.5
docker inspect dev-postgres-1 --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'
# Should be: 172.20.0.10- Test DNS resolution:
# From one container to another
docker exec dev-reference-api-1 ping -c 2 postgres
docker exec dev-reference-api-1 ping -c 2 vault- Test connectivity:
# Test database connectivity from API container
docker exec dev-reference-api-1 nc -zv postgres 5432
docker exec dev-reference-api-1 nc -zv redis-1 6379- Restart network if needed:
docker compose down
docker network rm dev-services
docker network create dev-services --subnet 172.20.0.0/16
docker compose up -dSymptoms:
-
curl http://localhost:8000fails -
psql -h localhost -p 5432times out
Solutions:
- Check port bindings:
docker ps --format "table {{.Names}}\t{{.Ports}}"Expected: 0.0.0.0:8000->8000/tcp for exposed services.
- Verify Colima VM networking:
colima status
# Check VZ networking is enabled
colima list- Test with explicit IP:
# Get Colima VM IP
colima list -j | jq -r '.[0].address'
# Try connecting to service via VM IP
curl http://<vm-ip>:8000/healthError: FATAL: password authentication failed
Solutions:
- Verify credentials in Vault:
VAULT_ADDR=http://localhost:8200 \
VAULT_TOKEN=$(cat ~/.config/vault/root-token) \
vault kv get secret/postgresql- Check pg_hba.conf:
docker exec dev-postgres-1 cat /var/lib/postgresql/data/pg_hba.confShould include:
host all all 172.20.0.0/16 md5
- Reset PostgreSQL:
docker compose stop postgres
docker volume rm devstack-core_postgres_data
docker compose up -d postgresError: Access denied for user 'dev_user'@'172.20.0.100'
Solutions:
- Check user grants:
docker exec dev-mysql-1 mysql -u root -p$(VAULT_ADDR=http://localhost:8200 VAULT_TOKEN=$(cat ~/.config/vault/root-token) vault kv get -field=root_password secret/mysql) -e "SELECT user,host FROM mysql.user;"- Recreate user:
docker exec -it dev-mysql-1 mysql -u root -p
mysql> DROP USER IF EXISTS 'dev_user'@'%';
mysql> CREATE USER 'dev_user'@'%' IDENTIFIED BY '<password_from_vault>';
mysql> GRANT ALL PRIVILEGES ON dev_database.* TO 'dev_user'@'%';
mysql> FLUSH PRIVILEGES;Error: Authentication failed
Solutions:
- Verify MongoDB credentials:
VAULT_ADDR=http://localhost:8200 \
VAULT_TOKEN=$(cat ~/.config/vault/root-token) \
vault kv get secret/mongodb- Test connection:
docker exec -it dev-mongodb-1 mongosh -u dev_user -p <password> --authenticationDatabase admin dev_database- Reset MongoDB:
docker compose stop mongodb
docker volume rm devstack-core_mongodb_data
docker compose up -d mongodbSymptoms:
$ docker logs dev-redis-1 | grep cluster
[ERR] Node is not in cluster modeSolutions:
- Check all 3 nodes are running:
docker ps | grep redisAll three (redis-1, redis-2, redis-3) must be healthy.
- Check cluster configuration:
docker exec dev-redis-1 redis-cli -a <password> cluster infoExpected:
cluster_state:ok
cluster_slots_assigned:16384
cluster_known_nodes:3
- Manually create cluster:
docker exec dev-redis-1 redis-cli -a <password> --cluster create \
172.20.0.13:6379 \
172.20.0.16:6379 \
172.20.0.17:6379 \
--cluster-replicas 0- Reset cluster:
docker compose stop redis-1 redis-2 redis-3
docker volume rm devstack-core_redis_data_1 devstack-core_redis_data_2 devstack-core_redis_data_3
docker compose up -d redis-1 redis-2 redis-3Error: MOVED 12345 172.20.0.16:6379
Explanation: This is normal behavior for Redis Cluster. Clients must follow redirects.
Solutions:
- Use cluster-aware client:
# Python example
from redis.cluster import RedisCluster
rc = RedisCluster(host='172.20.0.13', port=6379, password='...')- Test cluster operations:
docker exec dev-redis-1 redis-cli -a <password> -c
127.0.0.1:6379> SET mykey myvalue
-> Redirected to slot [14687] located at 172.20.0.16:6379
OKNote the -c flag enables cluster mode (follows redirects).
Symptoms:
$ curl http://localhost:8200/v1/sys/health
{"sealed":true}Solutions:
- Check Vault status:
docker exec dev-vault-1 vault status- Unseal Vault manually:
docker exec dev-vault-1 vault operator unseal $(cat ~/.config/vault/unseal-key-1)
docker exec dev-vault-1 vault operator unseal $(cat ~/.config/vault/unseal-key-2)
docker exec dev-vault-1 vault operator unseal $(cat ~/.config/vault/unseal-key-3)- Or use auto-unseal script:
docker exec dev-vault-1 /vault/scripts/vault-auto-unseal.sh- Restart Vault (auto-unseals on startup):
docker compose restart vaultError: permission denied
Solutions:
- Check token TTL:
VAULT_ADDR=http://localhost:8200 \
VAULT_TOKEN=$(cat ~/.config/vault/root-token) \
vault token lookup- Use root token:
# Root token never expires
export VAULT_TOKEN=$(cat ~/.config/vault/root-token)- Create new token:
VAULT_ADDR=http://localhost:8200 \
VAULT_TOKEN=$(cat ~/.config/vault/root-token) \
vault token create -policy=adminError: * permission denied
Diagnostic:
# Check if secret exists
VAULT_ADDR=http://localhost:8200 \
VAULT_TOKEN=$(cat ~/.config/vault/root-token) \
vault kv list secret/
# Try reading specific secret
vault kv get secret/postgresqlSolutions:
- If secret doesn't exist - run bootstrap:
VAULT_ADDR=http://localhost:8200 \
VAULT_TOKEN=$(cat ~/.config/vault/root-token) \
bash configs/vault/scripts/vault-bootstrap.sh- If permission denied - check policy:
vault token lookup
# Check "policies" field includes necessary policiesSymptoms:
$ docker compose logs postgres
Error: AppRole credentials not found: /vault-approles/postgres/role-idCause: AppRole credentials not generated or not mounted properly
Solutions:
- Run vault-bootstrap to generate AppRole credentials:
export VAULT_ADDR=http://localhost:8200
export VAULT_TOKEN=$(cat ~/.config/vault/root-token)
./devstack vault-bootstrap- Verify AppRole credentials exist:
ls -la ~/.config/vault/approles/postgres/
# Should show: role-id and secret-id files- Check file permissions:
chmod 600 ~/.config/vault/approles/postgres/secret-id
chmod 644 ~/.config/vault/approles/postgres/role-id- Verify volume mount in docker-compose.yml:
volumes:
- ${HOME}/.config/vault/approles/postgres:/vault-approles/postgres:roSymptoms:
$ docker compose logs postgres
Error: AppRole authentication failed: invalid role_idCause: AppRole not created in Vault or mismatched credentials
Solutions:
- Verify AppRole exists in Vault:
export VAULT_ADDR=http://localhost:8200
export VAULT_TOKEN=$(cat ~/.config/vault/root-token)
vault list auth/approle/role
# Should show: postgres, mysql, mongodb, redis, rabbitmq, forgejo, reference-api- Regenerate AppRole for specific service:
./devstack vault-bootstrap
# Or manually:
vault write auth/approle/role/postgres \
token_ttl=1h \
token_max_ttl=4h \
policies=postgres-policy- Get new role-id and secret-id:
# Get role-id
vault read auth/approle/role/postgres/role-id
# Generate new secret-id
vault write -f auth/approle/role/postgres/secret-idSymptoms:
$ docker compose logs postgres
Error: permission denied (token expired)Cause: Service token TTL is 1 hour, may expire if service runs long-term
Solutions:
- Restart service to get new token:
docker compose restart postgres- Check token TTL policy:
vault read auth/approle/role/postgres
# Look for: token_ttl and token_max_ttl- For long-running services, consider token renewal in init script
Symptoms:
$ docker exec dev-postgres psql -U devuser -c "SELECT 1"
Error: Could not fetch MySQL credentials from VaultCause: This is expected behavior - each service can only access its own secrets
Explanation:
AppRole policies enforce least-privilege access. PostgreSQL's AppRole can only read secret/postgres, not secret/mysql.
If you need cross-service access (NOT recommended):
# Modify policy (development only!)
vault policy write postgres-policy - <<EOF
path "secret/data/postgres" {
capabilities = ["read"]
}
path "secret/data/mysql" {
capabilities = ["read"]
}
EOFBetter solution: Use service mesh or API gateway for cross-service communication.
Symptoms:
$ ls ~/.config/vault/approles/postgres/
ls: cannot access: No such file or directoryCause: Vault data not persisted or vault-bootstrap not run after fresh install
Solutions:
- Check if Vault data volume exists:
docker volume ls | grep vault
# Should show: devstack-core_vault_data- Re-run vault-bootstrap:
./devstack vault-init # If Vault is uninitialized
./devstack vault-bootstrap # Regenerate AppRole credentials- Backup AppRole credentials (recommended):
# Backup
tar -czf vault-approles-backup-$(date +%Y%m%d).tar.gz ~/.config/vault/approles/
# Restore
tar -xzf vault-approles-backup-YYYYMMDD.tar.gz -C ~/Symptoms:
$ docker compose logs postgres
Error reading secret/postgres: permission deniedCause: Service policy not attached to AppRole or policy doesn't allow read
Solutions:
- Verify policy exists:
vault policy list
# Should include: postgres-policy- Check policy contents:
vault policy read postgres-policy
# Should show path "secret/data/postgres" with read capability- Verify AppRole has policy attached:
vault read auth/approle/role/postgres
# Check "policies" field includes "postgres-policy"- Re-create policy and AppRole:
./devstack vault-bootstrapSymptoms:
$ docker compose up postgres
Error: exec /init/init-approle.sh: no such file or directoryCause: Volume mount missing or script doesn't exist
Solutions:
- Verify script exists:
ls -la configs/postgres/scripts/init-approle.sh- Check docker-compose.yml volume mount:
volumes:
- ./configs/postgres/scripts/init-approle.sh:/init/init-approle.sh:ro- Ensure script has execute permissions:
chmod +x configs/postgres/scripts/init-approle.shEnable debug logging:
# In init-approle.sh, add before authentication:
set -x # Enable debug mode
# View full authentication flow:
docker compose logs postgres 2>&1 | grep -A 10 "AppRole"Manual AppRole test:
# Get credentials from files
ROLE_ID=$(cat ~/.config/vault/approles/postgres/role-id)
SECRET_ID=$(cat ~/.config/vault/approles/postgres/secret-id)
# Test login
curl -X POST http://localhost:8200/v1/auth/approle/login \
-d "{\"role_id\":\"$ROLE_ID\",\"secret_id\":\"$SECRET_ID\"}"
# Should return: {"auth":{"client_token":"hvs.CAESIE..."}}Check which services use AppRole:
# List all AppRole credentials
ls -la ~/.config/vault/approles/
# Should show: postgres, mysql, mongodb, redis, rabbitmq, forgejo, reference-api
# Services NOT using AppRole (use VAULT_TOKEN):
# - pgbouncer, api-first, golang-api, nodejs-api, rust-api
# - redis-exporter-1/2/3, vectorSymptoms:
$ docker ps
Cannot connect to the Docker daemonSolutions:
- Check Colima status:
colima status- Start Colima:
colima startOr use manage script:
./devstack start- If stuck, force restart:
colima stop -f
colima startSymptoms:
$ docker system df
Images 50GB
Containers 10GB
Volumes 20GBSolutions:
- Clean up unused resources:
# Remove unused containers, networks, images
docker system prune -a
# Remove unused volumes (WARNING: deletes data!)
docker volume prune- Remove specific volumes:
# List volumes
docker volume ls
# Remove specific volume
docker volume rm devstack-core_prometheus_data- Increase Colima disk:
colima stop
# Edit ~/.colima/default/colima.yaml
# Change disk: 100G
colima startSymptoms:
- Containers are slow
- High CPU usage
Solutions:
- Increase Colima resources:
# Stop Colima
colima stop
# Start with more resources
COLIMA_CPU=8 COLIMA_MEMORY=16 ./devstack start- Check resource usage:
docker stats- Limit individual containers:
# In docker-compose.yml
services:
postgres:
deploy:
resources:
limits:
cpus: '2'
memory: 2GCommon test failures and solutions:
- Test: Vault health check
# Manual test
curl -s http://localhost:8200/v1/sys/health | jq
# Should return: "sealed": falseFix: Unseal Vault (see Vault is Sealed)
- Test: Service connectivity
# Manual test
docker exec dev-reference-api-1 curl -s http://postgres:5432Fix: Check network connectivity (see Network Issues)
- Test: Redis cluster
# Run specific test
bash tests/infrastructure/test_redis_cluster.shFix: See Redis Cluster Issues
Running specific test suites:
- FastAPI unit tests:
cd reference-apps/fastapi
pytest tests/ -v- With coverage:
pytest tests/ --cov=app --cov-report=term-missing- Specific test file:
pytest tests/test_api.py::test_health_check -vCommon issues:
- Import errors: Check virtual environment is activated
- Connection errors: Ensure services are healthy
- Credential errors: Run Vault bootstrap
Symptoms:
$ docker compose build reference-api
ERROR: Cannot install -r requirements.txt (line 32) and pytest==9.0.0 because these package versions have conflicting dependencies.
The user requested pytest==9.0.0
pytest-asyncio 1.2.0 depends on pytest<9 and >=8.2
# OR
ERROR: Cannot install fastapi-cache2[redis]==0.2.2 and redis==7.0.1 because these package versions have conflicting dependencies.
fastapi-cache2[redis] 0.2.2 depends on redis<5.0.0 and >=4.2.0rc1; extra == "redis"Cause:
- Pinned dependency versions with incompatible requirements
-
pytest-asyncio 1.2.0incompatible withpytest 9.0.0 -
fastapi-cache2[redis]extra requiresredis<5.0.0but we needredis 7.0.1
Solution (without downgrades):
- Fix pytest-asyncio conflict:
# In requirements.txt, change:
pytest-asyncio==1.2.0
# To (let pip resolve compatible version):
pytest-asyncio # Let pip resolve compatible version- Fix fastapi-cache2 redis conflict:
# In requirements.txt, change:
fastapi-cache2[redis]==0.2.2
# To (install without redis extra):
fastapi-cache2==0.2.2 # Install without [redis] extra to avoid dependency conflict- Rebuild Docker image:
docker compose build reference-api
docker compose up -d reference-apiResult: Build succeeds while maintaining pytest 9.0.0 and redis 7.0.1
Reference: Fixed in November 2025 - see reference-apps/fastapi/requirements.txt
Symptoms:
$ curl http://localhost:8000/health/all
{"error":"InternalServerError","message":"An unexpected error occurred","status_code":500}
# Logs show:
AssertionError: You must call init first!Cause:
- Redis connection fails at startup (e.g., Redis not running or credentials missing)
- Startup exception handler catches Redis error but doesn't initialize FastAPICache
-
/health/allendpoint has@cache()decorator - Decorator tries to use uninitialized FastAPICache → AssertionError
Solution:
Add InMemoryBackend fallback in startup exception handler:
# In reference-apps/fastapi/app/main.py startup event
try:
# Try to initialize Redis cache
redis_creds = await vault_client.get_secret("redis-1")
redis_password = redis_creds.get("password", "")
redis_url = f"redis://:{redis_password}@{settings.REDIS_HOST}:{settings.REDIS_PORT}"
await cache_manager.init(redis_url, prefix="cache:")
except Exception as e:
logger.error(f"Failed to initialize cache: {e}")
logger.warning("Application will continue without caching")
# Initialize FastAPICache with InMemoryBackend as fallback to prevent decorator errors
from fastapi_cache import FastAPICache
from fastapi_cache.backends.inmemory import InMemoryBackend
FastAPICache.init(InMemoryBackend(), prefix="cache:")Result: Endpoint returns 200 with "degraded" status instead of crashing
Verification:
# Test endpoint
curl http://localhost:8000/health/all | jq
# Should return:
{
"status": "degraded", # or "healthy" if all services up
"services": { ... }
}Reference: Fixed in November 2025 - see reference-apps/fastapi/app/main.py:369-375
Running Go tests:
cd reference-apps/golang
go test ./... -vCommon issues:
- Build errors:
go mod tidy
go mod download- Test timeout:
go test ./... -v -timeout 5mDiagnostic:
- Check API metrics:
curl http://localhost:8000/metrics | grep http_request_duration- Check database connection pooling:
docker logs dev-reference-api-1 | grep "pool"- Monitor with Grafana:
- Open http://localhost:3001
- Check "Application Metrics" dashboard
- Look for slow queries or high latency
Solutions:
- Increase connection pool size:
# In reference-apps/fastapi/app/database.py
POOL_SIZE = 20 # Increase from default 10- Add caching:
# Use Redis for caching
from redis import Redis
cache = Redis(host='redis-1', port=6379)- Optimize database queries:
# Check slow queries in PostgreSQL
docker exec dev-postgres-1 psql -U dev_admin -d dev_database -c \
"SELECT query, calls, total_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;"Diagnostic:
docker stats --no-streamSolutions:
- Restart high-memory services:
docker compose restart <service>- Add memory limits:
# docker-compose.yml
services:
mongodb:
deploy:
resources:
limits:
memory: 2G- Increase Colima memory:
COLIMA_MEMORY=16 ./devstack startSymptoms:
$ vault write pki_int/issue/postgres-role common_name="postgres" ttl=1h
Error writing data to pki_int/issue/postgres-role: Error making API request.
URL: PUT http://localhost:8200/v1/pki_int/issue/postgres-role
Code: 400. Errors:
* common name postgres not allowed by this roleCause:
- Vault PKI role has
allowed_domainsconfiguration - Common name must match one of the allowed patterns
- For postgres role: allowed domains are
postgres.dev-services.localandlocalhost - Using bare
postgresdoesn't match the allowed pattern
Solution:
Use the correct common name matching PKI role configuration:
# Check allowed domains for a role
vault read pki_int/roles/postgres-role
# Should show:
# allowed_domains: [postgres.dev-services.local, localhost]
# Issue certificate with correct common name
vault write pki_int/issue/postgres-role \
common_name="postgres.dev-services.local" \
ttl=1h
# OR for localhost
vault write pki_int/issue/postgres-role \
common_name="localhost" \
ttl=1hCommon allowed domains by service:
-
postgres:
postgres.dev-services.local,localhost -
mysql:
mysql.dev-services.local,localhost -
redis:
redis.dev-services.local,localhost -
mongodb:
mongodb.dev-services.local,localhost -
rabbitmq:
rabbitmq.dev-services.local,localhost
Verification:
# List all PKI roles
vault list pki_int/roles
# Check specific role configuration
vault read pki_int/roles/<service>-roleReference: Fixed in tests/test-vault.sh:483 (November 2025)
Error: SSL certificate problem: unable to get local issuer certificate
Solutions:
- Check CA certificate exists:
ls -la ~/.config/vault/ca/
# Should have: ca-chain.pem, root-ca.pem, intermediate-ca.pem- Re-export CA certificates:
VAULT_ADDR=http://localhost:8200 \
VAULT_TOKEN=$(cat ~/.config/vault/root-token) \
bash configs/vault/scripts/vault-bootstrap.sh- Trust CA certificate (macOS):
sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain ~/.config/vault/ca/root-ca.pem- Use CA in API calls:
curl --cacert ~/.config/vault/ca/ca-chain.pem https://localhost:8443/healthDiagnostic:
openssl s_client -connect localhost:8443 -CAfile ~/.config/vault/ca/ca-chain.pemSolutions:
- Check TLS is enabled in Vault:
VAULT_ADDR=http://localhost:8200 \
VAULT_TOKEN=$(cat ~/.config/vault/root-token) \
vault kv get -field=tls_enabled secret/postgresql- Verify certificate validity:
docker exec dev-postgres-1 openssl x509 -in /etc/ssl/certs/postgres.crt -text -noout- Regenerate certificates:
# Re-run bootstrap
VAULT_ADDR=http://localhost:8200 \
VAULT_TOKEN=$(cat ~/.config/vault/root-token) \
bash configs/vault/scripts/vault-bootstrap.sh
# Restart service
docker compose restart postgresRun all health checks:
bash tests/infrastructure/run_all_tests.shVault:
docker exec dev-vault-1 vault status
docker logs dev-vault-1 --tail 50
curl http://localhost:8200/v1/sys/health | jqPostgreSQL:
docker exec dev-postgres-1 pg_isready
docker logs dev-postgres-1 --tail 50
docker exec dev-postgres-1 psql -U dev_admin -d dev_database -c "SELECT version();"Redis Cluster:
docker exec dev-redis-1 redis-cli -a <password> cluster info
docker exec dev-redis-1 redis-cli -a <password> cluster nodes
docker logs dev-redis-1 --tail 50MySQL:
docker exec dev-mysql-1 mysqladmin ping
docker logs dev-mysql-1 --tail 50
docker exec dev-mysql-1 mysql -u dev_user -p<password> -e "SHOW DATABASES;"MongoDB:
docker exec dev-mongodb-1 mongosh --eval "db.adminCommand('ping')"
docker logs dev-mongodb-1 --tail 50RabbitMQ:
docker exec dev-rabbitmq-1 rabbitmqctl status
docker logs dev-rabbitmq-1 --tail 50
curl -u dev_user:<password> http://localhost:15672/api/overview | jqCheck all IPs:
docker network inspect dev-services --format='{{range .Containers}}{{.Name}}: {{.IPv4Address}}{{"\n"}}{{end}}'Test DNS resolution:
docker exec dev-reference-api-1 nslookup vault
docker exec dev-reference-api-1 nslookup postgresTest connectivity matrix:
for service in vault postgres mysql mongodb redis-1 rabbitmq; do
echo "Testing $service..."
docker exec dev-reference-api-1 nc -zv $service $(docker port dev-$service-1 | head -1 | cut -d: -f2) 2>&1 | grep -q "succeeded" && echo "✓ $service" || echo "✗ $service"
doneResource usage:
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"API response time:
time curl -s http://localhost:8000/healthDatabase query performance:
docker exec dev-postgres-1 psql -U dev_admin -d dev_database -c \
"SELECT schemaname,tablename,n_tup_ins,n_tup_upd,n_tup_del FROM pg_stat_user_tables;"Planned improvements to reduce troubleshooting:
-
Automatic Vault Bootstrap Detection
- Add check to
devstack start - Auto-run bootstrap if credentials missing
- Make startup truly "one command"
- Add check to
-
Health Dashboard
- Quick status showing all 23 services
- Vault bootstrap status indicator
- Add to
./devstack status
-
Enhanced Error Messages
- Better service log output
- Clearer init script errors
- Actionable error messages
-
Automated Recovery
- Auto-restart unhealthy services
- Self-healing for common issues
- Notification system for failures
If you encounter an issue not covered here:
-
Check service logs:
docker logs <container-name> --tail 100
-
Run diagnostic tests:
bash tests/infrastructure/run_all_tests.sh
-
Check GitHub Issues:
- See if similar issue already reported
- Open new issue with logs and diagnostics
-
Consult related documentation:
- ARCHITECTURE.md - System design
- PERFORMANCE_TUNING.md - Optimization
- VAULT_SECURITY.md - Vault specifics
- TEST_RESULTS.md - Expected test behavior
Last updated: 2025-10-27