-
Notifications
You must be signed in to change notification settings - Fork 0
crew.aws.batch for province-wide parallel pipeline #76
Copy link
Copy link
Open
Description
crew.aws.batch for province-wide habitat pipeline
Vision
Replace local parallel execution with crew.aws.batch workers hitting an RDS PostgreSQL instance. 350 WSGs × 7 species = 2,450 tasks across 50 workers = ~2 minutes for the entire province (vs ~9 hours sequential local).
Architecture
Controller (laptop or EC2)
└── crew.aws.batch launcher
├── Worker 1 (Batch) ──→ RDS PostgreSQL (fwapg)
├── Worker 2 (Batch) ──→ ...
└── Worker 50 (Batch) ──→ RDS PostgreSQL (fwapg)
What fresh needs
frs_habitat()already supportsworkersparam — swap mirai for crew controller- Workers need
frs_db_conn()params passed via crew'sdataargument - Docker image: current
docker/Dockerfile+ R + fresh installed → push to ECR break_sourcestables live in RDS, all workers read them
What awshak needs
- RDS/Aurora PostgreSQL with fwapg loaded
- Batch compute environment (Fargate or EC2)
- ECR repository for the worker Docker image
- VPC, security groups, IAM roles
- See NewGraphEnvironment/awshak#64
Cost estimate
- 50 workers × 2 min = 100 vCPU-minutes ≈ $0.50/run (Batch)
- RDS db.r6g.xlarge (~$0.40/hr) — shared infrastructure
Prerequisites
- mirai integration in fresh (this comes first)
- awshak infrastructure (RDS, Batch, ECR)
Blocked by
- awshak infrastructure issue (to be filed)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels