Seamless AWS cloud bursting for parallel R workloads
staRburst lets you run parallel R code on AWS with zero infrastructure management. Scale from your laptop to 100+ cloud workers with a single function call. Supports both EC2 (recommended for performance and cost) and Fargate (serverless) backends.
- Simple Setup: One-time configuration (~2 minutes), then seamless operation
- Simple API: Direct
starburst_map()function - no new concepts to learn - Flexible Backends: EC2 (recommended - faster, cheaper, spot support) and Fargate (serverless)
- Detached Sessions: Submit long-running jobs and detach - retrieve results anytime
- Automatic Environment Sync: Your packages and dependencies automatically available on workers
- Smart Quota Management: Automatically handles AWS quota limits with wave execution
- Cost Transparent: See estimated and actual costs for every run
- Auto Cleanup: Workers shut down automatically when done
CRAN submission in progress for v0.3.6 (expected within 2-4 weeks).
Once available:
install.packages("starburst")Development version from GitHub:
remotes::install_github("scttfrdmn/starburst")library(starburst)
# One-time setup (2 minutes)
starburst_setup()
# Run parallel computation on AWS
results <- starburst_map(
1:1000,
function(x) expensive_computation(x),
workers = 50
)
#> 🚀 Starting starburst cluster with 50 workers
#> 💰 Estimated cost: ~$2.80/hour
#> 📊 Processing 1000 items with 50 workers
#> 📦 Created 50 chunks (avg 20 items per chunk)
#> 🚀 Submitting tasks...
#> ✓ Submitted 50 tasks
#> ⏳ Progress: 50/50 tasks (3.2 minutes elapsed)
#>
#> ✓ Completed in 3.2 minutes
#> 💰 Estimated cost: $0.15library(starburst)
# Define simulation
simulate_portfolio <- function(seed) {
set.seed(seed)
returns <- rnorm(252, mean = 0.0003, sd = 0.02)
prices <- cumprod(1 + returns)
list(
final_value = prices[252],
sharpe_ratio = mean(returns) / sd(returns) * sqrt(252)
)
}
# Run 10,000 simulations on 100 AWS workers
results <- starburst_map(
1:10000,
simulate_portfolio,
workers = 100
)
#> 🚀 Starting starburst cluster with 100 workers
#> 💰 Estimated cost: ~$5.60/hour
#> 📊 Processing 10000 items with 100 workers
#> ⏳ Progress: 100/100 tasks (3.1 minutes elapsed)
#>
#> ✓ Completed in 3.1 minutes
#> 💰 Estimated cost: $0.29
# Extract results
final_values <- sapply(results, function(x) x$final_value)
sharpe_ratios <- sapply(results, function(x) x$sharpe_ratio)
# Summary
mean(final_values) # Average portfolio outcome
quantile(final_values, c(0.05, 0.95)) # Risk range
# Comparison:
# Local (single core): ~4 hours
# Cloud (100 workers): 3 minutes, $0.29# Create cluster once
cluster <- starburst_cluster(workers = 50, cpu = 4, memory = "8GB")
# Run multiple analyses
results1 <- cluster$map(dataset1, analysis_function)
results2 <- cluster$map(dataset2, processing_function)
results3 <- cluster$map(dataset3, modeling_function)
# All use the same Docker image and configuration# For memory-intensive workloads
results <- starburst_map(
large_datasets,
memory_intensive_function,
workers = 20,
cpu = 8,
memory = "16GB"
)
# For CPU-intensive workloads
results <- starburst_map(
cpu_tasks,
cpu_intensive_function,
workers = 50,
cpu = 4,
memory = "8GB"
)Run long jobs and disconnect - results persist in S3:
# Start detached session
session <- starburst_session(workers = 50, detached = TRUE)
# Submit work and get session ID
session$submit(quote({
results <- starburst_map(huge_dataset, expensive_function)
saveRDS(results, "results.rds")
}))
session_id <- session$session_id
# Disconnect - job continues running
# Later (hours/days), reconnect:
session <- starburst_session_attach(session_id)
status <- session$status() # Check progress
results <- session$collect() # Get results
# Cleanup when done
session$cleanup(force = TRUE)- Environment Snapshot: Captures your R packages using renv
- Container Build: Creates Docker image with your environment, cached in ECR
- Task Distribution: Splits data into chunks across workers
- Task Submission: Launches Fargate tasks (or sequential batches if quota-limited)
- Data Transfer: Serializes task data to S3 using fast qs format
- Execution: Workers pull data, execute function on chunk items, push results
- Result Collection: Downloads and combines results in correct order
- Cleanup: Automatically shuts down workers
# Set cost limits
starburst_config(
max_cost_per_job = 10, # Hard limit
cost_alert_threshold = 5 # Warning at $5
)
# Costs shown transparently
results <- starburst_map(data, fn, workers = 100)
#> 💰 Estimated cost: ~$3.50/hour
#> ✓ Completed in 23 minutes
#> 💰 Estimated cost: $1.34staRburst automatically handles AWS Fargate quota limitations:
results <- starburst_map(data, fn, workers = 100, cpu = 4)
#> ⚠ Requested 100 workers (400 vCPUs) but quota allows 25 workers (100 vCPUs)
#> ⚠ Using 25 workers instead
#> 💰 Estimated cost: ~$1.40/hourYour work still completes, just with fewer workers. You can request quota increases through AWS Service Quotas.
starburst_map(.x, .f, workers, ...)- Parallel map over datastarburst_cluster(workers, cpu, memory)- Create reusable clusterstarburst_setup()- Initial AWS configurationstarburst_config(...)- Update configurationstarburst_status()- Check cluster status
starburst_config(
region = "us-east-1",
max_cost_per_job = 10,
cost_alert_threshold = 5
)Full documentation available at starburst.ing
- Getting Started Guide
- Detached Sessions
- Example Vignettes
- API Reference
- Security Guide
- Troubleshooting
| Feature | staRburst | RStudio Server on EC2 | Coiled (Python) |
|---|---|---|---|
| Setup time | 2 minutes | 30+ minutes | 5 minutes |
| Infrastructure management | Zero | Manual | Zero |
| Learning curve | Minimal | Medium | Medium |
| Auto scaling | Yes | No | Yes |
| Cost optimization | Automatic | Manual | Automatic |
| R-native | Yes | Yes | No (Python) |
- R >= 4.0
- AWS account with:
- AWS CLI configured or
AWS_PROFILEset - IAM permissions for ECS, ECR, S3, VPC
- Two IAM roles (created during setup):
starburstECSExecutionRole- for ECS/ECR accessstarburstECSTaskRole- for S3 access
- AWS CLI configured or
For detailed setup instructions, see the Getting Started guide.
- ✅ Direct API (
starburst_map,starburst_cluster) - ✅ AWS Fargate integration
- ✅ EC2 backend support with spot instances
- ✅ Detached session mode for long-running jobs
- ✅ Automatic environment management
- ✅ Cost tracking and quota handling
- ✅ Full
futurebackend integration - ✅ Support for
future.apply,furrr,targets - ✅ Comprehensive AWS integration testing
- ✅ CRAN-ready (0 errors, 0 notes)
- Performance optimizations
- Enhanced error recovery
- Interactive progress monitoring
- Multi-region support
Contributions welcome! See the GitHub repository for contribution guidelines.
Apache License 2.0 - see LICENSE
Copyright 2026 Scott Friedman
@software{starburst,
title = {staRburst: Seamless AWS Cloud Bursting for R},
author = {Scott Friedman},
year = {2026},
version = {0.3.6},
url = {https://starburst.ing},
license = {Apache-2.0}
}Built using the paws AWS SDK for R.
Container management with renv and rocker.
Inspired by Coiled for Python/Dask.
