Halldyll Starter RunPod

A comprehensive Rust library for managing RunPod GPU pods with automatic provisioning, state management, and orchestration.

Features

REST API Client - Create, start, stop pods via RunPod REST API
GraphQL Client - Full access to RunPod GraphQL API for advanced operations
State Management - Persist pod state and compute idempotent action plans
Orchestration - High-level pod management with automatic reconciliation
Fully Configurable - All settings via environment variables (.env file)
Strict Linting - Production-ready code with comprehensive lint rules

Installation

From crates.io (recommended)

[dependencies]
halldyll_starter_runpod = "0.2"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }

From GitHub

[dependencies]
halldyll_starter_runpod = { git = "https://github.com/Mr-soloDev/halldyll-starter" }
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }

Configuration

Create a .env file in your project root:

# Required
RUNPOD_API_KEY=your_api_key_here
RUNPOD_IMAGE_NAME=runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel

# Optional - Pod Configuration
RUNPOD_POD_NAME=my-gpu-pod
RUNPOD_GPU_TYPE_IDS=NVIDIA A40
RUNPOD_GPU_COUNT=1
RUNPOD_CONTAINER_DISK_GB=20
RUNPOD_VOLUME_GB=50
RUNPOD_VOLUME_MOUNT_PATH=/workspace
RUNPOD_PORTS=22/tcp,8888/http

# Optional - Timeouts
RUNPOD_HTTP_TIMEOUT_MS=30000
RUNPOD_READY_TIMEOUT_MS=300000
RUNPOD_POLL_INTERVAL_MS=5000

# Optional - API URLs
RUNPOD_REST_URL=https://rest.runpod.io/v1
RUNPOD_GRAPHQL_URL=https://api.runpod.io/graphql

# Optional - Behavior
RUNPOD_RECONCILE_MODE=reuse

Environment Variables Reference

Variable	Required	Default	Description
`RUNPOD_API_KEY`	✓	-	RunPod API key
`RUNPOD_IMAGE_NAME`	✓	-	Container image (e.g., `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel`)
`RUNPOD_POD_NAME`		`halldyll-pod`	Name for the pod
`RUNPOD_GPU_TYPE_IDS`		`NVIDIA A40`	Comma-separated GPU types (e.g., `NVIDIA A40,NVIDIA RTX 4090`)
`RUNPOD_GPU_COUNT`		`1`	Number of GPUs
`RUNPOD_CONTAINER_DISK_GB`		`20`	Container disk size in GB
`RUNPOD_VOLUME_GB`		`0`	Persistent volume size (0 = no volume)
`RUNPOD_VOLUME_MOUNT_PATH`		`/workspace`	Mount path for persistent volume
`RUNPOD_PORTS`		`22/tcp,8888/http`	Exposed ports (format: `port/protocol`)
`RUNPOD_HTTP_TIMEOUT_MS`		`30000`	HTTP request timeout (ms)
`RUNPOD_READY_TIMEOUT_MS`		`300000`	Pod ready timeout (ms)
`RUNPOD_POLL_INTERVAL_MS`		`5000`	Poll interval for readiness (ms)
`RUNPOD_RECONCILE_MODE`		`reuse`	`reuse` or `recreate` existing pods

Pod Naming & Multiple Pods

The orchestrator uses the pod name to identify and reuse existing pods:

Same name → Reuses the existing pod (starts it if stopped)
Different name → Creates a new pod

To run multiple pods simultaneously, simply use different names:

# Development pod
RUNPOD_POD_NAME=dev-pod

# Production pod  
RUNPOD_POD_NAME=prod-pod

# ML training pod
RUNPOD_POD_NAME=training-pod

Each unique name creates a separate pod on RunPod.

Usage

Quick Start with Orchestrator

The orchestrator provides the simplest way to get a ready-to-use pod:

use halldyll_starter_runpod::{RunpodOrchestrator, RunpodOrchestratorConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load config from .env
    let cfg = RunpodOrchestratorConfig::from_env()?;
    let orchestrator = RunpodOrchestrator::new(cfg)?;

    // Get a ready pod (creates, starts, or reuses as needed)
    let pod = orchestrator.ensure_ready_pod().await?;

    println!("Pod ready: {} at {}", pod.name, pod.public_ip);

    // Get SSH connection info
    if let Some((host, port)) = pod.ssh_endpoint() {
        println!("SSH: ssh -p {} user@{}", port, host);
    }

    // Get Jupyter URL
    if let Some(url) = pod.jupyter_endpoint() {
        println!("Jupyter: {}", url);
    }

    Ok(())
}

Stopping & Terminating Pods

use halldyll_starter_runpod::{RunpodOrchestrator, RunpodOrchestratorConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cfg = RunpodOrchestratorConfig::from_env()?;
    let orchestrator = RunpodOrchestrator::new(cfg)?;

    // Get a ready pod
    let pod = orchestrator.ensure_ready_pod().await?;
    println!("Pod running: {}", pod.id);

    // Do your work...

    // Stop the pod (keeps config, can restart later, stops billing)
    orchestrator.stop_pod(&pod.id).await?;
    println!("Pod stopped!");

    // Or stop by name (uses RUNPOD_POD_NAME from .env)
    // orchestrator.stop_current_pod().await?;

    // Or terminate completely (deletes the pod)
    // orchestrator.terminate(&pod.id).await?;
    // orchestrator.terminate_current_pod().await?;

    Ok(())
}

Auto-stop after timeout

use halldyll_starter_runpod::{RunpodOrchestrator, RunpodOrchestratorConfig};
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cfg = RunpodOrchestratorConfig::from_env()?;
    let orchestrator = RunpodOrchestrator::new(cfg)?;

    let pod = orchestrator.ensure_ready_pod().await?;
    println!("Pod running for max 1 hour...");

    // Auto-stop after 1 hour
    tokio::select! {
        _ = tokio::time::sleep(Duration::from_secs(3600)) => {
            println!("Timeout reached, stopping pod...");
            orchestrator.stop_pod(&pod.id).await?;
        }
        // Or wait for your task to complete
        // result = your_long_running_task() => { ... }
    }

    Ok(())
}

Low-Level Provisioner

For direct pod creation:

use halldyll_starter_runpod::{RunpodProvisioner, RunpodProvisionConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cfg = RunpodProvisionConfig::from_env()?;
    let provisioner = RunpodProvisioner::new(cfg)?;

    let pod = provisioner.create_pod().await?;
    println!("Created pod: {}", pod.id);

    Ok(())
}

Pod Starter (Start/Stop)

For managing existing pods:

use halldyll_starter_runpod::{RunpodStarter, RunpodStarterConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cfg = RunpodStarterConfig::from_env()?;
    let starter = RunpodStarter::new(cfg)?;

    // Start a pod
    let status = starter.start("pod_id_here").await?;
    println!("Pod status: {}", status.desired_status);

    // Stop a pod
    let status = starter.stop("pod_id_here").await?;
    println!("Pod stopped: {}", status.desired_status);

    Ok(())
}

GraphQL Client

For advanced operations:

use halldyll_starter_runpod::{RunpodClient, RunpodClientConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cfg = RunpodClientConfig::from_env()?;
    let client = RunpodClient::new(cfg)?;

    // List all pods
    let pods = client.list_pods().await?;
    for pod in pods {
        println!("Pod: {} ({})", pod.id, pod.desired_status);
    }

    // List available GPU types
    let gpus = client.list_gpu_types().await?;
    for gpu in gpus {
        println!("GPU: {} - Available: {}", gpu.display_name, gpu.available_count);
    }

    Ok(())
}

State Management

For persistent state and reconciliation:

use halldyll_starter_runpod::{RunPodState, JsonFileStateStore, PlannedAction};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let store = JsonFileStateStore::new("./pod_state.json");
    
    // Load existing state
    let mut state = store.load()?.unwrap_or_default();

    // Record a pod
    state.record_pod("pod-123", "my-pod", "runpod/pytorch:latest");

    // Compute reconciliation plan
    let action = state.reconcile("my-pod", "runpod/pytorch:latest");
    match action {
        PlannedAction::DoNothing(id) => println!("Pod {} is ready", id),
        PlannedAction::Start(id) => println!("Need to start pod {}", id),
        PlannedAction::Create => println!("Need to create new pod"),
    }

    // Save state
    store.save(&state)?;

    Ok(())
}

Modules

Module	Description
`runpod_provisioner`	Create new pods via REST API
`runpod_starter`	Start/stop existing pods via REST API
`runpod_state`	State persistence and reconciliation
`runpod_client`	GraphQL client for advanced operations
`runpod_orchestrator`	High-level pod management

GPU Types

Common GPU types available on RunPod:

GPU	ID
NVIDIA A40	`NVIDIA A40`
NVIDIA A100 80GB	`NVIDIA A100 80GB PCIe`
NVIDIA RTX 4090	`NVIDIA GeForce RTX 4090`
NVIDIA RTX 3090	`NVIDIA GeForce RTX 3090`
NVIDIA L40S	`NVIDIA L40S`

Use client.list_gpu_types() to get the full list with availability.

Running the Example

# Clone the project
git clone https://github.com/halldyll/starter.git
cd starter

# Create your .env file
cp .env.example .env
# Edit .env with your API key and settings

# Run the example
cargo run

Building

# Debug build
cargo build

# Release build
cargo build --release

# Check without building
cargo check

# Run with all lints
cargo clippy -- -D warnings

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Halldyll Starter RunPod

Features

Installation

From crates.io (recommended)

From GitHub

Configuration

Environment Variables Reference

Pod Naming & Multiple Pods

Usage

Quick Start with Orchestrator

Stopping & Terminating Pods

Auto-stop after timeout

Low-Level Provisioner

Pod Starter (Start/Stop)

GraphQL Client

State Management

Modules

GPU Types

Running the Example

Building

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

Mr-soloDev/halldyll-starter

Folders and files

Latest commit

History

Repository files navigation

Halldyll Starter RunPod

Features

Installation

From crates.io (recommended)

From GitHub

Configuration

Environment Variables Reference

Pod Naming & Multiple Pods

Usage

Quick Start with Orchestrator

Stopping & Terminating Pods

Auto-stop after timeout

Low-Level Provisioner

Pod Starter (Start/Stop)

GraphQL Client

State Management

Modules

GPU Types

Running the Example

Building

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages