# S3-Only Backend Quick Start: Versioned Artifacts Management

This guide walks you through using the versioned artifacts system's S3-only backend to manage binary artifacts. We'll use simple binary content as examples to demonstrate the concepts, but the system works with any binary files - SQLite databases, machine learning models, configuration archives, documentation snapshots, or any files you need to version.

## What is the S3-Only Backend?

The S3-Only Backend is a lightweight artifact management system that uses Amazon S3 as the sole storage solution for both binary content and metadata. Unlike the hybrid S3+DynamoDB approach, this backend stores everything in S3 using a structured folder hierarchy and smart filename encoding.

### When to Use S3-Only Backend

**Perfect for any binary artifacts:**
- SQLite databases (1MB - 50MB)
- Machine learning model files
- Configuration archives
- Documentation snapshots
- Small to medium binary assets
- Compiled applications
- Asset bundles

**Key Benefits:**
- Simpler setup (only S3 required)
- Cost-effective for moderate file sizes
- Built-in versioning and rollback
- No DynamoDB complexity

## Core Concepts

### Artifact

Any binary file that you want to version - SQLite databases, ML models, configuration files, compiled applications, or any other binary content that needs version management.

### Version

An immutable snapshot of your artifact:

- **LATEST**: Mutable development version (always points to newest content)
- **Numbered Versions**: Immutable snapshots (1, 2, 3, ... up to 999,999)

### Alias

A named pointer to specific versions that enables deployment patterns:

- **Simple Alias**: Points to one version (e.g., `prod` → version 5)
- **Traffic Splitting**: Routes percentage of requests between two versions for canary deployments

## Example Use Case: Binary Content Versioning

To demonstrate the versioned artifacts system, we'll use simple binary content as our example. Imagine you're managing application assets, configuration files, or any binary content that needs:
- Version control and history
- Environment-specific deployments
- Rollback capabilities
- Traffic splitting for testing

The same principles apply to any binary artifacts - ML models, configuration bundles, compiled assets, databases, etc.

## Setup and Initialization

Before we can start managing artifacts, we need to set up our development environment and initialize the versioned artifacts repository. In this section, we'll use moto to create a mock AWS environment that simulates S3 services locally - this allows you to experiment with the system without needing real AWS credentials or incurring costs. We'll configure a repository that will store all our binary artifacts with proper organization and naming conventions.

In [1]:
from versioned.api import s3_only_backend

Artifact = s3_only_backend.Alias
Alias = s3_only_backend.Alias
Repository = s3_only_backend.Repository

In [2]:
import moto
from boto_session_manager import BotoSesManager

# Start mocking AWS services
mock_aws = moto.mock_aws()
mock_aws.start()

# Configure AWS session (mocked)
bsm = BotoSesManager(region_name="us-east-1")
print(f"{bsm.aws_account_id = }")

bsm.aws_account_id = '123456789012'


In [3]:
# Use a simple bucket name
bucket = "mybucket"

# Create repository for binary artifacts
repo = s3_only_backend.Repository(
    aws_region=bsm.aws_region,
    s3_bucket=bucket,
    s3_prefix="artifacts",
    suffix=".bin"  # All artifacts will have .bin extension
)

# Initialize S3 bucket
repo.bootstrap(bsm=bsm)

print("✅ Repository initialized successfully!")

✅ Repository initialized successfully!


## Creating Your First Artifact

Now that our repository is set up, let's create and upload our first binary artifact. This section demonstrates the fundamental operation of the versioned artifacts system - taking binary content and storing it as a managed artifact. We'll create simple binary content in memory (avoiding file system operations) and upload it to the repository with metadata. This represents the core workflow you'll use whether you're managing ML models, configuration files, or any other binary content.

In [4]:
# Create binary content (could be any binary data)
def create_binary_content(version_tag: str) -> bytes:
    """Create sample binary content for demonstration"""
    content = f"Binary content for {version_tag} - timestamp: 2024-01-01"
    return content.encode('utf-8')

# Create initial binary content
artifact_name = "my_application"
content_v1 = create_binary_content("version 1.0")

print(f"📦 Created binary content: {len(content_v1)} bytes")
print(f"📋 Content preview: {content_v1[:50]}...")

📦 Created binary content: 54 bytes
📋 Content preview: b'Binary content for version 1.0 - timestamp: 2024-0'...


In [5]:
from rich import print as rprint

# Upload binary content as LATEST version
artifact = repo.put_artifact(
    bsm=bsm, 
    name=artifact_name, 
    content=content_v1,
    content_type="application/octet-stream",
    metadata={
        "description": "Application binary with core functionality",
        "version": "1.0",
        "created_by": "dev_team"
    }
)

rprint(artifact)
print(f"🔗 View in S3: {artifact.s3path.console_url}")

# Verify we can download the content
downloaded_content = artifact.get_content(bsm=bsm)
print(f"✅ Successfully uploaded and verified ({len(downloaded_content)} bytes)")
print(f"📋 Downloaded content matches: {downloaded_content == content_v1}")

🔗 View in S3: https://console.aws.amazon.com/s3/object/mybucket?prefix=artifacts/my_application/versions/000000_LATEST.bin
✅ Successfully uploaded and verified (54 bytes)
📋 Downloaded content matches: True


## Understanding S3 Folder Structure

One of the key features of the S3-only backend is its intelligent organization of artifacts in S3 storage. This section explores how the system automatically structures your artifacts using a hierarchical folder system and smart filename encoding. Understanding this structure is important because it enables features like chronological sorting, easy navigation, and efficient artifact discovery. The naming conventions ensure that your artifacts are stored in a predictable, scalable way that works well with S3's native capabilities.

In [6]:
# List all artifacts in the repository
artifact_names = repo.list_artifact_names(bsm=bsm)
print(f"📁 Artifacts in repository: {artifact_names}")

# Get detailed information about our artifact
artifact_info = repo.get_artifact_version(bsm=bsm, name=artifact_name)
rprint(artifact_info)

print(f"""
🏗️  S3 Folder Structure:
Repository Root: s3://{repo.s3_bucket}/{repo.s3_prefix}/

Your artifact structure:
{artifact_name}/
├── versions/
│   └── 000000_LATEST.bin          # ← Your current binary content
└── aliases/
    └── (no aliases yet)

The filename encoding ensures proper chronological sorting:
- 000000_LATEST.bin: Always appears first (development version)
- 999999_000001.bin: Version 1 (when published)
- 999998_000002.bin: Version 2 (when published)
- etc.
""")

📁 Artifacts in repository: ['my_application']



🏗️  S3 Folder Structure:
Repository Root: s3://mybucket/artifacts/

Your artifact structure:
my_application/
├── versions/
│   └── 000000_LATEST.bin          # ← Your current binary content
└── aliases/
    └── (no aliases yet)

The filename encoding ensures proper chronological sorting:
- 000000_LATEST.bin: Always appears first (development version)
- 999999_000001.bin: Version 1 (when published)
- 999998_000002.bin: Version 2 (when published)
- etc.



## Creating Immutable Versions

The LATEST version we created is mutable - it can be updated and changed as you develop your artifact. However, for production deployments and stable releases, you need immutable versions that never change. This section demonstrates how to "publish" the current LATEST version as a numbered, immutable snapshot. This is a critical concept in artifact management: development happens on LATEST, but deployments use numbered versions that provide consistency and enable reliable rollbacks.

In [7]:
# Publish the LATEST version as version 1
print("📸 Creating immutable version 1...")
published_version = repo.publish_artifact_version(bsm=bsm, name=artifact_name)

rprint(published_version)
print(f"🔗 Version 1 in S3: {published_version.s3path.console_url}")

# List all versions
versions = repo.list_artifact_versions(bsm=bsm, name=artifact_name)
print(f"\n📚 All versions ({len(versions)} total):")
for version in versions:
    print(f"  - Version {version.version}: {version.update_at} ({version.sha256[:8]}...)")

📸 Creating immutable version 1...


🔗 Version 1 in S3: https://console.aws.amazon.com/s3/object/mybucket?prefix=artifacts/my_application/versions/999999_000001.bin

📚 All versions (2 total):
  - Version LATEST: 2025-06-26T23:57:20+00:00 (88f0dfa9...)
  - Version 1: 2025-06-26T23:57:20+00:00 (88f0dfa9...)


## Updating and Publishing New Versions

In real development workflows, you'll continuously update your artifacts with new features, bug fixes, or improvements. This section demonstrates the iterative development cycle: updating the LATEST version with new content, then publishing stable snapshots when ready for release. This workflow separates ongoing development (which uses the mutable LATEST version) from production releases (which use immutable numbered versions), allowing you to work safely without affecting deployed systems.

In [8]:
# Create updated binary content
content_v2 = create_binary_content("version 2.0 - enhanced features")

print(f"📦 Updated content size: {len(content_v2)} bytes")
print(f"📈 Size change: {len(content_v2) - len(content_v1)} bytes")
print(f"📋 Content preview: {content_v2[:50]}...")

📦 Updated content size: 74 bytes
📈 Size change: 20 bytes
📋 Content preview: b'Binary content for version 2.0 - enhanced features'...


In [9]:
# Update the LATEST version
updated_artifact = repo.put_artifact(
    bsm=bsm,
    name=artifact_name,
    content=content_v2,
    content_type="application/octet-stream",
    metadata={
        "description": "Enhanced application binary with new features",
        "version": "2.0", 
        "created_by": "dev_team",
        "changes": "Added enhanced features and optimizations"
    }
)

print("🔄 LATEST version updated")
rprint(updated_artifact)

# Publish as version 2
version_2 = repo.publish_artifact_version(bsm=bsm, name=artifact_name)
print(f"📸 Published version 2")
rprint(version_2)

🔄 LATEST version updated


📸 Published version 2


## Working with Aliases for Deployment

While versions provide immutable snapshots of your artifacts, aliases provide a flexible deployment layer that maps environment names to specific versions. This section introduces aliases - named pointers that allow you to manage different environments (development, staging, production) without hardcoding version numbers in your applications. Aliases enable clean environment management and make it easy to promote versions through your deployment pipeline while maintaining clear separation between environments.

In [10]:
# Create development alias pointing to LATEST
dev_alias = repo.put_alias(
    bsm=bsm,
    name=artifact_name,
    alias="development"
    # No version specified = points to LATEST
)

print("🚧 Development alias created")
rprint(dev_alias)

# Create production alias pointing to stable version 1
prod_alias = repo.put_alias(
    bsm=bsm,
    name=artifact_name,
    alias="production",
    version=1
)

print("🚀 Production alias created")
rprint(prod_alias)

# Create staging alias pointing to version 2
staging_alias = repo.put_alias(
    bsm=bsm,
    name=artifact_name,
    alias="staging", 
    version=2
)

print("🧪 Staging alias created")
rprint(staging_alias)

🚧 Development alias created


🚀 Production alias created


🧪 Staging alias created


## Blue/Green Deployment

Blue/Green deployment is a powerful deployment strategy that enables zero-downtime releases by maintaining two identical production environments and instantly switching between them. This section demonstrates how aliases make Blue/Green deployments simple and safe. By pointing your production alias from one version to another in a single atomic operation, you can instantly deploy new versions or rollback to previous ones without any service interruption. This approach minimizes deployment risk and provides immediate rollback capabilities.

In [11]:
# Current production setup
current_prod = repo.get_alias(bsm=bsm, name=artifact_name, alias="production")
print(f"🔵 Current production points to version: {current_prod.version}")

# Simulate testing version 2 in staging
staging = repo.get_alias(bsm=bsm, name=artifact_name, alias="staging")
print(f"🧪 Staging is using version: {staging.version}")

# After testing passes, promote staging to production (Blue/Green switch)
print("\n🔄 Performing Blue/Green deployment...")
new_prod = repo.put_alias(
    bsm=bsm,
    name=artifact_name,
    alias="production",
    version=2  # Switch from version 1 to version 2
)

print(f"✅ Production switched to version {new_prod.version}")
rprint(new_prod)

# Applications using the production alias now get version 2 instantly
prod_content = new_prod.get_version_content(bsm=bsm)
print(f"📦 Production content size: {len(prod_content)} bytes")
print(f"📋 Production content: {prod_content[:50]}...")

🔵 Current production points to version: 1
🧪 Staging is using version: 2

🔄 Performing Blue/Green deployment...
✅ Production switched to version 2


📦 Production content size: 74 bytes
📋 Production content: b'Binary content for version 2.0 - enhanced features'...


## Canary Deployment

Canary deployment is a risk-mitigation strategy that gradually rolls out new versions by directing a small percentage of traffic to the new version while most traffic continues using the stable version. This section demonstrates the advanced alias feature of traffic splitting, which allows you to route a controlled percentage of requests between two versions. This approach enables you to test new versions with real production traffic, monitor their performance, and gradually increase adoption while maintaining the ability to quickly abort if issues arise.

In [12]:
# Create version 3 for canary testing
content_v3 = create_binary_content("version 3.0 - performance optimizations")

print(f"📦 Canary content size: {len(content_v3)} bytes")
print(f"📋 Content preview: {content_v3[:50]}...")

# Upload and publish version 3
repo.put_artifact(
    bsm=bsm, 
    name=artifact_name, 
    content=content_v3,
    content_type="application/octet-stream",
    metadata={
        "description": "Optimized application binary",
        "version": "3.0",
        "changes": "Performance optimizations and bug fixes"
    }
)
version_3 = repo.publish_artifact_version(bsm=bsm, name=artifact_name)

print(f"📸 Created version 3 with optimizations")
rprint(version_3)

📦 Canary content size: 82 bytes
📋 Content preview: b'Binary content for version 3.0 - performance optim'...
📸 Created version 3 with optimizations


In [13]:
# Start canary deployment: 90% traffic to stable v2, 10% to new v3
canary_alias = repo.put_alias(
    bsm=bsm,
    name=artifact_name,
    alias="production",
    version=2,                    # Stable version (90% traffic)
    secondary_version=3,          # Canary version (10% traffic)
    secondary_version_weight=10   # 10% goes to version 3
)

print("🕊️ Canary deployment started: 10% traffic to version 3")
rprint(canary_alias)

# Simulate traffic distribution
print("\n📊 Simulating 20 requests to see traffic distribution:")
version_counts = {2: 0, 3: 0}

for i in range(1000):
    selected_uri = canary_alias.random_artifact()
    if "000002" in selected_uri:  # Version 2
        version_counts[2] += 1
    else:  # Version 3
        version_counts[3] += 1

print(f"Version 2 (stable): {version_counts[2]} requests ({version_counts[2]/1000*100:.1f}%)")
print(f"Version 3 (canary): {version_counts[3]} requests ({version_counts[3]/1000*100:.1f}%)")

🕊️ Canary deployment started: 10% traffic to version 3



📊 Simulating 20 requests to see traffic distribution:
Version 2 (stable): 908 requests (90.8%)
Version 3 (canary): 92 requests (9.2%)


In [14]:
# Gradually increase canary traffic
print("\n📈 Increasing canary traffic to 50%...")
canary_50 = repo.put_alias(
    bsm=bsm,
    name=artifact_name,
    alias="production",
    version=2,
    secondary_version=3,
    secondary_version_weight=50  # Now 50% traffic to version 3
)

# After monitoring shows good results, complete the rollout
print("✅ Canary successful! Completing rollout...")
final_prod = repo.put_alias(
    bsm=bsm,
    name=artifact_name,
    alias="production",
    version=3  # 100% traffic to version 3
)

print(f"🚀 Production fully migrated to version {final_prod.version}")
rprint(final_prod)


📈 Increasing canary traffic to 50%...
✅ Canary successful! Completing rollout...
🚀 Production fully migrated to version 3


## Emergency Rollback

Despite careful testing, production issues can still occur after deployment. When they do, speed of recovery is critical. This section demonstrates emergency rollback procedures using aliases to instantly revert to a previous stable version. The versioned artifacts system makes rollbacks as simple as updating an alias pointer - no complex deployment processes, no waiting for builds, just an immediate switch back to a known-good version. This capability provides confidence to deploy frequently, knowing that rollback is always fast and reliable.

In [15]:
# Simulate an emergency: rollback production to version 2
print("🚨 Emergency detected! Rolling back to stable version...")

rollback_alias = repo.put_alias(
    bsm=bsm,
    name=artifact_name,
    alias="production",
    version=2  # Instant rollback to version 2
)

print(f"⚡ Emergency rollback completed to version {rollback_alias.version}")
rprint(rollback_alias)

# Verify rollback
current_prod = repo.get_alias(bsm=bsm, name=artifact_name, alias="production")
print(f"✅ Production verified at version {current_prod.version}")

# Check the content is correct
rollback_content = current_prod.get_version_content(bsm=bsm)
print(f"📋 Rollback content: {rollback_content[:50]}...")

🚨 Emergency detected! Rolling back to stable version...
⚡ Emergency rollback completed to version 2


✅ Production verified at version 2
📋 Rollback content: b'Binary content for version 2.0 - enhanced features'...


## Lifecycle Management

Over time, you'll accumulate many versions of your artifacts, and storage costs can grow if old versions aren't managed properly. This section demonstrates lifecycle management features that help you automatically clean up old versions while preserving important releases. The system provides flexible policies to keep recent versions, preserve versions based on age, and maintain versions that are still referenced by aliases. This automated cleanup helps control storage costs while ensuring you always retain the versions you need for rollbacks and historical reference.

In [16]:
# List all current versions
all_versions = repo.list_artifact_versions(bsm=bsm, name=artifact_name)
print(f"📚 Total versions before cleanup: {len(all_versions)}")

for version in all_versions:
    print(f"  - Version {version.version}: {version.update_at}")

# Clean up old versions (keep last 2, delete versions older than 0 seconds for demo)
purge_time, deleted_versions = repo.purge_artifact_versions(
    bsm=bsm,
    name=artifact_name,
    keep_last_n=2,                    # Keep latest 2 versions + LATEST
    purge_older_than_secs=0           # Delete immediately for demo
)

print(f"\n🧹 Cleanup completed at {purge_time}")
print(f"🗑️  Deleted {len(deleted_versions)} old versions:")
for deleted in deleted_versions:
    print(f"  - Version {deleted.version}")

# Verify remaining versions
remaining_versions = repo.list_artifact_versions(bsm=bsm, name=artifact_name)
print(f"✅ Remaining versions: {len(remaining_versions)}")

📚 Total versions before cleanup: 4
  - Version LATEST: 2025-06-26T23:57:20+00:00
  - Version 3: 2025-06-26T23:57:20+00:00
  - Version 2: 2025-06-26T23:57:20+00:00
  - Version 1: 2025-06-26T23:57:20+00:00

🧹 Cleanup completed at 2025-06-26 23:57:20.875823+00:00
🗑️  Deleted 1 old versions:
  - Version 1
✅ Remaining versions: 3


## Working with Binary Content

The final piece of the artifact management workflow is consuming the stored content in your applications. This section demonstrates how to retrieve binary content from aliases and versions, inspect artifact metadata, and integrate the versioned artifacts system into your application code. Whether you're loading ML models, configuration files, or other binary assets, this shows the practical patterns for accessing your managed artifacts in production systems. The examples also show how to handle different types of binary content and extract useful information for monitoring and debugging.

In [17]:
# Get production content
prod_alias = repo.get_alias(bsm=bsm, name=artifact_name, alias="production")
binary_content = prod_alias.get_version_content(bsm=bsm)

print(f"""
📊 Production Binary Stats:
  - Version: {prod_alias.version}
  - Size: {len(binary_content)} bytes
  - Content Type: application/octet-stream
  - SHA256: {prod_alias.version.sha256 if hasattr(prod_alias.version, 'sha256') else 'N/A'}
""")

# Decode and display content (since we know it's text)
try:
    decoded_content = binary_content.decode('utf-8')
    print(f"📋 Content: {decoded_content}")
except UnicodeDecodeError:
    print(f"📋 Binary content (first 50 bytes): {binary_content[:50]}")

print(f"""
✅ Tutorial completed! You've learned to:
  - Upload and version binary artifacts
  - Create immutable snapshots  
  - Use aliases for environment management
  - Perform Blue/Green deployments
  - Execute canary rollouts
  - Handle emergency rollbacks
  - Manage artifact lifecycle
""")

# Clean up mocked AWS
mock_aws.stop()


📊 Production Binary Stats:
  - Version: 2
  - Size: 74 bytes
  - Content Type: application/octet-stream
  - SHA256: N/A

📋 Content: Binary content for version 2.0 - enhanced features - timestamp: 2024-01-01

✅ Tutorial completed! You've learned to:
  - Upload and version binary artifacts
  - Create immutable snapshots  
  - Use aliases for environment management
  - Perform Blue/Green deployments
  - Execute canary rollouts
  - Handle emergency rollbacks
  - Manage artifact lifecycle



## Summary

The S3-Only Backend provides a powerful yet simple solution for managing any binary artifacts. Key takeaways:

**✅ Version Control**: Immutable snapshots ensure reliable rollbacks  
**✅ Environment Management**: Aliases enable clean dev/staging/prod workflows  
**✅ Deployment Patterns**: Blue/Green and canary deployments minimize risk  
**✅ Cost Effective**: S3-only storage optimizes costs for moderate file sizes  
**✅ Simple Operations**: No DynamoDB complexity, just S3 and smart file organization  
**✅ Universal**: Works with any binary content - databases, models, assets, configurations

This approach scales well for artifacts from 1MB to 50MB and integrates seamlessly with CI/CD pipelines for automated deployment workflows of any binary content.