# Simple VACE Video Editing

Clean, focused notebook for video-to-video editing with VACE.

## What is VACE?
VACE (Video-Aware Content Editing) from [ali-vilab/VACE](https://github.com/ali-vilab/VACE) is a model for editing videos based on text prompts.

## Key Parameters:
- **`context_scale`**: Controls editing strength (0.0 = maximum change, 1.0 = minimal change)
  - Try **0.3-0.5** for strong edits
  - Try **0.6-0.8** for subtle edits
- **`guide_scale`**: Prompt adherence (higher = follows prompt more closely)
- **`shift`**: Noise schedule (16.0 works well for editing)

## Resolution: 480p (832×480)

In [1]:
# Imports
import os
import sys
import time
import glob
from datetime import datetime

import torch
from IPython.display import Video, display, HTML

sys.path.insert(0, '/workspace/wan2.1/Wan2.1')

from wan.vace import WanVace
from wan.configs.wan_t2v_14B import t2v_14B
from wan.configs.wan_t2v_1_3B import t2v_1_3B
from wan.utils.utils import cache_video

print("✓ Imports successful")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A'}")

  from .autonotebook import tqdm as notebook_tqdm
  @amp.autocast(enabled=False)
  @amp.autocast(enabled=False)


✓ Imports successful
CUDA available: True
GPU: NVIDIA B200


## Configuration

In [None]:
CONFIG = {
    # Model Selection: '1.3B' or '14B'
    'model': '14B',
    
    # Paths
    'checkpoint_dir_14B': '/workspace/wan2.1/Wan2.1/Wan2.1-VACE-14B',
    'checkpoint_dir_1.3B': '/workspace/wan2.1/Wan2.1/Wan2.1-VACE-1.3B',
    'output_dir': '/workspace/wan2.1/Wan2.1/outputs',
    
    # Source Video (will auto-select most recent)
    'source_video': '/workspace/wan2.1/Wan2.1/outputs/baseline_14B_832x480_20260106_183542.mp4',
    # Resolution
    'width': 832,
    'height': 480,
    'frame_num': 81,
    
    # Editing Parameters
    'edit_prompt': 'Change style to animated cartoon',
    'negative_prompt': '',
    
    # CRITICAL: Context Scale (how much to preserve original video)
    # Lower = more changes | Higher = fewer changes
    'context_scale': 0.4,  # Try 0.3-0.5 for strong edits, 0.6-0.8 for subtle
    
    # Sampling
    'sampling_steps': 50,
    'sample_solver': 'unipc',
    'shift': 16.0,
    'guide_scale': 5.0,
    'seed': 42,
    
    # Device
    'device_id': 0,
}

os.makedirs(CONFIG['output_dir'], exist_ok=True)
print('✓ Configuration set')
print(f"  Model: VACE-{CONFIG['model']}")
print(f"  Context scale: {CONFIG['context_scale']} ({'strong edit' if CONFIG['context_scale'] < 0.5 else 'subtle edit'})")

✓ Configuration set
  Model: VACE-14B
  Context scale: 0.4 (strong edit)


## Select Source Video

In [3]:
# List available videos
videos = sorted(glob.glob(os.path.join(CONFIG['output_dir'], '*.mp4')), 
                key=os.path.getmtime, reverse=True)

print("Available videos:")
print("="*80)
for i, v in enumerate(videos[:10]):
    name = os.path.basename(v)
    size = os.path.getsize(v) / 1024 / 1024
    mtime = datetime.fromtimestamp(os.path.getmtime(v)).strftime('%Y-%m-%d %H:%M')
    print(f"{i}: {name} ({size:.1f}MB) - {mtime}")
print("="*80)

# Auto-select most recent
if not CONFIG['source_video'] and videos:
    CONFIG['source_video'] = videos[0]
    print(f"\n✓ Auto-selected: {os.path.basename(CONFIG['source_video'])}")

# Preview
if CONFIG['source_video'] and os.path.exists(CONFIG['source_video']):
    print(f"\nSource video: {CONFIG['source_video']}")
    display(Video(CONFIG['source_video'], embed=True, width=832))
else:
    print("\n⚠️  No source video found!")
    print("   Please set CONFIG['source_video'] = '/path/to/your/video.mp4'")

Available videos:
0: edited_v2v_1.3B_20260112_190134.mp4 (3.6MB) - 2026-01-12 19:01
1: edited_v2v_1.3B_20260112_184554.mp4 (3.7MB) - 2026-01-12 18:45
2: edited_v2v_14B_20260112_182209.mp4 (3.3MB) - 2026-01-12 18:41
3: edited_v2v_1.3B_20260112_182209.mp4 (3.7MB) - 2026-01-12 18:27
4: edited_v2v_1.3B_20260112_182032.mp4 (3.7MB) - 2026-01-12 18:20
5: edited_v2v_1.3B_20260112_181337.mp4 (4.7MB) - 2026-01-12 18:13
6: edited_v2v_1.3B_20260112_173744.mp4 (3.7MB) - 2026-01-12 17:37
7: edited_v2v_1.3B_20260112_163908.mp4 (3.5MB) - 2026-01-12 16:39
8: edited_v2v_1.3B_20260112_163545.mp4 (3.7MB) - 2026-01-12 16:35
9: edited_v2v_1.3B_20260112_160030.mp4 (3.7MB) - 2026-01-12 16:00

Source video: /workspace/wan2.1/Wan2.1/outputs/baseline_14B_832x480_20260106_074824.mp4


## Load VACE Model

In [4]:
model_name = CONFIG['model']
checkpoint_dir = CONFIG[f'checkpoint_dir_{model_name}']

print(f"Loading VACE-{model_name}...")
print(f"Checkpoint: {checkpoint_dir}")

# Select config
vace_config = t2v_14B if model_name == '14B' else t2v_1_3B

# Load model
vace_model = WanVace(
    config=vace_config,
    checkpoint_dir=checkpoint_dir,
    device_id=CONFIG['device_id'],
    rank=0,
    t5_fsdp=False,
    dit_fsdp=False,
    use_usp=False,
    t5_cpu=False
)

mem = torch.cuda.memory_allocated() / 1e9
print(f"\n✓ VACE-{model_name} loaded")
print(f"  GPU Memory: {mem:.2f} GB")

Loading VACE-14B...
Checkpoint: /workspace/wan2.1/Wan2.1/Wan2.1-VACE-14B


KeyboardInterrupt: 

## Run Video Editing

In [None]:
print("="*80)
print("VACE VIDEO EDITING")
print("="*80)
print(f"Source: {os.path.basename(CONFIG['source_video'])}")
print(f"Model: VACE-{CONFIG['model']}")
print(f"Edit prompt: {CONFIG['edit_prompt']}")
print(f"Context scale: {CONFIG['context_scale']}")
print("="*80)

start_time = time.time()

# Step 1: Prepare source (load and encode video)
print("\nStep 1: Preparing source video...")
prepared_video, prepared_mask, prepared_refs = vace_model.prepare_source(
    src_video=[CONFIG['source_video']],
    src_mask=[None],  # Full-frame editing
    src_ref_images=[None],  # No reference images
    num_frames=CONFIG['frame_num'],
    image_size=(CONFIG['width'], CONFIG['height']),
    device=vace_model.device
)
print(f"✓ Source prepared - Video shape: {prepared_video[0].shape}")

# Step 2: Run VACE editing
print("\nStep 2: Running VACE editing...")
print(f"  Sampling steps: {CONFIG['sampling_steps']}")
print(f"  Guide scale: {CONFIG['guide_scale']}")
print(f"  Context scale: {CONFIG['context_scale']}")

edited_video = vace_model.generate(
    input_prompt=CONFIG['edit_prompt'],
    input_frames=prepared_video,
    input_masks=prepared_mask,
    input_ref_images=prepared_refs,
    size=(CONFIG['width'], CONFIG['height']),
    frame_num=CONFIG['frame_num'],
    context_scale=CONFIG['context_scale'],
    shift=CONFIG['shift'],
    sample_solver=CONFIG['sample_solver'],
    sampling_steps=CONFIG['sampling_steps'],
    guide_scale=CONFIG['guide_scale'],
    n_prompt=CONFIG['negative_prompt'],
    seed=CONFIG['seed'],
    offload_model=False
)

total_time = time.time() - start_time
print(f"\n✓ Editing complete - Output shape: {edited_video.shape}")
print(f"  Total time: {total_time:.2f}s")

# Step 3: Save
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_path = os.path.join(
    CONFIG['output_dir'],
    f"edited_{CONFIG['model']}_ctx{CONFIG['context_scale']}_{timestamp}.mp4"
)
cache_video(edited_video[None], save_file=output_path, fps=16, nrow=1,
            normalize=True, value_range=(-1, 1))

print(f"\n✓ Saved to: {output_path}")
print("="*80)

## View Results

In [None]:
# Side-by-side comparison
html_content = f'''
<style>
    .comparison {{
        display: grid;
        grid-template-columns: 1fr 1fr;
        gap: 20px;
        max-width: 1600px;
        margin: 0 auto;
    }}
    .video-box {{
        text-align: center;
        padding: 15px;
        border: 2px solid #ddd;
        border-radius: 10px;
        background: #f9f9f9;
    }}
    .video-box h3 {{
        margin-top: 0;
        color: #333;
    }}
    .controls {{
        grid-column: 1 / -1;
        text-align: center;
        padding: 20px;
        background: #e8f4f8;
        border-radius: 10px;
        margin-bottom: 20px;
    }}
    .btn {{
        padding: 12px 30px;
        margin: 5px;
        font-size: 16px;
        background: #4CAF50;
        color: white;
        border: none;
        border-radius: 5px;
        cursor: pointer;
    }}
    .btn:hover {{ background: #45a049; }}
    .info {{
        font-size: 12px;
        color: #666;
        margin-top: 10px;
    }}
</style>

<div class="comparison">
    <div class="controls">
        <h3>🎬 Synchronized Playback</h3>
        <button class="btn" onclick="playAll()">▶️ Play All</button>
        <button class="btn" onclick="pauseAll()">⏸️ Pause All</button>
        <button class="btn" onclick="restartAll()">⏮️ Restart</button>
    </div>
    
    <div class="video-box">
        <h3>📹 Original</h3>
        <video class="sync-video" width="800" controls>
            <source src="{CONFIG['source_video']}" type="video/mp4">
        </video>
        <div class="info">Source video</div>
    </div>
    
    <div class="video-box">
        <h3>🎨 Edited (VACE-{CONFIG['model']})</h3>
        <video class="sync-video" width="800" controls>
            <source src="{output_path}" type="video/mp4">
        </video>
        <div class="info">
            Context scale: {CONFIG['context_scale']}<br>
            Time: {total_time:.1f}s<br>
            Prompt: {CONFIG['edit_prompt']}
        </div>
    </div>
</div>

<script>
function playAll() {{
    document.querySelectorAll('.sync-video').forEach(v => v.play());
}}
function pauseAll() {{
    document.querySelectorAll('.sync-video').forEach(v => v.pause());
}}
function restartAll() {{
    document.querySelectorAll('.sync-video').forEach(v => {{
        v.currentTime = 0;
        v.pause();
    }});
}}
</script>
'''

display(HTML(html_content))

## 💡 Tips for Better Results

### If the video isn't changing enough:
1. **Lower `context_scale`** (try 0.3 or 0.2)
2. **Increase `guide_scale`** (try 7.0 or 8.0)
3. Make your prompt **more specific and detailed**

### If the video is changing too much:
1. **Raise `context_scale`** (try 0.6 or 0.7)
2. **Decrease `guide_scale`** (try 3.0 or 4.0)

### Try different context scales:
- **0.2-0.3**: Maximum editing (might lose consistency)
- **0.4-0.5**: Strong editing (good balance) ⭐
- **0.6-0.7**: Subtle editing (preserves more original)
- **0.8-0.9**: Minimal editing (very conservative)

### Based on [VACE repo](https://github.com/ali-vilab/VACE):
- VACE works best when you want to **edit existing videos** while preserving motion
- The `context_scale` parameter is critical for controlling edit strength
- You can add masks for regional editing (not implemented yet in this notebook)

### Example prompts:
- "Two orange anthropomorphic cats in anime style boxing in a traditional Japanese dojo"
- "Cartoon style anthropomorphic cats with blue boxing gloves sparring in a gym"
- "Watercolor painting of two elegant cats dancing ballet in a theater"

## Clean Up Memory (Optional)

In [None]:
# Optional: Clean up GPU memory
del vace_model
torch.cuda.empty_cache()
print("✓ GPU memory cleared")