# GC-SEG Pipeline Notebook

This notebook demonstrates the complete GeometryCrafter Segmented (GC-SEG) pipeline for processing long videos.

## Pipeline Phases

1. **Phase 1**: Video metadata probing and segment mapping
2. **Phase 2**: Run AI inference on each segment
3. **Phase 3**: Extract disparity PNGs from NPZ segments
4. **Phase 4**: Compute scale/shift alignment between segments
5. **Phase 5**: Merge aligned frames with seamless blending
6. **Phase 6**: Apply bilateral filter and encode to video

In [1]:
# Import required modules
import os
from pathlib import Path

from gc_seg import (
    create_segment_mapping,
    process_video_segments,
    extract_disparity_frames,
    compute_alignment,
    merge_segments,
    process_video,
    Encoder,
)

print("All modules imported successfully!")

  from .autonotebook import tqdm as notebook_tqdm


All modules imported successfully!


## Configuration

Set up your video path and parameters below.

In [None]:
# === CONFIGURATION ===
VIDEO_PATH = "workspace/input/AWO_clip2_V1-1520.mp4"  # Path to input video

# Segment parameters
WINDOW_SIZE = 80  # Frames per segment
OVERLAP = 20  # Overlap between segments

# Processing parameters
HEIGHT = 320  # Output height (must be divisible by 64)
WIDTH = 512  # Output width (must be divisible by 64)

# Round up to nearest multiple of 64 if needed
HEIGHT = ((HEIGHT + 63) // 64) * 64
WIDTH = ((WIDTH + 63) // 64) * 64
print(f"Processing resolution: {WIDTH}x{HEIGHT}")

MODEL_TYPE = "diff"  # 'diff' or 'determ'
NUM_INFERENCE_STEPS = 5
LOW_MEMORY_USAGE = True
DECODE_CHUNK_SIZE = 6
# Calculate downsample ratio to process at target resolution
# Example: if video is 1920x1080 and you want 320x512:
# downsample_ratio = 1080 / 320 = 3.375
DOWNSAMPLE_RATIO = 1.0

# Output folders
WORKSPACE = Path("workspace")
SEGMENTS_FOLDER = WORKSPACE / "segments"
TEMP_FRAMES_FOLDER = WORKSPACE / "temp_frames"
MERGED_FRAMES_FOLDER = WORKSPACE / "merged_frames"
OUTPUT_PATH = WORKSPACE / "output.mp4"


# Alignment JSON path
ALIGNMENT_PATH = WORKSPACE / "alignment.json"

# Create workspace folder
WORKSPACE.mkdir(exist_ok=True)

# Phase 5: Merge options
BYPASS_ALIGNMENT = True  # Set True to skip scale/shift alignment
# if BYPASS_ALIGNMENT and not ALIGNMENT_PATH.exists():
#      ALIGNMENT_PATH.write_text('{"overlap": 25, "alignments": []}')
    
CLEAN_MERGED_OUTPUT = True  # Clean merged folder before processing

# Phase 6: Video encoding options
ENCODER = "x265"  # Options: "x265", "h264", "nvenc_h264", "nvenc_hevc"
# For NVENC: use preset "p1" (fastest) to "p7" (best quality), e.g., "p4"
# For CPU: use preset "ultrafast" to "veryslow", e.g., "medium"
ENCODER_PRESET = "medium"
BYPASS_BILATERAL = True
UPSCALE_RES = None  # Keep low res: 512x276
# UPSCALE_RES = (1912, 1036)  # Upscale to original video resolution

print(f"Video: {VIDEO_PATH}")
print(f"Window: {WINDOW_SIZE}, Overlap: {OVERLAP}")
print(f"Output: {OUTPUT_PATH}")

Processing resolution: 512x320
Video: workspace/input/AWO_clip2_V1-1520.mp4
Window: 80, Overlap: 20
Output: workspace\output.mp4


## Phase 1: Metadata & Segment Mapping

Probe video metadata and create segment mapping with specified window size and overlap.

In [3]:
# Phase 1: Create segment mapping
print("=" * 50)
print("PHASE 1: Metadata & Segment Mapping")
print("=" * 50)

mapping = create_segment_mapping(
    VIDEO_PATH,
    window_size=WINDOW_SIZE,
    overlap=OVERLAP,
)

print(f"\nVideo: {mapping.metadata.num_frames} frames @ {mapping.metadata.fps:.2f} FPS")
print(f"Resolution: {mapping.metadata.width}x{mapping.metadata.height}")
print(f"Created {len(mapping.segments)} segments:\n")

for i, seg in enumerate(mapping.segments):
    print(f"  Segment {i}: frames {seg.start_frame}-{seg.end_frame} ({seg.frame_count} frames)")

# Save mapping to JSON
mapping_json_path = WORKSPACE / "segment_mapping.json"
mapping.to_json(mapping_json_path)
print(f"\nMapping saved to: {mapping_json_path}")

PHASE 1: Metadata & Segment Mapping

Video: 113 frames @ 24.00 FPS
Resolution: 1912x1036
Created 2 segments:

  Segment 0: frames 0-79 (80 frames)
  Segment 1: frames 60-112 (53 frames)

Mapping saved to: workspace\segment_mapping.json


## Phase 2: Run AI Inference on Segments

Process each segment through the GeometryCrafter model to generate point maps.

In [None]:
# Reset segments folder (run this to clear npz files)
import shutil
if SEGMENTS_FOLDER.exists():
    shutil.rmtree(SEGMENTS_FOLDER)
SEGMENTS_FOLDER.mkdir(exist_ok=True)
print(f"Reset {SEGMENTS_FOLDER}")

In [None]:
# Phase 2: Run AI inference on each segment
print("=" * 50)
print("PHASE 2: AI Inference on Segments")
print("=" * 50)
# Check if video has odd dimensions and force resize if needed
orig_width = mapping.metadata.width
orig_height = mapping.metadata.height
print(f"Original video: {orig_width}x{orig_height}")
if orig_width % 2 != 0 or orig_height % 2 != 0:
    print("WARNING: Video has odd dimensions, forcing resize...")
    # Calculate downsample ratio to get to target while ensuring even dims
    ratio_w = orig_width / WIDTH
    ratio_h = orig_height / HEIGHT
    DOWNSAMPLE_RATIO = max(ratio_w, ratio_h)
    print(f"Using downsample_ratio: {DOWNSAMPLE_RATIO:.2f}")
else:
    DOWNSAMPLE_RATIO = None  # Let it process at original res if even
segment_paths = process_video_segments(
    video_path=VIDEO_PATH,
    segment_mapping=mapping,
    save_folder=str(SEGMENTS_FOLDER),
    height=HEIGHT,
    width=WIDTH,
    model_type=MODEL_TYPE,
    num_inference_steps=NUM_INFERENCE_STEPS,
    low_memory_usage=LOW_MEMORY_USAGE,
    decode_chunk_size=DECODE_CHUNK_SIZE,
    downsample_ratio=DOWNSAMPLE_RATIO,  # Force resize if odd
    keep_low_res=True,
)

print(f"\nProcessed {len(segment_paths)} segments:")
for p in segment_paths:
    print(f"  {p.name}")

## Phase 3: Extract Disparity PNGs

Extract the Z-channel (depth/disparity) from point maps and save as 16-bit PNGs.

In [None]:
# Reset temp frames folder (run this to clear extracted PNGs without deleting NPZ segments)
import shutil
if TEMP_FRAMES_FOLDER.exists():
    shutil.rmtree(TEMP_FRAMES_FOLDER)
TEMP_FRAMES_FOLDER.mkdir(exist_ok=True)
print(f"Reset {TEMP_FRAMES_FOLDER}")

In [None]:
# Phase 3: Extract disparity frames
print("=" * 50)
print("PHASE 3: Extract Disparity PNGs")
print("=" * 50)

extract_disparity_frames(
    segments_folder=str(SEGMENTS_FOLDER),
    output_folder=str(TEMP_FRAMES_FOLDER),
    invert=True,
)

# Count extracted frames
total_frames = 0
for seg_dir in sorted(TEMP_FRAMES_FOLDER.glob("part_*")):
    num_frames = len(list(seg_dir.glob("*.png")))
    print(f"  {seg_dir.name}: {num_frames} frames")
    total_frames += num_frames

print(f"\nTotal extracted: {total_frames} frames")

## Phase 4: Compute Alignment

Calculate scale (s) and shift (t) coefficients to align overlapping frames between segments.

In [None]:
# Phase 4: Compute alignment
print("=" * 50)
print("PHASE 4: Compute Scale/Shift Alignment")
print("=" * 50)

if len(mapping.segments) < 2:
    print("Only one segment - no alignment needed!")
    alignments = []
else:
    alignments = compute_alignment(
        segments_folder=str(TEMP_FRAMES_FOLDER),
        segment_mapping=mapping,
        output_path=ALIGNMENT_PATH,
    )

    print(f"\nAlignment coefficients saved to: {ALIGNMENT_PATH}")
    print("\nSummary:")
    for a in alignments:
        print(f"  Segment {a.segment_a_index} -> {a.segment_b_index}: s={a.scale:.6f}, t={a.shift:.6f}, RMSE={a.rmse:.6f}")

## Phase 5: Merge & Blend Frames

Apply alignment transformations and blend overlapping regions to create seamless output.

In [None]:
# Reset merged frames folder (run this to clear extracted PNGs inside the merged frames folder)
import shutil
if MERGED_FRAMES_FOLDER.exists():
    shutil.rmtree(MERGED_FRAMES_FOLDER)
MERGED_FRAMES_FOLDER.mkdir(exist_ok=True)
print(f"Reset {MERGED_FRAMES_FOLDER}")

In [None]:
# Phase 5: Merge and blend frames
print("=" * 50)
print("PHASE 5: Merge & Blend Frames")
print("=" * 50)
merged_paths = merge_segments(
    segments_folder=str(TEMP_FRAMES_FOLDER),
    segment_mapping=mapping,
    alignment_path=str(ALIGNMENT_PATH),
    output_folder=str(MERGED_FRAMES_FOLDER),
    blend_mode="sigmoid",  # or "linear"
    blend_sigma=6.0,
    bypass_alignment=BYPASS_ALIGNMENT,  # Skip scale/shift if True
    clean_output=CLEAN_MERGED_OUTPUT,   # Clean old frames before merging
)

print(f"\nMerged {len(merged_paths)} frames to: {MERGED_FRAMES_FOLDER}")

## Phase 6: Post-Process & Encode

Apply joint bilateral filter for edge-preserving smoothing and encode to x265 10-bit video.

In [6]:
import importlib
import gc_seg.post_processor
importlib.reload(gc_seg.post_processor)



# Phase 6: Post-process and encode
print("=" * 50)
print("PHASE 6: Post-Process & Encode")
print("=" * 50)
print(f"Bypass bilateral: {BYPASS_BILATERAL}")

output_video = process_video(
    merged_frames_folder=str(MERGED_FRAMES_FOLDER),
    original_video=VIDEO_PATH,
    output_path=str(OUTPUT_PATH),
    bilateral_d=9,
    bilateral_sigma_color=0.1,
    bilateral_sigma_space=0.1,
    encoder=ENCODER,
    encoder_preset=ENCODER_PRESET,
    encoder_crf=18,
    bitdepth=10,
    bypass_bilateral=BYPASS_BILATERAL,
    upscale_resolution=UPSCALE_RES,
)

print(f"\n{'=' * 50}")
print(f"Pipeline complete!")
print(f"Output: {output_video}")
print(f"{'=' * 50}")

PHASE 6: Post-Process & Encode
Bypass bilateral: False
Applying joint bilateral filter to 113 frames...
  Processed 50/113 frames
  Processed 100/113 frames
Saving filtered frames to workspace\filtered_frames
Upscaling frames to 1912x1036...
Upscaled 113 frames
Encoded video to workspace\output.mp4

Pipeline complete!
Output: workspace\output.mp4


## Summary

The GC-SEG pipeline has completed all 6 phases:

1. **Metadata & Mapping**: Created segment mapping
2. **AI Inference**: Generated point maps for each segment
3. **Extraction**: Converted to 16-bit disparity PNGs
4. **Alignment**: Computed scale/shift coefficients
5. **Merge & Blend**: Created seamless frame sequence
6. **Post-Process**: Applied bilateral filter and encoded to video

Output video saved to: `workspace/output.mp4`