# AV Workshop: Cosmos Transfer 2.5
**Authors:** Aiden Chang, Akul Santhosh


This notebook is a hands on guide for Milestone data. The goal is for you to understand, create, and use the multi-control modalities that power Cosmos Transfer 2.5 (CT 2.5).

In [4]:
!huggingface-cli login --token "YOUR TOKEN HERE"


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `hf`CLI if you want to set the git credential as well.
Token is valid (permission: read).
The token `read_token` has been saved to /home/nvidia/.cache/huggingface/stored_tokens
Your token has been saved to /home/nvidia/.cache/huggingface/token
Login successful.
The current active token is: `read_token`


In [1]:
import os
os.makedirs("prompts", exist_ok=True)
os.makedirs("outputs", exist_ok=True)
os.makedirs("control_modalities", exist_ok=True)

## 1. Augmenting real AV data

### Control Modalities

We start with the following control modalities:

| Original Video | Edge | Seg | Depth | Vis |
|----------|----------|----------|----------|----------|
| <video src="av_data/output_fixed.mp4" controls width="300"></video> | <video src="av_data/0_edge.mp4" controls width="300"></video> | <video src="av_data/0_seg.mp4" controls width="300"></video> | <video src="av_data/0_depth.mp4" controls width="300"></video> | <video src="av_data/0_vis.mp4" controls width="300"></video> |


### Recipe 


| Task | Suggested Controls & Settings| Example Results | Prompt |
|--|--|---|-|
|Original Video| N/A | <video src="av_data/output_fixed.mp4" width="300" controls></video> | N/A |
|Fog|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_fog_3_5_0_f_0_10.mp4" width="300" controls></video> | [Prompt Location](av_data/clip_0_easier_prompts/fog.txt) |
|Morning Sunlight|`{'guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.9}`| <video src="assets/av_assets/av_realistic_morning_sun_3_10_0_f_0_9.mp4" width="300" controls></video> | [Prompt Location](av_data/clip_0_easier_prompts/morning_sun.txt) |
|Night|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_night_3_5_0_f_0_10.mp4" width="300" controls></video> | [Prompt Location](av_data/clip_0_easier_prompts/night.txt) |
|Rain|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_rain_3_9_0_f_0_10.mp4" width="300" controls></video> | [Prompt Location](simulation_data/rain.txt) |
|No Snow|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_no_snow_3_10_0_f_0_9.mp4" width="300" controls></video> | [Prompt Location](av_data/clip_0_easier_prompts/no_snow.txt) |
|Wooden Road|`{guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.2, 'depth': 0.0}`| <video src="assets/av_assets/av_realistic_wooden_road_3_10_0_f_2_0.mp4" width="300" controls></video> | [Prompt Location](av_data/clip_0_easier_prompts/wooden_road.txt) |


### 1.1 Different Fog Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_fog_7_5_0_f_0_10-2.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_fog_3_5_0_f_0_10.mp4" width="300" controls></video> |



### 1.2 Different Morning sunlight Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_morning_sun_3_9_0_f_0_10-2.mp4" width="300" controls></video> |
|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_morning_sun_3_5_0_f_0_10.mp4" width="300" controls></video> |
|`{'guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.9}`| <video src="assets/av_assets/av_realistic_morning_sun_3_10_0_f_0_9.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_morning_sun_7_9_0_f_0_10.mp4" width="300" controls></video> |

### 1.3 Different Night Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_night_3_5_0_f_0_10.mp4" width="300" controls></video> |
|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_night_3_9_0_f_0_10.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_night_7_5_0_f_0_10.mp4" width="300" controls></video> |

### 1.4 Different Rain Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_rain_3_9_0_f_0_10.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.9}`| <video src="assets/av_assets/av_realistic_rain_7_10_0_f_0_9.mp4" width="300" controls></video> |


### 1.5 Different No Snow Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.0}`| <video src="assets/av_assets/av_realistic_no_snow_3_10_0_f_0_0.mp4" width="300" controls></video> |
|`{'guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.9}`| <video src="assets/av_assets/av_realistic_no_snow_3_10_0_f_0_9.mp4" width="300" controls></video> |


### 1.6 Different Wooden Road Generations

| Controls & Settings| Example Results |
|--|--|
|`{guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.2, 'depth': 0.0}`| <video src="assets/av_assets/av_realistic_wooden_road_3_10_0_f_2_0.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 0.6, 'seg': 0.4, 'seg_mask': False, 'vis': 0.0, 'depth': 0.0}`| <video src="assets/av_assets/av_realistic_wooden_road_7_6_4_f_0_0.mp4" width="300" controls></video> |

## 2. Generating Realistic Data from Omniverse

An important robotics workflow is "Sim-to-Real." NVIDIA Omniverse can generate synthetic data, but we can use CT 2.5 to add real-world domain randomization (new lighting, textures, backgrounds) and generate photorealistic scenes.

The Workflow:
1. Generate in Omniverse: Create a base scenario (e.g., cars driving around) and export the video.
2. Extract Ground Truth: From Omniverse, also export the perfect ground-truth modalities (Depth, Segmentation, Edge).
3. Augment with CT 2.5: Use these perfect synthetic controls to run CT 2.5 with a new prompt (e.g., "in a dimly lit snowy day").
4. Package with Cosmos Writer: Save the new, augmented video alongside the original, ground-truth controls. This teaches a downstream model to associate the ground-truth controls with the new, realistic style.


### Omniverse Control Modalities

We start with the following control modalities:

| Original Video | Edge | Seg | Depth |
|----------|----------|----------|----------|
| <video src="simulation_data/simulator_rgb_input.mp4" controls width="300"></video> | <video src="simulation_data/edge.mp4" controls width="300"></video> | <video src="simulation_data/seg.mp4" controls width="300"></video> | <video src="simulation_data/depth.mp4" controls width="300"></video> |


### Recipe 


| Task | Suggested Controls & Settings| Example Results | Prompt |
|--|--|---|-|
|Original Video| N/A | <video src="simulation_data/simulator_rgb_input.mp4" controls width="300"></video> | N/A |
|Fog|`{'guidance': 3.0, 'edge': 0.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/omniverse_generations_av_fog_3_0_0_f_0_10.mp4" width="300" controls></video> | [Prompt Location](simulation_data/fog.txt) |
|Morning Sunlight|`{'guidance': 3.0, 'edge': 0.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/omniverse_generations_av_morning_sun_3_0_0_f_0_10.mp4" width=300 controls></video> | [Prompt Location](simulation_data/morning_sun.txt) |

<!-- #### Example Results:
<div style="display: flex; gap: 20px;">
  <video src="TODO" width="45%" controls></video>
  <video src="TODO" width="45%" controls></video>
</div> -->

## 3. Prompt Generator for Scene Conditions
This module provides a configurable system for automatically generating natural-language prompts based on selected environmental, weather, and road-surface conditions. It is designed for data generation, augmentation workflows, or any pipeline where you want consistent, high-quality scene descriptions without manually rewriting prompts.

#### How It Works

The system uses:
- A SceneConfig dataclass
- Three condition dictionaries:
    - ENV_LIGHTING
    - WEATHER
    - ROAD_SURFACE
- A single function: generate_prompt(config)

It takes your base scene, inserts the selected conditions, and returns a polished final prompt. 

#### Code Structure:

```python
from dataclasses import dataclass
from typing import Optional, List

ENV_LIGHTING = { ... }
WEATHER = { ... }
ROAD_SURFACE = { ... }

@dataclass
class SceneConfig:
    base_scene: str
    env_lighting: Optional[str] = None
    weather: Optional[str] = None
    road_surface: Optional[str] = None
    extra_tags: Optional[List[str]] = None

def generate_prompt(config: SceneConfig) -> str:
    parts = [config.base_scene.strip()]
    if config.env_lighting: parts.append(f"The scene is {ENV_LIGHTING[config.env_lighting]}.")
    if config.weather: parts.append(WEATHER[config.weather])
    if config.road_surface: parts.append(ROAD_SURFACE[config.road_surface])
    parts.append("All visual elements should be consistent with these conditions.")
    return " ".join(p for p in parts if p)
```


You can find the full codebase at [src/prompt_generation.py](src/prompt_generation.py)

#### Example:

```python
config = SceneConfig(
    base_scene="A busy urban intersection with multiple vehicles.",
    env_lighting="sunrise",
    weather="fog",
    road_surface="wooden"
)

print(generate_prompt(config))
```

Output:
```
A busy urban intersection with multiple vehicles.
The scene is bathed in warm morning light.
A layer of fog softens distant structures.
The road surface is made of wooden planks.
All visual elements should be consistent with these conditions.
```

## 4. Additional Recipes
Didn't find something you were looking for? There's a bunch of examples in the [cosmos cookbook](https://nvidia-cosmos.github.io/cosmos-cookbook/)!