# AV Workshop: Cosmos Transfer 2.5
**Authors:** Aiden Chang, Akul Santhosh


This notebook is a hands on guide for Milestone data. The goal is for you to understand, create, and use the multi-control modalities that power Cosmos Transfer 2.5 (CT 2.5).

In [4]:
!huggingface-cli login --token "YOUR TOKEN HERE"


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `hf`CLI if you want to set the git credential as well.
Token is valid (permission: read).
The token `read_token` has been saved to /home/nvidia/.cache/huggingface/stored_tokens
Your token has been saved to /home/nvidia/.cache/huggingface/token
Login successful.
The current active token is: `read_token`


In [1]:
import os
os.makedirs("prompts", exist_ok=True)
os.makedirs("outputs", exist_ok=True)
os.makedirs("control_modalities", exist_ok=True)

## 1. Augmenting real AV data

### Control Modalities

We start with the following control modalities:

| Original Video | Edge | Seg | Depth | Vis |
|----------|----------|----------|----------|----------|
| <video src="av_data/output_fixed.mp4" controls width="300"></video> | <video src="av_data/0_edge.mp4" controls width="300"></video> | <video src="av_data/0_seg.mp4" controls width="300"></video> | <video src="av_data/0_depth.mp4" controls width="300"></video> | <video src="av_data/0_vis.mp4" controls width="300"></video> |


### Recipe 

| Task | Suggested Controls & Settings| Example Results | Prompt |
|--|--|---|-|
|Original Video| N/A | <video src="av_data/output_fixed.mp4" width="300" controls></video> | N/A |
|Fog|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_fog_3_5_0_f_0_10.mp4" width="300" controls></video> | [Prompt Location](av_data/clip_0_easier_prompts/fog.txt) |
|Morning Sunlight|`{'guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.9}`| <video src="assets/av_assets/av_realistic_morning_sun_3_10_0_f_0_9.mp4" width="300" controls></video> | [Prompt Location](av_data/clip_0_easier_prompts/morning_sun.txt) |
|Night|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_night_3_5_0_f_0_10.mp4" width="300" controls></video> |[Prompt Location](av_data/clip_0_easier_prompts/night.txt) |
|Rain|`{'guidance': 7.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_rain_3_9_0_f_0_10.mp4" width="300" controls></video> | [Prompt Location](simulation_data/rain.txt) |
|No Snow|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_no_snow_3_10_0_f_0_9.mp4" width="300" controls></video> | [Prompt Location](av_data/clip_0_easier_prompts/no_snow.txt) |
|Wooden Road|`{guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.2, 'depth': 0.0}`| <video src="assets/av_assets/av_realistic_wooden_road_3_10_0_f_2_0.mp4" width="300" controls></video> | [Prompt Location](av_data/clip_0_easier_prompts/wooden_road.txt) |
|Small Car|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.4, 'seg_mask': True, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_vehicle_small_car_3_5_4_t_0_10.mp4" width="300" controls></video> | [Prompt Location](av_data/clip_0_easier_prompts/small_car.txt) |
|Van|`{'guidance': 7.0, 'edge': 0.5, 'seg': 0.8, 'seg_mask': True, 'vis': 0.0, 'depth': 0.5}`| <video src="assets/av_assets/av_vehicle_van_7_5_8_t_0_5.mp4" width="300" controls></video> | [Prompt Location](av_data/clip_0_easier_prompts/van.txt) |





| Task | Suggested Controls & Settings| Example Results | Prompt |
|--|--|---|-|
|Original Video| N/A | <video src="assets/other_av_assets/clip_2_2.mp4" width="300" controls></video> | N/A |
|Color Change|`{'guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.2, 'depth': 0.0}`| <video src="assets/av_assets/clip_0_color_change.mp4" width="300" controls></video> | [Prompt Location](av_data/color_change.txt) |

### 1.1 Different Fog Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_fog_7_5_0_f_0_10-2.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_fog_3_5_0_f_0_10.mp4" width="300" controls></video> |



### 1.2 Different Morning sunlight Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_morning_sun_3_9_0_f_0_10-2.mp4" width="300" controls></video> |
|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_morning_sun_3_5_0_f_0_10.mp4" width="300" controls></video> |
|`{'guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.9}`| <video src="assets/av_assets/av_realistic_morning_sun_3_10_0_f_0_9.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_morning_sun_7_9_0_f_0_10.mp4" width="300" controls></video> |

### 1.3 Different Night Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_night_3_5_0_f_0_10.mp4" width="300" controls></video> |
|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_night_3_9_0_f_0_10.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_night_7_5_0_f_0_10.mp4" width="300" controls></video> |

### 1.4 Different Rain Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/av_realistic_rain_3_9_0_f_0_10.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.9}`| <video src="assets/av_assets/av_realistic_rain_7_10_0_f_0_9.mp4" width="300" controls></video> |


### 1.5 Different No Snow Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.0}`| <video src="assets/av_assets/av_realistic_no_snow_3_10_0_f_0_0.mp4" width="300" controls></video> |
|`{'guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.9}`| <video src="assets/av_assets/av_realistic_no_snow_3_10_0_f_0_9.mp4" width="300" controls></video> |


### 1.6 Different Wooden Road Generations

| Controls & Settings| Example Results |
|--|--|
|`{guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.2, 'depth': 0.0}`| <video src="assets/av_assets/av_realistic_wooden_road_3_10_0_f_2_0.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 0.6, 'seg': 0.4, 'seg_mask': False, 'vis': 0.0, 'depth': 0.0}`| <video src="assets/av_assets/av_realistic_wooden_road_7_6_4_f_0_0.mp4" width="300" controls></video> |

## 2. Hands on Example

We will be generating these examples using some of the AV data. 


| Ex. 1 | Ex. 2 | Ex. 3 | Ex. 4 |
|----------|----------|----------|----------|
| <video src="assets/other_av_assets/clip_2_1.mp4" controls width="300"></video> | <video src="assets/other_av_assets/clip_2_2.mp4" controls width="300"></video> | <video src="assets/other_av_assets/clip_3_1.mp4" controls width="300"></video> | <video src="assets/other_av_assets/clip_4_1.mp4" controls width="300"></video> |


### 2.1 Generating Snow

We will use the recommended recipe from section 1 to generate a snow augmentation. Feel free to tweak the parameters and prompt. 

Original Video:

<video src="assets/other_av_assets/clip_2_1.mp4" controls width="600"></video>

In [None]:
import os, json

## Recipe
prompt = '''
This video clip features an overhead, slightly elevated perspective capturing a complex, busy intersection controlled by traffic lights and multiple turning lanes during a snowy winter day. Various vehicles traverse the snow-dusted, icy pavement: a large black pickup truck drives across the foreground, quickly followed by a silver/gray pickup truck making a turn from the left, a dark blue/gray SUV proceeding straight through, and finally a white sedan crossing the frame. The intersection is framed on the left by snow-covered commercial buildings, but the majority of the background is dominated by a moderately steep, heavily wooded hillside completely blanketed in deep, untouched snow, emphasizing a stark, cold atmosphere under a gray, overcast sky.
'''

snow_augmentation = {
        "name": "ex_1_snow_augmentation", 
        "prompt_path": "../assets/other_av_assets/clip_1_1_snow_augment.txt", 
        "video_path": "../assets/other_av_assets/clip_2_1.mp4", 
        "guidance": 7, 
        "edge": {"control_weight": 0.5},
        "depth": {"control_weight": 1.0}
    }

dest_prompt_path = "assets/other_av_assets/clip_1_1_snow_augment.txt"
dest_script_path = "scripts/clip_1_1_snow_augment.jsonl"

os.makedirs(os.path.dirname(dest_prompt_path), exist_ok=True)
os.makedirs(os.path.dirname(dest_script_path), exist_ok=True)

with open(dest_prompt_path, "w", encoding="utf-8") as f:
    f.write(prompt.strip() + "\n")

with open(dest_script_path, "w", encoding="utf-8") as f:
    f.write(json.dumps(snow_augmentation) + "\n")

print("Files written successfully!")



Files written successfully!


In [None]:
# Run CT2.5 
# WARNING: This should take a couple of minutes
# !python cosmos_transfer2_5/examples/inference.py -i scripts/clip_1_1_snow_augment.jsonl -o outputs/clip_1_1_snow_augment 
!torchrun --nproc_per_node=8 --master_port=12341 cosmos_transfer2_5/examples/inference.py -i scripts/clip_1_1_snow_augment.jsonl -o outputs/clip_1_1_snow_augment --disable-guardrails

In [1]:
from IPython.display import HTML

HTML(f"""
<video width="600" controls>
  <source src="outputs/clip_1_1_snow_augment/ex_1_snow_augmentation.mp4" type="video/mp4">
</video>
""")

### 2.2 Generating Morning Sunlight

We will use the recommended recipe from section 1 to generate a lighting augmentation. Feel free to tweak the parameters and prompt. 

Original Video:

<video src="assets/other_av_assets/clip_2_2.mp4" controls width="600"></video>

In [12]:
## Recipe
prompt = '''
This video clip presents an overhead, slightly elevated view capturing a multi-lane intersection with active traffic lights, basking in the bright, warm glow of morning sunlight. Various modern SUVs and vans are moving through the intersection on dry pavement: a dark blue/gray Ford Edge crosses first, followed by a white Jeep Grand Cherokee, which is quickly succeeded by a white U.S. Postal Service (USPS) delivery van, and finally, a black Kia Carnival minivan. The scene is framed on the left by commercial buildings that catch the morning light, while the background is dominated by a moderately steep, dry wooded hillside, now illuminated by the gentle, golden rays of the early sun, giving the entire urban landscape a crisp, well-lit appearance.
'''

fog_augmentation = {
        "name": "ex_2_lighting_augmentation", 
        "prompt_path": "../assets/other_av_assets/clip_1_2_lighting_augment.txt", 
        "video_path": "../assets/other_av_assets/clip_2_2.mp4", 
        "guidance": 3, 
        "edge": {"control_weight": 1.0},
        "depth": {"control_weight": 0.9}
    }

dest_prompt_path = "assets/other_av_assets/clip_1_2_lighting_augment.txt"
dest_script_path = "scripts/clip_1_2_lighting_augment.jsonl"


os.makedirs(os.path.dirname(dest_prompt_path), exist_ok=True)
os.makedirs(os.path.dirname(dest_script_path), exist_ok=True)

with open(dest_prompt_path, "w", encoding="utf-8") as f:
    f.write(prompt.strip() + "\n")

with open(dest_script_path, "w", encoding="utf-8") as f:
    f.write(json.dumps(fog_augmentation) + "\n")

print("Files written successfully!")

Files written successfully!


In [None]:
# Run CT2.5 
# WARNING: This should take a couple of minutes
# !python cosmos_transfer2_5/examples/inference.py -i scripts/clip_1_2_lighting_augment.jsonl -o outputs/clip_1_2_lighting_augment 
!torchrun --nproc_per_node=8 --master_port=12341 cosmos_transfer2_5/examples/inference.py -i scripts/clip_1_2_lighting_augment.jsonl -o outputs/clip_1_2_lighting_augment --disable-guardrails

In [14]:
HTML(f"""
<video width="600" controls>
  <source src="outputs/clip_1_2_lighting_augment/ex_2_lighting_augmentation.mp4" type="video/mp4">
</video>
""")

### 2.3 Generating Night Conditions

We will use the recommended recipe from section 1 to generate a night augmentation. Feel free to tweak the parameters and prompt. 

Original Video:

<video src="assets/other_av_assets/clip_3_1.mp4" controls width="600"></video>

In [16]:
## Recipe
prompt = '''
This video clip captures a low-angle, eye-level view of heavy, slow-moving traffic stopped at a red light on a multi-lane highway on-ramp or service road, set dramatically at night. The road surface is wet, reflecting the harsh glare of streetlights, headlights, and the red glow of the traffic signal. The foreground is dominated by a tight queue of vehicles, including a dark pickup truck towing a trailer of equipment, a large semi-trailer truck, and several passenger cars. In the right lane of the main highway, a dark SUV is seen driving past the stopped traffic. The background consists of dark, indistinct commercial areas and a parking lot full of cars on the left, with the entire scene lit only by artificial lights against a dark, night sky, creating a contrast between the bright light sources and the deep shadows.
'''

fog_augmentation = {
        "name": "ex_3_night_augmentation", 
        "prompt_path": "../assets/other_av_assets/clip_3_1_night_augment.txt", 
        "video_path": "../assets/other_av_assets/clip_3_1.mp4", 
        "guidance": 3, 
        "edge": {"control_weight": 0.5},
        "depth": {"control_weight": 1.0}
    }


dest_prompt_path = "assets/other_av_assets/clip_3_1_night_augment.txt"
dest_script_path = "scripts/clip_3_1_night_augment.jsonl"


os.makedirs(os.path.dirname(dest_prompt_path), exist_ok=True)
os.makedirs(os.path.dirname(dest_script_path), exist_ok=True)

with open(dest_prompt_path, "w", encoding="utf-8") as f:
    f.write(prompt.strip() + "\n")

with open(dest_script_path, "w", encoding="utf-8") as f:
    f.write(json.dumps(fog_augmentation) + "\n")

print("Files written successfully!")

Files written successfully!


In [None]:
# Run CT2.5 
# WARNING: This should take a couple of minutes
# !python cosmos_transfer2_5/examples/inference.py -i scripts/clip_3_1_night_augment.jsonl -o outputs/clip_3_1_night_augment 
!torchrun --nproc_per_node=8 --master_port=12341 cosmos_transfer2_5/examples/inference.py -i scripts/clip_3_1_night_augment.jsonl -o outputs/clip_3_1_night_augment --disable-guardrails

In [18]:
HTML(f"""
<video width="600" controls>
  <source src="outputs/clip_3_1_night_augment/ex_3_night_augmentation.mp4" type="video/mp4">
</video>
""")

### 2.4 Generating Rain Conditions

We will use the recommended recipe from section 1 to generate a rain augmentation. Feel free to tweak the parameters and prompt. 

Original Video:

<video src="assets/other_av_assets/clip_4_1.mp4" controls width="600"></video>

In [22]:
## Recipe
prompt = '''
This video clip captures a mid-level view of a multi-lane, curved urban street with heavy, slow-moving traffic in both directions, set during a downpour of heavy rain. The road is completely soaked, with streams and puddles of water visible, reflecting the distorted lights of the surrounding vehicles. The focus is on the vehicles: a white luxury crossover, a dark gray sedan, and an older, dark pickup truck are prominent in the foreground of the lane heading toward the viewer. The opposite lane is a solid queue of cars waiting in traffic. Pedestrians carrying umbrellas are visible walking along the sidewalks on both sides of the street, which is lined with wet, dark-barked trees and utility poles. The heavy rain creates a dark, saturated, and highly reflective atmosphere across the entire scene.
'''

fog_augmentation = {
        "name": "ex_4_rain_augmentation", 
        "prompt_path": "../assets/other_av_assets/clip_4_1_rain_augment.txt", 
        "video_path": "../assets/other_av_assets/clip_4_1.mp4", 
        "guidance": 7, 
        "edge": {"control_weight": 0.5},
        "depth": {"control_weight": 1.0}
    }


dest_prompt_path = "assets/other_av_assets/clip_4_1_rain_augment.txt"
dest_script_path = "scripts/clip_4_1_rain_augment.jsonl"


os.makedirs(os.path.dirname(dest_prompt_path), exist_ok=True)
os.makedirs(os.path.dirname(dest_script_path), exist_ok=True)

with open(dest_prompt_path, "w", encoding="utf-8") as f:
    f.write(prompt.strip() + "\n")

with open(dest_script_path, "w", encoding="utf-8") as f:
    f.write(json.dumps(fog_augmentation) + "\n")

print("Files written successfully!")

Files written successfully!


In [None]:
# Run CT2.5 
# WARNING: This should take a couple of minutes
# !python cosmos_transfer2_5/examples/inference.py -i scripts/clip_4_1_rain_augment.jsonl -o outputs/clip_4_1_rain_augment 
!torchrun --nproc_per_node=8 --master_port=12341 cosmos_transfer2_5/examples/inference.py -i scripts/clip_4_1_rain_augment.jsonl -o outputs/clip_4_1_rain_augment --disable-guardrails

In [24]:
HTML(f"""
<video width="600" controls>
  <source src="outputs/clip_4_1_rain_augment/ex_4_rain_augmentation.mp4" type="video/mp4">
</video>
""")

### 2.5 Changing Colors of the cars

We will use the recommended recipe from section 1 to generate a color augmentation. Feel free to tweak the parameters and prompt. 

Original Video:

<video src="assets/other_av_assets/clip_2_2.mp4" controls width="600"></video>

In [None]:
## Recipe
prompt = '''
The video captures a busy multi-lane intersection where vehicles of different types and colors move through in various directions. Pickup trucks, sedans, and other cars wait at traffic lights or make turns as signals change. The scene shows the typical flow of urban traffic orderly yet dynamic with each vehicle maneuvering through the intersection according to the lights and lanes. There are multiple blue SUVs moving therough the intersection. The surroundings include open roads, nearby buildings, and some grassy areas, reflecting a routine moment of daytime traffic activity.
'''

fog_augmentation = {
        "name": "ex_5_color_augmentation", 
        "prompt_path": "../av_data/color_change.txt", 
        "video_path": "../assets/other_av_assets/clip_2_2.mp4", 
        "guidance": 3, 
        "edge": {"control_weight": 1.0},
        "vis": {"control_weight": 0.2}
    }


dest_prompt_path = "av_data/color_change.txt"
dest_script_path = "scripts/clip_2_2_color_augment.jsonl"


os.makedirs(os.path.dirname(dest_prompt_path), exist_ok=True)
os.makedirs(os.path.dirname(dest_script_path), exist_ok=True)

with open(dest_prompt_path, "w", encoding="utf-8") as f:
    f.write(prompt.strip() + "\n")

with open(dest_script_path, "w", encoding="utf-8") as f:
    f.write(json.dumps(fog_augmentation) + "\n")

print("Files written successfully!")

NameError: name 'os' is not defined

In [None]:
# Run CT2.5 
# WARNING: This should take a couple of minutes
# !python cosmos_transfer2_5/examples/inference.py -i scripts/clip_2_2_color_augment.jsonl -o outputs/clip_2_2_color_augment 
!torchrun --nproc_per_node=8 --master_port=12341 cosmos_transfer2_5/examples/inference.py -i scripts/clip_2_2_color_augment.jsonl -o outputs/clip_2_2_color_augment --disable-guardrails

In [2]:
HTML(f"""
<video width="600" controls>
  <source src="outputs/clip_2_2_color_augment/ex_5_color_augmentation.mp4" type="video/mp4">
</video>
""")

### 2.6 Changing the cars

We will use the recommended recipe from section 1 to generate a car augmentation. Specifically, we will transform the car in the video to a smaller car. 

Original Video:

<video src="av_data/output_fixed.mp4" controls width="600"></video>

In [None]:
## Recipe
prompt = '''
A high-angle, static wide shot captures a suburban residential street on a gloomy winter day, where lawns and sidewalks are blanketed in snow and plowed drifts line the curbs. The worn asphalt road, marked by double yellow lines and a large yellow triangular symbol in the foreground, stretches into the distance flanked by tall, bare trees. Traffic is light but active; a compact blue Toyota Corolla hatchback, small and distinctly low to the ground, drives down the center lane toward the camera. A brown delivery truck heads away into the background on the right, while a white sedan briefly passes through the bottom-left corner out of frame. A silver car remains parked in a driveway on the left, completing this scene of everyday winter life under flat, overcast lighting.
'''

fog_augmentation = {
        "name": "ex_6_car_augmentation", 
        "prompt_path": "../av_data/clip_0_easier_prompts/small_car.txt", 
        "video_path": "../av_data/output_fixed.mp4", 
        "guidance": 3, 
        "edge": {"control_weight": 0.5},
        "seg": {"control_weight": 0.4},
        "depth": {"control_weight": 1.0}
    }

dest_prompt_path = "av_data/clip_0_easier_prompts/small_car.txt"
dest_script_path = "scripts/clip_0_car_augment.jsonl"


os.makedirs(os.path.dirname(dest_prompt_path), exist_ok=True)
os.makedirs(os.path.dirname(dest_script_path), exist_ok=True)

with open(dest_prompt_path, "w", encoding="utf-8") as f:
    f.write(prompt.strip() + "\n")

with open(dest_script_path, "w", encoding="utf-8") as f:
    f.write(json.dumps(fog_augmentation) + "\n")

print("Files written successfully!")

Files written successfully!


In [None]:
# Run CT2.5 
# WARNING: This should take a couple of minutes
# !python cosmos_transfer2_5/examples/inference.py -i scripts/clip_0_car_augment.jsonl -o outputs/clip_0_car_augment 
!torchrun --nproc_per_node=8 --master_port=12341 cosmos_transfer2_5/examples/inference.py -i scripts/clip_0_car_augment.jsonl -o outputs/clip_0_car_augment --disable-guardrails

In [4]:
HTML(f"""
<video width="600" controls>
  <source src="outputs/clip_0_car_augment/ex_6_car_augmentation.mp4" type="video/mp4">
</video>
""")

## 3. Generating Realistic Data from Omniverse

An important robotics workflow is "Sim-to-Real." NVIDIA Omniverse can generate synthetic data, but we can use CT 2.5 to add real-world domain randomization (new lighting, textures, backgrounds) and generate photorealistic scenes.

The Workflow:
1. Generate in Omniverse: Create a base scenario (e.g., cars driving around) and export the video.
2. Extract Ground Truth: From Omniverse, also export the perfect ground-truth modalities (Depth, Segmentation, Edge).
3. Augment with CT 2.5: Use these perfect synthetic controls to run CT 2.5 with a new prompt (e.g., "in a dimly lit snowy day").
4. Package with Cosmos Writer: Save the new, augmented video alongside the original, ground-truth controls. This teaches a downstream model to associate the ground-truth controls with the new, realistic style.


### Omniverse Control Modalities

We start with the following control modalities:

| Original Video | Edge | Seg | Depth |
|----------|----------|----------|----------|
| <video src="simulation_data/simulator_rgb_input.mp4" controls width="300"></video> | <video src="simulation_data/edge.mp4" controls width="300"></video> | <video src="simulation_data/seg.mp4" controls width="300"></video> | <video src="simulation_data/depth.mp4" controls width="300"></video> |


### Recipe 


| Task | Suggested Controls & Settings| Example Results | Prompt |
|--|--|---|-|
|Original Video| N/A | <video src="simulation_data/simulator_rgb_input.mp4" controls width="300"></video> | N/A |
|Fog|`{'guidance': 3.0, 'edge': 0.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/omniverse_generations_av_fog_3_0_0_f_0_10.mp4" width="300" controls></video> | [Prompt Location](simulation_data/fog.txt) |
|Morning Sunlight|`{'guidance': 3.0, 'edge': 0.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/av_assets/omniverse_generations_av_morning_sun_3_0_0_f_0_10.mp4" width=300 controls></video> | [Prompt Location](simulation_data/morning_sun.txt) |

<!-- #### Example Results:
<div style="display: flex; gap: 20px;">
  <video src="TODO" width="45%" controls></video>
  <video src="TODO" width="45%" controls></video>
</div> -->

## 4. Prompt Generator for Scene Conditions
This module provides a configurable system for automatically generating natural-language prompts based on selected environmental, weather, and road-surface conditions. It is designed for data generation, augmentation workflows, or any pipeline where you want consistent, high-quality scene descriptions without manually rewriting prompts.

#### How It Works

The system uses:
- A SceneConfig dataclass
- Three condition dictionaries:
    - ENV_LIGHTING
    - WEATHER
    - ROAD_SURFACE
- A single function: generate_prompt(config)

It takes your base scene, inserts the selected conditions, and returns a polished final prompt. 

#### Code Structure:

```python
from dataclasses import dataclass
from typing import Optional, List

ENV_LIGHTING = { ... }
WEATHER = { ... }
ROAD_SURFACE = { ... }

@dataclass
class SceneConfig:
    base_scene: str
    env_lighting: Optional[str] = None
    weather: Optional[str] = None
    road_surface: Optional[str] = None
    extra_tags: Optional[List[str]] = None

def generate_prompt(config: SceneConfig) -> str:
    parts = [config.base_scene.strip()]
    if config.env_lighting: parts.append(f"The scene is {ENV_LIGHTING[config.env_lighting]}.")
    if config.weather: parts.append(WEATHER[config.weather])
    if config.road_surface: parts.append(ROAD_SURFACE[config.road_surface])
    parts.append("All visual elements should be consistent with these conditions.")
    return " ".join(p for p in parts if p)
```


You can find the full codebase at [src/prompt_generation.py](src/prompt_generation.py)

#### Example:

```python
config = SceneConfig(
    base_scene="A busy urban intersection with multiple vehicles.",
    env_lighting="sunrise",
    weather="fog",
    road_surface="wooden"
)

print(generate_prompt(config))
```

Output:
```
A busy urban intersection with multiple vehicles.
The scene is bathed in warm morning light.
A layer of fog softens distant structures.
The road surface is made of wooden planks.
All visual elements should be consistent with these conditions.
```

## 4. Additional Recipes
Didn't find something you were looking for? There's a bunch of examples in the [cosmos cookbook](https://nvidia-cosmos.github.io/cosmos-cookbook/)!