# Intelligent Cities Workshop: Cosmos Transfer 2.5
**Authors:** Aiden Chang, Akul Santhosh


This notebook is a hands on guide for city data. The goal is for you to understand, create, and use the multi-control modalities that power Cosmos Transfer 2.5 (CT 2.5).

In [4]:
!huggingface-cli login --token "YOUR TOKEN HERE"


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `hf`CLI if you want to set the git credential as well.
Token is valid (permission: read).
The token `read_token` has been saved to /home/nvidia/.cache/huggingface/stored_tokens
Your token has been saved to /home/nvidia/.cache/huggingface/token
Login successful.
The current active token is: `read_token`


In [2]:
import os
os.makedirs("prompts", exist_ok=True)
os.makedirs("outputs", exist_ok=True)
os.makedirs("control_modalities", exist_ok=True)

## 1. Augmenting real city data

### Control Modalities

We start with the following control modalities:

| Original Video | Edge | Seg | Depth | Vis |
|----------|----------|----------|----------|----------|
| <video src="assets/vs_assets/clip_1_short.mp4" controls width="300"></video> | <video src="assets/vs_assets/clip_1_edge.mp4" controls width="300"></video> | <video src="assets/vs_assets/clip_1_seg.mp4" controls width="300"></video> | <video src="assets/vs_assets/clip_1_depth.mp4" controls width="300"></video> | <video src="assets/vs_assets/clip_1_vis.mp4" controls width="300"></video> |


### Recipe 

| Task | Suggested Controls & Settings| Example Results | Prompt |
|--|--|---|-|
|Original Video| N/A | <video src="assets/vs_assets/clip_1_short.mp4" width="300" controls></video> | N/A |
|Fog|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/vs_fog.mp4" width="300" controls></video> | [Prompt Location](assets/vs_assets/clip_1_fog.txt) |
|Morning Sunlight|`{'guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.9}`| <video src="assets/vs_assets/results/vs_morning_sun.mp4" width="300" controls></video> | [Prompt Location](assets/vs_assets/clip_1_morning_sun.txt) |
|Night|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/vs_night.mp4" width="300" controls></video> |[Prompt Location](assets/vs_assets/clip_1_night.txt) |
|Rain|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/vs_rain.mp4" width="300" controls></video> | [Prompt Location](assets/vs_assets/clip_1_rain.txt) |
|Snow|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/vs_snow.mp4" width="300" controls></video> | [Prompt Location](assets/vs_assets/clip_1_snow.txt) |
|Wooden Road|`{'guidance': 7.0, 'edge': 0.6, 'seg': 0.4, 'seg_mask': False, 'vis': 0.0, 'depth': 0.0}`| <video src="assets/vs_assets/results/wooden_road_1.mp4" width="300" controls></video> | [Prompt Location](assets/vs_assets/clip_1_road.txt) |
|Small Car|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.4, 'seg_mask': True, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/small_car.mp4" width="300" controls></video> | [Prompt Location](assets/vs_assets/clip_1_small_car.txt) |
|Van|`{'guidance': 7.0, 'edge': 0.5, 'seg': 0.8, 'seg_mask': True, 'vis': 0.0, 'depth': 0.5}`| <video src="assets/vs_assets/results/van.mp4" width="300" controls></video> | [Prompt Location](assets/vs_assets/clip_1_van.txt) |
|People Generation|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.7, 'seg_mask': False, 'vis': 0.0, 'depth': 0.5}`| <video src="assets/vs_assets/results/people_generation.mp4" width="300" controls></video> | [Prompt Location](assets/vs_assets/clip_1_people_generation.txt) |




### 1.1 Different Fog Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/vs_fog.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/vs_fog_1.mp4" width="300" controls></video> |



### 1.2 Different Morning sunlight Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/vs_morning_sun_1.mp4" width="300" controls></video> |
|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/vs_morning_sun_2.mp4" width="300" controls></video> |
|`{'guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.9}`| <video src="assets/vs_assets/results/vs_morning_sun.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/vs_morning_sun_3.mp4" width="300" controls></video> |

### 1.3 Different Night Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/vs_night.mp4" width="300" controls></video> |
|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/vs_night_1.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 0.5, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/vs_night_2.mp4" width="300" controls></video> |

### 1.4 Different Rain Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 0.9, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 1.0}`| <video src="assets/vs_assets/results/vs_rain.mp4" width="300" controls></video> |
|`{'guidance': 7.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.9}`| <video src="assets/vs_assets/results/vs_rain_1.mp4" width="300" controls></video> |


### 1.5 Different Snow Generations

| Controls & Settings| Example Results |
|--|--|
|`{'guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.0}`| <video src="assets/vs_assets/results/vs_snow_1.mp4" width="300" controls></video> |
|`{'guidance': 3.0, 'edge': 1.0, 'seg': 0.0, 'seg_mask': False, 'vis': 0.0, 'depth': 0.9}`| <video src="assets/vs_assets/results/vs_snow.mp4" width="300" controls></video> |


## 2. Hands on Example

We will be generating these examples using some of the example data. 


| Ex. 1 | Ex. 2 | Ex. 3 | Ex. 4 |
|----------|----------|----------|----------|
| <video src="assets/vs_assets/clip_1_short.mp4" controls width="300"></video> | <video src="assets/vs_assets/clip_2_short.mp4" controls width="300"></video> | <video src="assets/vs_assets/clip_3_short.mp4" controls width="300"></video> | <video src="assets/vs_assets/clip_4_short.mp4" controls width="300"></video> |


### 2.1 Generating Snow

We will use the recommended recipe from section 1 to generate a snow augmentation. Feel free to tweak the parameters and prompt. 

Original Video:

<video src="assets/vs_assets/clip_1_short.mp4" controls width="600"></video>

In [None]:
import os, json, sys

## Recipe
prompt = '''
A video of a winding four-lane divided highway cutting through a rural landscape of rolling hills blanketed in snow, with patches of icy pavement and snowbanks lining the shoulders. Leafless trees are dusted with fresh snow. A white sedan travels away from the camera in the right lane. The scene captures the flow of light traffic under a cold, gray, overcast winter sky.
'''

snow_augmentation = {
        "name": "ex_1_snow_augmentation", 
        "prompt_path": "../assets/other_av_assets/clip_1_snow_augment.txt", 
        "video_path": "../assets/vs_assets/clip_1_short.mp4", 
        "guidance": 3, 
        "edge": {"control_weight": 0.5},
        "depth": {"control_weight": 1.0}
    }

dest_prompt_path = "assets/other_av_assets/clip_1_snow_augment.txt"
dest_script_path = "scripts/clip_1_snow_augment.jsonl"

os.makedirs(os.path.dirname(dest_prompt_path), exist_ok=True)
os.makedirs(os.path.dirname(dest_script_path), exist_ok=True)

with open(dest_prompt_path, "w", encoding="utf-8") as f:
    f.write(prompt.strip() + "\n")

with open(dest_script_path, "w", encoding="utf-8") as f:
    f.write(json.dumps(snow_augmentation) + "\n")

print("Files written successfully!")



Files written successfully!


In [None]:
# Run CT2.5 
# WARNING: This should take a couple of minutes
!{sys.executable} cosmos_transfer2_5/examples/inference.py -i scripts/clip_1_snow_augment.jsonl -o outputs/clip_1_snow_augment 
# !torchrun --nproc_per_node=8 --master_port=12341 cosmos_transfer2_5/examples/inference.py -i scripts/clip_1_snow_augment.jsonl -o outputs/clip_1_snow_augment --disable-guardrails

In [2]:
from IPython.display import HTML

HTML(f"""
<video width="600" controls>
  <source src="outputs/clip_1_snow_augment/ex_1_snow_augmentation.mp4" type="video/mp4">
</video>
""")

### 2.2 Generating Morning Sunlight

We will use the recommended recipe from section 1 to generate a lighting augmentation. Feel free to tweak the parameters and prompt. 

Original Video:

<video src="assets/vs_assets/clip_2_short.mp4" controls width="600"></video>

In [6]:
## Recipe
prompt = '''
A video overlooking a roadway intersection in bright morning sunlight. Soft golden light casts long, gentle shadows across the pavement, replacing the earlier overcast atmosphere. In the foreground, a black SUV navigates a sweeping curved lane moving from right to left. Beyond a grassy median, a silver sedan travels along a multi-lane main road that runs past a large concrete building and leafless trees. The scene captures a quiet suburban traffic flow, with crisp visibility and the highway stretching into the distance under a clear early-day sky.
'''

fog_augmentation = {
        "name": "ex_2_lighting_augmentation", 
        "prompt_path": "../assets/other_av_assets/clip_2_lighting_augment.txt", 
        "video_path": "../assets/vs_assets/clip_2_short.mp4", 
        "guidance": 3, 
        "edge": {"control_weight": 1.0},
        "depth": {"control_weight": 0.9}
    }

dest_prompt_path = "assets/other_av_assets/clip_2_lighting_augment.txt"
dest_script_path = "scripts/clip_2_lighting_augment.jsonl"


os.makedirs(os.path.dirname(dest_prompt_path), exist_ok=True)
os.makedirs(os.path.dirname(dest_script_path), exist_ok=True)

with open(dest_prompt_path, "w", encoding="utf-8") as f:
    f.write(prompt.strip() + "\n")

with open(dest_script_path, "w", encoding="utf-8") as f:
    f.write(json.dumps(fog_augmentation) + "\n")

print("Files written successfully!")

Files written successfully!


In [None]:
# Run CT2.5 
# WARNING: This should take a couple of minutes
!{sys.executable} cosmos_transfer2_5/examples/inference.py -i scripts/clip_2_lighting_augment.jsonl -o outputs/clip_2_lighting_augment 
# !torchrun --nproc_per_node=8 --master_port=12341 cosmos_transfer2_5/examples/inference.py -i scripts/clip_2_lighting_augment.jsonl -o outputs/clip_2_lighting_augment --disable-guardrails

In [9]:
HTML(f"""
<video width="600" controls>
  <source src="outputs/clip_2_lighting_augment/ex_2_lighting_augmentation.mp4" type="video/mp4">
</video>
""")

### 2.3 Generating Night Conditions

We will use the recommended recipe from section 1 to generate a night augmentation. Feel free to tweak the parameters and prompt. 

Original Video:

<video src="assets/vs_assets/clip_3_short.mp4" controls width="600"></video>

In [10]:
## Recipe
prompt = '''
A video looking down at a busy multi-lane intersection at night. Streetlights and traffic signals illuminate the scene, casting pools of warm light and reflections on the dark asphalt. Traffic accelerates forward from the stop line, led by a dark gray sedan and a silver sedan, followed closely by a black muscle car with distinctive white racing stripes. To the right, a black SUV turns onto the cross street, passing a red pickup truck parked on the shoulder. In the distance, a large white FedEx truck travels beneath a metal overhead gantry, its headlights and taillights glowing against embankments of dry grass and leafless trees silhouetted in the darkness.
'''

fog_augmentation = {
        "name": "ex_3_night_augmentation", 
        "prompt_path": "../assets/other_av_assets/clip_3_night_augment.txt", 
        "video_path": "../assets/vs_assets/clip_3_short.mp4", 
        "guidance": 3, 
        "edge": {"control_weight": 0.5},
        "depth": {"control_weight": 1.0}
    }


dest_prompt_path = "assets/other_av_assets/clip_3_night_augment.txt"
dest_script_path = "scripts/clip_3_night_augment.jsonl"


os.makedirs(os.path.dirname(dest_prompt_path), exist_ok=True)
os.makedirs(os.path.dirname(dest_script_path), exist_ok=True)

with open(dest_prompt_path, "w", encoding="utf-8") as f:
    f.write(prompt.strip() + "\n")

with open(dest_script_path, "w", encoding="utf-8") as f:
    f.write(json.dumps(fog_augmentation) + "\n")

print("Files written successfully!")

Files written successfully!


In [None]:
# Run CT2.5 
# WARNING: This should take a couple of minutes
!{sys.executable} cosmos_transfer2_5/examples/inference.py -i scripts/clip_3_night_augment.jsonl -o outputs/clip_3_night_augment 
# !torchrun --nproc_per_node=8 --master_port=12341 cosmos_transfer2_5/examples/inference.py -i scripts/clip_3_night_augment.jsonl -o outputs/clip_3_night_augment --disable-guardrails

In [12]:
HTML(f"""
<video width="600" controls>
  <source src="outputs/clip_3_night_augment/ex_3_night_augmentation.mp4" type="video/mp4">
</video>
""")

### 2.4 Generating Rain Conditions

We will use the recommended recipe from section 1 to generate a rain augmentation. Feel free to tweak the parameters and prompt. 

Original Video:

<video src="assets/vs_assets/clip_4_short.mp4" controls width="600"></video>

In [6]:
import os, json
## Recipe
prompt = '''A video overlooking a wide bridge during steady rain. The roadway is darkened and slick with water, reflecting headlights and taillights across multiple lanes of traffic. The weather is gloomy and rainy. 
'''

fog_augmentation = {
        "name": "ex_4_rain_augmentation", 
        "prompt_path": "../assets/other_av_assets/clip_4_rain_augment.txt", 
        "video_path": "../assets/vs_assets/clip_4_short.mp4", 
        "guidance": 3, 
        "edge": {"control_weight": 1.0},
        "depth": {"control_weight": 0.9}
    }


dest_prompt_path = "assets/other_av_assets/clip_4_rain_augment.txt"
dest_script_path = "scripts/clip_4_rain_augment.jsonl"


os.makedirs(os.path.dirname(dest_prompt_path), exist_ok=True)
os.makedirs(os.path.dirname(dest_script_path), exist_ok=True)

with open(dest_prompt_path, "w", encoding="utf-8") as f:
    f.write(prompt.strip() + "\n")

with open(dest_script_path, "w", encoding="utf-8") as f:
    f.write(json.dumps(fog_augmentation) + "\n")

print("Files written successfully!")

Files written successfully!


In [7]:
# Run CT2.5 
# WARNING: This should take a couple of minutes
# !python cosmos_transfer2_5/examples/inference.py -i scripts/clip_4_rain_augment.jsonl -o outputs/clip_4_rain_augment 
!torchrun --nproc_per_node=8 --master_port=12341 cosmos_transfer2_5/examples/inference.py -i scripts/clip_4_rain_augment.jsonl -o outputs/clip_4_rain_augment --disable-guardrails

W1217 00:15:01.907000 142133 torch/distributed/run.py:766] 
W1217 00:15:01.907000 142133 torch/distributed/run.py:766] *****************************************
W1217 00:15:01.907000 142133 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W1217 00:15:01.907000 142133 torch/distributed/run.py:766] *****************************************
[[32m12-17 00:15:09[0m|[1mINFO[0m|[36mcosmos_transfer2_5/cosmos_transfer2/_src/imaginaire/utils/checkpoint_db.py:171:path[0m] Downloading checkpoint nvidia/Cosmos-Transfer2.5-2B/general/edge(ecd0ba00-d598-4f94-aa09-e8627899c431)
[[32m12-17 00:15:09[0m|[1mINFO[0m|[36mcosmos_transfer2_5/cosmos_transfer2/_src/imaginaire/utils/checkpoint_db.py:96:_download[0m] Downloading checkpoint file from Hugging Face with {'repo_id': 'nvidia/Cosmos-Transfer2.5-2B', 're

In [8]:
HTML(f"""
<video width="600" controls>
  <source src="outputs/clip_4_rain_augment/ex_4_rain_augmentation.mp4" type="video/mp4">
</video>
""")

### 2.5 Adding debris


Original Video:

<video src="assets/vs_assets/clip_1_short.mp4" controls width="600"></video>

In [9]:
## Recipe
prompt = '''
A video of a winding four-lane divided highway cutting through a rural landscape of rolling hills, dry brown grass, and leafless trees under a gray, overcast sky. A large brown bear stands in the middle of the roadway near the center divide, facing slightly toward the oncoming lanes.
'''

fog_augmentation = {
        "name": "ex_5_object_augmentation", 
        "prompt_path": "../av_data/other_av_assets/object_on_road.txt", 
        "video_path": "../assets/vs_assets/clip_1_short.mp4", 
        "guidance": 7, 
        "edge": {"control_weight": 0.5},
        "seg": {"control_weight": 0.8},
        "depth": {"control_weight": 0.4}
    }


dest_prompt_path = "av_data/other_av_assets/object_on_road.txt"
dest_script_path = "scripts/object_on_road.jsonl"


os.makedirs(os.path.dirname(dest_prompt_path), exist_ok=True)
os.makedirs(os.path.dirname(dest_script_path), exist_ok=True)

with open(dest_prompt_path, "w", encoding="utf-8") as f:
    f.write(prompt.strip() + "\n")

with open(dest_script_path, "w", encoding="utf-8") as f:
    f.write(json.dumps(fog_augmentation) + "\n")

print("Files written successfully!")

Files written successfully!


In [None]:
# Run CT2.5 
# WARNING: This should take a couple of minutes
!{sys.executable} cosmos_transfer2_5/examples/inference.py -i scripts/object_on_road.jsonl -o outputs/object_on_road 
# !torchrun --nproc_per_node=8 --master_port=12341 cosmos_transfer2_5/examples/inference.py -i scripts/object_on_road.jsonl -o outputs/object_on_road --disable-guardrails

W1216 17:26:51.291000 3966712 torch/distributed/run.py:766] 
W1216 17:26:51.291000 3966712 torch/distributed/run.py:766] *****************************************
W1216 17:26:51.291000 3966712 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W1216 17:26:51.291000 3966712 torch/distributed/run.py:766] *****************************************
[[32m12-16 17:26:59[0m|[1mINFO[0m|[36mcosmos_transfer2_5/cosmos_transfer2/_src/imaginaire/utils/checkpoint_db.py:171:path[0m] Downloading checkpoint nvidia/Cosmos-Transfer2.5-2B/general/edge(ecd0ba00-d598-4f94-aa09-e8627899c431)
[[32m12-16 17:26:59[0m|[1mINFO[0m|[36mcosmos_transfer2_5/cosmos_transfer2/_src/imaginaire/utils/checkpoint_db.py:96:_download[0m] Downloading checkpoint file from Hugging Face with {'repo_id': 'nvidia/Cosmos-Transfer2.5-2B',

In [3]:
HTML(f"""
<video width="600" controls>
  <source src="outputs/object_on_road/ex_5_object_augmentation.mp4" type="video/mp4">
</video>
""")

## 3. Prompt Generator for Scene Conditions
This module provides a configurable system for automatically generating natural-language prompts based on selected environmental, weather, and road-surface conditions. It is designed for data generation, augmentation workflows, or any pipeline where you want consistent, high-quality scene descriptions without manually rewriting prompts.

#### How It Works

The system uses:
- A SceneConfig dataclass
- Three condition dictionaries:
    - ENV_LIGHTING
    - WEATHER
    - ROAD_SURFACE
- A single function: generate_prompt(config)

It takes your base scene, inserts the selected conditions, and returns a polished final prompt. 

#### Code Structure:

```python
from dataclasses import dataclass
from typing import Optional, List

ENV_LIGHTING = { ... }
WEATHER = { ... }
ROAD_SURFACE = { ... }

@dataclass
class SceneConfig:
    base_scene: str
    env_lighting: Optional[str] = None
    weather: Optional[str] = None
    road_surface: Optional[str] = None
    extra_tags: Optional[List[str]] = None

def generate_prompt(config: SceneConfig) -> str:
    parts = [config.base_scene.strip()]
    if config.env_lighting: parts.append(f"The scene is {ENV_LIGHTING[config.env_lighting]}.")
    if config.weather: parts.append(WEATHER[config.weather])
    if config.road_surface: parts.append(ROAD_SURFACE[config.road_surface])
    parts.append("All visual elements should be consistent with these conditions.")
    return " ".join(p for p in parts if p)
```


You can find the full codebase at [src/prompt_generation.py](src/prompt_generation.py)

#### Example:

```python
config = SceneConfig(
    base_scene="A busy urban intersection with multiple vehicles.",
    env_lighting="sunrise",
    weather="fog",
    road_surface="wooden"
)

print(generate_prompt(config))
```

Output:
```
A busy urban intersection with multiple vehicles.
The scene is bathed in warm morning light.
A layer of fog softens distant structures.
The road surface is made of wooden planks.
All visual elements should be consistent with these conditions.
```

## 4. Additional Recipes
Didn't find something you were looking for? There's a bunch of examples in the [cosmos cookbook](https://nvidia-cosmos.github.io/cosmos-cookbook/)!