# Synthetic Data Generation for Smart Cities using CARLA and Cosmos

## Overview 

This guide demonstrates how to leverage the opensource CARLA simulator to simulate various kinds of traffic patterns and incidents at a variety of map locations.

The simulations are then augmented to generate datasets needed for training and finetuning models. 


<img src="../data/docs/main_workflow.png" width="1156" height="408">

This notebook assumes all the necessary NIMs and Carla server are spun up in order to consume as part of the docker compose example. Here we will now do some simple checks to see if the required services are available to proceed.

In [None]:
! curl -fsS http://$NIM_HOST:$VLM_PORT/v1/health/ready | grep -q "Service is ready." && echo "VLM is healthy" || echo "VLM is unhealthy"
! curl -fsS http://$NIM_HOST:$LLM_PORT/v1/health/ready | grep -q "Service is ready." && echo "LLM is healthy" || echo "LLM is unhealthy"
! curl -fsS http://$NIM_HOST:$TRANSFER_GRADIO_PORT/ | grep -q 'window.gradio_config' && echo "Transfer gradio is healthy" || echo "Transfer gradio is unhealthy"
! nc -zv $CARLA_HOST $CARLA_PORT | echo "Carla server is healthy" || echo "Carla server is unhealthy"

In [None]:
# Import necessary packages for the overall flow 
import os
import json
import yaml
import glob 
import matplotlib.pyplot as plt
from PIL import Image
import sys
sys.path.append('/workspace/modules/carla-ground-truth-generation')
from som import process_video

## Stage 1 - Generate Ground Truth 

This workflow uses the open source [Carla](https://carla.org/) simulator to simulate various kinds of traffic patterns and incidents at a variety of map locations. The current SDG release is based on Carla 0.9.16. This stage takes in 3 pieces of information: An unreal engine map to run the simulation in, a scenario log containing the actor playback information (car/pedestrian movements), and a sensor config that defines where the cameras are placed and what info they should record. Samples of all 3 of these files can be found in this repo, please see the README for more information on creating your own. 

<img src="../data/docs/Stage1.png" width="438" height="324">

The notebook comes with a default global config that helps control the parameters needed for ground truth generation. Ensure to adjust it based on the requirements for SDG.

In [None]:
%%writefile /tmp/wf-config.json
{
    "host": "localhost",
    "port": 2000,
    "timeout": 360.0,
    "time_factor": 1.0,
    "generate_videos": true,
    "limit_distance": 100.0,
    "area_threshold": 100,
    "class_filter_config": "config/filter_semantic_classes.yaml",
    "ignore_hero": false,
    "move_spectator": false,
    "detect_collisions": true,
    "output_dir": ""
}

In [None]:
# Setup some environment variables that will be needed later
%env SCENARIO_DIR=/workspace/data/examples
%env CARLA_OUTPUT_DIR=/workspace/data/outputs/CARLA
%env COSMOS_OUTPUT_DIR=/workspace/data/outputs/Cosmos
%env POSTPROCESS_OUTPUT_DIR=/workspace/data/outputs/
%env RUN_ID=default_run

Now that all of the proper prerequisites are set, we can run the simulation and generate ground truth data. This will run simulation of all log files in SCENARIO_DIR

In [None]:
!/workspace/modules/carla-ground-truth-generation/batch_processing.sh

Once our generation is done we can view a snapshot of the data.

In [None]:
# Set the base directory to your ground truth folder
GROUND_TRUTH_DIR = os.path.join(os.getenv('CARLA_OUTPUT_DIR'), os.getenv('RUN_ID'), 'scenario_1')

def visualize_image_grid(base_dir):
    subfolders = sorted(d for d in os.listdir(base_dir) if os.path.isdir(os.path.join(base_dir, d)) and d != "odvg")[:9]
    fig, axes = plt.subplots(3, 3, figsize=(10, 10))
    
    for ax, folder in zip(axes.flatten(), subfolders):
        folder_path = os.path.join(base_dir, folder)
        images = sorted(f for f in os.listdir(folder_path) if f.lower().endswith(('.png', '.jpg', '.jpeg', '.gif', '.bmp')))
        if images:
            ax.imshow(Image.open(os.path.join(folder_path, images[-1])))
            ax.set_title(folder)
        else:
            ax.text(0.5, 0.5, 'No images found', ha='center', va='center', transform=ax.transAxes)
        ax.axis('off')
    
    for ax in axes.flatten()[len(subfolders):]:
        ax.axis('off')
    
    plt.tight_layout()
    plt.show()

visualize_image_grid(GROUND_TRUTH_DIR)

## Stage 2 : Augmentation with Cosmos Transfer2.5

In stage 2, we'll take the ground truth data generated by carla and augment it to expand our dataset variety. Below, you will define a list of variables and a number of augmentations to create. For each augmentation, 1 random condition will be chosen per variable. These conditions will be applied to the original scenario creating a brand new video that maintains the core information from the ground truth simulation.

<img src="../data/docs/Stage2.png" width="542" height="326">


Let us now set up the configuration necessary to perform augmentation. These will determine what gets generated by Cosmos Transfer. If using the default files no changes are needed. 

In [None]:
# USER-EDITABLE VARIABLES
# ==============================================================================
BASE_INPUT_DIR = os.path.join(os.getenv('CARLA_OUTPUT_DIR'), os.getenv('RUN_ID'))
BASE_OUTPUT_DIR = os.path.join(os.getenv('COSMOS_OUTPUT_DIR'), os.getenv('RUN_ID'))
CONFIG_FILE_PATH = "/workspace/modules/augmentation/configs/config_carla.yaml"
NUM_AUGMENTATIONS = 2

# Endpoint and model configuration
VLM_URL = f"http://{os.getenv('NIM_HOST')}:{os.getenv('VLM_PORT')}/v1" 
LLM_URL = f"http://{os.getenv('NIM_HOST')}:{os.getenv('LLM_PORT')}/v1"
COSMOS_URL = f"http://{os.getenv('NIM_HOST')}:{os.getenv('TRANSFER_GRADIO_PORT')}/"

#Cosmos variables
variables = {
            'weather_condition': ['clear_sky', 'overcast', 'snow_falling', 'raining', 'foggy'],
            'lighting_condition': ['sunrise', 'sunset', 'twilight', 'mid_morning', 'afternoon', 'zenith', 'golden_hour', 'blue_hour', 'night'],
            'road_condition': ['dry', 'snow', 'sand', 'puddles', 'flooding']
            }

with open(CONFIG_FILE_PATH) as f:
    default_config = yaml.load(f.read(), Loader=yaml.FullLoader)
print(f'Default configuration: ', default_config)
# ==============================================================================

Finally we'll input our variables and write a new config. 

In [None]:
data_list = []
scenario_dirs = sorted([
    d for d in os.listdir(BASE_INPUT_DIR)
    if os.path.isdir(os.path.join(BASE_INPUT_DIR, d)) and d.startswith('scenario_')
])
for scenario_name in scenario_dirs:
    video_dir = os.path.join(BASE_INPUT_DIR, scenario_name, "videos")
    
    if os.path.exists(os.path.join(video_dir, "rgb.mp4")):
        for i in range(NUM_AUGMENTATIONS):
            entry = {
                "inputs": {
                    "rgb": os.path.join(video_dir, "rgb.mp4"),
                    "controls": {
                        "edge": os.path.join(video_dir, "edges.mp4"),
                        "depth": os.path.join(video_dir, "depth.mp4"),
                        "seg": os.path.join(video_dir, "semantic_segmentation.mp4"),
                    }
                },
                "output": {
                    "video": os.path.join(BASE_OUTPUT_DIR, scenario_name, f"Augmentation_{i}", f"output.mp4"),
                    "caption": os.path.join(BASE_OUTPUT_DIR, scenario_name, f"Augmentation_{i}", f"output.txt"),
                    "metadata": os.path.join(BASE_OUTPUT_DIR, scenario_name, f"Augmentation_{i}", f"output.json"),
                }
            }
            data_list.append(entry)

config = default_config
config['data']=data_list 
config['endpoints'] = {
    'vlm': {'url': VLM_URL, 'model': 'nvidia/cosmos-reason1-7b'},
    'llm': {'url': LLM_URL, 'model': 'nvidia/nvidia-nemotron-nano-9b-v2'},
    'cosmos': {'url': COSMOS_URL, 'model': 'Cosmos-Transfer2.5-2B'}
}

# Optionally you can switch to use public larger NIMs for the VLM/LLM using build nvidia to have better prompts for your data generation
# config['endpoints']['llm']['url'] = 'https://integrate.api.nvidia.com/v1'
# config['endpoints']['llm']['model'] = 'nvidia/llama-3.3-nemotron-super-49b-v1'

config['template_generation']['system_prompt_file']='/workspace/modules/augmentation/configs/prompts/carla_template_generation_system_prompt.txt'
config['cosmos']['configuration']='/workspace/modules/augmentation/configs/cosmos_configs/config_template.toml'
config['cosmos']['parameters']['inference_name']='nvidia/Cosmos-Transfer2.5-2B'

CONFIG_FILE_PATH='/workspace/data/outputs/augmentation_config.yaml'
with open(CONFIG_FILE_PATH, 'w') as f:
    yaml.dump(config, f, sort_keys=False, indent=2)

print(f"Successfully wrote config to {CONFIG_FILE_PATH}")

### Run the Augmentation.

Run the augmentation script using the config defined above. Depending on your hardware configuration this step may take a while.

In [None]:
%%bash 
uv run /workspace/modules/augmentation/modules/cli.py --config /workspace/data/outputs/augmentation_config.yaml 

### View The Augmented Data

In [None]:
from IPython.display import Video

#Replace the Path with the augmentation and scenario you'd like to view
Video("../data/outputs/Cosmos/default_run/scenario_1/Augmentation_0/output.mp4", width=600)

## Stage 3 : Post-Processing for Finetuning

At this point, we have successfully created a ground truth dataset, and augmented it to increase variety. The final step is to package all this information up for actual use in model training of finetuning. To do this we'll perform 2 actions: generate SOM overlays and Q&A pairs. 

SOM (set of marks) is a structured labeling approach where points of interest are annotated with discrete marks or identifiers. In our case we will add bounding boxes as well as numeric IDs to specific cars involved in the incident. These additional labels help ground the VLM, improving the quality of fine-tuning.

Q&A pairs are text prompts and responses automatically generated from the ground-truth data. They provide a useful mechanism for fine-tuning VLMs by enabling the model to learn from the dataset in a semi-supervised or self-supervised manner.

<img src="../data/docs/Stage3.png" width="469" height="234">




### Generate SOM Videos
When generating ground truth data with Carla, collision events are detected and saved. Using this data, we can create specific overlays for vehicles involved in incidents. 


In [None]:
import json

# Prepare SOM overlays for CARLA ground truth 
for scenario in os.listdir(BASE_INPUT_DIR):
    rgb_video_path = os.path.join(BASE_INPUT_DIR, scenario, 'videos/rgb.mp4')
    for event in glob.glob(os.path.join(BASE_INPUT_DIR, scenario, "events*.json")):
        event_name = os.path.basename(event).split('.')[0]
        target_som_path = os.path.join(BASE_INPUT_DIR, scenario, f'{event_name}_rgb_som.mp4')
        odvg_path = os.path.join(BASE_INPUT_DIR, scenario, "odvg")
        if not os.path.exists(target_som_path):
            process_video(rgb_video_path, odvg_path, target_som_path)

# Prepare SOM overlays for Cosmos augmented videos 
for video in glob.glob(os.path.join(BASE_OUTPUT_DIR, 'scenario*', 'Augmentation*', 'output.mp4')):
    metadata_path = video.replace('.mp4', '.json')
    if os.path.exists(metadata_path):
        with open(metadata_path) as f:
            metadata = json.loads(f.read())
            gt_video_path = metadata['original_video_path']
            gt_path = os.path.dirname(os.path.dirname(gt_video_path))
            odvg_path = os.path.join(gt_path, 'odvg')
            for event in glob.glob(os.path.join(gt_path, "events*.json")):
                som_video_name = os.path.basename(event.replace('.json', '.mp4')).split('.')
                som_video_name[0] += '_som'
                som_video_name = '.'.join(som_video_name)
                target_som_path = os.path.join(os.path.dirname(video), som_video_name)
                event_name = os.path.basename(event).split('.')[0]
                if not os.path.exists(target_som_path):
                    process_video(video, odvg_path, target_som_path)
            # All bbox som 
            target_som_path = os.path.join(os.path.dirname(video), 'output_allcars_som.mp4')
            if not os.path.exists(target_som_path):
                process_video(video, odvg_path, target_som_path, area_threshold=5000)
                
print('Completed generating SOM overlays for Carla and Cosmos videos!')


### View SOM data

In [None]:
import cv2
from PIL import Image
from IPython.display import display

video_path = "../data/outputs/Cosmos/default_run/scenario_1/Augmentation_0/output_allcars_som.mp4"

cap = cv2.VideoCapture(video_path)

total_frames = cap.get(cv2.CAP_PROP_FRAME_COUNT)
cap.set(cv2.CAP_PROP_POS_FRAMES, total_frames - 1)
ret, frame = cap.read()
cap.release()

frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
img = Image.fromarray(frame_rgb)
display(img.resize((800, int(img.height * 800 / img.width)), Image.LANCZOS))

### Generate Q&A pairs
Using our overlayed videos, we can generate a Q&A dataset for finetuning a VLM. Since we know which vehicles are invlolved in incidents we can create a large number of simple yes or no questions grounded in our videos.

In [None]:
%%bash 
uv run /workspace/modules/postprocess/postprocess_for_vlm.py --carla_folder /workspace/data/outputs/CARLA \
                                                             --cosmos_folder /workspace/data/outputs/Cosmos \
                                                             --output_folder /workspace/data/outputs/postprocess \
                                                             --run_id $RUN_ID

## Next Steps
This concludes the content for this repo. For more information on finetuning Cosmos Reason see [here](https://nvidia-cosmos.github.io/cosmos-cookbook/recipes/post_training/reason1/intelligent-transportation/post_training.html)