# Data Sanity Checklist for Autonomous Vehicle Datasets

Autonomous vehicle datasets (e.g., nuScenes, Waymo, Wayve) often require conversion into a common format. After conversion, it’s crucial to verify the integrity and consistency of the data. Below is a comprehensive sanity-check checklist for multimodal autonomous driving data (images, LiDAR, etc.), with explanations and example code (using Python and the FiftyOne tool along with other libraries) for each check.

## Step 0: Load Your FiftyOne Dataset In

In [None]:
import fiftyone as fo

dataset = fo.load_dataset("nuscenes-rerun-fo")

## Step 1: Ensure Metadata Completeness and Consistency

What to check:

 Verify that each frame (sample) in the dataset has all essential metadata fields. This typically includes timestamps, sensor identifiers, sensor calibration parameters (intrinsics and extrinsics), and ego-vehicle pose (position and orientation). Such metadata makes the dataset “complete” and is needed for downstream tasks. 
 
For example, nuScenes stores localization (ego pose), timestamps, and calibration data for each frame as part of its metadata, and FiftyOne likewise allows attaching any metadata you need (time of day, device ID, location, weather, etc.) to each sample. 

Without these, it’s difficult to fuse sensor data or interpret coordinates properly. Why it’s important: 

 Incomplete metadata can lead to misinterpretation of the data (e.g., misaligning sensors or time). Ensuring consistency (e.g., units and coordinate frames) is equally important. Every sample should at least have a timestamp and ego-vehicle pose so that frames can be ordered and spatially related. Calibration info (camera intrinsics, sensor extrinsics relative to the car) must be present to project between sensor frames correctly. If any of these are missing or incorrect, the subsequent sanity checks and any sensor fusion algorithm might fail. 

How to check: 

 Use FiftyOne to iterate through the dataset and assert the presence of required fields. For instance, suppose the dataset’s Sample schema includes fields like timestamp, ego_pose (with position/orientation), and per-sensor calibration entries (intrinsics/extrinsics). We can programmatically verify these. If something is missing, we should flag it before proceeding.

In [None]:
# Add the names of your corresponding fields as the values in the dictionaries below.

metadata_dict = {
    "timestamp" : "timestamp",         # Mandatory
    "intrinsics" : "intrinsics",       # Mandatory
    "T_rig_world" : "T_rig_world",     # Mandatory
    "T_sensor_rig" : "T_sensor_rig",   # Mandatory
}

optional_metadata_dict = {
    "location" : "location",                          # Optional
    "velocity" : "velocity",                          # Optional
    "angular_velocity" : "angular_velocity",          # Optional
    "s_timestamp" : "s_timestamp",                    # Optional
    "e_timestamp" : "e_timestamp",                    # Optional
    "rig_timestamp" : "rig_timestamp",                # Optional
    "ground_truth_cuboids" : "ground_truth_cuboids",  # Optional
    "ego_mask" : "ego_mask",                          # Optional
    "ground_truth_masks" : "ground_truth_masks",      # Optional
    "ground_truth_lidar_segmentation" : "ground_truth_lidar_segmentation",  # Optional
    "ego_rotation" : "ego_rotation",                  # Optional
    "cs_rotation" : "cs_rotation",                    # Optional
    "ego_translation" : "ego_translation",            # Optional
    "cs_translation" : "cs_translation",              # Optional
}

expected_sensors = ["CAM_FRONT", "CAM_BACK", "CAM_FRONT_LEFT", "CAM_FRONT_RIGHT", "CAM_BACK_LEFT", "CAM_BACK_RIGHT", "3D"] # Mandatory
img_sensors = ["CAM_FRONT", "CAM_BACK", "CAM_FRONT_LEFT", "CAM_FRONT_RIGHT", "CAM_BACK_LEFT", "CAM_BACK_RIGHT"]  # Mandatory
pcd_sensors = ["LIDAR_TOP", "RADAR_FRONT", "RADAR_BACK_LEFT", "RADAR_FRONT_RIGHT", "RADAR_FRONT_LEFT", "RADAR_BACK_RIGHT"]  


### Step 1.1: Check for Missing Metadata Fields

In this step, we programmatically verify the presence of required metadata fields in the dataset schema. Missing metadata fields can lead to issues in downstream tasks, such as sensor fusion or coordinate interpretation.

The `check_metadata_fields` function iterates through the dataset's schema and checks for the presence of mandatory fields defined in `metadata_dict`. If any required fields are missing, they are flagged for review.

In [None]:
def check_metadata_fields(dataset, metadata_dict):
    # Get the field schema of the each slice

    for sensor in expected_sensors:
        view = dataset.select_group_slices(sensor)
        field_schema = view.get_field_schema()
        
        # Check if each required metadata field exists in the schema
        missing_fields = []
        for key, field in metadata_dict.items():
            if field not in field_schema:
                missing_fields.append(key)
        
        # Return the result
        if missing_fields:
            print(f"Missing metadata fields: {missing_fields} in {sensor} slice.")
        else:
            print("All required metadata fields are present in the dataset schema.")

check_metadata_fields(dataset, metadata_dict)

### Step 1.2: Identify Present Optional Metadata Fields

Optional metadata fields can provide additional context or insights but are not strictly necessary for all tasks. The `get_present_optional_fields` function checks which optional fields, as defined in `optional_metadata_dict`, are present in the dataset schema. This helps in understanding the dataset's completeness and potential for extended analysis.

The variable `present_optional_fields` stores the list of optional fields that are available in the dataset.

In [None]:
def get_present_optional_fields(dataset, optional_metadata_dict):
    # Get the field schema of the dataset

    present_fields_dict = {}
    for sensor in expected_sensors:
        view = dataset.select_group_slices(sensor)
        field_schema = view.get_field_schema()
    
        # Check if each optional metadata field exists in the schema
        present_fields = []
        for key, field in optional_metadata_dict.items():
            if field in field_schema:
                present_fields.append(key)
        
        # Return the result
        if present_fields:
            print(f"Optional metadata fields present: {present_fields}")
        else:
            print("No optional metadata fields are present in the dataset schema.")
        present_fields_dict[sensor] = present_fields
    return present_fields_dict

present_optional_fields = get_present_optional_fields(dataset, optional_metadata_dict)

# 2. Verify Sensor Data Presence and Synchronization

What to check: 

 Ensure that for each timestamp (or frame index), data from all expected sensors is present and correctly synchronized. For example, if the dataset should have 6 camera images and 1 LiDAR scan per frame, verify none of these modalities are missing. Each sample should contain data from every sensor in the rig (unless the sensor intentionally did not record for that frame). Additionally, confirm that the timestamps of different sensor data in the same frame are either identical or within an acceptable sync tolerance (depending on dataset specification). 
 
Why it’s important: 

 Missing sensor data would create a “blind spot” in that frame’s perception. If one camera frame is dropped or out-of-sync, downstream algorithms might make incorrect assumptions. Many datasets ensure sensors are hardware-synchronized (e.g. Wayve’s rig of 5 cameras is time-synchronized at 10 Hz), so a missing or unsynced frame suggests an issue in conversion. Synchronized timestamps ensure that an image and LiDAR from the same sample depict the same moment in time. 
 
How to check: 

 Iterate through the dataset and check each sample for the presence of each sensor’s data. If sensor data is stored in subfields (e.g., sample.camera_front, sample.camera_back, sample.lidar), verify those fields are populated. Also, check timestamp consistency across sensors in a frame (if sensors have separate timestamps). For synchronization, one approach is to ensure the difference between any sensor timestamp and the frame’s master timestamp is below a threshold.
 

In [None]:
max_sync_tolerance = 100000  # 100 ms tolerance for timestamp sync in microseconds


missing_sensor_groups = []
for group in dataset.iter_groups():

    timestamp_list = []

    # Check presence of each sensor and grab timestamp
    for sensor in expected_sensors:
        sample = group[sensor]
        if sample is None:
            print(f"Missing data: {sensor} is not present for sample {group.id}")
            missing_sensor_groups.append((group.id, sensor))
        else:
            timestamp_list.append(sample[metadata_dict["timestamp"]])

    # Check synchronization of timestamps
    if len(timestamp_list) > 1:
        min_timestamp = min(timestamp_list)
        max_timestamp = max(timestamp_list)

        if max_timestamp - min_timestamp > max_sync_tolerance:
            print(f"Timestamp sync issue for group {sample.group.id}: "
                  f"min {min_timestamp}, max {max_timestamp}, "
                  f"difference {max_timestamp - min_timestamp}")

This code checks that each expected sensor is present in every sample. If a sensor data field is missing, it prints a message (indicating a potential conversion issue causing a blind spot). It also checks sensor timestamp vs. frame timestamp; if the difference exceeds max_sync_tolerance, it flags a synchronization issue. In a correctly converted dataset, we expect each frame to have all sensors present and timestamps closely aligned (often exactly the same timestamp for all modalities if the data were captured simultaneously).

# Step 3: LiDAR-to-Image Projection Alignment (Extrinsic Calibration Check)

What to check: 

 Project the 3D LiDAR points into the 2D camera image and verify they align with the visual scene (i.e., points fall on the corresponding objects in the image). This checks that the LiDAR–camera extrinsic calibration (and the camera intrinsics) are correctly applied in the conversion. In practice, you overlay the point cloud on the image and see if, for example, points hit the surfaces of vehicles, pedestrians, and road in the image, rather than being offset. 
 
Why it’s important: 

 Misalignments between LiDAR and camera indicate calibration errors or coordinate transform issues. If the transformation from LiDAR coordinates to camera coordinates is wrong, the projected points will not correspond to the correct image features. For instance, a misplaced rotation or translation could cause LiDAR points to appear shifted (e.g., hovering beside the actual object). Visualizing LiDAR over images is a common sanity check – misprojections can reveal calibration or time-sync problems. [In research literature](https://www.researchgate.net/figure/Examples-of-projecting-lidar-points-to-images-before-top-row-and-after-bottom-row_fig1_339813627), it’s noted that misalignment of lidar points and image features will be clearly visible if calibration is off or if motion distortion isn’t handled 
 
How to check: 

 Use the known calibration parameters to transform LiDAR points to the camera frame and then apply the camera intrinsics to project to pixel coordinates. Then, visualize or statistically verify alignment. Using FiftyOne, one can write a custom loop or use utility functions if available. Below is an example using open3d and opencv to project points onto an image. We assume we have the camera intrinsic matrix K and a 4x4 homogeneous transform T_lidar_to_cam for the extrinsics (from LiDAR frame to camera frame):

In [None]:
import numpy as np
import open3d as o3d
import matplotlib.pyplot as plt
from PIL import Image
from pyquaternion import Quaternion



def transform_points(points, transform):
    points_hom = np.hstack((points, np.ones((points.shape[0], 1))))
    
    if not np.all(np.isfinite(points_hom)):
        raise ValueError("Non-finite values in homogeneous points.")

    if not np.all(np.isfinite(transform)):
        raise ValueError("Non-finite values in transform matrix.")

    # Check max magnitude (overflow-safe)
    if np.abs(points_hom).max() > 1e6:
        print("⚠️ Warning: Unusually large point values detected in transform_points input.")

    result = transform @ points_hom.T  # shape (4, N)

    if not np.all(np.isfinite(result)):
        print("⚠️ Warning: Non-finite values found after matrix multiplication.")
    
    return result.T[:, :3]

def translate(points, x: np.ndarray) -> None:
        """
        Applies a translation to the point cloud.
        :param x: <np.float: 3, 1>. Translation in x, y, z.
        """
        for i in range(3):
            points[i, :] = points[i, :] + x[i]

        return points

def rotate(points, rot_matrix: np.ndarray) -> None:
        """
        Applies a rotation.
        :param rot_matrix: <np.float: 3, 3>. Rotation matrix.
        """
        points[:3, :] = np.dot(rot_matrix, points[:3, :])

        return points

def transform(points, transf_matrix: np.ndarray) -> None:
        """
        Applies a homogeneous transform.
        :param transf_matrix: <np.float: 4, 4>. Homogenous transformation matrix.
        """
        points[:3, :] = transf_matrix.dot(np.vstack((points[:3, :], np.ones(points.shape[1]))))[:3, :]

def view_points(points: np.ndarray, view: np.ndarray, normalize: bool) -> np.ndarray:
    """
    This is a helper class that maps 3d points to a 2d plane. It can be used to implement both perspective and
    orthographic projections. It first applies the dot product between the points and the view. By convention,
    the view should be such that the data is projected onto the first 2 axis. It then optionally applies a
    normalization along the third dimension.

    For a perspective projection the view should be a 3x3 camera matrix, and normalize=True
    For an orthographic projection with translation the view is a 3x4 matrix and normalize=False
    For an orthographic projection without translation the view is a 3x3 matrix (optionally 3x4 with last columns
     all zeros) and normalize=False

    :param points: <np.float32: 3, n> Matrix of points, where each point (x, y, z) is along each column.
    :param view: <np.float32: n, n>. Defines an arbitrary projection (n <= 4).
        The projection should be such that the corners are projected onto the first 2 axis.
    :param normalize: Whether to normalize the remaining coordinate (along the third axis).
    :return: <np.float32: 3, n>. Mapped point. If normalize=False, the third coordinate is the height.
    """

    assert view.shape[0] <= 4
    assert view.shape[1] <= 4
    assert points.shape[0] == 3

    viewpad = np.eye(4)
    viewpad[:view.shape[0], :view.shape[1]] = view

    nbr_points = points.shape[1]

    # Do operation in homogenous coordinates.
    points = np.concatenate((points, np.ones((1, nbr_points))))
    points = np.dot(viewpad, points)
    points = points[:3, :]

    if normalize:
        points = points / points[2:3, :].repeat(3, 0).reshape(3, nbr_points)

    return points


def load_point_cloud_pcd(path):
    pcd = o3d.io.read_point_cloud(path)
    return np.asarray(pcd.points)

def project_lidar_to_image_fiftyone(
        img_sample,
        pcd_sample,
        pcd_path,
        min_depth=1.0,
        metadata_dict=metadata_dict,
        optional_metadata_dict=optional_metadata_dict,
        r_and_t_transform = False,
        verbose=False):
    # Load point cloud
    pts = load_point_cloud_pcd(pcd_path)

    if verbose:
        print("Raw point cloud shape:", pts.shape)
        print("Any NaNs or infs?", not np.all(np.isfinite(pts)))

    # TO DO
    if not r_and_t_transform:
        # Load transforms
        T_lidar_to_rig = np.array(pcd_sample[metadata_dict["T_sensor_rig"]])  # 4x4
        T_rig_to_world_lidar = np.array(pcd_sample[metadata_dict["T_rig_world"]])  # 4x4

        T_cam_to_rig = np.array(img_sample[metadata_dict["T_sensor_rig"]]) # 4x4
        T_rig_to_world_cam = np.array(img_sample[metadata_dict["T_rig_world"]])  # 4x4


        K = np.array(img_sample["intrinsics"])  # 3x3

        # Full chain: lidar -> rig -> world -> rig(cam) -> cam
        T_world_to_rig_cam = np.linalg.inv(T_rig_to_world_cam)
        T_rig_to_cam = np.linalg.inv(T_cam_to_rig)

        T_lidar_to_cam = T_rig_to_cam @ T_world_to_rig_cam @ T_rig_to_world_lidar @ T_lidar_to_rig
        pts_cam = transform_points(pts, T_lidar_to_cam)

    else:

        pts = pts.T # shape (3, N)
        # Points live in the point sensor frame. So they need to be transformed via global to the image plane.
        # First step: transform the pointcloud to the ego vehicle frame for the timestamp of the sweep.
        pcd_sensor_rotation = pcd_sample[optional_metadata_dict["cs_rotation"]]  
        pcd_sensor_translation = pcd_sample[optional_metadata_dict["cs_translation"]]
        pts = rotate(pts, Quaternion(pcd_sensor_rotation).rotation_matrix)
        pts = translate(pts, np.array(pcd_sensor_translation))

        # Second step: transform from ego to the global frame.
        pcd_rig_rotation = pcd_sample[optional_metadata_dict["ego_rotation"]]
        pcd_rig_translation = pcd_sample[optional_metadata_dict["ego_translation"]]
        pts = rotate(pts, Quaternion(pcd_rig_rotation).rotation_matrix)
        pts = translate(pts, np.array(pcd_rig_translation))

        # Third step: transform from global into the ego vehicle frame for the timestamp of the image.
        img_rig_rotation = img_sample[optional_metadata_dict["ego_rotation"]]
        img_rig_translation = img_sample[optional_metadata_dict["ego_translation"]]
        pts = translate(pts, -np.array(img_rig_translation))
        pts = rotate(pts, Quaternion(img_rig_rotation).rotation_matrix.T)

        # Fourth step: transform from ego into the camera.
        img_sensor_rotation = img_sample[optional_metadata_dict["cs_rotation"]]  
        img_sensor_translation = img_sample[optional_metadata_dict["cs_translation"]]
        pts = translate(pts, -np.array(img_sensor_translation))
        pts = rotate(pts, Quaternion(img_sensor_rotation).rotation_matrix.T)

        # Fifth step: actually take a "picture" of the point cloud.
        # Grab the depths (camera frame z axis points away from the camera).
        depths = pts[2, :]

        coloring = depths

        points = view_points(pts[:3, :], np.array(img_sample['intrinsics']), normalize=True)



        im = Image.open(img_sample.filepath)
        mask = np.ones(depths.shape[0], dtype=bool)
        mask = np.logical_and(mask, depths > min_depth)
        mask = np.logical_and(mask, points[0, :] > 1)
        mask = np.logical_and(mask, points[0, :] < im.size[0] - 1)
        mask = np.logical_and(mask, points[1, :] > 1)
        mask = np.logical_and(mask, points[1, :] < im.size[1] - 1)
        points = points[:, mask]
        coloring = coloring[mask]


        return points, coloring, im



    if verbose:
        print("Camera-frame Z stats:")
        print("  Min Z:", np.min(pts_cam[:, 2]))
        print("  Max Z:", np.max(pts_cam[:, 2]))
        print("  Any Z ≤ 0?", np.any(pts_cam[:, 2] <= 0))
        print("  Any NaN or inf?", not np.all(np.isfinite(pts_cam)))

    # Filter points behind camera
    mask = pts_cam[:, 2] > min_depth
    pts_cam = pts_cam[mask]

    # Project to image
    projected = K @ pts_cam.T
    projected = projected.T
    # Z = depth (3rd column)
    z = projected[:, 2]
    valid = np.isfinite(z) & (z > min_depth)

    projected = projected[valid]
    z = z[valid][:, None]

    pts_2d = projected[:, :2] / z

    return pts_2d, pts_cam

In [None]:
import fiftyone.core.threed as fotd

for group in dataset.iter_groups():
    for img_sensor in img_sensors:
        img_sample = group[img_sensor]
        pcd_sample = group["3D"]

        if img_sample is None or pcd_sample is None:
            print(f"Missing data: {img_sensor} or 3D for sample {group.id}")
            continue

        fo3d_path = pcd_sample.filepath
        pcd_paths = fo.Scene().from_fo3d(fo3d_path).get_asset_paths()
        pcd_path = next(path for path in pcd_paths if "LIDAR_TOP" in path)

        pts_2d, coloring,im = project_lidar_to_image_fiftyone(
            img_sample,
            pcd_sample,
            pcd_path,
            metadata_dict=metadata_dict,
            r_and_t_transform=True
        )

        if pts_2d.size == 0:
            print(f"No valid points projected for {img_sensor} in sample {group.id}")
            continue

        img_sample["lidar_points"] = fo.Keypoints(
        keypoints = [
            fo.Keypoint(
                label="LIDAR_TOP",
                points = list(zip(pts_2d.T[:, 0]/img_sample.metadata.width, pts_2d.T[:, 1]/img_sample.metadata.height)),
                colors = list(coloring)
            )
            ]
        )
        img_sample.save()