# Getting started 🚀

Welcome to the ICRA 2024 Cloth Competition! In this notebook we will load and explore the data.

Run the cell below to download a part of the dataset (10 samples, this is ~1 GB) and unzip it.
You only need to run the cell once, then you can comment it out.

☁️ For the full dataset, see: https://cloud.ilabt.imec.be/index.php/s/Sy945rbamg8JMgR

## 1. Directory structure 📂

In [None]:
import os
from dataclasses import fields
from pathlib import Path

import cv2
import matplotlib.pyplot as plt
import numpy as np
import open3d as o3d
from airo_camera_toolkit.point_clouds.conversions import point_cloud_to_open3d
from airo_dataset_tools.data_parsers.pose import Pose
from cloth_tools.annotation.grasp_annotation import GraspAnnotation
from cloth_tools.dataset.format import load_competition_observation
from cloth_tools.dataset.download import download_and_extract_dataset
from cloth_tools.visualization.opencv import draw_pose


data_dir = Path("data")
dataset_dir = data_dir / "cloth_competition_dataset_0000_0-9"

In [None]:
os.path.exists(dataset_dir)

In the cell below we download a small part (10 episodes) of the dataset if no dataset was found.

In [None]:
def emoji(dir: str, file: str) -> str:
    if os.path.isdir(os.path.join(dir, file)):
        return "📁"
    elif file.endswith(".jpg") or file.endswith(".png"):
        return "🖼️"
    elif file.endswith(".mp4"):
        return "🎥"
    return "📄"

print("First directories in the dataset:")
for f in sorted(os.listdir(dataset_dir))[:5]:
    print(emoji(dataset_dir, f), f)

One sample in the dataset corresponds to one episode. 
An episode consists of one attempt at unfolding a piece of hanging cloth by grasping it at a human-annotated point.

A sample directory contains the following files:

In [None]:
sample_dir = dataset_dir / "sample_000000"

for f in os.listdir(sample_dir):
    print(emoji(sample_dir, f), f)

One sample thus contains two observations, the **start** and **result**, a **grasp** annotation and a video of the episode.

* 🔎 The **start** observation is taken after the cloth has been grasped by its lowest point.
* 🔎 The **result** observation is taken after the attempt to unfold it.
* 👉 The **grasp** pose annotation used to unfold the garment, currently these are human-annotated.
* 🎥 The **video** of the entire episode.

Participants of the ICRA 2024 Cloth Competition will be asked to predict a good **grasp**, given the **start** observation.

The grasp will be evaluated based on the **result** observation (using the surface area of cloth).

## 2. Start Observation 🔎

In this section we explore some of the data contained in the start observation.

In [None]:
observation_start_dir = sample_dir / "observation_start"

observation = load_competition_observation(observation_start_dir)

print("Overview of the fields in an Cloth Competition Observation:")
for field in fields(observation):
    field_name = field.name + ":"
    field_value = getattr(observation, field.name)
    if isinstance(field_value, np.ndarray):
        print(f" - {field_name:<34} np.ndarray {field_value.shape} {field_value.dtype}")
    else:
        print(f" - {field_name:<34} {field.type}")

In [None]:
grasp_dir = sample_dir / "grasp"

for f in os.listdir(grasp_dir):
    print(emoji(grasp_dir, f), f)

In [None]:
grasp_pose_file = grasp_dir / "grasp_pose.json"
grasp_annotation_file = grasp_dir / "grasp_annotation.json"

with open(grasp_pose_file, "r") as f:
    grasp_pose = Pose.model_validate_json(f.read()).as_homogeneous_matrix()


with open(grasp_annotation_file, "r") as f:
    grasp_annotation = GraspAnnotation.model_validate_json(f.read())

with np.printoptions(precision=3, suppress=True):
    print("Grasp pose:\n", grasp_pose)
    print("\nGrasp annotation:\n", grasp_annotation)

## 4. Result Observation 🎉

In [None]:
observation_result_dir = sample_dir / "observation_result"

observation_result = load_competition_observation(observation_result_dir)

plt.figure(figsize=(10, 5))
plt.imshow(observation_result.image_left)
plt.title("Result: image of cloth after grasping and stretching")
plt.show()

ℹ️ The precise calculation of the evaluation metric will be released at a later date.

## 5. Coordinate frames 📐

In [None]:
X_W_C = observation.camera_pose_in_world
X_W_TCPL = observation.arm_left_tcp_pose_in_world
X_W_TCPR = observation.arm_right_tcp_pose_in_world
X_W_LB = observation.arm_left_pose_in_world
X_W_RB = observation.arm_right_pose_in_world
intrinsics = observation.camera_intrinsics

X_W_GRASP = grasp_pose

image_bgr = cv2.cvtColor(observation.image_left, cv2.COLOR_RGB2BGR)

draw_pose(image_bgr, np.identity(4), intrinsics, X_W_C, 0.25)
draw_pose(image_bgr, X_W_LB, intrinsics, X_W_C)
draw_pose(image_bgr, X_W_RB, intrinsics, X_W_C)
draw_pose(image_bgr, X_W_TCPL, intrinsics, X_W_C, 0.05)
draw_pose(image_bgr, X_W_TCPR, intrinsics, X_W_C, 0.05)
draw_pose(image_bgr, X_W_GRASP, intrinsics, X_W_C, 0.05)

image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(10, 5))
plt.imshow(image_rgb)
plt.title("Coordinate frames visualization")
plt.show()


❔ If you have any questions, feel free to ask in on the [Github Discussions page](https://github.com/Victorlouisdg/cloth-competition/discussions)!

In [None]:
import matplotlib.pyplot as plt
import open3d as o3d
from airo_camera_toolkit.point_clouds.conversions import point_cloud_to_open3d
from cloth_tools.visualization.opencv import draw_pose

# Load the start observation
observation_start_dir = sample_dir / "observation_start"
observation_start = load_competition_observation(observation_start_dir)

# Load the result observation
observation_result_dir = sample_dir / "observation_result"
observation_result = load_competition_observation(observation_result_dir)

# Get the grasp pose
with open(grasp_pose_file, "r") as f:
    grasp_pose = Pose.model_validate_json(f.read()).as_homogeneous_matrix()

# Create the figure with 2 columns and 3 rows
fig, axes = plt.subplots(3, 2, figsize=(5, 7), dpi=100, gridspec_kw={'height_ratios': [1, 1, 1]})

# --- First column: Start observation ---

# RGB image with grasp pose
image_bgr = cv2.cvtColor(observation_start.image_left, cv2.COLOR_RGB2BGR)
draw_pose(image_bgr, grasp_pose, intrinsics, X_W_C, 0.1)  # Visualize grasp pose
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)


from airo_camera_toolkit.image_transforms.transforms.crop import Crop

x_middle = observation_start.depth_map.shape[1] // 2
width = 352  # mutliple of 32, which is generally preferred for neural networks

x = x_middle - width // 2
y = 140
height = 750 

width_result = 1000
x_result = x_middle - width_result // 2

crop_depth_start = Crop(observation_start.depth_map.shape, x=x, y=y, w=width, h=height)
crop_depth_result = Crop(observation_result.depth_map.shape, x=x_result, y=y, w=width_result, h=height)

crop_rgb_left_start = Crop(observation_start.image_left.shape, x=x, y=y, w=width, h=height)
crop_rgb_left_result = Crop(observation_result.image_left.shape, x=x_result, y=y, w=width_result, h=height)

image_start_cropped = crop_rgb_left_start.transform_image(image_rgb)
image_result_cropped = crop_rgb_left_result.transform_image(observation_result.image_left)


depth_cropped_start = crop_depth_start.transform_image(observation.depth_map)
depth_cropped_result = crop_depth_result.transform_image(observation_result.depth_map)


distance_max = 1.55

segmentation_mask = observation_start.depth_map < distance_max
segmentation_mask_cropped = crop_depth_start.transform_image(segmentation_mask)

# save cropped RGB start and result images
cv2.imwrite("start_RGB.png", cv2.cvtColor(image_start_cropped, cv2.COLOR_RGB2BGR))
cv2.imwrite("result_RGB.png", cv2.cvtColor(image_result_cropped, cv2.COLOR_RGB2BGR))


# turn off all axes
# for ax in axes.flatten():
#     ax.axis("off")


# axes[0, 0].set_title("Start")
# axes[0, 0].imshow(image_start_cropped)
# depth_map_im_start = axes[1, 0].imshow(depth_cropped_start, cmap='viridis_r')
# depth_map_im_start.set_clim(vmin=1.20, vmax=1.55)  # 



# axes[2, 0].imshow(depth_cropped_start)

# # RGB image without grasp
# axes[0, 1].set_title("Result")
# axes[0, 1].imshow(image_result_cropped)
# axes[1, 1].imshow(depth_cropped_result)
# axes[2, 1].imshow(depth_cropped_result)

# # Adjust layout and show the figure
# plt.tight_layout()
# plt.show()

In [None]:
import cv2
import matplotlib.pyplot as plt
from cloth_tools.visualization.opencv import draw_pose

def colorize_depth_and_draw_pose(depth_map, grasp_pose, intrinsics, X_W_C, output_filename, vmin, vmax, crop):
    """
    Colorizes a depth map with 'viridis_r' colormap, draws a pose on it, and saves the result as a PNG image.

    Args:
        depth_map: The depth map as a NumPy array.
        grasp_pose: The grasp pose as a 4x4 homogeneous matrix.
        intrinsics: The camera intrinsics matrix.
        X_W_C: The camera pose in the world frame as a 4x4 homogeneous matrix.
        output_filename: The filename for the output PNG image.
    """

    depth_map = np.clip(depth_map, vmin, vmax)


    # Normalize depth map to 0-1 range
    normalized_depth = (depth_map - depth_map.min()) / (depth_map.max() - depth_map.min())

    # Apply 'viridis_r' colormap using Matplotlib
    cmap = plt.get_cmap('viridis_r')
    colored_depth = (cmap(normalized_depth)[:, :, :3] * 255).astype(np.uint8)

    # Draw the pose on the colored depth map
    if grasp_pose is not None:
        # BGR to RGB
        colored_depth = cv2.cvtColor(colored_depth, cv2.COLOR_RGB2BGR)
        draw_pose(colored_depth, grasp_pose, intrinsics, X_W_C, 0.1)
        colored_depth = cv2.cvtColor(colored_depth, cv2.COLOR_BGR2RGB)

    colored_depth_crop = crop.transform_image(colored_depth)

    # Save the image
    cv2.imwrite(output_filename, cv2.cvtColor(colored_depth, cv2.COLOR_RGB2BGR))
    return colored_depth_crop


# Example usage:
# colorize_depth_and_draw_pose(
#     depth_cropped_start, grasp_pose, intrinsics, X_W_C, "output_depth_with_pose.png"
# )

colored_depth_crop = colorize_depth_and_draw_pose(
    observation_start.depth_map, grasp_pose, intrinsics, X_W_C, "output_depth_with_pose.png", 1.17, 1.54, crop_depth_start
)

plt.imshow(colored_depth_crop)

In [None]:
colored_depth_crop2 = colorize_depth_and_draw_pose(
    observation_result.depth_map, None, intrinsics, X_W_C, "output_depth_with_pose2.png", 0.75, 1.20, crop_depth_result
)

plt.imshow(colored_depth_crop2)

In [None]:
# save cropped colored depth start and result images
cv2.imwrite("start_depth.png", cv2.cvtColor(colored_depth_crop, cv2.COLOR_RGB2BGR))
cv2.imwrite("result_depth.png", cv2.cvtColor(colored_depth_crop2, cv2.COLOR_RGB2BGR))

In [None]:
import rerun as rr

observation_start_dir = dataset_dir / "sample_000000" / "observation_start"


observation = load_competition_observation(observation_start_dir)

confidence_map = observation.confidence_map
point_cloud = observation.point_cloud

In [None]:
from cloth_tools.point_clouds.operations import filter_point_cloud
from cloth_tools.bounding_boxes import BBOX_CLOTH_IN_THE_AIR, bbox_to_mins_and_sizes


confidence_threshold = 1.0
confidence_mask = (confidence_map <= confidence_threshold).reshape(-1)  # Threshold and flatten
# point_cloud_filtered = filter_point_cloud(point_cloud, confidence_mask)

# bbox = BBOX_CLOTH_IN_THE_AIR
bbox = (-1.0, -1.0, -0.5), (0.5, 1.0, 1.0)


point_cloud_cropped = filter_and_crop_point_cloud(point_cloud, confidence_map, bbox)

In [None]:
window_name = "start_result"
rr.init(window_name, spawn=True)

In [None]:
rr_point_cloud = rr.Points3D(positions=point_cloud_cropped.points, colors=point_cloud_cropped.colors)
rr.log("world/point_cloud", rr_point_cloud)