# Fog-X Demo

In this demo, we show how to use Fog-X to collect and manage your robotics learning dataset. We show the following aspects of the Fog-X: 
* Support for existing Open-X datasets
* Data Analytics and Management 
* Use for Pytorch Learning
* Export and Share with Open-X (Tensorflow rlds) and HuggingFace

We also compare the disk saving (43\%!) of Fog-X at the end.

In [2]:
import fog_x 

dataset = fog_x.dataset.Dataset(
    name="demo_ds",
    path="~/test_dataset",
)

## Loading From Existing Open-X/RT-X datasets

In [None]:
dataset.load_rtx_episodes(
    name="berkeley_autolab_ur5",
    split="train[:10]",
)

### Trajectory Metadata and Data

Fog-X makes a distinction between trajectory metadata and the actual data. 
* **Metadata**: information that is consistent across a certain trajectory, such as language command, tags
* **Data**: data for individual steps within a trajectory

In [4]:
# metadata
trajectory_metadata = dataset.get_episode_info()
trajectory_metadata

episode_id,Finished,feature_gripper_closedness_action_type,feature_gripper_closedness_action_shape,gripper_closedness_action_count,feature_rotation_delta_type,feature_rotation_delta_shape,rotation_delta_count,feature_terminate_episode_type,feature_terminate_episode_shape,terminate_episode_count,feature_world_vector_type,feature_world_vector_shape,world_vector_count,feature_is_first_type,feature_is_first_shape,is_first_count,feature_is_last_type,feature_is_last_shape,is_last_count,feature_is_terminal_type,feature_is_terminal_shape,is_terminal_count,feature_hand_image_type,feature_hand_image_shape,hand_image_count,feature_image_type,feature_image_shape,image_count,feature_image_with_depth_type,feature_image_with_depth_shape,image_with_depth_count,feature_natural_language_embedding_type,feature_natural_language_embedding_shape,natural_language_embedding_count,feature_natural_language_instruction_type,feature_natural_language_instruction_shape,natural_language_instruction_count,feature_robot_state_type,feature_robot_state_shape,robot_state_count,feature_reward_type,feature_reward_shape,reward_count
i64,bool,str,str,f64,str,str,f64,str,str,f64,str,str,f64,str,str,f64,str,str,f64,str,str,f64,str,str,f64,str,str,f64,str,str,f64,str,str,f64,str,str,f64,str,str,f64,str,str,f64
0,true,"""float32""","""()""",71.0,"""float32""","""(3,)""",71.0,"""float32""","""()""",71.0,"""float32""","""(3,)""",71.0,"""bool""","""()""",71.0,"""bool""","""()""",71.0,"""bool""","""()""",71.0,"""uint8""","""(480, 640, 3)""",71.0,"""uint8""","""(480, 640, 3)""",71.0,"""float32""","""(480, 640, 1)""",71.0,"""float32""","""(512,)""",71.0,"""string""","""()""",71.0,"""float32""","""(15,)""",71.0,"""float32""","""()""",71.0
1,true,"""float32""","""()""",71.0,"""float32""","""(3,)""",71.0,"""float32""","""()""",71.0,"""float32""","""(3,)""",71.0,"""bool""","""()""",71.0,"""bool""","""()""",71.0,"""bool""","""()""",71.0,"""uint8""","""(480, 640, 3)""",71.0,"""uint8""","""(480, 640, 3)""",71.0,"""float32""","""(480, 640, 1)""",71.0,"""float32""","""(512,)""",71.0,"""string""","""()""",71.0,"""float32""","""(15,)""",71.0,"""float32""","""()""",71.0
2,true,"""float32""","""()""",76.0,"""float32""","""(3,)""",76.0,"""float32""","""()""",76.0,"""float32""","""(3,)""",76.0,"""bool""","""()""",76.0,"""bool""","""()""",76.0,"""bool""","""()""",76.0,"""uint8""","""(480, 640, 3)""",76.0,"""uint8""","""(480, 640, 3)""",76.0,"""float32""","""(480, 640, 1)""",76.0,"""float32""","""(512,)""",76.0,"""string""","""()""",76.0,"""float32""","""(15,)""",76.0,"""float32""","""()""",76.0
3,true,"""float32""","""()""",81.0,"""float32""","""(3,)""",81.0,"""float32""","""()""",81.0,"""float32""","""(3,)""",81.0,"""bool""","""()""",81.0,"""bool""","""()""",81.0,"""bool""","""()""",81.0,"""uint8""","""(480, 640, 3)""",81.0,"""uint8""","""(480, 640, 3)""",81.0,"""float32""","""(480, 640, 1)""",81.0,"""float32""","""(512,)""",81.0,"""string""","""()""",81.0,"""float32""","""(15,)""",81.0,"""float32""","""()""",81.0
4,true,"""float32""","""()""",80.0,"""float32""","""(3,)""",80.0,"""float32""","""()""",80.0,"""float32""","""(3,)""",80.0,"""bool""","""()""",80.0,"""bool""","""()""",80.0,"""bool""","""()""",80.0,"""uint8""","""(480, 640, 3)""",80.0,"""uint8""","""(480, 640, 3)""",80.0,"""float32""","""(480, 640, 1)""",80.0,"""float32""","""(512,)""",80.0,"""string""","""()""",80.0,"""float32""","""(15,)""",80.0,"""float32""","""()""",80.0
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
6,true,"""float32""","""()""",103.0,"""float32""","""(3,)""",103.0,"""float32""","""()""",103.0,"""float32""","""(3,)""",103.0,"""bool""","""()""",103.0,"""bool""","""()""",103.0,"""bool""","""()""",103.0,"""uint8""","""(480, 640, 3)""",103.0,"""uint8""","""(480, 640, 3)""",103.0,"""float32""","""(480, 640, 1)""",103.0,"""float32""","""(512,)""",103.0,"""string""","""()""",103.0,"""float32""","""(15,)""",103.0,"""float32""","""()""",103.0
7,true,"""float32""","""()""",110.0,"""float32""","""(3,)""",110.0,"""float32""","""()""",110.0,"""float32""","""(3,)""",110.0,"""bool""","""()""",110.0,"""bool""","""()""",110.0,"""bool""","""()""",110.0,"""uint8""","""(480, 640, 3)""",110.0,"""uint8""","""(480, 640, 3)""",110.0,"""float32""","""(480, 640, 1)""",110.0,"""float32""","""(512,)""",110.0,"""string""","""()""",110.0,"""float32""","""(15,)""",110.0,"""float32""","""()""",110.0
8,true,"""float32""","""()""",118.0,"""float32""","""(3,)""",118.0,"""float32""","""()""",118.0,"""float32""","""(3,)""",118.0,"""bool""","""()""",118.0,"""bool""","""()""",118.0,"""bool""","""()""",118.0,"""uint8""","""(480, 640, 3)""",118.0,"""uint8""","""(480, 640, 3)""",118.0,"""float32""","""(480, 640, 1)""",118.0,"""float32""","""(512,)""",118.0,"""string""","""()""",118.0,"""float32""","""(15,)""",118.0,"""float32""","""()""",118.0
9,true,"""float32""","""()""",84.0,"""float32""","""(3,)""",84.0,"""float32""","""()""",84.0,"""float32""","""(3,)""",84.0,"""bool""","""()""",84.0,"""bool""","""()""",84.0,"""bool""","""()""",84.0,"""uint8""","""(480, 640, 3)""",84.0,"""uint8""","""(480, 640, 3)""",84.0,"""float32""","""(480, 640, 1)""",84.0,"""float32""","""(512,)""",84.0,"""string""","""()""",84.0,"""float32""","""(15,)""",84.0,"""float32""","""()""",84.0


In [5]:
# data for ALL trajectories 
# these data are loaded lazily that only actively used data is loaded to memory
all_step_data = dataset.get_step_data()
# use .describe to get the summary of the information
all_step_data.describe() 

statistic,episode_id,Timestamp,gripper_closedness_action,rotation_delta,terminate_episode,world_vector,is_first,is_last,is_terminal,hand_image,image,image_with_depth,natural_language_embedding,natural_language_instruction,robot_state,reward
str,f64,f64,f64,str,f64,str,f64,f64,f64,str,str,str,str,str,str,f64
"""count""",1014.0,1014.0,1014.0,"""1014""",1014.0,"""1014""",1014.0,1014.0,1014.0,"""1014""","""1014""","""1014""","""1014""","""1014""","""1014""",1014.0
"""null_count""",0.0,0.0,0.0,"""0""",0.0,"""0""",0.0,0.0,0.0,"""0""","""0""","""0""","""0""","""0""","""0""",0.0
"""mean""",5.383629,1.7127e+18,0.0,,0.021696,,0.010848,0.021696,0.021696,,,,,,,0.010848
"""std""",3.017515,130230000000.0,0.108839,,0.145762,,,,,,,,,,,0.103639
"""min""",0.0,1.7127e+18,-1.0,"""b""\x93NUMPY\x0…",0.0,"""b""\x93NUMPY\x0…",0.0,0.0,0.0,"""b'\x93NUMPY\x0…","""b'\x93NUMPY\x0…","""b'\x93NUMPY\x0…","""b'\x93NUMPY\x0…","""b'pick up the …","""b""\x93NUMPY\x0…",0.0
"""25%""",3.0,1.7127e+18,0.0,,0.0,,,,,,,,,,,0.0
"""50%""",6.0,1.7127e+18,0.0,,0.0,,,,,,,,,,,0.0
"""75%""",8.0,1.7127e+18,0.0,,0.0,,,,,,,,,,,0.0
"""max""",10.0,1.7127e+18,1.0,"""b""\x93NUMPY\x0…",1.0,"""b""\x93NUMPY\x0…",1.0,1.0,1.0,"""b'\x93NUMPY\x0…","""b'\x93NUMPY\x0…","""b'\x93NUMPY\x0…","""b'\x93NUMPY\x0…","""b'sweep the gr…","""b""\x93NUMPY\x0…",1.0


### Lazy Loading Step Data
Al the step data are loaded on demand to save space in memory. You can see the loading time difference between the lazy loading and loading all the data from disk. 

In [6]:
# data for individual episode 
%timeit dataset.get_step_data_by_episode_ids([1,2,3])

3.2 µs ± 368 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [7]:
%timeit dataset.get_step_data_by_episode_ids([1,2,3], as_lazy_frame=False)

2.48 s ± 291 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Data Analytics and Management


### Example 1: Add new Episode information metadata and Filter

Suppose another person collects another set of the data and you want to distinguish who collects what.  


In [8]:
# this loads another 2 episodes 
dataset.load_rtx_episodes(
    name="berkeley_autolab_ur5",
    split="train[3:5]",
    additional_metadata={"collector": "User 2", "custom_tag": "Partition_2"},
)

2024-04-10 05:59:42.147783: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-04-10 06:00:06.033397: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-04-10 06:00:08.650303: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


now the metadata table looks like

In [9]:
dataset.get_episode_info().select(["episode_id", "collector", "custom_tag"])

episode_id,collector,custom_tag
i64,str,str
0,,
1,,
2,,
3,,
4,,
…,…,…
8,,
9,,
10,,
11,"""User 2""","""Partition_2"""


In [10]:
episode_info = dataset.get_episode_info()
# querying non-existent metadata 
metadata = episode_info.filter(episode_info["collector"] == "User_Do_No_Exist")
episodes = dataset.read_by(metadata)

In [11]:
metadata = episode_info.filter(episode_info["custom_tag"] == "Partition_2")
episodes = dataset.read_by(metadata)
episodes, episodes[0].describe()

([<LazyFrame [16 cols, {"episode_id": Int64 … "reward": Float32}] at 0x7F9D34D62350>,
  <LazyFrame [16 cols, {"episode_id": Int64 … "reward": Float32}] at 0x7F9CA2F0CC70>],
 shape: (9, 17)
 ┌───────────┬───────────┬───────────┬───────────┬───┬───────────┬───────────┬───────────┬──────────┐
 │ statistic ┆ episode_i ┆ Timestamp ┆ gripper_c ┆ … ┆ natural_l ┆ natural_l ┆ robot_sta ┆ reward   │
 │ ---       ┆ d         ┆ ---       ┆ losedness ┆   ┆ anguage_e ┆ anguage_i ┆ te        ┆ ---      │
 │ str       ┆ ---       ┆ f64       ┆ _action   ┆   ┆ mbedding  ┆ nstructio ┆ ---       ┆ f64      │
 │           ┆ f64       ┆           ┆ ---       ┆   ┆ ---       ┆ n         ┆ str       ┆          │
 │           ┆           ┆           ┆ f64       ┆   ┆ str       ┆ ---       ┆           ┆          │
 │           ┆           ┆           ┆           ┆   ┆           ┆ str       ┆           ┆          │
 ╞═══════════╪═══════════╪═══════════╪═══════════╪═══╪═══════════╪═══════════╪═══════════╪═══════

### Example 2: Extracts and Searches natural language instructions from step data 

Existing Open-X datasets store natural language instructions for every step, which costs inefficiency and manage complexity. This example shows 
1. how to extracts natural language instruction from existing Open-X datasets
2. search for keywords or **regex** 

In [12]:
id_to_language_instruction = (
    dataset.get_step_data()
    .select("episode_id", "natural_language_instruction")# only interested in episode id and language column
    .collect() # the frame is lazily evaluated at memory when we call collect() 
)

# print out unique natural_language_instructions 
# https://docs.pola.rs/py-polars/html/reference/dataframe/api/polars.DataFrame.unique.html 
id_to_language_instruction.unique(subset=["natural_language_instruction"], maintain_order=True)

episode_id,natural_language_instruction
i64,binary
0,"b""sweep\x20the\x20green\x20cloth\x20to\x20the\x20left\x20side\x20of\x20the\x20table"""
10,"b""put\x20the\x20ranch\x20bottle\x20into\x20the\x20pot"""
12,"b""pick\x20up\x20the\x20blue\x20cup\x20and\x20put\x20it\x20into\x20the\x20brown\x20cup.\x20"""


In [13]:
all_step_data = dataset.get_step_data() # get lazy frame of the entire step-level dataset
id_to_language_instruction = (
    all_step_data
    .select("episode_id", "natural_language_instruction") 
    .group_by("episode_id") # group by unqiue language ids, since language instruction is stored for every step
    .last()  # since instruction is same for all steps in an episode, we can just take the last one
    .collect() # the frame is lazily evaluated until we call collect() 
)

# join with the metadata 
episode_metadata = dataset.get_episode_info().join(id_to_language_instruction, on="episode_id")

In [14]:
import polars as pl 
# Decode byte strings to strings
episode_metadata = episode_metadata.with_columns(episode_metadata['natural_language_instruction'].map_elements(lambda x: x.decode('utf-8')).alias('decoded'))

# Filter rows where 'string_col' contains "example"
result = episode_metadata.filter(
    pl.col("decoded").str.contains("green|red").alias("cloth") # supports regex!
)
print(result.select(["episode_id", "decoded"]))

shape: (6, 2)
┌────────────┬───────────────────────────────────┐
│ episode_id ┆ decoded                           │
│ ---        ┆ ---                               │
│ i64        ┆ str                               │
╞════════════╪═══════════════════════════════════╡
│ 9          ┆ sweep the green cloth to the lef… │
│ 4          ┆ sweep the green cloth to the lef… │
│ 1          ┆ sweep the green cloth to the lef… │
│ 2          ┆ sweep the green cloth to the lef… │
│ 0          ┆ sweep the green cloth to the lef… │
│ 11         ┆ sweep the green cloth to the lef… │
└────────────┴───────────────────────────────────┘


  episode_metadata = episode_metadata.with_columns(episode_metadata['natural_language_instruction'].map_elements(lambda x: x.decode('utf-8')).alias('decoded'))


We use polars as backend for data processing and management. This example demonstrates its capabaility and flexiblitiy. Please refer to https://docs.pola.rs/py-polars/html/reference/lazyframe/index.html all the available interfaces 

## Use, Export and Share

### Huggingface dataset 

In [15]:
import datasets

huggingface_ds = dataset.get_as_huggingface_dataset()

print(f"Hugging face dataset: {huggingface_ds}")

Generating train split: 0 examples [00:00, ? examples/s]

Hugging face dataset: DatasetDict({
    train: Dataset({
        features: ['episode_id', 'Timestamp', 'gripper_closedness_action', 'rotation_delta', 'terminate_episode', 'world_vector', 'is_first', 'is_last', 'is_terminal', 'hand_image', 'image', 'image_with_depth', 'natural_language_embedding', 'natural_language_instruction', 'robot_state', 'reward'],
        num_rows: 1217
    })
})


### Pytorch Dataset

In [16]:
import torch 

metadata = dataset.get_episode_info()
metadata = metadata.filter(metadata["collector"] == "User 2")
pytorch_ds = dataset.pytorch_dataset_builder(
    metadata=metadata
)

# get samples from the dataset
for data in torch.utils.data.DataLoader(
    pytorch_ds,
    batch_size=2,
    collate_fn=lambda x: x,
    sampler=torch.utils.data.RandomSampler(pytorch_ds),
):
    print(data)


Retrieving episode at index 0
Retrieving episode at index 1
[    episode_id            Timestamp  gripper_closedness_action  \
0           11  1712728768601166160                        0.0   
1           11  1712728768839768104                        0.0   
2           11  1712728768983350023                        0.0   
3           11  1712728769119575319                        0.0   
4           11  1712728769256151909                        0.0   
..         ...                  ...                        ...   
75          11  1712728781218967667                        0.0   
76          11  1712728781437725750                        0.0   
77          11  1712728781613065131                        0.0   
78          11  1712728781822132558                        0.0   
79          11  1712728781969148910                        0.0   

                                       rotation_delta  terminate_episode  \
0   b"\x93NUMPY\x01\x00v\x00{'descr': '<f4', 'fort...                0

### As Open-X dataset 
In tensorflow rlds dataset format

In [None]:
dataset.export(format="open-x")

In [7]:
!ls ~/test_dataset/export

dataset_info.json	      demo_ds-train.tfrecord-00004
demo_ds-train.tfrecord-00000  demo_ds-train.tfrecord-00005
demo_ds-train.tfrecord-00001  demo_ds-train.tfrecord-00006
demo_ds-train.tfrecord-00002  features.json
demo_ds-train.tfrecord-00003


In [6]:
!cat ~/test_dataset/export/dataset_info.json

{
  "fileFormat": "tfrecord",
  "name": "demo_ds",
  "splits": [
    {
      "filepathTemplate": "{DATASET}-{SPLIT}.{FILEFORMAT}-{SHARD_INDEX}",
      "name": "train",
      "numBytes": "2417903909",
      "shardLengths": [
        "1",
        "1",
        "1",
        "1",
        "1",
        "1",
        "1"
      ]
    }
  ],
  "version": "0.0.1"
}

## Disk Comparison

In [9]:
# file size of generated rlds 
!du -sh ~/test_dataset/export/

2.3G	/root/test_dataset/export/


In [11]:
# file size of Fog-X dataset
!du -sh ~/test_dataset/demo_ds/

1.6G	/root/test_dataset/demo_ds/
