In this Notebook we show how we preprocess the raw data from the simulation to the DR00NE format.
No need to run this Notebook, it is just for documentation purposes.

## Visualize the raw data

```text
before_processing/
└── episode_data_e0_box/
    ├── cam_0_df<datetime>_frame<framenumber>.bmp   # RGB images
    ├── cam_1_df<datetime>_frame<framenumber>.npy   # Depth/segmentation arrays
    └── follow_data.pkl                             # Pickled drone metadata (speed,...)

In [4]:
import pandas as pd

df = pd.read_pickle("data_track_hawk/before_processing/episode_data_e0_box/follow_data.pkl")

df = pd.DataFrame(df)
print(df.shape)
df.head(6)

(18121, 10)


Unnamed: 0,timestamp,drone_position,drone_velocity,drone_orientation,drone_angular_velocity,target_position,relative_position,distance,direction,action
0,1747689000.0,"[0.0, 0.0, -0.6795402]","[0.0, 0.0, 0.017774258]","[0.0, 0.0, 0.0]","[0.0, 0.0, 0.0]","[0.5043983, -0.6310591, 0.74028504]","[-0.4956017, -0.6310591, 1.4198253]",1.630877,"[-0.3038866, -0.3869446, 0.8705899]","[-0.6685505, -0.8512781, 1.9152979, -268.38477]"
1,1747689000.0,"[-0.00026987414, -0.0003810866, -0.6724362]","[-0.010574108, -0.014459001, 0.13672169]","[0.032587077, -0.044612013, -0.030317433]","[-0.4266491, 0.35720235, -0.66227627]","[0.5043983, -0.6310591, 0.74028504]","[-0.49533185, -0.630678, 1.4127212]",1.624466,"[-0.3049198, -0.38823715, 0.8696527]","[-0.6708236, -0.85412174, 1.9132359, -264.75015]"
2,1747689000.0,"[-0.003396512, -0.0044257888, -0.64988106]","[-0.053015754, -0.06722088, 0.29360485]","[0.05000335, -0.08290108, -0.14293681]","[-0.21585421, 0.13591205, -1.4569824]","[0.5043983, -0.6310591, 0.74028504]","[-0.4922052, -0.62663335, 1.390166]",1.602341,"[-0.3071788, -0.3910736, 0.86758435]","[-0.67579335, -0.86036193, 1.9086856, -251.24174]"
3,1747689000.0,"[-0.012636595, -0.015997177, -0.6078961]","[-0.11510456, -0.14334653, 0.46314162]","[0.036995392, -0.10749053, -0.35148278]","[-0.06451831, 0.014390424, -2.30332]","[0.5043983, -0.6310591, 0.74028504]","[-0.4829651, -0.61506194, 1.3481811]",1.558573,"[-0.30987653, -0.39463153, 0.8650101]","[-0.6817284, -0.8681894, 1.9030222, -226.198]"
4,1747689000.0,"[-0.02734306, -0.03429038, -0.54136026]","[-0.16102979, -0.20030564, 0.8599328]","[0.005643457, -0.11524651, -0.6249931]","[0.026583519, -0.010699692, -2.884274]","[0.5043983, -0.6310591, 0.74028504]","[-0.46825865, -0.59676874, 1.2816453]",1.4893,"[-0.3144153, -0.40070423, 0.86056906]","[-0.6917137, -0.88154936, 1.8932519, -193.3342]"
5,1747689000.0,"[-0.04623399, -0.05782004, -0.4248522]","[-0.19860667, -0.24778861, 1.3567665]","[-0.029834589, -0.10614508, -0.9381215]","[0.07118423, 0.011546697, -3.0767367]","[0.5043983, -0.6310591, 0.74028504]","[-0.44936773, -0.5732391, 1.1651373]",1.374074,"[-0.32703316, -0.41718212, 0.8479436]","[-0.71947294, -0.91780066, 1.865476, -155.70328]"


The `follow_data.pkl` contains all the drone's metadata, such as position, velocity, and orientation, along with the actions taken at each frame.

Gr00t expects the data to be in a very specific format, meaning parquet files for drone actions and states. And mp4 videos for the RGB, depth, and segmentation images. Here is the script that processes the raw data into this format.

In [2]:
from simulation_to_drone_utils import process_episode

In [5]:
# see simulation_to_drone_utils.py
# what we did is create videos in the right encoding, and right FPS (we recorded the data every 2 seconds)
# Preprocessing we did:
# - convert all the images to RGB videos
# - preprocess the depth images with log (we notice that the depth for instance we almost don't see the object, once applying the log scaling we observe it better)  and we clip to 5000

In [3]:
BEFORE = "data_track_hawk/before_processing"
DATA_OUT = "data_track_hawk/dataset_drone_control/data/chunk-000"
VIDEO_OUT = "data_track_hawk/dataset_drone_control/videos/chunk-000"
for i in range(0, 6):
    result = process_episode(i, BEFORE, DATA_OUT, VIDEO_OUT)
    print("Processed episode", result)

                                   observation.state  \
0  [-0.23470455, -0.30573353, -0.04231875, 0.0666...   
1  [0.06265601, 0.07886859, 0.21626657, 0.0112875...   
2  [0.028106226, 0.043671258, -0.5838261, -0.0194...   
3  [0.29830185, 0.9936367, 0.05800056, 0.03864696...   
4  [1.2330841, -0.78998166, -0.027963877, 0.08182...   

                                              action  timestamp  \
0  [0.42617872, 0.55622643, -0.074567065, -263.61...          2   
1  [-0.28168797, -0.37319124, -0.5663822, -210.64...          4   
2  [0.14818843, 0.19872326, -0.11666328, -100.60335]          6   
3   [1.4737692, -0.25660047, 0.11007726, -171.01186]          8   
4      [0.7117931, 0.10794208, 0.05497241, 89.80819]         10   

   annotation.human.action.task_description  task_index  \
0                                         0           0   
1                                         0           0   
2                                         0           0   
3                       

Now the data is saved in the right format and we just need to manually create the `meta` folder where we specify the tasks descriptions, the cameras, and many other dataset informations.