Skip to content

Latest commit

 

History

History
142 lines (113 loc) · 6.24 KB

data_organization.md

File metadata and controls

142 lines (113 loc) · 6.24 KB

Data Organization in pytorch-dense-correspondence

The purpose of this doc is to describe the organization for how data logs are stored, their format, directory structure, etc.

Highest level directory structure

At the highest level we use this directory identifier to identify where data is located:

pdc/

Most data will live in data logs in this location:

pdc/
    logs_proto/

Within logs_proto, there is one directory for each RGBD video log and associated data.

These logs have unique names which correspond to the year, month, day, hour, minute, second they were created (YYYY-MM-DD-hh-mm-ss). For example they might be located at:

pdc/
    logs_proto/
        2018-04-06-11-34-13/
        2018-04-06-11-37-44/
        2018-04-03-19-56-58/

Log directories

A "log" for our purposes is the folder and all of the data inside of it. Inside each log are two sub-directories, raw, and processed:

pdc/
    logs_proto/
        2018-04-06-11-34-13/
            raw/
            processed/

There are a couple reasons for having separate raw/ and processed/ subdirectories:

  1. Having separate raw/ and processed/ subdirectories makes it very easy to:
    • Copy only processed data, no raw data, to different locations for training + testing
    • Easily rm -rf ./processed or even rm -rf ./logs_proto/*/processed in order for us iterate and re-do how we are taking raw data and processing it
  2. There is a large difference, due to the downsampling that we do the data, in their sizes:
    • Raw data of a typical log is, for example at this time of writing, 5.2 GB
    • Processed and downsampled data for downstream training is, for example at this time of writing, only 115 MB

Data within a log

Within these two folders, for each log, at this time of writing we have the following files:

pdc/
    logs_proto/
        2018-04-06-11-34-13/
            raw/
                fusion_2018-04-06-11-34-13.bag
            processed/
                fusion_mesh.ply
                fusion_pointcloud.ply
                tsdf.bin
                images/
                image_masks/
                rendered_images/

The above components are as follows. Note that these components are generated by different components of the data processing pipeline. More on how this data processing pipeline fits together is described in data_pipeline.md

Subdirectory Item Description Generated by
raw/ fusion_YYYY-MM-DD-hh-mm-ss.bag Raw rosbag of all data needed for fusion. The redundant date is a double check that logs are in the right place. spartan
processed/ fusion_mesh.ply Mesh of the scene generated from tsdf-fusion, then marching cubes spartan
processed/ fusion_pointcloud.ply Pointcloud of the scene generated from tsdf-fusion tsdf-fusion (subrepo of spartan)
processed/ tsdf.bin TSDF scene generated from tsdf-fusion tsdf-fusion (subrepo of spartan)
processed/ images/ sub-directory of extracted images and other metadata spartan
processed/ image_masks/ masks of objects of interest pytorch-dense-correspondence
processed/ rendered_images/ rendered depth images against the fused scene mesh pytorch-dense-correspondence

Data within image folders

Within the image folders we have the following layout, which completes the layout of a log folder structure:

pdc/
    logs_proto/
        2018-04-06-11-34-13/
            raw/
                fusion_2018-04-06-11-34-13.bag
            processed/
                fusion_mesh.ply
                fusion_pointcloud.ply
                tsdf.bin
                images/
                    000000_rgb.png
                    000000_depth.png
                    ...
                    002553_rgb.png
                    002553_depth.png
                    ...
                    camera_info.yaml
                    pose_data.yaml
                image_masks/
                    000000_mask.png
                    000000_visible_mask.png
                    ...
                    002553_mask.png
                    002553_visible_mask.png
                rendered_images/
                    000000_depth.png
                    000000_depth_cropped.png
                    ...
                    002553_depth.png
                    002553_depth_cropped.png

Note that images that remain in processed/ data have already been downsampled, and so not every number exists, i.e. 000000_rgb.png, then 000017_rgb.png. All original images exist, however, in the original rosbag within raw/.

Notes on the data within image folders serve different purposes and are generated from different components as follows:

Subdirectory Item Description Generated by
processed/images/ ######_rgb.png Raw RGB image. spartan
processed/images/ ######_depth.png "Raw" depth image. Note that this has been rectified and registered to be in the frame of the RGB image spartan
processed/images/ camera_info.yaml Camera intrinsics for the rgbd camera (rectification and color-depth registration has already happened, so only the intrinsics matrix is needed at this point in the pipeline). spartan
processed/images/ pose_data.yaml Pose of the camera in robot base frame as calculated from forward kinematics of the 7-dof robot arm. spartan
processed/image_masks/ ######_mask.png Mask of objects of interest, with 0 as background change detection code inside pytorch-dense-correspondence
processed/image_masks/ ######_visible_mask.png Masked parts are made to look white, so are easily human-readable change detection code inside pytorch-dense-correspondence
processed/rendered_images/ ######_depth.png Rendered depth image against the full fused mesh of the scene change detection code inside pytorch-dense-correspondence
processed/rendered_images/ ######_depth_cropped.png Rendered depth image against the change-detected part of the fused mesh of the scene change detection code inside pytorch-dense-correspondence