#Merge a set of bag files to a large single bag file#

Description: The beobot robot has 5 camera streams and a pointcloud stream. The data is recorded as a rosbag file and consists of a stream of images and pointclouds along with other metadata and streams. The images are recorded at 15Hz and the pointclouds are recorded at 10Hz. Since the bandwidth on a single computer is limited, we placed 5 computers on the robot. The LiDAR and a camera sensor is connected to a computer and the remaining 4 cameras are connected to 4 other computers (each camera connected to a single computer). A specific data collection in an arbitrary day would consist of 5 folders: cam1, cam2, cam3, cam4 and cam5. Each of these folders would consist of a set of .bag corresponding to a data collection "session" on that day. Ideally, you should find an equal number of bag files in each of these folders, but it might so happen that a connection to a specific camera got disrupted due to reasons like the connector getting loose etc. The objective of this issue is to combine the streams of all the bag files into a single bag file, corresponding to a specific session, by aligning the timestamps since all the bag files are recorded separately. Please state your assumptions and there are many of ways of solving this problem. Your program has to be robust to missing bag files for a camera for that specific session, different bag file lengths/number of messages for different cameras etc.

Github Issue link: https://github.com/klekkala/vision_toolkit/issues/2

Input folder: raw_data/2023_\*_\*/cam{1..5}/*.bag

Output folder: merged_bag/2023_\*_\*/session*.bag



In [None]:
!pip install bagpy
!pip install rosbags
#Use pcl_ros

In [None]:
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/USCilab3D/data

## Primary objective
The goal of this program is to merge multiple bag files into a singular bag file.

Our approach to solve this problem is by implmenting our own python script that will allow us to merge multiple bag files using the rosbag Python API. We will also be importing bagpy and read the content of these bag files.

First step will be by creating a directory called *merged_bag/*. This will be the place where we store our results. Then proceed to move onto the directory.

In [None]:
%cd /content/drive/MyDrive
%mkdir merged_bag
%cd /content/drive/MyDrive/merged_bag
%ls

/content/drive/MyDrive
/content/drive/MyDrive/merged_bag


## The Python script
***IMPORTANT***: The job of ensuring a good and valid bag file is up to the user. The script is designed to merge multiple bag files and store the output to a designated directory. The job of ensuring the user receives the desired bag file will be the user's responsibility.
\
\
The following code is the script. I have written it and stored it in the merged_bag/ directory we created earlier. You can comment out Line 2 to see the Python code with color.

I used *os* and *argparse* to help parsing user input and creating the destination directory. *tqdm* is utilized for accessibility. It will prompt user with progress bar to ensure the program is running smoothly.

The offset is calculated using a reference time stamp which we calculated with the first given bag file in input. This aligns all of the timestamps to ensure even when the hardware itself has different system time, the script will still treat them as the same session.


In [None]:
## Feel free to comment out the write file line to see the code with color
# %%writefile merge.py
## Author: Leo Lee
## Date: 1/19/2024
import bagpy
import rosbag
import os
import argparse
from tqdm import tqdm

## I have used following list of resources as inspiration for my code
## https://gist.github.com/NikolausDemmel/8944211#file-rosbag_merge-py
## https://answers.ros.org/question/318536/understanding-rosbag-timestamps/
## https://answers.ros.org/question/10683/is-there-a-way-to-merge-bag-files/
## https://wiki.ros.org/rosbag/Cookbook
## https://chat.openai.com/
## https://docs.python.org/3/library/os.html#os.makedirs
## https://jmscslgroup.github.io/bagpy/Reading_bagfiles_from_cloud.html
## The bagpy documentation is absolutely atrocious ^^

def combine_bag_files(output_bag_file, *input_bag_files):
  ## First we define a reference time stamp for baseline.
  ## We use the first input file in the parameter
  reference_bag = input_bag_files[0]
  with rosbag.Bag(reference_bag, 'r') as ref_bag:
    _, _, ref_start_time = next(ref_bag.read_messages())

  ## Then we create an offset for each of the input files
  ## The difference of the time will be stored in a hash map
  ## This offset will allow us to align the time correctly
  offsets = {}
  for input_bag in tqdm(input_bag_files, desc='Calculating offset'):
    with rosbag.Bag(input_bag, 'r') as bag:
      _, _, bag_start_time = next(bag.read_messages())
      offset = ref_start_time - bag_start_time
      offsets[input_bag] = offset

  ## Open the output file with write permission
  ## Write the topic, message, and timestamp into the new output file
  ## Adjust the time by the offset
  with rosbag.Bag(output_bag_file, 'w') as output:
    for infile in input_bag_files:
      with rosbag.Bag(infile, 'r') as in_bag:
        for topic, msg, t in tqdm(in_bag, desc='Writing new messages from {}'.format(infile)):
          new_t = t + offsets[infile]
          output.write(topic, msg, new_t)

if __name__ == "__main__":
    # Set up command-line argument parser
    parser = argparse.ArgumentParser(description='Merge and synchronize multiple bag files.')
    parser.add_argument('-s', '--session', nargs=1, required = True, help='Session number')
    parser.add_argument('-o', '--output', nargs=1, required=True, help='Output bag file')
    parser.add_argument('-i', '--input', nargs='+', required=True, help='Input bag files')

    # Parse command-line arguments
    args = parser.parse_args()
    if not os.path.exists(args.output[0]):
      os.makedirs(args.output[0])
    # Combine the bag files
    combine_bag_files(args.output[0]+'session{}.bag'.format(args.session[0]), *args.input)

Writing merge.py


## Access the data
We will move into the folder where the bag files are located. This is usually located at */content/drive/MyDrive/USCilab3D/data/raw_data/*. You will need to choose the folder yourself.

In [None]:
%ls
%cd /content/drive/MyDrive/USCilab3D/data/raw_data/2023_03_28/cam1/
%ls -l
%cd /content/drive/MyDrive/USCilab3D/data/raw_data/2023_03_28/cam2/
%ls -l
%cd /content/drive/MyDrive/USCilab3D/data/raw_data/2023_03_28/cam3/
%ls -l
%cd /content/drive/MyDrive/USCilab3D/data/raw_data/2023_03_28/cam4/
%ls -l
%cd /content/drive/MyDrive/USCilab3D/data/raw_data/2023_03_28/cam5/
%ls -l
%cd /content/drive/MyDrive/USCilab3D/data/raw_data/2023_03_28/

merge.py
/content/drive/.shortcut-targets-by-id/186BsJkqMgvSzbdsvqM3ay8XFeupZu6Yg/USCilab3D/data/raw_data/2023_03_28/cam1
total 16554414
-r-------- 1 root root 9506924179 Jan 12 00:31 test_2023-03-28-11-29-55.bag
dr-x------ 2 root root       4096 Jan 12 19:00 [0m[01;34mtest_2023-03-28-11-39-28[0m/
-r-------- 1 root root 3329860226 Jan 12 00:28 test_2023-03-28-11-39-28.bag
-r-------- 1 root root 4114930354 Jan 12 00:29 test_2023-03-28-11-46-59.bag
/content/drive/.shortcut-targets-by-id/186BsJkqMgvSzbdsvqM3ay8XFeupZu6Yg/USCilab3D/data/raw_data/2023_03_28/cam2
total 1519184
-r-------- 1 root root 1009320349 Jan 12 00:31 test_2023-12-30-16-46-21.bag
dr-x------ 2 root root       4096 Jan 12 19:00 [0m[01;34mtest_2023-12-30-16-55-53[0m/
-r-------- 1 root root  288579182 Jan 12 00:31 test_2023-12-30-16-55-53.bag
-r-------- 1 root root  257739879 Jan 12 00:31 test_2023-12-30-17-03-25.bag
/content/drive/.shortcut-targets-by-id/186BsJkqMgvSzbdsvqM3ay8XFeupZu6Yg/USCilab3D/data/raw_data/2023_

## Running the script
The script has 3 required arguments: **Session**, **Output**, **Input**. All of the arguments are required and session number and output are limited to 1 argument only.

**Session**: An integer. It will represent the merged bag file's session.

**Output**: A directory. This should be the directory where the output will be. Usually it should be in /content/drive/MyDrive/merged_bag/2023_\*_\*/. The user needs write permission to this folder.

**Input**: (.bag) files. These files should be the files the user want to merge. There are can any amount of files under this flag as long as they are all (.bag) files.

**Output file**: The output file is called 'session\*.bag'. It can be found in the output directory the user specified when running the script.

This command should create a file of around 12GB size in the merge_bag/2023_\*_\*/ directory.

In [None]:
## Testing, only 3 bag files
#!python3 /content/drive/MyDrive/merged_bag/merge.py -s 1 -o /content/drive/MyDrive/merged_bag/2023_03_28/ -i ./cam1/test_2023-03-28-11-29-55.bag ./cam2/test_2023-12-30-16-46-21.bag ./cam3/test_2023-12-30-16-46-21.bag

## First session
!python3 /content/drive/MyDrive/merged_bag/merge.py -s 1 -o /content/drive/MyDrive/merged_bag/2023_03_28/ -i ./cam1/test_2023-03-28-11-29-55.bag ./cam2/test_2023-12-30-16-46-21.bag ./cam3/test_2023-12-30-16-46-21.bag ./cam4/test_2023-03-02-05-01-04.bag ./cam5/test_2023-03-02-05-00-52.bag

## Second session
# !python3 /content/drive/MyDrive/merged_bag/merge.py -s 2 -o /content/drive/MyDrive/merged_bag/2023_03_28/ -i ./cam1/test_2023-03-28-11-39-28.bag ./cam2/test_2023-12-30-16-55-53.bag ./cam3/test_2023-12-30-16-55-53.bag ./cam4/test_2023-03-02-05-10-37.bag ./cam5/test_2023-03-02-05-10-25.bag

## Third session
# !python3 /content/drive/MyDrive/merged_bag/merge.py -s 3 -o /content/drive/MyDrive/merged_bag/2023_03_28/ -i ./cam1/test_2023-03-28-11-46-59.bag ./cam2/test_2023-12-30-17-03-25.bag ./cam3/test_2023-12-30-17-03-25.bag ./cam4/test_2023-03-02-05-18-08.bag ./cam5/test_2023-03-02-05-17-56.bag

Calculating offset: 100% 5/5 [00:27<00:00,  5.53s/it]
Writing new messages from ./cam1/test_2023-03-28-11-29-55.bag: 323504it [04:05, 1317.96it/s]
Writing new messages from ./cam2/test_2023-12-30-16-46-21.bag: 223737it [01:25, 2617.61it/s]
Writing new messages from ./cam3/test_2023-12-30-16-46-21.bag: 226434it [01:19, 2862.91it/s]
Writing new messages from ./cam4/test_2023-03-02-05-01-04.bag: 221630it [01:30, 2453.65it/s]
Writing new messages from ./cam5/test_2023-03-02-05-00-52.bag: 224150it [01:19, 2806.04it/s]


## Checking the result
Great! Now the files have been merged and created. We will now check the result using an online bag file viewer called Foxglove. It can be found at https://foxglove.dev/ros. Download the file from Google Drive and upload it to the website.

In [None]:
%cd /content/drive/MyDrive/merged_bag/2023_03_28/
%ls -l

/content/drive/MyDrive/merged_bag/2023_03_28
total 13215761
-rw------- 1 root root 13532938455 Jan 20 01:59 session1.bag


## Conclusion
This script will merge user inputted files into an output file located at the user inputted output directory. The name of the file will be based on the session number the user inputted as well. Some of the main drawbacks regarding this script is the lack of automation. I was hoping to write a script that will open the 2023\_\*\_\*/ folder automatically and merge the files inside each of the 5 camera folders. However, some of the main difficulties I encountered are:

- Timestamps inside the folders are very off. Some of the files are dated in March and some are dated in December. I emailed Henghui and Henghui said this is the result of hardware having different system time onboard. To resolve this issue, I calculated the offset and applied them to the messages.

- If a camera folder, let's say cam2, is missing a session 2 bag file. How can the script tell that a bag file with time stamp in December 2023 should be in the same session as the session 3 bag file from the remaining other camera folders. The timestamps are so off to begin with that I found difficulties working with an autonomous option. An attempt to solve this issue is to do this process manually. By having user inputs, the script can process these bag files without worrying if the files are incorrect. The burden of having a valid bag file will fall on the user and not the script.

## Possible Errors
There are some errors that occur when using the Foxglove studio is view the merged bag file since the file is so large. An error message such as "The requested file could not be read, typically due to permission problems that have occurred after a reference to a file was acquired." could appear. This is mainly due to the restrain with the arrayBuffer built into the browser. More info can be found here: https://stackoverflow.com/questions/63376248/domexception-the-requested-file-could-not-be-read-typically-due-to-permission.  I believe this is an issue with the ROS viewer by Foxglove rather than an issue with the script. Moreover, I am aware of the compression flag for writing the bag file but I am not sure how it will affect the quality of the data stored inside each bag files.

**Update**: Using the desktop version of Foxglove Studio seems to work a lot better.


## Future goals
I am hoping to alter the script in a way such that the script will take the size of the files into account and group the files in similar size into a session. However, possible errors such as similar sizes files are misplaced in the wrong bucket can arise. Or utilize the timestamp of when these files are created to help sort the files into buckets. Interestingly, the online viewer Foxglove can still view the image messages even if they are out of sequence. I believe this is because it utilizes sql to sort the messages before displaying. I read this somewhere in the ROS forum but is not too sure. I would like to also sort the messages after merging to decrease the load on the Studio.