# Example 1: Grab metadata from a dataset

In this example, we will learn how to get metadata of each file in dataset.  
The metadata contains deterministic information (e.g. recording date, duration, etc.)
as well as heuristic information such as tags.

## Grab a DataFrame of metadata from a database

Firstly, let's import a metadata handler from the toolkit and initialize it.

In [1]:
from dwtk.db import V3DBHandler as DBHandler

db_handler = DBHandler(
    db_class='meta',
    db_host='/data_pool_1/small_DrivingBehaviorDatabase/dwtk.db',
    base_dir_path='/data_pool_1/small_DrivingBehaviorDatabase',
    read_on_init=True
)

If you set `read_on_init` to `True`, the entire contents in the database will be loaded
and stored into the local memory as a Pandas DataFrame.  
You can access to the contents as follows.

In [2]:
db_handler.df

Unnamed: 0,description,database_id,record_id,data_type,path,start_timestamp,end_timestamp,content_type,contents,msg_type,msg_md5sum,count,frequency,tags,uuid_in_df
0,Driving Database,Driving Behavior Database,W07_17000000020000001255,raw_data,records/W07_17000000020000001255/data/records.bag,1.519881e+09,1.519885e+09,application/rosbag,/vehicle/analog/speed_pulse,std_msgs/UInt8,7c8164229e7d2c17eb95e9231617fdee,73200.0,20.000019,vehicle:analog:speed:pulse,014d15e7b0287f2ae6d632029c3515ae
1,Driving Database,Driving Behavior Database,016_00000000030000000240,raw_data,records/016_00000000030000000240/data/records.bag,1.489728e+09,1.489729e+09,application/rosbag,/vehicle/analog/speed_pulse,std_msgs/UInt8,7c8164229e7d2c17eb95e9231617fdee,4800.0,20.000019,vehicle:analog:speed:pulse,034d9659e19ae6450a3ca8ffb15d0341
2,Driving Database,Driving Behavior Database,016_00000000030000000240,raw_data,records/016_00000000030000000240/data/camera_0...,1.489728e+09,1.489729e+09,video/mp4,camera/driver-foot,,,,,camera:driver:foot:image,037128ab35dff0c96c30b6d9df55b68d
3,Driving Database,Driving Behavior Database,253_16000000080000000747,raw_data,records/253_16000000080000000747/data/records.bag,1.506060e+09,1.506064e+09,application/rosbag,/gps_m2/vehicle/direction,std_msgs/Float32,73fcbf46b49191e672908e50842a83d4,45160.0,10.000010,gps:m2:vehicle:direction,046baa361040ff81730714c50a60b5be
4,Driving Database,Driving Behavior Database,253_16000000080000000747,raw_data,records/253_16000000080000000747/data/records.bag,1.506060e+09,1.506064e+09,application/rosbag,/driver/face_yaw,std_msgs/Float32,73fcbf46b49191e672908e50842a83d4,110441.0,33.333365,driver:face:yaw,0566b705a3443d7f18ac7435c9245e8e
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
187,Driving Database,Driving Behavior Database,253_16000000080000000747,raw_data,records/253_16000000080000000747/data/camera_0...,1.506060e+09,1.506064e+09,text/csv,camera/rear-center,,,,,camera:rear:center:timestamps,fc99d95061e9648ca85f05c7f68dae2f
188,Driving Database,Driving Behavior Database,W07_17000000020000001255,raw_data,records/W07_17000000020000001255/data/camera_0...,1.519881e+09,1.519885e+09,video/mp4,camera/front-center,,,,,camera:front:center:image,fd6e8b96e8ca96dfe23373f4fabddde3
189,Driving Database,Driving Behavior Database,016_00000000030000000240,raw_data,records/016_00000000030000000240/data/camera_0...,1.489728e+09,1.489729e+09,video/mp4,camera/rear-right,,,,,camera:rear:right:image,fe09dc6169fd6ea290c07b733ec70356
190,Driving Database,Driving Behavior Database,016_00000000030000000215,raw_data,records/016_00000000030000000215/data/records....,1.489455e+09,1.489456e+09,application/rosbag,/vehicle/extracted_can/speed,std_msgs/UInt8,7c8164229e7d2c17eb95e9231617fdee,3514.0,2.000000,vehicle:extracted:can:speed,fe7450a89dd571464161723be5014b18


When you want to handle a very large dataset, the metadata contains huge amount of information and as a result,
it takes a long time to load all of it.  
However, if you want to grab only a limited scope (e.g. metadata of files tagged 'camera' and 'front'),
it is costful to load all the dataset and search items on the loaded dataframe.  
Therefore, the toolkit provides a method to execute a sql query before loading the database
and limit the items to load.  

To execute a sql query before loading metadata, you should set `read_on_init` option to `False` as follows.

In [3]:
db_handler = DBHandler( 
    db_class='meta',
    db_host='/data_pool_1/small_DrivingBehaviorDatabase/dwtk.db',
    base_dir_path='/data_pool_1/small_DrivingBehaviorDatabase',
    read_on_init=False
)
db_handler.read(where='start_timestamp > 1500000000')
print('# of metadata: {}'.format(len(db_handler.df)))
db_handler.read(where='tags like "%camera%" and tags like "%front%"')
print('# of metadata: {}'.format(len(db_handler.df)))
db_handler.read(where='tags like "%can%" and tags like "%steering%"')
print('# of metadata: {}'.format(len(db_handler.df)))
db_handler.read(where='tags like "%can%" or tags like "%camera%"')
print('# of metadata: {}'.format(len(db_handler.df)))

# of metadata: 89
# of metadata: 24
# of metadata: 4
# of metadata: 84


## Get list for record_id corresponding to metadata

Each row of the dataframe acquired above corresponds to a file in the dataset.  
If you wan to know which record-id the file belongs to, you can get a dataframe of records as follows.

In [4]:
db_handler.record_id_df

Unnamed: 0,record_id,start_timestamp,end_timestamp,tags,duration
0,016_00000000030000000215,1489455000.0,1489456000.0,"[driver, center, angle, front, brake, position...",1755.957
1,016_00000000030000000240,1489728000.0,1489729000.0,"[driver, center, angle, front, brake, position...",240.957
2,253_16000000080000000747,1506060000.0,1506064000.0,"[driver, center, angle, front, brake, position...",4512.957
3,W07_17000000020000001255,1519881000.0,1519885000.0,"[timestamps, center, camera, image, front, rig...",3478.957


You can get list of contents as well.

In [5]:
db_handler.content_df



Unnamed: 0,record_id,path,content,msg_type,tag
0,016_00000000030000000240,/data_pool_1/small_DrivingBehaviorDatabase/rec...,camera/driver-foot,,"[camera, image, foot, driver]"
1,016_00000000030000000215,/data_pool_1/small_DrivingBehaviorDatabase/rec...,camera/driver-face-black_and_white,,"[black_and_white, driver, camera, image, face]"
2,016_00000000030000000240,/data_pool_1/small_DrivingBehaviorDatabase/rec...,camera/rear-right,,"[camera, timestamps, rear, right]"
3,016_00000000030000000240,/data_pool_1/small_DrivingBehaviorDatabase/rec...,camera/rear-center,,"[camera, timestamps, rear, center]"
4,016_00000000030000000215,/data_pool_1/small_DrivingBehaviorDatabase/rec...,camera/front-left,,"[camera, image, front, left]"
...,...,...,...,...,...
79,253_16000000080000000747,/data_pool_1/small_DrivingBehaviorDatabase/rec...,camera/rear-center,,"[camera, timestamps, rear, center]"
80,W07_17000000020000001255,/data_pool_1/small_DrivingBehaviorDatabase/rec...,camera/front-center,,"[camera, image, front, center]"
81,016_00000000030000000240,/data_pool_1/small_DrivingBehaviorDatabase/rec...,camera/rear-right,,"[camera, image, rear, right]"
82,016_00000000030000000215,/data_pool_1/small_DrivingBehaviorDatabase/rec...,/vehicle/extracted_can/speed,std_msgs/UInt8,"[speed, extracted, can, vehicle]"
