# Preprocessing
This file is structured to guide through the preprocessing and setup stages required for training a model on a dataset.

## 1. Preparation 
Our task this step is to change the raw data into several temporal dynamic graphs saved in CSV.

- Download the **Annotations/Activities** from [Drive&Act](https://driveandact.com/), save the file `iccv_activities_3s\activities_3s\kinect_color\objectlevel.chunks_90.csv` and `iccv_activities_3s\activities_3s\kinect_color\tasklevel.chunks_90.csv` in the folder Dataset.

- run this file with the environment `dgb` we create earlier
- download all the dependency we need, cuda 11.3 is required in advance

In [4]:
# ! pip install numba transformers
import pandas as pd # type: ignore
import numpy as np
import json
import time

## 2. dictionary generation & graph example
we need to extract the dynamic graph from the raw data. Let's take a brief look at our data.

In [5]:
df = pd.read_csv('Dataset/objectlevel.chunks_90.csv')

file_ids = df['file_id'].unique()
participant_id = df['participant_id'].unique()
print(f'there are {len(participant_id)} participants, along with {len(file_ids)} videos in the dataset.')


df = df[df['object'] != 'no_object']
display(df)

there are 15 participants, along with 29 videos in the dataset.


Unnamed: 0,participant_id,file_id,annotation_id,frame_start,frame_end,activity,object,location,chunk_id
0,1,vp1/run1b_2018-05-29-14-02-47.kinect_color,0,175,188,reaching_for,seatbelt,left_backseat,0
1,1,vp1/run1b_2018-05-29-14-02-47.kinect_color,1,189,208,retracting_from,seatbelt,left_backseat,0
2,1,vp1/run1b_2018-05-29-14-02-47.kinect_color,2,208,230,interacting,seatbelt,no_location,0
3,1,vp1/run1b_2018-05-29-14-02-47.kinect_color,3,240,258,reaching_for,multimedia_display,center_console_front,0
4,1,vp1/run1b_2018-05-29-14-02-47.kinect_color,4,3011,3031,reaching_for,automation_button,center_console_back,0
...,...,...,...,...,...,...,...,...,...
9957,15,vp15/run2_2018-05-30-13-34-33.kinect_color,169,17848,17864,reaching_for,multimedia_display,no_location,0
9959,15,vp15/run2_2018-05-30-13-34-33.kinect_color,172,19035,19052,reaching_for,multimedia_display,no_location,0
9960,15,vp15/run2_2018-05-30-13-34-33.kinect_color,173,19052,19058,interacting,multimedia_display,no_location,0
9962,15,vp15/run2_2018-05-30-13-34-33.kinect_color,175,19360,19371,reaching_for,automation_button,no_location,0


To creat a dynamic graph suitable for further learning, we need to build a table whose head is showon as below:
| u | v | timestamp | state_label | feature |
|---|---|-----------|-------------|---------|
| 1 | 2 | 127       | 0           | 0.27    |

- **u:**            the acvtion generator, usually the participate
- **v:**            the object operated
- **timestamp:**:   frame the action happens
- **stable_label:** stable vector of the edge
- **feature:**      the weight vector of the edge, here it is the appearance percentage value of the `location` column.
### - State
now we need to extract the state of each nodec from `tasklevel.chunks_90.csv` and print them out

In [8]:
df_task = pd.read_csv('Dataset/midlevel.chunks_90.csv')

activity_tasks = df_task['activity'].unique()
print(f"there are {len(activity_tasks)} different behaviors")
display(activity_tasks)

there are 39 different behaviors


array(['standing_by_the_door', 'closing_door_outside',
       'opening_door_outside', 'entering_car', 'closing_door_inside',
       'fastening_seat_belt', 'using_multimedia_display', 'sitting_still',
       'pressing_automation_button', 'fetching_an_object',
       'opening_laptop', 'working_on_laptop', 'interacting_with_phone',
       'closing_laptop', 'placing_an_object', 'unfastening_seat_belt',
       'putting_on_jacket', 'opening_bottle', 'drinking',
       'closing_bottle', 'looking_or_moving_around (e.g. searching)',
       'preparing_food', 'eating', 'looking_back_left_shoulder',
       'taking_off_sunglasses', 'putting_on_sunglasses',
       'reading_newspaper', 'writing', 'talking_on_phone',
       'reading_magazine', 'taking_off_jacket', 'opening_door_inside',
       'exiting_car', 'opening_backpack', 'closing_backpack',
       'putting_laptop_into_backpack', 'looking_back_right_shoulder',
       'taking_laptop_from_backpack', 'moving_towards_door'], dtype=object)

To eusure a efficient and general behavior predicting result, we would like to classify these behaviors into different states

In [16]:
from transformers import pipeline

classifier = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')

candidate_labels = ('driving ', 'independent','eat or drink','others object concerned')
results = {}
for task in activity_tasks:
    results = classifier(task, candidate_labels)
    results[task] = results['labels'][0]
    print(f'{task} belongs to {results["labels"][0]}')
  

standing_by_the_door belongs to others object concerned
closing_door_outside belongs to others object concerned
opening_door_outside belongs to others object concerned
entering_car belongs to driving 
closing_door_inside belongs to others object concerned
fastening_seat_belt belongs to others object concerned
using_multimedia_display belongs to others object concerned
sitting_still belongs to independent
pressing_automation_button belongs to others object concerned
fetching_an_object belongs to others object concerned
opening_laptop belongs to others object concerned
working_on_laptop belongs to others object concerned
interacting_with_phone belongs to others object concerned
closing_laptop belongs to others object concerned
placing_an_object belongs to others object concerned
unfastening_seat_belt belongs to others object concerned
putting_on_jacket belongs to others object concerned
opening_bottle belongs to eat or drink
drinking belongs to eat or drink
closing_bottle belongs to othe

In [4]:

# driving concerned behaviors
state_1 = ['standing_by_the_door', 'closing_door_outside','opening_door_outside', 'entering_car', 'closing_door_inside','fastening_seat_belt', 'moving_towards_door', 'unfastening_seat_belt','opening_door_inside','exiting_car']
# independent behaviors
state_2 = ['sitting_still','looking_or_moving_around (e.g. searching)','looking_back_left_shoulder', 'looking_back_right_shoulder']

# eat or drink concerned behaviors
state_3 = ['preparing_food', 'eating','opening_bottle', 'drinking','closing_bottle',]

# other object concerned behaviors
state_4 = ['using_multimedia_display', 'sitting_still','using_multimedia_display','pressing_automation_button', 'fetching_an_object','opening_laptop', 'working_on_laptop', 'interacting_with_phone','closing_laptop', 'placing_an_object','putting_on_jacket','taking_off_sunglasses', 'putting_on_sunglasses', 'reading_newspaper', 'writing', 'talking_on_phone','reading_magazine', 'taking_off_jacket', 'opening_backpack', 'closing_backpack','putting_laptop_into_backpack','taking_laptop_from_backpack']

df_task['state'] = df_task['activity'].apply(lambda x: 1 if x in state_1 else (2 if x in state_2 else (3 if x in state_3 else (4 if x in state_4 else None))))
df_task['state'] = df_task['state'].astype(int)
display(df_task)

Unnamed: 0,participant_id,file_id,annotation_id,frame_start,frame_end,activity,chunk_id,state
0,1,vp1/run1b_2018-05-29-14-02-47.kinect_color,0,40,58,standing_by_the_door,0,1
1,1,vp1/run1b_2018-05-29-14-02-47.kinect_color,1,58,82,closing_door_outside,0,1
2,1,vp1/run1b_2018-05-29-14-02-47.kinect_color,2,83,102,standing_by_the_door,0,1
3,1,vp1/run1b_2018-05-29-14-02-47.kinect_color,3,102,130,opening_door_outside,0,1
4,1,vp1/run1b_2018-05-29-14-02-47.kinect_color,4,130,156,entering_car,0,1
...,...,...,...,...,...,...,...,...
10328,15,vp15/run2_2018-05-30-13-34-33.kinect_color,125,19595,19604,unfastening_seat_belt,1,1
10329,15,vp15/run2_2018-05-30-13-34-33.kinect_color,126,19604,19635,opening_door_inside,0,1
10330,15,vp15/run2_2018-05-30-13-34-33.kinect_color,127,19635,19680,exiting_car,0,1
10331,15,vp15/run2_2018-05-30-13-34-33.kinect_color,127,19680,19702,exiting_car,1,1


### - Nodes and features
Here we need to create dictionary for embedded nodes and feature.
To clearify the process, here we would take the first video as a explain and explain each step resprectively.

In [5]:
# take the first video as example
filtered_df = df[df['file_id'] == file_ids[0]]
filtered_df_task = df_task[df_task['file_id'] == file_ids[0]]
num_rows = filtered_df.shape[0]
print(f"There are {num_rows} rows in the video {file_ids[0]}.")

# NOTE: Node dictionary
# collection of objects dictinoary
objects = filtered_df['object'].unique()
objects = np.insert(objects, 0, 'participant')
nodes_dict = {object: index for index, object in enumerate(objects)}
print("numerate all the objects included and particitant in the video as well as their index in the nodes dictionary.")
print(nodes_dict)

print("*"*200)

#NOTE: feature dictionary
# collection of locations dictionary
locations = filtered_df['location'].unique()
location_counts = filtered_df['location'].value_counts()
print("the dictionary of locations and their counts percentage in the video.")
location_counts_dict = {location: count for location, count in (location_counts/num_rows).items()}
print(location_counts_dict)

There are 261 rows in the video vp1/run1b_2018-05-29-14-02-47.kinect_color.
numerate all the objects included and particitant in the video as well as their index in the nodes dictionary.
{'participant': 0, 'seatbelt': 1, 'multimedia_display': 2, 'automation_button': 3, 'laptop': 4, 'phone': 5, 'jacket': 6, 'bottle': 7, 'food': 8, 'glasses_case': 9, 'glasses': 10, 'newspaper': 11, 'writing_pad': 12, 'magazine': 13}
********************************************************************************************************************************************************************************************************
the dictionary of locations and their counts percentage in the video.
{'steering_wheel': 0.41762452107279696, 'no_location': 0.16091954022988506, 'lap': 0.1111111111111111, 'center_console_back': 0.09578544061302682, 'center_console_front': 0.0842911877394636, 'codriver_footwell': 0.038314176245210725, 'right_backseat': 0.03065134099616858, 'left_backseat': 0.02681992337164751, 

Now we could build up the new dictionary. Notice that the given data only mark the timestamp with `frame_start` and `frame_end` therefore we should fill the rows in between as well.

In [6]:
data = {'u': [], 'v': [], 'timestamp': [],'state_label':[],'feature':[]}
graph_df = pd.DataFrame(data)
for index, row in filtered_df.iterrows():

    if row['object'] != 'no_object' :
        # new_row = {'u': 0, 'v': nodes_dict[str(row['object'])], 'timestamp':row['frame_start'],'state_label':0,'feature':location_counts_dict[row['location']]}
        
        frame_start = row['frame_start']
        frame_end = row['frame_end']

        for timestamp in range(frame_start, frame_end+1):
            new_row = {'u': 0, 'v': nodes_dict[str(row['object'])], 'timestamp':timestamp,'state_label':0,'feature':location_counts_dict[row['location']]}
            filtered_tasks = df_task[(df_task['frame_start'] <= timestamp) & (df_task['frame_end'] >= timestamp)]['state'].values
            if len(filtered_tasks) > 0:
                state_label = filtered_tasks[0]
                new_row['state_label'] = state_label
            else:
                new_row['state_label'] = state_label
                print(f"No matching task for timestamp {timestamp},use the former one.")
            graph_df.loc[len(graph_df)] = new_row

    
    pass

graph_df.drop_duplicates(inplace=True)
display(graph_df)
graph_df.to_csv('data4train/graph_data.csv', index=False)



Unnamed: 0,u,v,timestamp,state_label,feature
0,0,1,175,1,0.02682
1,0,1,176,1,0.02682
2,0,1,177,1,0.02682
3,0,1,178,1,0.02682
4,0,1,179,1,0.02682
...,...,...,...,...,...
9026,0,1,18939,1,0.02682
9027,0,1,18940,1,0.02682
9028,0,1,18941,1,0.02682
9029,0,1,18942,1,0.02682


## 3. Graph generation
Now we could extract the dynamic graph of each video respectively, and save them all in `data4train`. 
The dictionary of features and nodes will be stored in the folder `dict`.

In [7]:


start_time = time.time()
combined_nodes = {}
combined_features = {}
for count, file_id in enumerate(file_ids):
    filtered_df = df[df['file_id'] == file_id]
    num_rows = filtered_df.shape[0]
    print(f"There are {num_rows} rows in the video {file_id}.")

    # list the state
    filtered_df_task = df_task[df_task['file_id'] == file_id]

    # set name for this graph data
    file_id = file_id.replace('/', '')
    nameid = file_id[:8].replace('b', '').replace('_', '')

    # NOTE: Node dictionary
    # collection of objects dictionary
    objects = filtered_df['object'].unique()
    objects = np.insert(objects, 0, 'participant')
    nodes_dict = {object: index for index, object in enumerate(objects)}
    
    combined_nodes.update({nameid: nodes_dict})

    #NOTE: feature dictionary
    # collection of locations dictionary
    locations = filtered_df['location'].unique()
    location_counts = filtered_df['location'].value_counts()
    location_counts_dict = {location: count for location, count in (location_counts/num_rows).items()}

    combined_features.update({nameid: location_counts_dict})

    data = {'u': [], 'v': [], 'timestamp': [],'state_label':[],'feature':[]}
    graph_df = pd.DataFrame(data)
    for index, row in filtered_df.iterrows():

        if row['object'] != 'no_object' :
            # new_row = {'u': 0, 'v': nodes_dict[str(row['object'])], 'timestamp':row['frame_start'],'state_label':0,'feature':location_counts_dict[row['location']]}
            
            frame_start = row['frame_start']
            frame_end = row['frame_end']

            for timestamp in range(frame_start, frame_end+1):
                new_row = {'u': 0, 'v': nodes_dict[str(row['object'])], 'timestamp':timestamp,'state_label':0,'feature':location_counts_dict[row['location']]}
                filtered_tasks = df_task[(df_task['frame_start'] <= timestamp) & (df_task['frame_end'] >= timestamp)]['state'].values
                if len(filtered_tasks) > 0:
                    state_label = filtered_tasks[0]
                    new_row['state_label'] = state_label
                else:
                    new_row['state_label'] = state_label
                    print(f"No matching task for timestamp {timestamp},use the former one.")
                graph_df.loc[len(graph_df)] = new_row

        
        pass

    graph_df.drop_duplicates(inplace=True)
    directory = 'data4train'
 
    
    
    graph_df.to_csv(f'{directory}/graph_data_{nameid}.csv', index=False)

    print(f"save the graph data of video {file_id} to {directory}/graph_data_{nameid}.csv")

    
    
# Convert dictionaries to JSON strings
combined_nodes_json = json.dumps(combined_nodes)
combined_features_json = json.dumps(combined_features)
# Save JSON strings to text files
with open('dict/combined_nodes.json', 'w') as f:
    f.write(combined_nodes_json)

with open('dict/combined_features.json', 'w') as f:
    f.write(combined_features_json)
print("save the combined nodes and features dictionary to dict/combined_nodes.txt and dict/combined_features.txt")

end_time = time.time()
time_cost = end_time - start_time
print(f"Time cost: {time_cost} seconds")
print('done')


There are 261 rows in the video vp1/run1b_2018-05-29-14-02-47.kinect_color.
save the graph data of video vp1run1b_2018-05-29-14-02-47.kinect_color to data4train/graph_data_vp1run1.csv
There are 259 rows in the video vp1/run2_2018-05-29-14-33-44.kinect_color.
save the graph data of video vp1run2_2018-05-29-14-33-44.kinect_color to data4train/graph_data_vp1run2.csv
There are 231 rows in the video vp2/run1_2018-05-03-14-08-31.kinect_color.
save the graph data of video vp2run1_2018-05-03-14-08-31.kinect_color to data4train/graph_data_vp2run1.csv
There are 337 rows in the video vp2/run2_2018-05-24-17-22-26.kinect_color.
save the graph data of video vp2run2_2018-05-24-17-22-26.kinect_color to data4train/graph_data_vp2run2.csv
There are 305 rows in the video vp3/run1b_2018-05-08-08-46-01.kinect_color.
save the graph data of video vp3run1b_2018-05-08-08-46-01.kinect_color to data4train/graph_data_vp3run1.csv
There are 222 rows in the video vp3/run2_2018-05-29-16-03-37.kinect_color.
No matching

here is the command to search for the dictionary of specific graph

In [8]:
# with open("combined_nodes.json", "r") as f:
#     combined_nodes = combined_nodes.load(f)
# with open("combined_features.json", "r") as f:
#     combined_nodes = combined_features.load(f)

inquired_file_id =  "vp2run2"

print(f"the nodes dictionary of video {inquired_file_id} is:")
print(combined_nodes[inquired_file_id])
print(f"the features dictionary of video {inquired_file_id} is:")
print(combined_features[inquired_file_id])


the nodes dictionary of video vp2run2 is:
{'participant': 0, 'seatbelt': 1, 'gearstick': 2, 'automation_button': 3, 'multimedia_display': 4, 'newspaper': 5, 'writing_pad': 6, 'pen': 7, 'magazine': 8, 'jacket': 9, 'food': 10, 'bottle': 11, 'laptop': 12, 'phone': 13}
the features dictionary of video vp2run2 is:
{'front_area': 0.4629080118694362, 'no_location': 0.20474777448071216, 'lap': 0.08902077151335312, 'steering_wheel': 0.08605341246290801, 'head': 0.05637982195845697, 'codriver_seat': 0.04747774480712166, 'center_console_back': 0.020771513353115726, 'trouser_pocket': 0.017804154302670624, 'right_backseat': 0.01483679525222552}


here we would also generate a graph with all the participants as comparasion

In [11]:
start_time = time.time()
nodes_participants = {}
nodes_objects = {}
features = {}
# for count, file_id in enumerate(file_ids):

num_rows = df.shape[0]
print(f"There are {num_rows} rows here.")

# # set name for this graph data
# file_id = file_id.replace('/', '')
# nameid = file_id[:8].replace('b', '').replace('_', '')

# NOTE: Node dictionary
# collection of objects dictionary
nodes_objects = df['object'].unique()
nodes_obj_dict = {object: index + 100 for index, object in enumerate(nodes_objects)}
print(nodes_obj_dict)


#NOTE: feature dictionary
# collection of locations dictionary
locations = df['location'].unique()
location_counts = df['location'].value_counts()
location_counts_dict = {location: count for location, count in (location_counts/num_rows).items()}
print(location_counts_dict)


data = {'u': [], 'v': [], 'timestamp': [],'state_label':[],'feature':[]}
graph_df = pd.DataFrame(data)
for index, row in df.iterrows():

    if row['object'] != 'no_object' :
        
        frame_start = row['frame_start']
        frame_end = row['frame_end']

        for timestamp in range(frame_start, frame_end+1):
            new_row = {'u': row['participant_id'], 'v': nodes_obj_dict[str(row['object'])], 'timestamp':timestamp,'state_label':0,'feature':location_counts_dict[row['location']]}
            filtered_tasks = df_task[(df_task['frame_start'] <= timestamp) & (df_task['frame_end'] >= timestamp)]['state'].values
            if len(filtered_tasks) > 0:
                state_label = filtered_tasks[0]
                new_row['state_label'] = state_label
            else:
                new_row['state_label'] = state_label
                print(f"No matching task for timestamp {timestamp},use the former one.")
            graph_df.loc[len(graph_df)] = new_row

    
    pass
directory = 'data4train'
 
graph_df = graph_df.sort_values('timestamp')
    
graph_df.to_csv(f'{directory}/graph_data_combined.csv', index=False)

display(graph_df)

print(f"save the graph data of all participants to {directory}/graph_data_combined.csv")

    
    
# Convert dictionaries to JSON strings
with open('dict/nodes.json', 'w') as f:
    f.write(json.dumps(nodes_obj_dict))

with open('dict/features.json', 'w') as f:
    f.write(json.dumps(location_counts_dict))
print("save the combined nodes and features dictionary to dict/nodes.txt and dict/features.txt")

end_time = time.time()
time_cost = end_time - start_time
print(f"Time cost: {time_cost} seconds")
print('done')

There are 8717 rows here.
{'seatbelt': 100, 'multimedia_display': 101, 'automation_button': 102, 'laptop': 103, 'phone': 104, 'jacket': 105, 'bottle': 106, 'food': 107, 'glasses_case': 108, 'glasses': 109, 'newspaper': 110, 'writing_pad': 111, 'magazine': 112, 'gearstick': 113, 'pen': 114, 'backpack': 115}
{'front_area': 0.37765286222324196, 'no_location': 0.2258804634622003, 'codriver_seat': 0.09292187679247448, 'steering_wheel': 0.07491109326603189, 'lap': 0.06103017093036595, 'head': 0.055408970976253295, 'center_console_back': 0.033038889526213144, 'codriver_footwell': 0.020075714121830904, 'center_console_front': 0.018584375358494894, 'right_backseat': 0.017207754961569347, 'driver_door': 0.00986577951129976, 'left_backseat': 0.005506481587702191, 'codriver_door': 0.005047608122060342, 'trouser_pocket': 0.0028679591602615577}
No matching task for timestamp 2981,use the former one.
No matching task for timestamp 2982,use the former one.
No matching task for timestamp 2983,use the f

Unnamed: 0,u,v,timestamp,state_label,feature
198891,12,100,106,1,0.22588
198892,12,100,107,1,0.22588
198893,12,100,108,1,0.22588
198894,12,100,109,1,0.22588
198895,12,100,110,1,0.22588
...,...,...,...,...,...
68360,4,100,33081,1,0.22588
68361,4,100,33082,1,0.22588
68362,4,100,33083,1,0.22588
68363,4,100,33084,1,0.22588


save the graph data of all participants to data4train/graph_data_combined.csv
save the combined nodes and features dictionary to dict/nodes.txt and dict/features.txt
Time cost: 1067.8795053958893 seconds
done
