# Video 3D Tabletop Task Generation:
In this notebook, we will show how to generate 3D tabletop ImageQA test cases in TaskVerse, including `WhatGridTaskGenerator`, `WhereManyGridTaskGenerator`, `HowManyGridTaskGenerator`, `WhatAttributeGridTaskGenerator`, `WhereAttributeGridTaskGenerator`, and `HowManyAttributeGridTaskGenerator`.

## Requirements

If you want to run this notebook, you need to add `..` to python package search path. You can do this by running the following command in the root directory of the repository:

In [1]:
import sys
sys.path.append("../..")

## Download TaskVerse source data from huggingface
The source metadata of TaskVerse is available at [Huggingface](https://huggingface.co/datasets/weikaih/Dynabench), you can download it by running the following command:

In [2]:
# from huggingface_hub import snapshot_download

# path = "../dynabench-source"
# snapshot_download(repo_id="weikaih/dynabench-source", repo_type="dataset", local_dir=path`WhatGridTask3DGenerator`, `WhereManyGridTask3DGenerator`, `HowManyGridTask3DGenerator`, `WhatAttributeGridTask3DGenerator`, `WhereAttributeGridTask3DGenerator`, and `HowManyAttributeGridTask3DGenerator`WhatGridTask3DGenerator`, `WhereManyGridTask3DGenerator`, `HowManyGridTask3DGenerator`, `WhatAttributeGridTask3DGenerator`, `WhereAttributeGridTask3DGenerator`, and `HowManyAttributeGridTask3DGenerator`.

## Using TaskGenerator to enumerate task plans and store them into local parquet file

* `TaskGenerator` is a class used to enumerate all the possible task plans from the source metadata, and generate VQA test cases based on the task plans. (e.g. `WhatGridTaskGenerator`)
* `TaskStore` is a class designed to handle the task plan generated by the task generator. The task store can output data in a pandas dataframe format.
  
In this notebook, instead of using a single type of generator, we will use multiple types of generators to generate test cases. So we introduce a new class `JointTaskGenerator`:
* `JointTaskGenerator` is a class to assemble multiple `TaskGenerator` into a joint task generator. 

After downloading the source data, we first create a `ObjaverseVideoMetaData` object to load the metadata from Objaverse. Then you can first create a dict of TaskGenerator, and then pass it to `JointTaskGenerator` to create a joint task generator. This generator has the same interface as a single task generator, but it will generate tasks from all the task generators in the dict.

In [3]:
from tma.videoqa.tabletop_3d import *
from tma.base import JointTaskGenerator
from tma.videoqa.metadata import ObjaverseVideoMetaData


blender_path = '/your_path/blender-4.0.1-linux-x64/blender'
assets_path = '/your_path/TaskMeAnything-v1-source/3d_assets'
metadata = ObjaverseVideoMetaData('../../annotations', blender_path=blender_path, assets_path=assets_path)

generators = {
    'what move video': WhatMovementVideoGridTaskGenerator,
    'what rotate video': WhatRotationVideoGridTaskGenerator
}
generator = JointTaskGenerator(metadata, generators)

Once you feed the metadata to the task generator, we can start to enumerate all the possible task plans by using `generator.enumerate_task_plans()` method. We also need to initialize `TaskStore` to store the task plans. When we initialize `TaskStore` with a `output_file`, it will save the task plans to a local parquet file without holding them in memory. After task plans enumeration, use `task_store.close()` method to make sure the parquet writer is appropriately closed. 

Note that `buffer_size` is the maximum number of task plans to hold in memory before writing to the parquet file which save in disk. If your memory is limited, you can set a smaller buffer_size to avoid memory overflow. Typically, `1e6` is a good choice for server which has less than 32GB memory.


In [4]:
from tma.task_store import TaskStore

save_path = '../cache/video_3d_tabletop.parquet'  # path to save task plans to a parquet
task_store = TaskStore(output_file=save_path, schema=generator.schema, buffer_size=1e6)
generator.enumerate_task_plans(task_store)
task_store.close()

Writing to ../cache/video_3d_tabletop.parquet


enumerating [what move video] task: 100%|██████████| 465/465 [00:01<00:00, 248.33it/s]


Generated [78752] what move video tasks


enumerating [what rotate video] task: 100%|██████████| 465/465 [00:01<00:00, 270.21it/s]


Generated [39376] what rotate video tasks


The `JointTaskGenerator` also maintains a `stats` attribute, which is a dictionary that records the number of tasks generated by each task generator.

In [5]:
generator.stats

{'what move video': 78752, 'what rotate video': 39376}

## Sample task plans from local parquet file and generate VQA test cases

We can Load the task plans from the local parquet file to a Pandas dataframe by using `pq.read_table(save_path, filters=f).to_pandas().astype(get_pd_schema(generator.schema)` method. After that, we can use `generator.generate()` method to generate VQA test cases based on the task plan.

In [6]:
import pandas as pd
import pyarrow.parquet as pq
from tma.task_store import get_pd_schema
from tqdm import tqdm
from pprint import pprint # use pprint to print the task
from IPython.display import Video
import os

types = ['what move video', 'what rotate video'] # select the types of task plans you want
filters = [[('task type', '==', t)] for t in types]

pool = []
sample_num = 100
idx = 0
save_path = '../cache/video_3d_tabletop.parquet'
for filter in tqdm(filters):
    df = pq.read_table(save_path, filters=filter).to_pandas().astype(get_pd_schema(generator.schema))
    df.iloc[idx].dropna()
    task = generator.generate(df.iloc[idx].dropna().to_dict(), return_data=True)
    
    
    binary_video_data = task['video']
    task['video'] = "see the video below"
    pprint(task)
    with open("temp_video.mp4", "wb") as f:
        f.write(binary_video_data)

    display(Video("temp_video.mp4", embed=True, width=600))

    if len(df) > sample_num:
        df = df.sample(sample_num)
    pool.append(df)

if os.path.exists("temp_video.mp4"):
    os.remove("temp_video.mp4")

  0%|          | 0/2 [00:00<?, ?it/s]

{'answer': 'vacuum cleaner',
 'options': ['vacuum cleaner', 'pinecone', 'beverage', 'mobility aid'],
 'question': 'What is the object that is moving up in the video?',
 'task_plan': 'task type: what move video\n'
              'grid number: 2\n'
              'target category: vacuum cleaner\n'
              'absolute position: back left\n'
              'attribute type: shape\n'
              'attribute value: cylinder\n'
              'moving direction: up\n'
              'are other objects moving: Yes',
 'video': 'see the video below',
 'video_metadata': {'VIDEO_H': 224,
                    'VIDEO_W': 224,
                    'blender_config': {'fill_light_horizontal_angle': -0.8791728494759601,
                                       'fill_light_vertical_angle': -0.23582152562881042,
                                       'hdri_path': '/mnt/sdb1/dynabench-source/3d_assets/hdri/evening_road_01_puresky_4k.exr',
                                       'key_light_horizontal_angle': 0.66

 50%|█████     | 1/2 [01:00<01:00, 60.58s/it]

{'answer': 'vacuum cleaner',
 'options': ['vacuum cleaner', 'pinecone', 'beverage', 'mobility aid'],
 'question': 'What is the object that is rotating clockwise in the video?',
 'task_plan': 'task type: what rotate video\n'
              'grid number: 2\n'
              'target category: vacuum cleaner\n'
              'absolute position: back left\n'
              'attribute type: shape\n'
              'attribute value: cylinder\n'
              'rotating direction: clockwise\n'
              'are other objects rotating: Yes',
 'video': 'see the video below',
 'video_metadata': {'VIDEO_H': 224,
                    'VIDEO_W': 224,
                    'blender_config': {'fill_light_horizontal_angle': -1.028051390944157,
                                       'fill_light_vertical_angle': -0.4115823077845404,
                                       'hdri_path': '/mnt/sdb1/dynabench-source/3d_assets/hdri/st_peters_square_night_4k.exr',
                                       'key_light_hori

100%|██████████| 2/2 [02:07<00:00, 64.00s/it]


Above is 3 VQA test cases generated from different types of generators. 

The dataframe below is the example task plans we sampled from the parquet file:

The values start with "Q" in the `target category`, `reference category` columns of the returned dataframe is the QID of the category, which corresponds to a Wikidata entry (eg,  `Q11422` corresponds to https://www.wikidata.org/wiki/Q11422).

Note that different question types have different schema, For example, for "what" type question, we don't need count, so the `count` column is `<NA>`.

In [7]:
df = pd.concat(pool)
df

Unnamed: 0,task type,grid number,target category,absolute position,attribute type,attribute value,moving direction,are other objects moving,rotating direction,are other objects rotating
32568,what move video,2,Q18696213,front right,shape,sphere,up,Yes,,
50069,what move video,2,Q3067542,front left,material,leather,right,No,,
1301,what move video,2,Q106106,front left,material,wood,right,No,,
30536,what move video,2,Q1798603,back right,material,wood,up,Yes,,
30180,what move video,2,Q1780509,back left,color,green,right,Yes,,
...,...,...,...,...,...,...,...,...,...,...
26082,what rotate video,2,Q36539,back left,material,metal,,,counterclockwise,Yes
29796,what rotate video,2,Q4498085,back right,shape,torus,,,clockwise,Yes
37913,what rotate video,2,Q93189,front left,shape,ellipsoid,,,clockwise,No
4928,what rotate video,2,Q123691907,back left,color,gray,,,clockwise,Yes
