论文链接 / 2s-DAS: Two-Stream Diffusion with Multi-Modal Fusion for Temporal Action Segmentation
This project is an open-source implementation for 2s-DAS: Two-Stream Diffusion with Multi-Modal Fusion for Temporal Action Segmentation, including full training code, inference scripts, and dataset. It aims to provide an efficient and reproducible research framework for temporal action segmentation.
- Python == 3.8
- PyTorch == 1.10
- Cuda == 11.3
The dataset is available at the links above.
Raw video files are needed to extract features. Please download the datasets with RGB videos from the official websites (Breakfast / GTEA /50Salads) and save them under the folder ./data/(name_dataset).
Extract features of 50salads, GTEA and Breakfast provided by Br-Prompt and I3D.
you can retrain the model by yourself with following command:
Generate config files by python default_configs.py
run by python main_two_stream.py --config configs/some_config.json --device gpu_id
Trained models and logs will be saved in the result folder
test by python eval.py
test one model by python predict.py --config configs/some_config.json --device gpu_id
Our model adapted form DiffAct.