MTRSAP - Multimodal Transformer for Real-time Surgical Activity Recognition and Prediction

This repository provides the code for the ICRA 2024 paper "Multimodal Transformer for Real-time Surgical Activity Recognition and Prediction".

Introduction

This repo is the official code for the ICRA 2024 paper "Multimodal Transformer for Real-time Surgical Activity Recognition and Prediction"

Getting Started

Please follow the below instructions to setup the code in your environment.

Prerequisites

Anaconda: Make sure to have Anaconda installed on your system. You can download it from Anaconda's official website.
Preprocessed Dataset: Obtain the preprocessed dataset required for your project. Refer to the Usage section for detailed instructions on acquiring and incorporating the dataset.
Operating System: While the project is designed to be compatible with various operating systems, Ubuntu is the preferred environment.

Installation

Create the conda environment using the environment file. conda env create -f environment.yml
Verify PyTorch was installed correclty.
Place the preprocessed data in the ProcessedData.
Verify the configuration is as required in config.py. Learning parameters are defined in config.py.

Usage

To reproduce gesture recognition results use the following command with the original configuration.

Note that the dataset required is not publicly available. Hence, please reach out to the original authors to obtain the data used for this work.

The model parameters and dataloader scripts needs to be changed to suit custom datasets. The current dataloader and config is designed for the above dataset.

python train_recognition.py --model transformer --dataloader v1 --modality 16

Results will be in the results folder specifically in following files.

train_results.json : Detailed results for each subject in LOUO setup.
Train_{task}_{model}_{date-time}.csv : Final results of the run.

Contributing

Please feel free to improve the model, add features and use this for research purposes.

If you have any questions, please feel free to reach out using the following email addresses (cjh9fw@virginia.edu, ydq9ag@virginia.edu)

License

The code for this project is made available to the public via the MIT License.

Acknowledgements

Special Thanks to Colin Lea for providing features for the dataset and inspiring further development in action segmentation.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
models		models
.gitignore		.gitignore
LICENSE		LICENSE
__init__.py		__init__.py
config.py		config.py
environment.yml		environment.yml
inference_recognition.py		inference_recognition.py
metrics.py		metrics.py
readme.md		readme.md
run_experiment.sh		run_experiment.sh
slurm.sh		slurm.sh
train_prediction.py		train_prediction.py
train_recognition.py		train_recognition.py
utils.py		utils.py
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MTRSAP - Multimodal Transformer for Real-time Surgical Activity Recognition and Prediction

Table of Contents

Introduction

Getting Started

Prerequisites

Installation

Usage

Contributing

License

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

License

UVA-DSA/MTRSAP

Folders and files

Latest commit

History

Repository files navigation

MTRSAP - Multimodal Transformer for Real-time Surgical Activity Recognition and Prediction

Table of Contents

Introduction

Getting Started

Prerequisites

Installation

Usage

Contributing

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages