Towards Holistic Surgical Scene Understanding

Natalia Valderrama¹, Paola Ruiz Puentes^1*, Isabela Hernández^1*, Nicolás Ayobi¹, Mathilde Verlyck¹, Jessica Santander², Juan Caicedo², Nicolás Fernández^3,4, Pablo Arbeláez¹

^*Equal contribution.
¹ Center for Research and Formation in Artificial Intelligence .(CINFONIA), Universidad de los Andes, Bogotá 111711, Colombia.
² Fundación Santafé de Bogotá, Bogotá, Colombia
³ Seattle Children’s Hospital, Seattle, USA
⁴ University of Washington, Seattle, USA

Oral presentation and best paper nominee at Medical Image Computing and Computer Assisted Intervention (MICCAI) 2022. Proceedings available at Springer Link
Preprint available at arXiv.

Visit the project in our website and our youtube channel.

We present a new experimental framework towards holistic surgical scene understanding. First, we introduce the Phase, Step, Instrument, and Atomic Visual Action recognition (PSI-AVA) Dataset. PSI-AVA includes annotations for both long-term (Phase and Step recognition) and short-term reasoning (Instrument detection and novel Atomic Action recognition) in robot-assisted radical prostatectomy videos. Second, we present Transformers for Action, Phase, Instrument, and steps Recognition (TAPIR) as a strong baseline for surgical scene understanding. TAPIR leverages our dataset’s multi-level annotations as it benefits from the learned representation on the instrument detection task to improve its classification capacity. Our experimental results in both PSI-AVA and other publicly available databases demonstrate the adequacy of our framework to spur future research on holistic surgical scene understanding.

This repository provides instructions to download the PSI-AVA dataset and run the PyTorch implementation of TAPIR, both presented in the paper Towards Holistic Surgical Scene Understanding, oral presentation at MICCAI,2022.

GraSP dataset and TAPIS

Check out GraSP, an extended version of our PSI-AVA dataset that provides surgical instrument segmentation annotations and more data. Also check TAPIS, the improved version of our method. GraSP and TAPIS have been published in this arXiv.

PSI-AVA

In this link, you will find the sampled frames of the original Radical Prostatectomy surgical videos and the annotations that compose the Phases, Steps, Instruments, and Atomic Visual Actions recognition dataset. You will also find the preprocessed data we used for training TAPIR, the instrument detector predictions, and the trained model weights on each task. The data in the link has the following organization.

PSI-AVA:
|
|_TAPIR_trained_models
|      |_ACTIONS
|      |    |_Fold1
|      |    |   |_checkpoint_best_actions.pyth
|      |    |_Fold2
|      |        |_checkpoint_best_actions.pyth
|      |_INSTRUMENTS
|      |    ...
|      |_PHASES
|      |    ...
|      |_STEPS
|           ...
|
|_def_DETR_box_ftrs
|     |_fold1
|     |   |_train
|     |   |   |_box_features.pth
|     |   |_val
|     |       |_box_features.pth
|     |_fold2
|         ...
|
|_images_8_frames_per_second
|       |_keyframes
|       |     |_CASE001
|       |     |    |_000000.jpg
|       |     |    |_000006.jpg
|       |     |    |_0000011.jpg
|       |     |    ...
|       |     |_CASE002
|       |     |    ...
|       |     ...
|       |_RobotSegSantaFe_v3_dense.json 
|       |_RobotSegSantaFe_v3_dense_fold1.json
|       |_RobotSegSantaFe_v3_dense_fold2.json 
|
|_keyframes
        |_CASE001
        |     |_00000.jpg
        |     |_00001.jpg
        |     |_00002.jpg
        |     ...
        |_CASE002
        |     ...
          ...

We recommend downloading the data recursively with the following command:

$ wget -r http://157.253.243.19/PSI-AVA

You will find PSIAVA's data partition and annotations in the outputs/data_annotations. directory.

TAPIR

Installation

Please follow these steps to run TAPIR:

$ conda create --name tapir python=3.8 -y
$ conda activate tapir
$ conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia

$ conda install av -c conda-forge
$ pip install -U iopath
$ pip install -U opencv-python
$ pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
$ pip install 'git+https://github.com/facebookresearch/fvcore'
$ pip install 'git+https://github.com/facebookresearch/fairscale'
$ python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

$ git clone https://github.com/BCV-Uniandes/TAPIR
$ cd TAPIR
$ pip install -r requirements.txt

Our code builds upon Multi Scale Vision Transformers[1]. For more information, please refer to this work.

Preparing data

Download the "keyframes" folder in PSI-AVA in the repository's folder ./outputs/PSIAVA/

PSI-AVA/keyframes/* ===> ./outputs/PSIAVA/keyframes/

Download the instrument features computed by deformable DETR from the folder "Def_DETR_Box_ftrs" in PSI-AVA as follows:

PSI-AVA/def_DETR_box_ftrs/fold1/* ===> ./outputs/data_annotations/psi-ava/fold1/*

PSI-AVA/def_DETR_box_ftrs/fold2/* ===> ./outputs/data_annotations/psi-ava/fold2/*

In the end, the outputs directory must have the following structure.

outputs
|_data_annotations
|      |_psi-ava
|      |     |_fold1
|      |     |    |_annotationas
|      |     |    |    ...
|      |     |    |_coco_anns
|      |     |    |    ...
|      |     |    |_frame_lists
|      |     |    |    ...
|      |     |    |_train
|      |     |    |    |_box_features.pth
|      |     |    |_val
|      |     |         |_box_features.pth
|      |     |_fold2
|      |          ...
|      |_psi-ava_extended
|            ...
|_PSIAVA
       |_keyframes 
               |_CASE001
               |      |_00000.jpg
               |      |_00001.jpg
               |      ...
               |_CASE002
                      ...
               ...

Running the code

First, add this repository for $PYTHONPATH

$ export PYTHONPATH=/path/to/TAPIR/slowfast:$PYTHONPATH

For training TAPIR in:

# the Instrument detection or Atomic Action recognition task
$ bash run_examples/mvit_short_term.sh

# the Phases or Steps recognition task
$ bash run_examples/mvit_long_term.sh

Evaluating models

Task	mAP	config	run file	model
Phases	56.55 $\pm$ 2.31	PHASES	long_term	phases
Steps	45.56 $\pm$ 0.004	STEPS	long_term	steps
Instruments	80.85 $\pm$ 1.54	TOOLS	short_term	tools
Actions	28.68 $\pm$ 1.33	ACTIONS	short_term	actions

Download our pretrained models in PSI-AVA.

Add this path in the run_examples/mvit_*.sh file corresponding to the task you want to evaluate. Enable test by setting in the config TEST.ENABLE True

Contact

If you have any doubts, questions, issues, corrections, or comments, please email n.ayobi@uniandes.edu.co.

Citing TAPIR

If you use PSI-AVA or TAPIR (or their extended versions, GraSP and TAPIS) in your research, please include the following BibTex citations in your papers.

@article{ayobi2024pixelwise,
      title={Pixel-Wise Recognition for Holistic Surgical Scene Understanding}, 
      author={Nicolás Ayobi and Santiago Rodríguez and Alejandra Pérez and Isabela Hernández and Nicolás Aparicio and Eugénie Dessevres and Sebastián Peña and Jessica Santander and Juan Ignacio Caicedo and Nicolás Fernández and Pablo Arbeláez},
      year={2024},
      url={https://arxiv.org/abs/2401.11174},
      eprint={2401.11174},
      journal={arXiv},
      primaryClass={cs.CV}
}

@InProceedings{valderrama2020tapir,
      author={Natalia Valderrama and Paola Ruiz and Isabela Hern{\'a}ndez and Nicol{\'a}s Ayobi and Mathilde Verlyck and Jessica Santander and Juan Caicedo and Nicol{\'a}s Fern{\'a}ndez and Pablo Arbel{\'a}ez},
      title={Towards Holistic Surgical Scene Understanding},
      booktitle={Medical Image Computing and Computer Assisted Intervention -- MICCAI 2022},
      year={2022},
      publisher={Springer Nature Switzerland},
      address={Cham},
      pages={442--452},
      isbn={978-3-031-16449-1}
}

References

[1] H. Fan, Y. Li, B. Xiong, W.-Y. Lo, C. Feichtenhofer, ‘PySlowFast’, 2020. https://github.com/facebookresearch/slowfast.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
configs		configs
images		images
outputs/data_annotations		outputs/data_annotations
run_examples		run_examples
slowfast		slowfast
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

images

images

outputs/data_annotations

outputs/data_annotations

run_examples

run_examples

slowfast

slowfast

tools

tools

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Towards Holistic Surgical Scene Understanding

GraSP dataset and TAPIS

PSI-AVA

TAPIR

Installation

Preparing data

Running the code

Evaluating models

Contact

Citing TAPIR

References

About

Releases

Packages

Contributors 2

Languages

License

BCV-Uniandes/TAPIR

Folders and files

Latest commit

History

Repository files navigation

Towards Holistic Surgical Scene Understanding

GraSP dataset and TAPIS

PSI-AVA

TAPIR

Installation

Preparing data

Running the code

Evaluating models

Contact

Citing TAPIR

References

About

Resources

License

Stars

Watchers

Forks

Languages