Pixel-wise Recognition for Holistic Surgical Scene Understanding

Nicolás Ayobi¹, Santiago Rodríguez^1*, Alejandra Pérez^1*, Isabela Hernández^1*, Nicolás Aparicio¹, Eugénie Dessevres¹, Sebastián Peña², Jessica Santander², Juan Ignacio Caicedo², Nicolás Fernández^3,4, Pablo Arbeláez¹

^*Equal contribution.
¹ Center for Research and Formation in Artificial Intelligence .(CinfonIA), Universidad de los Andes, Bogotá 111711, Colombia.
² Fundación Santafé de Bogotá, Bogotá, Colombia
³ Seattle Children’s Hospital, Seattle, USA
⁴ University of Washington, Seattle, USA

Preprint available at arXiv
Visit the project in our website.

We present the Holistic and Multi-Granular Surgical Scene Understanding of Prostatectomies (GraSP) dataset, a curated benchmark that models surgical scene understanding as a hierarchy of complementary tasks with varying levels of granularity. Our approach enables a multi-level comprehension of surgical activities, encompassing long-term tasks such as surgical phases and steps recognition and short-term tasks including surgical instrument segmentation and atomic visual actions detection. To exploit our proposed benchmark, we introduce the Transformers for Actions, Phases, Steps, and Instrument Segmentation (TAPIS) model, a general architecture that combines a global video feature extractor with localized region proposals from an instrument segmentation model to tackle the multi-granularity of our benchmark. Through extensive experimentation, we demonstrate the impact of including segmentation annotations in short-term recognition tasks, highlight the varying granularity requirements of each task, and establish TAPIS's superiority over previously proposed baselines and conventional CNN-based models. Additionally, we validate the robustness of our method across multiple public benchmarks, confirming the reliability and applicability of our dataset. This work represents a significant step forward in Endoscopic Vision, offering a novel and comprehensive framework for future research towards a holistic understanding of surgical procedures.

This repository provides instructions to download the GraSP dataset and run the PyTorch implementation of TAPIS, both presented in the paper Pixel-Wise Recognition for Holistic Surgical Scene Understanding.

Previous works

This work is an extended and consolidated version of three previous works:

Towards Holistic Surgical Scene Understanding, MICCAI 2022, Oral. Code here.
Winner solution of the 2022 SAR-RARP50 challenge
MATIS: Masked-Attention Transformers for Surgical Instrument Segmentation, ISBI 2023, Oral. Code here.

Please check these works.

GraSP

In this link, you will find the sampled frames of the original Radical Prostatectomy videos and the annotations that compose the Holistic and Multi-Granular Surgical Scene Understanding of Prostatectomies (GraSP) dataset. The data in the link has the following organization:

GraSP:
|
|--GraSP_1fps
|         |---frames
|         |    |---CASE001
|         |    |    |--00000.jpg
|         |    |    |--00001.jpg
|         |    |    |--00002.jpg
|         |    |    ...
|         |    |---CASE002
|         |    |    ...
|         |    ...
|         |
|         |---original_frames
|         |    |---CASE001
|         |    |    |--00000.jpg
|         |    |    |--00001.jpg
|         |    |    |--00002.jpg
|         |    |    ...
|         |    |---CASE002
|         |    |    ...
|         |    ...
|         |
|         |---annotations
|              |--segmentations
|              |    |---CASE001
|              |    |    |--00000.png
|              |    |    |--00001.png
|              |    |    |--00002.png
|              |    |    ...
|              |    |---CASE002
|              |    |    ...
|              |    ...
|              |    
|              |--grasp_long-term_fold1.json
|              |--grasp_long-term_fold2.json
|              |--grasp_long-term_train.json
|              |--grasp_long-term_test.json
|              |--grasp_short-term_fold1.json
|              |--grasp_short-term_fold2.json
|              |--grasp_short-term_train.json
|              |--grasp_short-term_test.json
|
|--GraSP_30fps
|         |---frames
|         |    |---CASE001
|         |    |    |--000000000.jpg
|         |    |    |--000000001.jpg
|         |    |    |--000000002.jpg
|         |    |    ...
|         |    |---CASE002
|         |    |    ...
|         |    ...
|         |
|         |---annotations
|              |--grasp_short-term_fold1.json
|              |--grasp_short-term_fold2.json
|              |--grasp_short-term_train.json
|              |--grasp_short-term_test.json
|
|--1fps_to_30fps_association.json
|--README.txt

In the GraSP_1fps directory, you will find our video frames sampled at 1fps and all the annotations for long-term tasks (surgical phases and steps recognition) and short-term tasks (instrument segmentation and atomic action detection). Additionally, in the GraSP_30fps directory, you will find all the frames of our videos sampled at 30fps and their corresponding annotations for short-term tasks. We provide a README file in the link, briefly explaining our annotations. We recommend downloading the data recursively with the following command:

$ wget -r http://157.253.243.19/GraSP

If you only need the dataset with the video frames sampled at 1fps or 30fps, you can download the directory of the version that you need:

# 1fps version
$ wget -r http://157.253.243.19/GraSP/GraSP_1fps

# 30fps version
$ wget -r http://157.253.243.19/GraSP/GraSP_30fps

TAPIS

Go to the TAPIS directory to find our source codes and instructions to run our TAPIS model.

Contact

If you have any doubts, questions, issues, corrections, or comments, please email n.ayobi@uniandes.edu.co.

Citing GraSP

If you find GraSP or TAPIS useful for your research, please include the following BibTex citations in your papers.

@article{ayobi2024pixelwise,
      title={Pixel-Wise Recognition for Holistic Surgical Scene Understanding}, 
      author={Nicolás Ayobi and Santiago Rodríguez and Alejandra Pérez and Isabela Hernández and Nicolás Aparicio and Eugénie Dessevres and Sebastián Peña and Jessica Santander and Juan Ignacio Caicedo and Nicolás Fernández and Pablo Arbeláez},
      year={2024},
      url={https://arxiv.org/abs/2401.11174},
      eprint={2401.11174},
      journal={arXiv},
      primaryClass={cs.CV}
}

@InProceedings{valderrama2020tapir,
      author={Natalia Valderrama and Paola Ruiz and Isabela Hern{\'a}ndez and Nicol{\'a}s Ayobi and Mathilde Verlyck and Jessica Santander and Juan Caicedo and Nicol{\'a}s Fern{\'a}ndez and Pablo Arbel{\'a}ez},
      title={Towards Holistic Surgical Scene Understanding},
      booktitle={Medical Image Computing and Computer Assisted Intervention -- MICCAI 2022},
      year={2022},
      publisher={Springer Nature Switzerland},
      address={Cham},
      pages={442--452},
      isbn={978-3-031-16449-1}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Images		Images
TAPIS		TAPIS
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images

Images

TAPIS

TAPIS

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Pixel-wise Recognition for Holistic Surgical Scene Understanding

Previous works

GraSP

TAPIS

Contact

Citing GraSP

About

Releases

Packages

Languages

License

BCV-Uniandes/GraSP

Folders and files

Latest commit

History

Repository files navigation

Pixel-wise Recognition for Holistic Surgical Scene Understanding

Previous works

GraSP

Contact

Citing GraSP

About

Resources

License

Stars

Watchers

Forks

Languages