This repository provides the official PyTorch implementation of the paper Multi-view Masked Contrastive Representation Learning for Endoscopic Video Analysis

We can install packages using provided environment.yaml.
cd MMCRL
conda env create -f environment.yaml
conda activate MMCRLWe use the datasets provided by Endo-FM and are grateful for their valuable work.
pretrain weight:
downstream weight:
cd MMCRL
wget -P checkpoints/ https://github.com/kahnchana/svt/releases/download/v1.0/kinetics400_vitb_ssl.pth
bash scripts/pretrain.sh# PolypDiag (Classification)
cd MMCRL
bash scripts/eval_finetune_polypdiag.sh
# CVC (Segmentation)
cd MMCRL/TransUNet
python train.py
# KUMC (Detection)
cd MMCRL/STMT
bash script/train_stft.shOur code is based on Endo-FM, DINO, TimeSformer, SVT, TransUNet, and STFT. Thanks them for releasing their codes.
@article{hu2024one,
title={Multi-view Masked Contrastive Representation Learning for Endoscopic Video Analysis},
author={Hu, Kai and Xiao, Ye and Zhang, Yuan and Gao, Xieping},
journal={Advances in Neural Information Processing Systems},
year={2024}
}