This repository contains code and resources for implementing and evaluating diversity sampling techniques, including Frame Variation Index (FVI) and entropy-based sampling, to optimize dataset selection for AI model training.
- Reduce data annotation costs by selecting representative and informative samples.
- Experiment with FVI, entropy metrics, and hybrid approaches.
- Compare diversity sampling methods with random sampling.
Ultimately, we want to develop a model (that integrates into SMI-SAMNet) that performs well in surgical scene segmentation with minimal annotated data. We are using diversity sampling techniques to overcome the challenge that manual annotation of surgical videos is expensive and time-consuming; we want to select the most informative frames for annotaiton, which creates a smaller but highly representative ground truth.
The input is a dataset of raw video files or image frames from surgical procedures (e.g. dAVF, MVD, EndoVis18), and we first:
- Extract the frames: Convert videos into individual frames at a standardized frame rate (e.g. 10fps), and save frames as a sequence of images.
- Preprocess frames: Resize frames (e.g. 224 x 224) and normalize pixels if needed
Once we have a directory of preprocessed frames ready for sampling, we can begin the diversity sampling process. The objective of diversity sampling is to select a subset of frames from the dataset that represents the diversity and variability of the entire video sequence, reducing the number of frames requiring annotation whilst maximizing coverage of unique surgical scenarios.
There are three techniques that we can explore/deploy and use a combination of. The first is Frame variation index where we compute the difference between consecutive frames to identify those with the most significant visual changes, using high FVI frames as annotation candidates. The second is entropy metrics where we can use a pretrained model (e.g. SAM) to make predictions on all frames, and calculate the entropy of prediction to identify frames where the model is most uncertain (high entropy frames as annotation candidates). Lastly, we use clustering where we apply dimensionality reduction (e.g. UMAP) and clustering (e.g. k-means) to group frames by visual similarity, and sample a representative frame from each cluster.
Hence we obtain a subset of frames selected for annotation, optimized for diversity and informativeness.
Now once we have the sampled, annotated frames, from the ground truth; we can train our model (in this case SOLOv2) on the annotated frames, loading the pretrained weights and fine-tuning the annotated frames. Then we evaluate the effectiveness of this model, comparing the performance on the sampled ground truth versus the random ground truth.
- Frame Variation Index (FVI) computation.
- Entropy-based sampling.
- Experimental pipelines for testing sampling methods.
- Integration-ready scripts for deep learning models.
data/: Example datasets and instructions on data preparation.notebooks/: Jupyter notebooks for exploratory analysis and experiments.scripts/: Scripts for sampling, utility functions, and processing pipelines.models/: Pretrained weights and model training scripts.tests/: Unit tests for the implemented algorithms.docs/: Detailed documentation of methods and usage.
- Clone the repository:
git clone https://github.com/yourusername/diversity-sampling-project.git cd diversity-sampling-project
pip install -r requirements.txt if permission error, use: pip install --user -r requirements.txt