Diversity Sampling Project

This repository contains code and resources for implementing and evaluating diversity sampling techniques, including Frame Variation Index (FVI) and entropy-based sampling, to optimize dataset selection for AI model training.

Project Goals

Reduce data annotation costs by selecting representative and informative samples.
Experiment with FVI, entropy metrics, and hybrid approaches.
Compare diversity sampling methods with random sampling.

Ultimately, we want to develop a model (that integrates into SMI-SAMNet) that performs well in surgical scene segmentation with minimal annotated data. We are using diversity sampling techniques to overcome the challenge that manual annotation of surgical videos is expensive and time-consuming; we want to select the most informative frames for annotaiton, which creates a smaller but highly representative ground truth.

Workflow

The input is a dataset of raw video files or image frames from surgical procedures (e.g. dAVF, MVD, EndoVis18), and we first:

Extract the frames: Convert videos into individual frames at a standardized frame rate (e.g. 10fps), and save frames as a sequence of images.
Preprocess frames: Resize frames (e.g. 224 x 224) and normalize pixels if needed

Once we have a directory of preprocessed frames ready for sampling, we can begin the diversity sampling process. The objective of diversity sampling is to select a subset of frames from the dataset that represents the diversity and variability of the entire video sequence, reducing the number of frames requiring annotation whilst maximizing coverage of unique surgical scenarios.

There are three techniques that we can explore/deploy and use a combination of. The first is Frame variation index where we compute the difference between consecutive frames to identify those with the most significant visual changes, using high FVI frames as annotation candidates. The second is entropy metrics where we can use a pretrained model (e.g. SAM) to make predictions on all frames, and calculate the entropy of prediction to identify frames where the model is most uncertain (high entropy frames as annotation candidates). Lastly, we use clustering where we apply dimensionality reduction (e.g. UMAP) and clustering (e.g. k-means) to group frames by visual similarity, and sample a representative frame from each cluster.

Hence we obtain a subset of frames selected for annotation, optimized for diversity and informativeness.

Now once we have the sampled, annotated frames, from the ground truth; we can train our model (in this case SOLOv2) on the annotated frames, loading the pretrained weights and fine-tuning the annotated frames. Then we evaluate the effectiveness of this model, comparing the performance on the sampled ground truth versus the random ground truth.

Features

Frame Variation Index (FVI) computation.
Entropy-based sampling.
Experimental pipelines for testing sampling methods.
Integration-ready scripts for deep learning models.

Repository Structure

data/: Example datasets and instructions on data preparation.
notebooks/: Jupyter notebooks for exploratory analysis and experiments.
scripts/: Scripts for sampling, utility functions, and processing pipelines.
models/: Pretrained weights and model training scripts.
tests/: Unit tests for the implemented algorithms.
docs/: Detailed documentation of methods and usage.

Getting Started

Clone the repository:

git clone https://github.com/yourusername/diversity-sampling-project.git
cd diversity-sampling-project

pip install -r requirements.txt if permission error, use: pip install --user -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
notebooks		notebooks
scripts		scripts
test		test
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diversity Sampling Project

Project Goals

Workflow

Features

Repository Structure

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Diversity Sampling Project

Project Goals

Workflow

Features

Repository Structure

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages