Instruct Me More! Random Prompting for Visual In-Context Learning (InMeMo)

Environment Setup

conda create -n inmemo python=3.8 -y
conda activate inmemo

The PyTorch version needs to be >= 1.8.0, and compatible with the cuda version supported by the GPU.

For NVIDIA GeForce RTX 4090, here is the Installation command:

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirements.txt

Preparation

Dataset

Download the Pascal-5ⁱ Dataset from Volumetric-Aggregation-Transformer, and put it under the InMeMo/ path, rename to pascal-5i.

Pre-trained weights for Large-scale Vision Model

Please follow the Visual Prompting to prepare the model and download the CVF 1000 epochs pre-train checkpoint.

Prompt Retriever

Foreground Sementation Prompt Retriever

Single Object Detection Prompt Retriever

Training

For foreground segmentation:

# Change the fold for training each split.
python train_vp_segmentation.py --mode spimg_spmask --output_dir output_samples --fold 3 --device cuda:0 --base_dir ./pascal-5i --batch-size 32 --lr 40 --epoch 100 --scheduler cosinewarm --optimizer Adam --arr a1 --vp-model pad --p-eps 1

For single object detection:

python train_vp_detection.py --mode spimg_spmask --output_dir output_samples --device cuda:0 --base_dir ./pascal-5i --batch-size 32 --lr 40 --epoch 100 --scheduler cosinewarm --optimizer Adam --arr a1 --vp-model pad --p-eps 1

Inference

For foreground segmentation

With prompt enhancer

# Change the fold for testing each split.
python val_vp_segmentation.py --mode spimg_spmask --batch-size 16 --fold 3 --arr a1 --vp-model pad --output_dir visual_examples --save_model_path MODEL_SAVE_PATH

Without prompt enhancer

python val_vp_segmentation.py --mode no_vp --batch-size 16 --fold 3 --arr a1 --output_dir visual_examples

For single object detection

With prompt enhancer

python val_vp_detection.py --mode spimg_spmask --batch-size 16 --arr a1 --vp-model pad --output_dir visual_examples --save_model_path MODEL_SAVE_PATH

Without prompt enhancer

python val_vp_detection.py --mode no_vp --batch-size 16 --arr a1 --vp-model pad --output_dir visual_examples

Performance

Visual Examples

Citation

If you find this work useful, please consider citing us as:

@inproceedings{zhang2024instruct,
  title={Instruct Me More! Random Prompting for Visual In-Context Learning},
  author={Zhang, Jiahao and Wang, Bowen and Li, Liangzhi and Nakashima, Yuta and Nagahara, Hajime},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={2597--2606},
  year={2024}
}

Acknowledgments

Part of the code is borrowed from Visual Prompting, visual_prompt_retrieval, timm, ILM-VP

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Figure		Figure
evaluate		evaluate
evaluate_detection		evaluate_detection
tools		tools
trainer		trainer
util		util
.gitignore		.gitignore
Detection.md		Detection.md
README.md		README.md
Segmentation.md		Segmentation.md
models_mae.py		models_mae.py
models_vit.py		models_vit.py
requirements.txt		requirements.txt
train_vp_detection.py		train_vp_detection.py
train_vp_segmentation.py		train_vp_segmentation.py
tta.py		tta.py
val_vp_detection.py		val_vp_detection.py
val_vp_segmentation.py		val_vp_segmentation.py
viz_utils.py		viz_utils.py
vqgan.py		vqgan.py

Jackieam/InMeMo

Folders and files

Latest commit

History

Repository files navigation

Instruct Me More! Random Prompting for Visual In-Context Learning (InMeMo)

Environment Setup

Preparation

Dataset

Pre-trained weights for Large-scale Vision Model

Prompt Retriever

Training

For foreground segmentation:

For single object detection:

Inference

For foreground segmentation

With prompt enhancer

Without prompt enhancer

For single object detection

With prompt enhancer

Without prompt enhancer

Performance

Visual Examples

Citation

Acknowledgments

About

Resources

Stars

Watchers

Forks

Languages