Skip to content

CuriseJia/FreeStyleRet

Repository files navigation

If you like our project, please give us a star ⭐ on GitHub for latest update.

arXiv License Data License

📰 News

  • [2024.7.24] Our Diverse-Style Retrieval Dataset is released at this.
  • [2024.7.24] Add a new retreival task evaluator tool with R1 and R5.
  • [2024.7.1] Our FreestyleRet has been accepted by ECCV 2024!
  • [2023.11.29] Code is available now! Welcome to watch 👀 this repository for the latest updates.

😮 Highlights

💡 High performance, plug-and-play, and lightweight

FreestyleRet is the first multi-style retrieval model and focus on the precision search field. You can transfer our gram-based style block to any other pre-trained model with only 28M trainable parameter.

⚡️ A multi-style, fully aligned and gained dataset

We propose the precision search task and its fisrt corresponding dataset. Following figure shows our proposed Diverse-Style Retrieval Dataset(DSR), which includes five styles: origin, sketch, art, mosaic, and text.

🚀 Main Results

FreestyleRet achieves state-of-the-art (SOTA) performance on the DSR dataset and the ImageNet-X dataset, * donates the results of prompt tuning.

🤗 Visualization

Each sample has three images to compare the retrieval performance between our FreestyleRet and the BLIP baseline on the DSR dataset. The left images are the queries randomly selected from different styles. The middle and the right images are the retrieval results of our FreestyleRet-BLIP model and the original BLIP model, respectively.

🛠️ Requirements and Installation

  • Python >= 3.9
  • Pytorch >= 1.9.0
  • CUDA Version >= 11.3
  • Install required packages:
git clone https://github.com/YanhaoJia/FreeStyleRet
cd FreeStyleRet
pip install -r requirements.txt

💥 DSR dataset & FreestyleRet Checkpoints

Both dataset and model checkpoints has been released.

🗝️ Training & Validating

The training & validating instruction is in train.py and test.py.

👍 Acknowledgement

  • OpenCLIP An open source pretraining framework.
  • LanguageBind Bind five modalities through Language.
  • ImageBind Bind five modalities through Image.
  • FSCOCO An open source Sketch-Text retrieval dataset.

🔒 License

  • The majority of this project is released under the MIT license as found in the LICENSE file.
  • The dataset of this project is released under the CC-BY-NC 4.0 license as found in the DATASET_LICENSE file.

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.

@misc{li2023freestyleret,
      title={FreestyleRet: Retrieving Images from Style-Diversified Queries}, 
      author={Hao Li and Curise Jia and Peng Jin and Zesen Cheng and Kehan Li and Jialu Sui and Chang Liu and Li Yuan},
      year={2023},
      eprint={2312.02428},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

Precision Search through Multi-Style Inputs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages