An Evaluation of Cross-Modal Text-to-Image Retrieval Pipelines
The goal of this project is to benchmark and evaluate various cross-modal retrieval pipelines, specifically focusing on text-to-image retrieval. It involves generating embeddings using state-of-the-art models (CLIP, FLAVA, UniIR, PreFLMR), building vector stores (FAISS, ColBERT), and serving retrieval requests through dedicated services.