You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Existing information retrieval (IR) models often assume a homogeneous format,limiting their applicability to diverse user needs, such as searching forimages with text descriptions, searching for a news article with a headlineimage, or finding a similar photo with a query image. To approach suchdifferent information-seeking demands, we introduce UniIR, a unifiedinstruction-guided multimodal retriever capable of handling eight distinctretrieval tasks across modalities. UniIR, a single retrieval system jointlytrained on ten diverse multimodal-IR datasets, interprets user instructions toexecute various retrieval tasks, demonstrating robust performance acrossexisting datasets and zero-shot generalization to new tasks. Our experimentshighlight that multi-task training and instruction tuning are keys to UniIR'sgeneralization ability. Additionally, we construct the M-BEIR, a multimodalretrieval benchmark with comprehensive results, to standardize the evaluationof universal multimodal information retrieval.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: