QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations, Chaitanya Malaviya+, N/A, arXiv'23 #701

AkihikoWatanabe · 2023-05-22T10:48:33Z

URL

https://arxiv.org/abs/2305.11694

Affiliations

Chaitanya Malaviya, N/A
Peter Shaw, N/A
Ming-Wei Chang, N/A
Kenton Lee, N/A
Kristina Toutanova, N/A

Abstract

Formulating selective information needs results in queries that implicitlyspecify set operations, such as intersection, union, and difference. Forinstance, one might search for "shorebirds that are not sandpipers" or"science-fiction films shot in England". To study the ability of retrievalsystems to meet such information needs, we construct QUEST, a dataset of 3357natural language queries with implicit set operations, that map to a set ofentities corresponding to Wikipedia documents. The dataset challenges models tomatch multiple constraints mentioned in queries with corresponding evidence indocuments and correctly perform various set operations. The dataset isconstructed semi-automatically using Wikipedia category names. Queries areautomatically composed from individual categories, then paraphrased and furthervalidated for naturalness and fluency by crowdworkers. Crowdworkers also assessthe relevance of entities based on their documents and highlight attribution ofquery constraints to spans of document text. We analyze several modernretrieval systems, finding that they often struggle on such queries. Queriesinvolving negation and conjunction are particularly challenging and systems arefurther challenged with combinations of these operations.

Translation (by gpt-3.5-turbo)

選択的な情報ニーズを定式化することにより、交差、和、差などの集合演算を暗黙的に指定するクエリが生成されます。例えば、「サンドパイパーでないショアバード」や「イギリスで撮影されたSF映画」を検索することができます。このような情報ニーズに対応する検索システムの能力を研究するために、Wikipediaのドキュメントに対応するエンティティのセットにマップする3357の自然言語クエリを含むQUESTデータセットを構築しました。このデータセットは、クエリで言及される複数の制約を対応するドキュメントの証拠と一致させ、さまざまな集合演算を正しく実行することをモデルに求めます。データセットは、Wikipediaのカテゴリ名を使用して半自動的に構築されます。クエリは個々のカテゴリから自動的に構成され、その後、クラウドワーカーによって言い換えられ、自然さと流暢さがさらに検証されます。クラウドワーカーは、エンティティの関連性をドキュメントに基づいて評価し、クエリの制約の帰属をドキュメントテキストのスパンにハイライトします。私たちは、いくつかの現代的な検索システムを分析し、このようなクエリに対してしばしば苦戦することがわかりました。否定や連言を含むクエリは特に難しく、これらの操作の組み合わせによってシステムはさらに挑戦されます。

Summary (by gpt-3.5-turbo)

QUESTデータセットは、交差、和、差などの集合演算を暗黙的に指定するクエリを生成するために、選択的な情報ニーズを定式化することによって構築されました。このデータセットは、Wikipediaのドキュメントに対応するエンティティのセットにマップされ、クエリで言及される複数の制約を対応するドキュメントの証拠と一致させ、さまざまな集合演算を正しく実行することをモデルに求めます。クラウドワーカーによって言い換えられ、自然さと流暢さがさらに検証されたクエリは、いくつかの現代的な検索システムにとって苦戦することがわかりました。

AkihikoWatanabe added the Pocket label May 22, 2023

AkihikoWatanabe changed the title あ QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations, Chaitanya Malaviya+, N/A, arXiv'23 May 22, 2023

AkihikoWatanabe mentioned this issue Oct 29, 2023

Chain-of-Verification Reduces Hallucination in Large Language Models, Shehzaad Dhuliawala+, N/A, arXiv'23 #1044

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations, Chaitanya Malaviya+, N/A, arXiv'23 #701

QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations, Chaitanya Malaviya+, N/A, arXiv'23 #701

AkihikoWatanabe commented May 22, 2023 •

edited

QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations, Chaitanya Malaviya+, N/A, arXiv'23 #701

QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations, Chaitanya Malaviya+, N/A, arXiv'23 #701

Comments

AkihikoWatanabe commented May 22, 2023 • edited

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented May 22, 2023 •

edited