-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Description
This is something I was expecting to be quite straightforward (or at least better documented in the API) but it doesn't seem to be.
Say I want to gather all doc_ids from a given corpus (for instance, if I want to use a random negative sampler on run time).
Currently, this is what I do:
data = ir_datasets.load("msmarco-document/train")
all_doc_ids = list(data.docs._handler.docs_store().lookup.idx())
which is fine, but, from what I can get, this triggers an iteration over all docs in the collection (and is also not very intuitive).
Is there a better way to achieve this?
Metadata
Metadata
Assignees
Labels
No labels