Direct access to all doc_ids

This is something I was expecting to be quite straightforward (or at least better documented in the API) but it doesn't seem to be. 
Say I want to gather all doc_ids from a given corpus (for instance, if I want to use a random negative sampler on run time).
Currently, this is what I do:
```
data = ir_datasets.load("msmarco-document/train")
all_doc_ids = list(data.docs._handler.docs_store().lookup.idx())
```
which is fine, but, from what I can get, this triggers an iteration over all docs in the collection (and is also not very intuitive). 

Is there a better way to achieve this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Direct access to all doc_ids #184

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Direct access to all doc_ids #184

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions