You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was able to index whole CW09 dataset, with slight modifications to IndexClueWeb09b class. With 15 threads and 50gb to heap, it took about 20 hours and ended up in index size of 650 GB.
iorixxx
added a commit
to iorixxx/Anserini
that referenced
this issue
Dec 1, 2015
Quite impressively, I was able to index all of ClueWeb09 (English):
Took ~18 hours:
Index size (note: no positions):
The text was updated successfully, but these errors were encountered: