Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New MS MARCO (V1) passage regressions #1730

Closed
lintool opened this issue Jan 12, 2022 · 3 comments · Fixed by #1732
Closed

New MS MARCO (V1) passage regressions #1730

lintool opened this issue Jan 12, 2022 · 3 comments · Fixed by #1732
Assignees

Comments

@lintool
Copy link
Member

lintool commented Jan 12, 2022

Just for consistency re: #1721 - we are going to re-build MS MARCO passage regressions, using same scripts, for consistency.

Raw file that @ronakice has prepped on orca is here:

$ md5sum /store/scratch/rpradeep/msmarco-v1/collections/msmarco_v1_passage_d2q-t5_40/docs.jsonl
1f4859d1258df07b8ac2277a986e162b  /store/scratch/rpradeep/msmarco-v1/collections/msmarco_v1_passage_d2q-t5_40/docs.jsonl

This is the starting point. I'll repackage in the same way as doc regressions.

@lintool lintool self-assigned this Jan 12, 2022
@ronakice
Copy link
Member

Yep, looks good from my end, @MXueguang should check if it looks fine on his!

@lintool
Copy link
Member Author

lintool commented Jan 13, 2022

Repackaged and stored on orca at /store/collections/msmarco/tarballs:

$ md5sum msmarco-passage-docTTTTTquery.tar
ee1a1d9e89d26bae4c0382d75a361576  msmarco-passage-docTTTTTquery.tar

Unpacked at /store/collections/msmarco/msmarco-passage-docTTTTTquery:

$ ls -l
total 3156412
-r--r--r-- 1 jimmylin jimmylin 180510918 Jan 13 13:12 docs-aa.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 180464659 Jan 13 13:12 docs-ab.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 180596698 Jan 13 13:12 docs-ac.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 180995768 Jan 13 13:12 docs-ad.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 181972324 Jan 13 13:12 docs-ae.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 183045317 Jan 13 13:12 docs-af.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 183262993 Jan 13 13:12 docs-ag.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 183505225 Jan 13 13:12 docs-ah.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 183467390 Jan 13 13:12 docs-ai.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 183563636 Jan 13 13:12 docs-aj.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 183872883 Jan 13 13:12 docs-ak.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 184319850 Jan 13 13:12 docs-al.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 182176900 Jan 13 13:12 docs-am.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 183767988 Jan 13 13:12 docs-an.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 183210261 Jan 13 13:12 docs-ao.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 184168860 Jan 13 13:13 docs-ap.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 183566373 Jan 13 13:13 docs-aq.jsonl.gz
-r--r--r-- 1 jimmylin jimmylin 125664655 Jan 13 13:13 docs-ar.jsonl.gz

@lintool
Copy link
Member Author

lintool commented Jan 13, 2022

Confirmed on orca, passes without issue with :

$ python src/main/python/run_regression.py --index --verify --search --regression msmarco-passage-docTTTTTquery

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants