Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an entry in Pyserini's Passage Ranking Experiment Reproduction Log for ugrad onboarding #1599

Merged
merged 1 commit into from
Aug 26, 2023

Conversation

yilinjz
Copy link
Contributor

@yilinjz yilinjz commented Aug 25, 2023

OS: macOS Ventura Version 13.5

Hardware: M2 MacBook Pro (2023), 16 GB RAM

Conda: 23.7.2

Python: (Conda Environment): 3.8.17
Java: (Conda Environment) 11.0.13
Maven: (Conda Environment) 3.9.4

Result: I was able to replicate (most of) the Indexing, Retrieval & Evaluation steps with Pyserini on my machine with the above settings. Outputs are the same as listed in the document. 


Additional Comment: I encountered two (and only two) failed unit tests (from running “python -m unittest”) when following the “pyserini/docs/installation.md” file. Namely,

  • test_remove_undjudged (tests.test_trectools.TestTrecTools)
  • test_undjudged_keep (tests.test_trectools.TestTrecTools)

This seems to be caused by the TREC evaluation tool. When running the following command:

python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap
collections/msmarco-passage/qrels.dev.small.trec
runs/run.msmarco-passage.bm25tuned.trec

I got:

Running command: ['java', '-jar', '/Users/jasonzhang/.cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-c', '-mrecall.1000', '-mmap', 'collections/msmarco-passage/qrels.dev.small.trec', 'runs/run.msmarco-passage.bm25tuned.trec']
Exception in thread "main" java.lang.UnsupportedOperationException: Unsupported os/arch: trec_eval-macosx-aarch64
at uk.ac.gla.terrier.jtreceval.trec_eval.getTrecEvalBinary(trec_eval.java:80)
at uk.ac.gla.terrier.jtreceval.trec_eval.(trec_eval.java:130)
at uk.ac.gla.terrier.jtreceval.trec_eval.main(trec_eval.java:262)

which seems to be Apple's M1/M2 chips related. There is surprisingly little information about this error on Google (it needs a better IR system!). As a side note, according to previous commits, it seems others have succeeded with M1 MacBooks (but no commit with M2).

On the other hand, the official MS MARCO evaluation script worked perfectly fine for me.

For the next step, I'm planning on replicating this experiment again with my old laptop (which has intel chips), but before that I thought I'd record this issue here as potential reference for future users.

@yilinjz
Copy link
Contributor Author

yilinjz commented Aug 25, 2023

Update: Reran this experiment on my Intel chip MacBook. Everything worked smoothly. Outputs are the same as listed in the document. 


OS: macOS Ventura Version 13.4.1
Hardware: MacBook Pro (2019), 2.8GHz Quad-Core Intel Core i7, 16 GB RAM

@lintool
Copy link
Member

lintool commented Aug 26, 2023

Interesting. From what I understand, Rosetta is supposed to handle translation of x86 instructions... but for some reason it's not kicking in? I have an Apple M2 Macbook, and seems to work fine for me? #shrug

Thanks for noting, and let's keep an eye out on this issue.

@yilinjz
Copy link
Contributor Author

yilinjz commented Aug 30, 2023

@lintool Found a solution to this issue. Leaving a comment here for future reference.

If you encountered this issue:
This issue seemed to be caused by Conda 23.7.2 ARM64 distribution. If you are unsure about which conda version you have, run "conda info" in your terminal and check the "platform" field.


Steps to fix this issue:

  1. Uninstall your current conda distribution (https://docs.anaconda.com/free/anaconda/install/uninstall/)
  2. If you do not have Rosetta installed on your Mac, install Rosetta (https://osxdaily.com/2020/12/04/how-install-rosetta-2-apple-silicon-mac/)
  3. Install the latest Intel Mac distribution (https://www.anaconda.com/). Rosetta will handle the x86 translation so yes you can run the Intel distribution on your M1/M2 Mac.
  4. Go through the Pyserini Installation again and it should work fine (https://github.com/castorini/pyserini/blob/master/docs/installation.md). Try the Development Installation if you encounter issues with the Pip Installation.

Side note:
It's unclear whether earlier Conda ARM64 distributions such as 23.7.1 or 23.7.0 also have this issue. If you prefer installing an ARM64 distribution, try the earlier Conda versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants