Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor installation instructions #1611

Closed
lintool opened this issue Aug 31, 2023 · 4 comments
Closed

Refactor installation instructions #1611

lintool opened this issue Aug 31, 2023 · 4 comments

Comments

@lintool
Copy link
Member

lintool commented Aug 31, 2023

@yilinjz @UShivani3 et al. recently had issues getting Pyserini installed... I think we should refactor the installation instructions?

@sahel-sh since you did the onboarding recently, you might share experiences also?

cc @Andrwyl

@lintool
Copy link
Member Author

lintool commented Aug 31, 2023

Also, something to think about - should we move some dependencies into "optional"?
https://setuptools.pypa.io/en/latest/userguide/dependency_management.html

@lintool
Copy link
Member Author

lintool commented Sep 1, 2023

On Ubuntu 18.04.6. I did the following:

conda env list
conda create -n pyserini-pypi-test python=3.9
conda activate pyserini-pypi-test

conda install -c pytorch pytorch faiss-cpu

pip install pyserini

The important package versions:

$ pip list | egrep '(numpy|pyjnius|transformers|torch|sentencepiece|faiss|scikit-learn|lightgbm|spacy|pandas)\s'
faiss              1.7.4
lightgbm           4.0.0
numpy              1.25.2
pandas             2.1.0
pyjnius            1.5.0
scikit-learn       1.3.0
sentencepiece      0.1.99
spacy              3.6.1
torch              2.0.1
transformers       4.32.1

Above config, will get this error for faiss:

ImportError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory

Issue described here: facebookresearch/faiss#2890

Fix is to install mkl separately:

conda install mkl=2021

@Andrwyl in the student cs env, did you have any issues?

@Andrwyl
Copy link
Contributor

Andrwyl commented Sep 1, 2023

For the student env, from the beginning all the way to the end there were no issues. Following the instructions exactly should work all the way to the end of onboarding. The only issue that you run into is there is a cpu limit on every student account which causes you to be kicked off while doing dense retrieval. ulimit -t unlimited gives you unlimited cpu time.

https://uwaterloo.ca/computer-science-computing-facility/teaching-hosts shows the specs of the student computers. The resources are shared so its possible that on high usage days you may end up failing (has not happened to me though).

To access, any uwaterloo student can just do ssh userid@linux.student.cs.uwaterloo.ca, but I'm pretty sure nearly everyone has used this for a 2nd year CS class

@lintool
Copy link
Member Author

lintool commented Sep 4, 2023

Added reference to student teaching hosts in onboarding guide: https://github.com/castorini/onboarding/blob/master/ura.md#initial-screening

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants