Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproduced results and updated pygaggle/docs/experiments-msmarco-passage-subset.md #309

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

farazkh80
Copy link
Member

Successfully reproduced the same numerical results for pygaggle/docs/experiments-msmarco-passage-subset.md on a Colab env with a T4 GPU.

Encountered a small issue with the python dependencies needed to evaluate using monoBERT.

python -um pygaggle.run.evaluate_passage_ranker --split dev \
                                                --method seq_class_transformer \
                                                --model castorini/monobert-large-msmarco \
                                                --dataset data/msmarco_ans_small/ \
                                                --index-dir indexes/index-msmarco-passage-20191117-0ed488 \
                                                --task msmarco \
                                                --output-file runs/run.monobert.ans_small.dev.tsv

The error log was

2022-12-26 02:37:05.453924: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-12-26 02:37:08 [INFO] utils: NumExpr defaulting to 2 threads.
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/content/pygaggle/pygaggle/run/evaluate_passage_ranker.py", line 13, in <module>
    from pygaggle.rerank.base import Reranker
  File "/content/pygaggle/pygaggle/rerank/base.py", line 5, in <module>
    from pyserini.search import JLuceneSearcherResult
  File "/usr/local/lib/python3.8/dist-packages/pyserini/search/__init__.py", line 19, in <module>
    from .lucene import JLuceneSearcherResult, LuceneSimilarities, LuceneFusionSearcher, LuceneSearcher
  File "/usr/local/lib/python3.8/dist-packages/pyserini/search/lucene/__init__.py", line 18, in <module>
    from ._impact_searcher import JImpactSearcherResult, LuceneImpactSearcher
  File "/usr/local/lib/python3.8/dist-packages/pyserini/search/lucene/_impact_searcher.py", line 28, in <module>
    from pyserini.encode import QueryEncoder, TokFreqQueryEncoder, UniCoilQueryEncoder, \
  File "/usr/local/lib/python3.8/dist-packages/pyserini/encode/__init__.py", line 17, in <module>
    from ._base import DocumentEncoder, QueryEncoder, JsonlCollectionIterator,\
  File "/usr/local/lib/python3.8/dist-packages/pyserini/encode/_base.py", line 19, in <module>
    import faiss
ModuleNotFoundError: No module named 'faiss'

Fix:

pip install faiss-cpu

@farazkh80 farazkh80 changed the title reproduced results for pygaggle/docs/experiments-msmarco-passage-subset.md reproduced results and updated pygaggle/docs/experiments-msmarco-passage-subset.md Dec 29, 2022
@farazkh80
Copy link
Member Author

Added "What's going on?" toggle blocks to illustrate the effect of re-ranking on the top hit's relevancy to a certain qid.

For each "What's going on?" toggle block

  1. Show the head of each generated run file
  2. Choose the first line of the run file
  3. Grep the qid and docid to show the actual corresponding text of the query and the passage
  4. Check the factual relevancy by retrieving the qrel files and checking if qid and docid appear as a match.

@rodrigonogueira4
Copy link
Member

Thanks for doing this! Could you please also add pip install faiss-cpu in the instructions?

@farazkh80
Copy link
Member Author

Added faiss-cpu installation!

scores
```

Let's also download MS MARCO passage dataset to visualize the actual passages after re-ranking.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we only downloading it to visualize it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since the data/msmarco_ans_small itself does not include the passages. Any suggestion on how we can only download the passages corresponding to data/msmarco_ans_small?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How were we reranking without this though?

@farazkh80 farazkh80 requested review from ronakice and rodrigonogueira4 and removed request for rodrigonogueira4 and ronakice January 9, 2023 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants