reproduced results and updated pygaggle/docs/experiments-msmarco-passage-subset.md #309

farazkh80 · 2022-12-26T02:38:00Z

Successfully reproduced the same numerical results for pygaggle/docs/experiments-msmarco-passage-subset.md on a Colab env with a T4 GPU.

Encountered a small issue with the python dependencies needed to evaluate using monoBERT.

python -um pygaggle.run.evaluate_passage_ranker --split dev \
                                                --method seq_class_transformer \
                                                --model castorini/monobert-large-msmarco \
                                                --dataset data/msmarco_ans_small/ \
                                                --index-dir indexes/index-msmarco-passage-20191117-0ed488 \
                                                --task msmarco \
                                                --output-file runs/run.monobert.ans_small.dev.tsv

The error log was

2022-12-26 02:37:05.453924: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-12-26 02:37:08 [INFO] utils: NumExpr defaulting to 2 threads.
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/content/pygaggle/pygaggle/run/evaluate_passage_ranker.py", line 13, in <module>
    from pygaggle.rerank.base import Reranker
  File "/content/pygaggle/pygaggle/rerank/base.py", line 5, in <module>
    from pyserini.search import JLuceneSearcherResult
  File "/usr/local/lib/python3.8/dist-packages/pyserini/search/__init__.py", line 19, in <module>
    from .lucene import JLuceneSearcherResult, LuceneSimilarities, LuceneFusionSearcher, LuceneSearcher
  File "/usr/local/lib/python3.8/dist-packages/pyserini/search/lucene/__init__.py", line 18, in <module>
    from ._impact_searcher import JImpactSearcherResult, LuceneImpactSearcher
  File "/usr/local/lib/python3.8/dist-packages/pyserini/search/lucene/_impact_searcher.py", line 28, in <module>
    from pyserini.encode import QueryEncoder, TokFreqQueryEncoder, UniCoilQueryEncoder, \
  File "/usr/local/lib/python3.8/dist-packages/pyserini/encode/__init__.py", line 17, in <module>
    from ._base import DocumentEncoder, QueryEncoder, JsonlCollectionIterator,\
  File "/usr/local/lib/python3.8/dist-packages/pyserini/encode/_base.py", line 19, in <module>
    import faiss
ModuleNotFoundError: No module named 'faiss'

Fix:

pip install faiss-cpu

…et.md

farazkh80 · 2022-12-29T06:48:26Z

Added "What's going on?" toggle blocks to illustrate the effect of re-ranking on the top hit's relevancy to a certain qid.

For each "What's going on?" toggle block

Show the head of each generated run file
Choose the first line of the run file
Grep the qid and docid to show the actual corresponding text of the query and the passage
Check the factual relevancy by retrieving the qrel files and checking if qid and docid appear as a match.

rodrigonogueira4 · 2022-12-29T10:58:22Z

Thanks for doing this! Could you please also add pip install faiss-cpu in the instructions?

farazkh80 · 2023-01-02T23:40:36Z

Added faiss-cpu installation!

docs/experiments-msmarco-passage-subset.md

ronakice · 2023-01-04T17:55:51Z

docs/experiments-msmarco-passage-subset.md

+scores
+``` 
+
+Let's also download MS MARCO passage dataset to visualize the actual passages after re-ranking.


Are we only downloading it to visualize it?

Yes, since the data/msmarco_ans_small itself does not include the passages. Any suggestion on how we can only download the passages corresponding to data/msmarco_ans_small?

How were we reranking without this though?

farazkh80 and others added 4 commits December 25, 2022 21:29

reproduced results for pygaggle/docs/experiments-msmarco-passage-subs…

9805e79

…et.md

added pre and post re-rank example visualization

8dfd94f

fixed some spelling errors

0abab7b

formatted

eda23ea

farazkh80 changed the title ~~reproduced results for pygaggle/docs/experiments-msmarco-passage-subset.md~~ reproduced results and updated pygaggle/docs/experiments-msmarco-passage-subset.md Dec 29, 2022

rodrigonogueira4 approved these changes Dec 29, 2022

View reviewed changes

added faiss instalation

f71e5f0

farazkh80 requested a review from rodrigonogueira4 January 2, 2023 23:40

ronakice reviewed Jan 4, 2023

View reviewed changes

docs/experiments-msmarco-passage-subset.md Show resolved Hide resolved

ronakice reviewed Jan 4, 2023

View reviewed changes

farazkh80 requested review from ronakice and rodrigonogueira4 and removed request for rodrigonogueira4 and ronakice January 9, 2023 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproduced results and updated pygaggle/docs/experiments-msmarco-passage-subset.md #309

reproduced results and updated pygaggle/docs/experiments-msmarco-passage-subset.md #309

farazkh80 commented Dec 26, 2022

farazkh80 commented Dec 29, 2022

rodrigonogueira4 commented Dec 29, 2022

farazkh80 commented Jan 2, 2023

ronakice Jan 4, 2023

farazkh80 Jan 4, 2023

ronakice Jan 10, 2023

reproduced results and updated pygaggle/docs/experiments-msmarco-passage-subset.md #309

Are you sure you want to change the base?

reproduced results and updated pygaggle/docs/experiments-msmarco-passage-subset.md #309

Conversation

farazkh80 commented Dec 26, 2022

farazkh80 commented Dec 29, 2022

rodrigonogueira4 commented Dec 29, 2022

farazkh80 commented Jan 2, 2023

ronakice Jan 4, 2023

Choose a reason for hiding this comment

farazkh80 Jan 4, 2023

Choose a reason for hiding this comment

ronakice Jan 10, 2023

Choose a reason for hiding this comment