Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the MUNI/33/0769/2022 R&D project #8

Merged
merged 67 commits into from
Mar 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
e8df577
evaluation metrics module (#7)
MarekToma Feb 27, 2023
22c0220
ensembles module (#6)
MarekToma Feb 27, 2023
8c327ab
Datasets module (#1)
MarekToma Feb 27, 2023
62a77ce
Preprocessing (#5)
VojtechKalivoda Feb 27, 2023
e5d5d7d
fixed setupfile and changed default eval depth
MarekToma Feb 27, 2023
6726f07
use DocPreprocessingBase
VojtechKalivoda Mar 4, 2023
9119521
removed forking for num_processes=1
MarekToma Mar 5, 2023
2c3a04d
adressed some requested changes
MarekToma Mar 5, 2023
071841b
added preprocessing module to setup
MarekToma Mar 5, 2023
092e4da
added cqadupstack class into datasets module, defined query ordering …
MarekToma Mar 6, 2023
5cd3aa4
updated tests for new ids and fixed judgments loading
MarekToma Mar 6, 2023
f05953d
fixed style checks
MarekToma Mar 6, 2023
fec508f
fixed line breaks in test.beir.test_loader
MarekToma Mar 6, 2023
7ab2ae8
updated beir_cqadupstack notebook, added some docstring to beir.loade…
MarekToma Mar 9, 2023
d467235
fixed some typechecks
MarekToma Mar 9, 2023
d9c2dcd
fixed eval call in beir notebook
MarekToma Mar 9, 2023
f52936c
added cache_location atribute to arqmathdataset
MarekToma Mar 9, 2023
2ff7188
set default cach download location in datasets module
MarekToma Mar 9, 2023
e931ac9
Fix NumPy docstring style
Witiko Mar 10, 2023
c3cf5ff
changed data loading to use datasets module in arqmath notebook
MarekToma Mar 13, 2023
eaa41f9
updated beir notebook and eval with new min MAP and added beir notebo…
MarekToma Mar 16, 2023
9af8d8e
add requirements file with pinned versions of libs
VojtechKalivoda Mar 16, 2023
d12473c
rename core of BM25Plus
VojtechKalivoda Mar 16, 2023
a3bceaf
use lib for BM25
VojtechKalivoda Mar 17, 2023
6dfc34a
update requirements
VojtechKalivoda Mar 17, 2023
94b76be
clean BM25 and readme
VojtechKalivoda Mar 17, 2023
5442bfb
use systems in notebooks
VojtechKalivoda Mar 17, 2023
721f09e
use datasets module in cranfield and trec notebooks
VojtechKalivoda Mar 18, 2023
be70ece
updated description of IR system in cqadupstack notebook
MarekToma Mar 19, 2023
9468d37
changed the multiprocessing implementation ov map in evaluation metri…
MarekToma Mar 19, 2023
237ae96
updated multiprocessing implementation of other metrics and set a chu…
MarekToma Mar 19, 2023
fe92126
changed imap to map
MarekToma Mar 19, 2023
dd50dc9
reverted some changes
MarekToma Mar 19, 2023
8c8c0f8
temporarely reverted some changes for sanity check
MarekToma Mar 19, 2023
8845b2f
put back in changes I made earlier
MarekToma Mar 19, 2023
7ad873b
put back in changes I made earlier
MarekToma Mar 19, 2023
fd28b63
fixed some type errors
MarekToma Mar 19, 2023
1cf35ea
made map eval faster using technique from original implementation
MarekToma Mar 19, 2023
e18360d
fixed some style errors
MarekToma Mar 19, 2023
da8e45d
use datasets module in trec
VojtechKalivoda Mar 20, 2023
ac246de
update notebooks and their scores in README
VojtechKalivoda Mar 20, 2023
33bd92c
update beir notebook
VojtechKalivoda Mar 20, 2023
901a358
In `notebooks/cranfield.ipynb`, replace tfidf with bow and clear outputs
Witiko Mar 20, 2023
028cb5e
Add more details to system section in `notebooks/cranfield.ipynb`
Witiko Mar 20, 2023
0395582
update arqmath and beir notebook
VojtechKalivoda Mar 20, 2023
4bc5b58
In `notebooks/arqmath.ipynb`, improve text and clear outputs
Witiko Mar 20, 2023
923dce3
In `notebooks/trec.ipynb`, improve text and clear outputs
Witiko Mar 20, 2023
43dbc49
In `notebooks/beir*.ipynb`, clear outputs and use `NoneDocPreprocessing`
Witiko Mar 20, 2023
1fd757b
Fix wrong MAP score for `notebooks/beir*.ipynb` in `README.md`
Witiko Mar 20, 2023
26509e3
fixed error in typing
MarekToma Mar 20, 2023
ac2c340
updated min map for arqmath
MarekToma Mar 20, 2023
dc22c72
Update minimum MAP score for `notebooks/beir.ipynb`
Witiko Mar 20, 2023
9cb5f84
Remove duplicate paragraph in `notebooks/arqmath.ipynb`
Witiko Mar 20, 2023
7019e4e
add tutorial notebook
VojtechKalivoda Mar 20, 2023
8042bf8
Use `find_packages` and `parse_requirements` in `setup.py`
Witiko Mar 20, 2023
a4cefb7
Add newline to code block in `notebooks/arqmath.ipynb`
Witiko Mar 20, 2023
bd09ae1
Merge branch 'developer' of github.com:MIR-MU/pv211-utils into developer
Witiko Mar 20, 2023
ebe564c
Fix the use of `parse_requirements` in `setup.py`
Witiko Mar 20, 2023
82f3f17
Pull `preprocessing.*.*` members to `preprocessing`
Witiko Mar 20, 2023
fd5bee7
Loosen `requirements.txt`
Witiko Mar 20, 2023
0761fa7
Loosen `requirements.txt`
Witiko Mar 20, 2023
5df3176
Loosen `requirements.txt`
Witiko Mar 20, 2023
a8a7efa
Add `rank-bm25` to `requirements.txt`
Witiko Mar 20, 2023
1a57376
Minimize parameters passed to `*Evaluation` in `notebooks/*.ipynb`
Witiko Mar 20, 2023
acfb437
Revert "Fix wrong MAP score for `notebooks/beir*.ipynb` in `README.md`"
Witiko Mar 20, 2023
a6aff58
Revert "In `notebooks/beir*.ipynb`, clear outputs and use `NoneDocPre…
Witiko Mar 20, 2023
1a8c40b
Pin `notebooks/cranfield.ipynb` and `trec.ipynb`
Witiko Mar 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
[dockerhub]: https://hub.docker.com/repository/docker/miratmu/pv211-utils

This is a Python library that provides an object-oriented interface for
Cranfield, TREC 6–8, and ARQMath collections. The library also provides an
Cranfield, TREC 6–8, ARQMath, and Beir collections. The library also provides an
object-oriented interface for building and evaluating information retrieval
search engines for these collections as a part of the [PV211: Introduction to
Information Retrieval][pv211] course taught at [the Faculty of Informatics,
Expand All @@ -21,22 +21,28 @@ Masaryk University, Brno, Czech Republic][fimu].

Here are some examples of how you can use the PV211 Utils library:

- First Term Project: Cranfield Collection (20.75% MAP score)
- First Term Project: Cranfield Collection (23.24% MAP score)
[![Open in Colab][colab-badge]][cranfield]
Witiko marked this conversation as resolved.
Show resolved Hide resolved
[![Open in Jupyter Hub][jupyter-badge]][jupyter]

- Second Term Project: TREC Collection (10.37% MAP score)
[![Open in Colab][colab-badge]][trec]
- Second Term Project: Beir CQADupStack Collection (21.96% MAP score)
[![Open in Colab][colab-badge]][beir]
[![Open in Jupyter Hub][jupyter-badge]][jupyter]

- Alternative Second Term Project: ARQMath Collection (0.71% MAP score)
- Alternative Second Term Project: ARQMath Collection (6.62% MAP score)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the new MAP score (6.62%) passes the lower threshold (1.2%), we need to set a new threshold (see also how we set the 1.2% threshold in spring 2022).

[![Open in Colab][colab-badge]][arqmath]
[![Open in Jupyter Hub][jupyter-badge]][jupyter]

- Pre-2023 Second Term Project: TREC Collection (43.06% MAP score)
Witiko marked this conversation as resolved.
Show resolved Hide resolved
[![Open in Colab][colab-badge]][trec]
[![Open in Jupyter Hub][jupyter-badge]][jupyter]


[colab-badge]: https://colab.research.google.com/assets/colab-badge.svg
[jupyter-badge]: https://github.com/MIR-MU/pv211-utils/raw/main/jupyterhub-badge.svg

[jupyter]: https://iirhub.cloud.e-infra.cz/
[cranfield]: https://colab.research.google.com/github/MIR-MU/pv211-utils/blob/main/notebooks/cranfield.ipynb
[trec]: https://colab.research.google.com/github/MIR-MU/pv211-utils/blob/main/notebooks/trec.ipynb
[arqmath]: https://colab.research.google.com/github/MIR-MU/pv211-utils/blob/main/notebooks/arqmath.ipynb
[beir]: https://colab.research.google.com/github/MIR-MU/pv211-utils/blob/main/notebooks/beir_cqadupstack.ipynb
Loading