Skip to content

Commit

Permalink
Docs update
Browse files Browse the repository at this point in the history
  • Loading branch information
AmenRa committed Sep 2, 2023
1 parent 542fa42 commit f2dde80
Show file tree
Hide file tree
Showing 6 changed files with 135 additions and 49 deletions.
52 changes: 36 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,6 @@

## 🔥 News

- 📌 [April 4, 2023] [ranxhub](https://amenra.github.io/ranxhub), the [ranx](https://github.com/AmenRa/ranx)'s companion repository, will be featured in [SIGIR 2023](https://sigir.org/sigir2023)!
On [ranxhub](https://amenra.github.io/ranxhub), you can download and share pre-computed runs for Information Retrieval datasets, such as [MSMARCO Passage Ranking](https://arxiv.org/abs/1611.09268).

- [August 3 2023] `ranx` `0.3.16` is out!
This release adds support for importing Qrels and Runs from `parquet` files, exporting them as `pandas.DataFrame` and save them as `parquet` files.
Any dependence on `trec_eval` have been removed to make `ranx` truly MIT-compliant.
Expand All @@ -34,9 +31,13 @@ Any dependence on `trec_eval` have been removed to make `ranx` truly MIT-complia
It offers a user-friendly interface to evaluate and compare [Information Retrieval](https://en.wikipedia.org/wiki/Information_retrieval) and [Recommender Systems](https://en.wikipedia.org/wiki/Recommender_system).
[ranx](https://github.com/AmenRa/ranx) allows you to perform statistical tests and export [LaTeX](https://en.wikipedia.org/wiki/LaTeX) tables for your scientific publications.
Moreover, [ranx](https://github.com/AmenRa/ranx) provides several [fusion algorithms](https://amenra.github.io/ranx/fusion) and [normalization strategies](https://amenra.github.io/ranx/normalization), and an automatic [fusion optimization](https://amenra.github.io/ranx/fusion/#optimize-fusion) functionality.
[ranx](https://github.com/AmenRa/ranx) was featured in [ECIR 2022](https://ecir2022.org) and [CIKM 2022](https://www.cikm2022.org).
[ranx](https://github.com/AmenRa/ranx) also have a companion repository of pre-computed runs to facilitated model comparisons called [ranxhub](https://amenra.github.io/ranxhub).
On [ranxhub](https://amenra.github.io/ranxhub), you can download and share pre-computed runs for Information Retrieval datasets, such as [MSMARCO Passage Ranking](https://arxiv.org/abs/1611.09268).
[ranx](https://github.com/AmenRa/ranx) was featured in [ECIR 2022](https://ecir2022.org), [CIKM 2022](https://www.cikm2022.org), and [SIGIR 2023](https://sigir.org/sigir2023).

If you use [ranx](https://github.com/AmenRa/ranx) to evaluate results or conducting experiments involving fusion for your scientific publication, please consider citing it: [evaluation bibtex](https://dblp.org/rec/conf/ecir/Bassani22.html?view=bibtex), [fusion bibtex](https://dblp.org/rec/conf/cikm/BassaniR22.html?view=bibtex).
If you use [ranx](https://github.com/AmenRa/ranx) to evaluate results or conducting experiments involving fusion for your scientific publication, please consider citing it: [evaluation bibtex](https://dblp.org/rec/conf/ecir/Bassani22.html?view=bibtex), [fusion bibtex](https://dblp.org/rec/conf/cikm/BassaniR22.html?view=bibtex), [ranxhub bibtex](https://dblp.org/rec/conf/sigir/Bassani23.html?view=bibtex).

NB: `ranx` is not suited for evaluating classifiers. Please, refer to the [FAQ](https://amenra.github.io/ranx/faq) for further details.

For a quick overview, follow the [Usage](#-usage) section.

Expand Down Expand Up @@ -219,15 +220,16 @@ If you use [ranx](https://github.com/AmenRa/ranx) to evaluate results for your s
<summary>BibTeX</summary>

```bibtex
@inproceedings{DBLP:conf/ecir/Bassani22,
author = {Elias Bassani},
title = {ranx: {A} Blazing-Fast Python Library for Ranking Evaluation and Comparison},
booktitle = {{ECIR} {(2)}},
series = {Lecture Notes in Computer Science},
volume = {13186},
pages = {259--264},
publisher = {Springer},
year = {2022}
@inproceedings{ranx,
author = {Elias Bassani},
title = {ranx: {A} Blazing-Fast Python Library for Ranking Evaluation and Comparison},
booktitle = {{ECIR} {(2)}},
series = {Lecture Notes in Computer Science},
volume = {13186},
pages = {259--264},
publisher = {Springer},
year = {2022},
doi = {10.1007/978-3-030-99739-7\_30}
}
```
</details>
Expand All @@ -237,18 +239,36 @@ If you use the fusion functionalities provided by [ranx](https://github.com/Amen
<summary>BibTeX</summary>

```bibtex
@inproceedings{DBLP:conf/cikm/BassaniR22,
@inproceedings{ranx.fuse,
author = {Elias Bassani and
Luca Romelli},
title = {ranx.fuse: {A} Python Library for Metasearch},
booktitle = {{CIKM}},
pages = {4808--4812},
publisher = {{ACM}},
year = {2022}
year = {2022},
doi = {10.1145/3511808.3557207}
}
```
</details>

If you use pre-computed runs from [ranxhub]((https://amenra.github.io/ranxhub) to make comparison for your scientific publication, please consider citing our [SIGIR 2023](https://sigir.org/sigir2023) paper:
<details>
<summary>BibTeX</summary>

```bibtex
@inproceedings{ranxhub,
author = {Elias Bassani},
title = {ranxhub: An Online Repository for Information Retrieval Runs},
booktitle = {{SIGIR}},
pages = {3210--3214},
publisher = {{ACM}},
year = {2023},
doi = {10.1145/3539618.3591823}
}
```
</details>

## 🎁 Feature Requests
Would you like to see other features implemented? Please, open a [feature request](https://github.com/AmenRa/ranx/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=%5BFeature+Request%5D+title).

Expand Down
9 changes: 9 additions & 0 deletions docs/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# FAQ

## Is `ranx` suited for evaluating classification tasks?
No, it's not. `ranx` is meant for ranking tasks. Although some metrics are commonly used for evaluation of both tasks (e.g., `precision` and `recall`) the relevance scores stored in `runs` should not be confused with the predicted class labels of a classification task. Relevance scores are used by `ranx` to sort results before computing the metrics, regardless of their actual values.

## Are zero and negative scored results filtered out by `ranx`?
Zero and negative scored results are NOT filtered out by `ranx`.
Relevance scores are used only for sorting and there is no constraint on the values produce by a ranking models, although some of them only outputs positive values.
Therefore, if you think that zero and negative scored results should be filtered out, you should do it before passing the `runs` to `ranx`.
52 changes: 36 additions & 16 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,6 @@

## 🔥 News

- 📌 [April 4, 2023] [ranxhub](https://amenra.github.io/ranxhub), the [ranx](https://github.com/AmenRa/ranx)'s companion repository, will be featured in [SIGIR 2023](https://sigir.org/sigir2023)!
On [ranxhub](https://amenra.github.io/ranxhub), you can download and share pre-computed runs for Information Retrieval datasets, such as [MSMARCO Passage Ranking](https://arxiv.org/abs/1611.09268).

- [August 3 2023] `ranx` `0.3.16` is out!
This release adds support for importing Qrels and Runs from `parquet` files, exporting them as `pandas.DataFrame` and save them as `parquet` files.
Any dependence on `trec_eval` have been removed to make `ranx` truly MIT-compliant.
Expand All @@ -34,9 +31,13 @@ Any dependence on `trec_eval` have been removed to make `ranx` truly MIT-complia
It offers a user-friendly interface to evaluate and compare [Information Retrieval](https://en.wikipedia.org/wiki/Information_retrieval) and [Recommender Systems](https://en.wikipedia.org/wiki/Recommender_system).
[ranx](https://github.com/AmenRa/ranx) allows you to perform statistical tests and export [LaTeX](https://en.wikipedia.org/wiki/LaTeX) tables for your scientific publications.
Moreover, [ranx](https://github.com/AmenRa/ranx) provides several [fusion algorithms](https://amenra.github.io/ranx/fusion) and [normalization strategies](https://amenra.github.io/ranx/normalization), and an automatic [fusion optimization](https://amenra.github.io/ranx/fusion/#optimize-fusion) functionality.
[ranx](https://github.com/AmenRa/ranx) was featured in [ECIR 2022](https://ecir2022.org) and [CIKM 2022](https://www.cikm2022.org).
[ranx](https://github.com/AmenRa/ranx) also have a companion repository of pre-computed runs to facilitated model comparisons called [ranxhub](https://amenra.github.io/ranxhub).
On [ranxhub](https://amenra.github.io/ranxhub), you can download and share pre-computed runs for Information Retrieval datasets, such as [MSMARCO Passage Ranking](https://arxiv.org/abs/1611.09268).
[ranx](https://github.com/AmenRa/ranx) was featured in [ECIR 2022](https://ecir2022.org), [CIKM 2022](https://www.cikm2022.org), and [SIGIR 2023](https://sigir.org/sigir2023).

If you use [ranx](https://github.com/AmenRa/ranx) to evaluate results or conducting experiments involving fusion for your scientific publication, please consider citing it: [evaluation bibtex](https://dblp.org/rec/conf/ecir/Bassani22.html?view=bibtex), [fusion bibtex](https://dblp.org/rec/conf/cikm/BassaniR22.html?view=bibtex).
If you use [ranx](https://github.com/AmenRa/ranx) to evaluate results or conducting experiments involving fusion for your scientific publication, please consider citing it: [evaluation bibtex](https://dblp.org/rec/conf/ecir/Bassani22.html?view=bibtex), [fusion bibtex](https://dblp.org/rec/conf/cikm/BassaniR22.html?view=bibtex), [ranxhub bibtex](https://dblp.org/rec/conf/sigir/Bassani23.html?view=bibtex).

NB: `ranx` is not suited for evaluating classifiers. Please, refer to the [FAQ](https://amenra.github.io/ranx/faq) for further details.

For a quick overview, follow the [Usage](#-usage) section.

Expand Down Expand Up @@ -219,15 +220,16 @@ If you use [ranx](https://github.com/AmenRa/ranx) to evaluate results for your s
<summary>BibTeX</summary>

```bibtex
@inproceedings{DBLP:conf/ecir/Bassani22,
author = {Elias Bassani},
title = {ranx: {A} Blazing-Fast Python Library for Ranking Evaluation and Comparison},
booktitle = {{ECIR} {(2)}},
series = {Lecture Notes in Computer Science},
volume = {13186},
pages = {259--264},
publisher = {Springer},
year = {2022}
@inproceedings{ranx,
author = {Elias Bassani},
title = {ranx: {A} Blazing-Fast Python Library for Ranking Evaluation and Comparison},
booktitle = {{ECIR} {(2)}},
series = {Lecture Notes in Computer Science},
volume = {13186},
pages = {259--264},
publisher = {Springer},
year = {2022},
doi = {10.1007/978-3-030-99739-7\_30}
}
```
</details>
Expand All @@ -237,18 +239,36 @@ If you use the fusion functionalities provided by [ranx](https://github.com/Amen
<summary>BibTeX</summary>

```bibtex
@inproceedings{DBLP:conf/cikm/BassaniR22,
@inproceedings{ranx.fuse,
author = {Elias Bassani and
Luca Romelli},
title = {ranx.fuse: {A} Python Library for Metasearch},
booktitle = {{CIKM}},
pages = {4808--4812},
publisher = {{ACM}},
year = {2022}
year = {2022},
doi = {10.1145/3511808.3557207}
}
```
</details>

If you use pre-computed runs from [ranxhub]((https://amenra.github.io/ranxhub) to make comparison for your scientific publication, please consider citing our [SIGIR 2023](https://sigir.org/sigir2023) paper:
<details>
<summary>BibTeX</summary>

```bibtex
@inproceedings{ranxhub,
author = {Elias Bassani},
title = {ranxhub: An Online Repository for Information Retrieval Runs},
booktitle = {{SIGIR}},
pages = {3210--3214},
publisher = {{ACM}},
year = {2023},
doi = {10.1145/3539618.3591823}
}
```
</details>

## 🎁 Feature Requests
Would you like to see other features implemented? Please, open a [feature request](https://github.com/AmenRa/ranx/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=%5BFeature+Request%5D+title).

Expand Down
29 changes: 23 additions & 6 deletions docs/qrels.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,16 @@ Qrels can also be loaded from TREC-style and JSON files, from [ir-datasets](http

## Load from files
Parse a qrels file into `ranx.Qrels`.
Supported formats are JSON and TREC qrels format.
Correct import behavior is inferred from the file extension: `.json` `json`, `.trec` `trec`, `.txt` `trec`.
Supported formats are JSON, TREC qrels, and gzipped TREC qrels.
Correct import behavior is inferred from the file extension: `.json` -> `json`, `.trec` -> `trec`, `.txt` -> `trec`, `.gz` -> `gzipped trec`.
Use the `kind` argument to override the default behavior.


```python
qrels = Qrels.from_file("path/to/qrels.json") # JSON file
qrels = Qrels.from_file("path/to/qrels.trec") # TREC-Style file
qrels = Qrels.from_file("path/to/qrels.txt") # TREC-Style file with txt extension
qrels = Qrels.from_file("path/to/qrels.gz") # Gzipped TREC-Style file
qrels = Qrels.from_file("path/to/qrels.custom", kind="json") # Loaded as JSON file
```

Expand Down Expand Up @@ -62,14 +63,30 @@ qrels = Qrels.from_df(
)
```

## Load from Parquet files
`ranx` can load `qrels` from Parquet files, even from remote sources.
You can control the behavior of the underlying `pandas.read_parquet` function by passing additional arguments through the `pd_kwargs` argument (see https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html).

```python
qrels = Qrels.from_parquet(
path="/path/to/parquet/file""",
q_id_col="q_id",
doc_id_col="doc_id",
score_col="score",
pd_kwargs=None,
)
```

## Save
Write `qrels` to `path` as JSON file or TREC qrels format.
File type is automatically inferred form the filename extension: `.json` `json`, `.trec` `trec`, `.txt` `trec`.
File type is automatically inferred form the filename extension: `.json` -> `json`, `.trec` -> `trec`, `.txt` -> `trec`, `.parq` -> `parquet`, `.parquet` -> `parquet`.
Use the `kind` argument to override the default behavior.

```python
qrels.save("path/to/qrels.json") # Save as JSON file
qrels.save("path/to/qrels.trec") # Save as TREC-Style file
qrels.save("path/to/qrels.txt") # Save as TREC-Style file with txt extension
qrels.save("path/to/qrels.json") # Save as JSON file
qrels.save("path/to/qrels.trec") # Save as TREC-Style file
qrels.save("path/to/qrels.txt") # Save as TREC-Style file with txt extension
qrels.save("path/to/qrels.parq") # Save as Parquet file
qrels.save("path/to/qrels.parquet") # Save as Parquet file
qrels.save("path/to/qrels.custom", kind="json") # Save as JSON file
```
40 changes: 30 additions & 10 deletions docs/run.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Run

`Run` stores the relevance scores estimated by the model under evaluation.
`Run` stores the relevance scores estimated by the model under evaluation.
There is no constraint on the score values, i.e., zero and negative scores are not removed.
The preferred way for creating a `Run` instance is converting a Python dictionary as follows:

```python
Expand All @@ -25,14 +26,16 @@ run = Run(run_dict, name="bm25")

## Load from Files
Parse a run file into `ranx.Run`.
Supported formats are JSON and TREC run format.
Correct import behavior is inferred from the file extension: `.json` `json`, `.trec` `trec`, `.txt` `trec`.
Supported formats are JSON, TREC run, gzipped TREC run, and LZ4.
Correct import behavior is inferred from the file extension: `.json` -> `json`, `.trec` -> `trec`, `.txt` -> `trec`, `.gz` -> `trec`, `.lz4` -> `lz4`.
Use the `kind` argument to override the default behavior.

```python
run = Run.from_file("path/to/run.json") # JSON file
run = Run.from_file("path/to/run.trec") # TREC-Style file
run = Run.from_file("path/to/run.txt") # TREC-Style file with txt extension
run = Run.from_file("path/to/run.gz") # Gzipped TREC-Style file
run = Run.from_file("path/to/run.lz4") # lz4 file produced by saving a ranx.Run as lz4
run = Run.from_file("path/to/run.custom", kind="json") # Loaded as JSON file
```

Expand All @@ -46,23 +49,40 @@ run_df = DataFrame.from_dict({
"score": [ 0.5, 0.3, 0.6, 0.1 ],
})

run = Runs.from_df(
run = Run.from_df(
df=run_df,
q_id_col="q_id",
doc_id_col="doc_id",
score_col="score",
)
```

## Load from Parquet files
`ranx` can load `runs` from Parquet files, even from remote sources.
You can control the behavior of the underlying `pandas.read_parquet` function by passing additional arguments through the `pd_kwargs` argument (see https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html).

```python
run = Run.from_parquet(
path="/path/to/parquet/file""",
q_id_col="q_id",
doc_id_col="doc_id",
score_col="score",
pd_kwargs=None,
)
```

## Save
Write `run` to `path` as JSON file or TREC run format.
File type is automatically inferred form the filename extension: `.json` `json`, `.trec` `trec`, `.txt` `trec`.
Use the `kind` argument to override the default behavior.
Write `run` to `path` as JSON file, TREC run, LZ4 file, or Parquet file.
File type is automatically inferred form the filename extension: `.json` -> `json`, `.trec` -> `trec`, `.txt` -> `trec`, and `.lz4` -> `lz4`, `.parq` -> `parquet`, `.parquet` -> `parquet`.
Use the `kind` argument to override this behavior.

```python
run.save("path/to/run.json") # Save as JSON file
run.save("path/to/run.trec") # Save as TREC-Style file
run.save("path/to/run.txt") # Save as TREC-Style file with txt extension
run.save("path/to/run.json") # Save as JSON file
run.save("path/to/run.trec") # Save as TREC-Style file
run.save("path/to/run.txt") # Save as TREC-Style file with txt extension
run.save("path/to/run.lz4") # Save as lz4 file
run.save("path/to/run.parq") # Save as Parquet file
run.save("path/to/run.parquet") # Save as Parquet file
run.save("path/to/run.custom", kind="json") # Save as JSON file
```

Expand Down
2 changes: 1 addition & 1 deletion ranx/data_structures/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ def from_dict(d: Dict[str, Dict[str, float]]):

@staticmethod
def from_file(path: str, kind: str = None, name: str = None):
"""Parse a run file into ranx.Run. Supported formats are JSON, TREC run, gzipped TREC run, and LZ4. Correct import behavior is inferred from the file extension: ".json" -> "json", ".trec" -> "trec", ".txt" -> "trec", ".lz4" -> "lz4". Use the "kind" argument to override this behavior.
"""Parse a run file into ranx.Run. Supported formats are JSON, TREC run, gzipped TREC run, and LZ4. Correct import behavior is inferred from the file extension: ".json" -> "json", ".trec" -> "trec", ".txt" -> "trec", ".gz" -> "gzipped trec", ".lz4" -> "lz4". Use the "kind" argument to override this behavior.
Args:
path (str): File path.
Expand Down

0 comments on commit f2dde80

Please sign in to comment.