Create results class #54

sahel-sh · 2024-01-26T23:51:01Z

This cl:
1- creates a Result class and integrates it to retriever. In all retrieval modes the retriever now returns a list[Result] rather than list[dict].
2- creates a helper 'ResultsWriter` class to take care of writing results to files.

Changing the reranker to read from this new result type will be submitted in a follow up cl.

TESTED=ran all demos of different retrieval modes.

ronakice · 2024-01-27T03:57:14Z

src/rank_llm/result.py

+            values = []
+            for info in result.ranking_exec_summary:
+                values.append(info.__dict__)
+            exec_summary[key] = values


would this work in case we have duplicate queries in our dataset? Feels like things might be a bit off in this case

No it would not since the query is the key, but this is identical to the current behavior in input token file where query is the key:

{ "How much impact do masks have on preventing the spread of the COVID-19?": [ 35947, 810 ], ... }

I can change this to an array of dictionaries instead, with "query" and "ranking_exec_summary" as keys, WDYT?

yep that array idea works better! thanks

ronakice · 2024-01-27T03:58:50Z

src/rank_llm/retrieve/pyserini_retriever.py

@@ -169,28 +170,21 @@ def retrieve_and_store(
        Path(f"retrieve_results/{self._retrieval_method.name}").mkdir(


Can this go in the writer somewhere too?

The writer is used after the reranking too. That's why I let the retriever and reranker to identify the target file.

ronakice · 2024-01-27T04:00:23Z

src/rank_llm/retrieve/pyserini_retriever.py

-                            f"{hit['qid']} Q0 {hit['docid']} {hit['rank']} {hit['score']} rank\n"
-                        )
+            writer.write_in_trec_eval_format(
+                f"retrieve_results/{self._retrieval_method.name}/trec_results_{self._dataset}.txt"


Can this naming be automated in writer too?

I prefer to keep it this way since: 1- the writer is shared between retriever and reranker. 2- all this information about the dataset, ertrieval method, etc should be passed to the writer to be able generate the names properly.

ronakice

LGTM!

sahel-sh added 4 commits January 25, 2024 02:16

added result and resultwriter classes

c4c8318

Merge branch 'main' into create_results_class

e661fab

partially integrated result in retriever

ce2797c

integrated result into pyserini retriever

a70572b

sahel-sh requested review from ronakice and lintool January 26, 2024 23:53

ronakice reviewed Jan 27, 2024

View reviewed changes

don't use query as key

37d15b8

ronakice approved these changes Jan 27, 2024

View reviewed changes

sahel-sh merged commit 67f3045 into main Jan 27, 2024

sahel-sh deleted the create_results_class branch January 28, 2024 22:58

sahel-sh mentioned this pull request Jan 28, 2024

create a result class #35

Closed

sahel-sh self-assigned this Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create results class #54

Create results class #54

sahel-sh commented Jan 26, 2024 •

edited

Loading

ronakice Jan 27, 2024

sahel-sh Jan 27, 2024

sahel-sh Jan 27, 2024

ronakice Jan 27, 2024

ronakice Jan 27, 2024

sahel-sh Jan 27, 2024

ronakice Jan 27, 2024

sahel-sh Jan 27, 2024

ronakice left a comment

		@@ -169,28 +170,21 @@ def retrieve_and_store(
		Path(f"retrieve_results/{self._retrieval_method.name}").mkdir(

Create results class #54

Create results class #54

Conversation

sahel-sh commented Jan 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ronakice left a comment

Choose a reason for hiding this comment

sahel-sh commented Jan 26, 2024 •

edited

Loading