Enhanced data type handling and improved code stability in computing functions #3

Martyniqo · 2024-08-11T20:03:00Z

Problem:
I encountered an issue in the computation.py file where the computation functions don't always handle different input data types properly. Specifically, I noticed that when the input data is a one-dimensional array or a list, it can cause errors that prevent the functions from running correctly.

What happened:
While running these:
evaluator = RAGChecker( extractor_name='bedrock/meta.llama3-1-8b-instruct-v1:0', checker_name='bedrock/meta.llama3-1-8b-instruct-v1:0', batch_size_extractor=32, batch_size_checker=32 )
evaluator.evaluate(rag_results, all_metrics)

I received this error message while trying to compute retriever_metrics and generator_metrics :
Error during evaluation: object of type 'numpy.bool_' has no len()

This error suggests that the code tried to calculate the length of a boolean value, which shouldn't happen. It seems that the input data wasn't processed as expected, leading to this issue.

What I changed:
Added Type Checks: I updated several functions (like compute_precision, compute_recall, and compute_retrieval) to include type checks. Now, the code verifies that the input data is either a numpy array or a list before proceeding. This should help prevent similar errors in the future.

Better Handling of 1D Arrays: In functions such as compute_retrieval and compute_context_utilization, I added logic to check if the input is a one-dimensional array. If it is, the code adjusts accordingly, which should avoid errors related to operations like np.max.

Impact on computation accuracy:
While these changes should make the code more robust, there's a chance they could affect the accuracy of the computations. Specifically, if the input data doesn't match the expected format, the new conditions might change how the computations are carried out. I've tested the changes, but I'm not entirely sure if they might cause something to work incorrectly. If you notice any issues or if the changes affect the computations in unintended ways, please advise.

Changes in code:

def evaluate_precision
if isinstance(answer2response, (np.ndarray, list)) and len(answer2response) > 0: result.metrics[metrics.precision] = np.mean(answer2response) else: result.metrics[metrics.precision] = 0.
def evaluate_retrieval
if isinstance(retrieved2answer, (np.ndarray, list)) and len(retrieved2answer) > 0: if isinstance(retrieved2answer[0], (np.ndarray, list)) and len(retrieved2answer[0]) > 0: claim_recalled = np.max(retrieved2answer, axis=1) result.metrics[metrics.claim_recall] = np.mean(claim_recalled) psg_useful = np.max(retrieved2answer, axis=0) result.metrics[metrics.context_precision] = np.mean(psg_useful) else: claim_recalled = retrieved2answer result.metrics[metrics.claim_recall] = np.mean(claim_recalled) result.metrics[metrics.context_precision] = 0. else: result.metrics[metrics.claim_recall] = 0. result.metrics[metrics.context_precision] = 0.
def evaluate_context_utilization
if isinstance(retrieved2answer, (np.ndarray, list)) and len(retrieved2answer) > 0: if np.ndim(retrieved2answer) == 1 or (np.ndim(retrieved2answer) > 1 and len(retrieved2answer[0]) > 0): claim_recalled = np.max(retrieved2answer, axis=1) if np.ndim(retrieved2answer) > 1 else retrieved2answer if np.sum(claim_recalled) > 0: claim_used = claim_recalled & response2answer result.metrics[metrics.context_utilization] = np.sum(claim_used) / np.sum(claim_recalled) else: result.metrics[metrics.context_utilization] = 0. else: result.metrics[metrics.context_utilization] = 0. else: result.metrics[metrics.context_utilization] = 0.

computation-v2.zip

The text was updated successfully, but these errors were encountered:

HuXiangkun · 2024-08-13T15:47:07Z

Hi @Martyniqo , this is great feedback! We will review your code and get back to you soon! Also, would you consider create a pull request for your changes? We can review and merge your code accordingly. Thanks for your effort!

Martyniqo · 2024-08-13T15:54:40Z

Hi, Thank you for your response! I’ll create the pull request asap ☺️ Regards, Martyna

…

On Tue, 13 Aug 2024 at 17:47, Xiangkun Hu ***@***.***> wrote: Hi @Martyniqo <https://github.com/Martyniqo> , this is great feedback! We will review your code and get back to you soon! Also, would you consider create a pull request for your changes? We can review and merge your code accordingly. Thanks for your effort! — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AZZKUARAY3FB7MB4MENST4TZRITBDAVCNFSM6AAAAABMLARM66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBWGU3TIOJZGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

rudongyu · 2024-08-21T17:52:07Z

Hi @Martyniqo, are you running the example here with only the backbone LLM changed?

I ran the code below but couldn't reproduce the error.

from ragchecker import RAGResults, RAGChecker
from ragchecker.metrics import all_metrics


# initialize ragresults from json/dict
with open("examples/checking_inputs.json") as fp:
    rag_results = RAGResults.from_json(fp.read())

# set-up the evaluator
evaluator = RAGChecker(
    extractor_name='bedrock/meta.llama3-1-8b-instruct-v1:0',
    checker_name='bedrock/meta.llama3-1-8b-instruct-v1:0',
    batch_size_extractor=32, batch_size_checker=32
)

evaluator.evaluate(rag_results, all_metrics)
print(rag_results)

Since the input of functions in computation.py is from our upstream package RefChecker, the data type should have been well-controlled only if the input data has the same format with the example here.

If you are running RAGChecker on your own data, could you provide some samples leading to the error?

rudongyu · 2024-08-28T09:32:16Z

The bug has been fixed by modifying the output formats of the dependency RefChecker. Please install the latest version to avoid the error. Feel free to reopen the issue if you find something wrong on your data.

rudongyu mentioned this issue Aug 22, 2024

us len() to process a boolen value #6

Closed

rudongyu closed this as completed Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhanced data type handling and improved code stability in computing functions #3

Enhanced data type handling and improved code stability in computing functions #3

Martyniqo commented Aug 11, 2024 •

edited

Loading

HuXiangkun commented Aug 13, 2024

Martyniqo commented Aug 13, 2024 via email

rudongyu commented Aug 21, 2024

rudongyu commented Aug 28, 2024

Enhanced data type handling and improved code stability in computing functions #3

Enhanced data type handling and improved code stability in computing functions #3

Comments

Martyniqo commented Aug 11, 2024 • edited Loading

HuXiangkun commented Aug 13, 2024

Martyniqo commented Aug 13, 2024 via email

rudongyu commented Aug 21, 2024

rudongyu commented Aug 28, 2024

Martyniqo commented Aug 11, 2024 •

edited

Loading