-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced data type handling and improved code stability in computing functions #3
Comments
Hi @Martyniqo , this is great feedback! We will review your code and get back to you soon! Also, would you consider create a pull request for your changes? We can review and merge your code accordingly. Thanks for your effort! |
Hi,
Thank you for your response!
I’ll create the pull request asap
…On Tue, 13 Aug 2024 at 17:47, Xiangkun Hu ***@***.***> wrote:
Hi @Martyniqo <https://github.com/Martyniqo> , this is great feedback! We
will review your code and get back to you soon! Also, would you consider
create a pull request for your changes? We can review and merge your code
accordingly. Thanks for your effort!
—
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZZKUARAY3FB7MB4MENST4TZRITBDAVCNFSM6AAAAABMLARM66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBWGU3TIOJZGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @Martyniqo, are you running the example here with only the backbone LLM changed? I ran the code below but couldn't reproduce the error. from ragchecker import RAGResults, RAGChecker
from ragchecker.metrics import all_metrics
# initialize ragresults from json/dict
with open("examples/checking_inputs.json") as fp:
rag_results = RAGResults.from_json(fp.read())
# set-up the evaluator
evaluator = RAGChecker(
extractor_name='bedrock/meta.llama3-1-8b-instruct-v1:0',
checker_name='bedrock/meta.llama3-1-8b-instruct-v1:0',
batch_size_extractor=32, batch_size_checker=32
)
evaluator.evaluate(rag_results, all_metrics)
print(rag_results) Since the input of functions in If you are running RAGChecker on your own data, could you provide some samples leading to the error? |
The bug has been fixed by modifying the output formats of the dependency RefChecker. Please install the latest version to avoid the error. Feel free to reopen the issue if you find something wrong on your data. |
Problem:
I encountered an issue in the computation.py file where the computation functions don't always handle different input data types properly. Specifically, I noticed that when the input data is a one-dimensional array or a list, it can cause errors that prevent the functions from running correctly.
What happened:
While running these:
evaluator = RAGChecker( extractor_name='bedrock/meta.llama3-1-8b-instruct-v1:0', checker_name='bedrock/meta.llama3-1-8b-instruct-v1:0', batch_size_extractor=32, batch_size_checker=32 )
evaluator.evaluate(rag_results, all_metrics)
I received this error message while trying to compute retriever_metrics and generator_metrics :
Error during evaluation: object of type 'numpy.bool_' has no len()
This error suggests that the code tried to calculate the length of a boolean value, which shouldn't happen. It seems that the input data wasn't processed as expected, leading to this issue.
What I changed:
Added Type Checks: I updated several functions (like compute_precision, compute_recall, and compute_retrieval) to include type checks. Now, the code verifies that the input data is either a numpy array or a list before proceeding. This should help prevent similar errors in the future.
Better Handling of 1D Arrays: In functions such as compute_retrieval and compute_context_utilization, I added logic to check if the input is a one-dimensional array. If it is, the code adjusts accordingly, which should avoid errors related to operations like np.max.
Impact on computation accuracy:
While these changes should make the code more robust, there's a chance they could affect the accuracy of the computations. Specifically, if the input data doesn't match the expected format, the new conditions might change how the computations are carried out. I've tested the changes, but I'm not entirely sure if they might cause something to work incorrectly. If you notice any issues or if the changes affect the computations in unintended ways, please advise.
Changes in code:
def evaluate_precision
if isinstance(answer2response, (np.ndarray, list)) and len(answer2response) > 0: result.metrics[metrics.precision] = np.mean(answer2response) else: result.metrics[metrics.precision] = 0.
def evaluate_retrieval
if isinstance(retrieved2answer, (np.ndarray, list)) and len(retrieved2answer) > 0: if isinstance(retrieved2answer[0], (np.ndarray, list)) and len(retrieved2answer[0]) > 0: claim_recalled = np.max(retrieved2answer, axis=1) result.metrics[metrics.claim_recall] = np.mean(claim_recalled) psg_useful = np.max(retrieved2answer, axis=0) result.metrics[metrics.context_precision] = np.mean(psg_useful) else: claim_recalled = retrieved2answer result.metrics[metrics.claim_recall] = np.mean(claim_recalled) result.metrics[metrics.context_precision] = 0. else: result.metrics[metrics.claim_recall] = 0. result.metrics[metrics.context_precision] = 0.
def evaluate_context_utilization
if isinstance(retrieved2answer, (np.ndarray, list)) and len(retrieved2answer) > 0: if np.ndim(retrieved2answer) == 1 or (np.ndim(retrieved2answer) > 1 and len(retrieved2answer[0]) > 0): claim_recalled = np.max(retrieved2answer, axis=1) if np.ndim(retrieved2answer) > 1 else retrieved2answer if np.sum(claim_recalled) > 0: claim_used = claim_recalled & response2answer result.metrics[metrics.context_utilization] = np.sum(claim_used) / np.sum(claim_recalled) else: result.metrics[metrics.context_utilization] = 0. else: result.metrics[metrics.context_utilization] = 0. else: result.metrics[metrics.context_utilization] = 0.
computation-v2.zip
The text was updated successfully, but these errors were encountered: