-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support asynchronous API call of OpenAIBasedEvaluator.get_score()
#104
Conversation
I ran the following code in my local environment and get this result: import time
from dotenv import load_dotenv
from langcheck.metrics.en import toxicity
load_dotenv()
generated_outputs = [
"You look beautiful",
"You look ugly",
"You are a good person",
"You are a bad person",
"I have a pen",
"I have an apple",
"I have a pine apple",
# "I have a gun", # Intentionally cause an error.
]
# Async
start = time.time()
toxicity_value = toxicity(
generated_outputs,
model_type="azure_openai",
openai_args={"model": "gpt-4-turbo"},
use_async=True,
)
print(f'Async call: {time.time() - start}[s]')
# No async
start = time.time()
toxicity_value = toxicity(
generated_outputs,
model_type="azure_openai",
openai_args={"model": "gpt-4-turbo"},
use_async=False,
)
print(f'No async call: {time.time() - start}[s]')
|
597b4d3
to
c56e132
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! It worked well in my environment too!
Could you also edit the remaining metrics (e.g. Japanese toxicity) too?
Sure, I overlooked them. I updated the code. |
7457256
to
30e0347
Compare
30e0347
to
bcf09f1
Compare
@liwii I updated all the metric function which we can pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a final check and added some missing use_async
arguments. LGTM! Thanks for the work!!
The tests are failing due to the server that stores the Chinese tokenizers. Let me think about how to resolve that.
Seemingly the server is back, so let me merge this PR now. Thanks!! |
Motivation
The bottleneck of the OpenAI-based evaluator is IO time. This PR introduces an asynchronous API call option for each metric evaluation.