Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend OpenAI finish_reason handling #1985

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

steven-solomon
Copy link

@steven-solomon steven-solomon commented May 10, 2024

As part of #1979, I added:

  • explicit handling for the finish_reason values: length, content_filter, and stop
  • handlingfinish_reason for any unspecified cases
  • log repr_doc_ids in all error cases to improve debugging

Fixes #1979

label = choice.message.content.strip().replace("topic: ", "")
elif choice.finish_reason == "length":
logger.warn(f"Extracing Topics - Length limit reached for doc_ids ({repr_doc_ids})")
if hasattr(response.choices[0].message, "content"):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure of the APIs behavior when hitting the token limit, so I kept the check for content.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, this is fine I guess. Also, you can replace response.choices[0] with choice. Also, and this might be just me, could we replace the variable name choice with output? I find the term choice not easily readable/explicit.

else:
logger.warn(f"Extracing Topics - No label due to finish_reason {choice.finish_reason} for doc_ids ({repr_doc_ids})")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not in love with exposing the finish_reason concept in the error message but I wonder whether folks will have to learn about it when debugging the issue anyway. Wyt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think I need to index into repr_doc_ids based on the index. Kudos to @sdehm for pointing that out.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not in love with exposing the finish_reason concept in the error message but I wonder whether folks will have to learn about it when debugging the issue anyway. Wyt?

If that reason didn't trigger any of the other if statements, then to me it makes sense to log it since we didn't expect it and there is no reason for the user to know which one it is.

Also, perhaps make the warning a bit more explicit. For instance, "OpenAI Topic Representation - Couldn't create a label due to {choice.finish_reason} for the following document IDs"

I think being explicit here might improve any potential debugging.

@steven-solomon steven-solomon marked this pull request as ready for review May 10, 2024 14:58
Copy link
Owner

@MaartenGr MaartenGr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay, I left some comments here and there.


logger = MyLogger("WARNING")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether importing the logger will work with the current logging procedure. My guess is that it will create a separate logger which does things twice. Also, the model should control the verbosity so that this warning is suppressed if needed by the user.

I believe, but I will have to check, that the functionality of the logger will change in an open PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes total sense. Since we want to leave verbosity controls to the model, I could use the logger instance from topic_model.logger. Wyt?

@@ -37,7 +38,7 @@
Topic name:"""

DEFAULT_CHAT_PROMPT = """
I have a topic that contains the following documents:
I have a topic that contains the following documents:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I expect no issues, it would be interesting to see the effect of removing a \n from this prompt. In my experience with recent LLMs (Phi-3/Llama-3), this actually can have a decent influence on performance (especially with respect to following instructions). Either, good change!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point. I am tempted to undo this change then I don't necessarily want to modify any behavior other than logging.

label = choice.message.content.strip().replace("topic: ", "")
elif choice.finish_reason == "length":
logger.warn(f"Extracing Topics - Length limit reached for doc_ids ({repr_doc_ids})")
if hasattr(response.choices[0].message, "content"):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, this is fine I guess. Also, you can replace response.choices[0] with choice. Also, and this might be just me, could we replace the variable name choice with output? I find the term choice not easily readable/explicit.

if hasattr(response.choices[0].message, "content"):
label = choice.message.content.strip().replace("topic: ", "")
else:
label = "Incomple output due to token limit being reached"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "Incomplete" instead of "Incomple"

else:
label = "Incomple output due to token limit being reached"
elif choice.finish_reason == "content_filter":
logger.warn(f"Extracing Topics - Content filtered for doc_ids ({repr_doc_ids})")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make the warning a bit more explicit? For instance, "The content filter of OpenAI was trigger for the following documents IDs:"

else:
logger.warn(f"Extracing Topics - No label due to finish_reason {choice.finish_reason} for doc_ids ({repr_doc_ids})")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not in love with exposing the finish_reason concept in the error message but I wonder whether folks will have to learn about it when debugging the issue anyway. Wyt?

If that reason didn't trigger any of the other if statements, then to me it makes sense to log it since we didn't expect it and there is no reason for the user to know which one it is.

Also, perhaps make the warning a bit more explicit. For instance, "OpenAI Topic Representation - Couldn't create a label due to {choice.finish_reason} for the following document IDs"

I think being explicit here might improve any potential debugging.

@steven-solomon
Copy link
Author

@MaartenGr, I made the following updates based on your feedback:

  • Renamed choice to output
  • Introduce a logging prefix OpenAI Topic Representation
  • Improve copy of logs based on your feedback
    • Modified references to doc_ids to use document IDs
    • Fixed typos

Open questions:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle Responsible AI scenarios for OpenAI
2 participants