Extend OpenAI `finish_reason` handling #1985

steven-solomon · 2024-05-10T14:55:26Z

As part of #1979, I added:

explicit handling for the finish_reason values: length, content_filter, and stop
handlingfinish_reason for any unspecified cases
log repr_doc_ids in all error cases to improve debugging

Fixes #1979

steven-solomon · 2024-05-10T14:57:07Z

bertopic/representation/_openai.py

+                    label = choice.message.content.strip().replace("topic: ", "")
+                elif choice.finish_reason == "length":
+                    logger.warn(f"Extracing Topics - Length limit reached for doc_ids ({repr_doc_ids})")
+                    if hasattr(response.choices[0].message, "content"):


I am not sure of the APIs behavior when hitting the token limit, so I kept the check for content.

For now, this is fine I guess. Also, you can replace response.choices[0] with choice. Also, and this might be just me, could we replace the variable name choice with output? I find the term choice not easily readable/explicit.

steven-solomon · 2024-05-10T14:58:15Z

bertopic/representation/_openai.py

                else:
+                    logger.warn(f"Extracing Topics - No label due to finish_reason {choice.finish_reason} for doc_ids ({repr_doc_ids})")


I am not in love with exposing the finish_reason concept in the error message but I wonder whether folks will have to learn about it when debugging the issue anyway. Wyt?

I also think I need to index into repr_doc_ids based on the index. Kudos to @sdehm for pointing that out.

I am not in love with exposing the finish_reason concept in the error message but I wonder whether folks will have to learn about it when debugging the issue anyway. Wyt?

If that reason didn't trigger any of the other if statements, then to me it makes sense to log it since we didn't expect it and there is no reason for the user to know which one it is.

Also, perhaps make the warning a bit more explicit. For instance, "OpenAI Topic Representation - Couldn't create a label due to {choice.finish_reason} for the following document IDs"

I think being explicit here might improve any potential debugging.

MaartenGr

Apologies for the delay, I left some comments here and there.

MaartenGr · 2024-05-22T13:39:45Z

bertopic/representation/_openai.py


+logger = MyLogger("WARNING")


I'm not sure whether importing the logger will work with the current logging procedure. My guess is that it will create a separate logger which does things twice. Also, the model should control the verbosity so that this warning is suppressed if needed by the user.

I believe, but I will have to check, that the functionality of the logger will change in an open PR.

That makes total sense. Since we want to leave verbosity controls to the model, I could use the logger instance from topic_model.logger. Wyt?

MaartenGr · 2024-05-22T13:41:32Z

bertopic/representation/_openai.py

@@ -37,7 +38,7 @@
 Topic name:"""

 DEFAULT_CHAT_PROMPT = """
-I have a topic that contains the following documents: 
+I have a topic that contains the following documents:


Although I expect no issues, it would be interesting to see the effect of removing a \n from this prompt. In my experience with recent LLMs (Phi-3/Llama-3), this actually can have a decent influence on performance (especially with respect to following instructions). Either, good change!

That's a great point. I am tempted to undo this change then I don't necessarily want to modify any behavior other than logging.

MaartenGr · 2024-05-22T13:44:42Z

bertopic/representation/_openai.py

+                    label = choice.message.content.strip().replace("topic: ", "")
+                elif choice.finish_reason == "length":
+                    logger.warn(f"Extracing Topics - Length limit reached for doc_ids ({repr_doc_ids})")
+                    if hasattr(response.choices[0].message, "content"):


For now, this is fine I guess. Also, you can replace response.choices[0] with choice. Also, and this might be just me, could we replace the variable name choice with output? I find the term choice not easily readable/explicit.

MaartenGr · 2024-05-22T13:45:03Z

bertopic/representation/_openai.py

+                    if hasattr(response.choices[0].message, "content"):
+                        label = choice.message.content.strip().replace("topic: ", "")
+                    else:
+                        label = "Incomple output due to token limit being reached"


Should be "Incomplete" instead of "Incomple"

MaartenGr · 2024-05-22T13:46:48Z

bertopic/representation/_openai.py

+                    else:
+                        label = "Incomple output due to token limit being reached"
+                elif choice.finish_reason == "content_filter":
+                    logger.warn(f"Extracing Topics - Content filtered for doc_ids ({repr_doc_ids})")


Perhaps make the warning a bit more explicit? For instance, "The content filter of OpenAI was trigger for the following documents IDs:"

MaartenGr · 2024-05-22T13:50:36Z

bertopic/representation/_openai.py

                else:
+                    logger.warn(f"Extracing Topics - No label due to finish_reason {choice.finish_reason} for doc_ids ({repr_doc_ids})")


I am not in love with exposing the finish_reason concept in the error message but I wonder whether folks will have to learn about it when debugging the issue anyway. Wyt?

If that reason didn't trigger any of the other if statements, then to me it makes sense to log it since we didn't expect it and there is no reason for the user to know which one it is.

Also, perhaps make the warning a bit more explicit. For instance, "OpenAI Topic Representation - Couldn't create a label due to {choice.finish_reason} for the following document IDs"

I think being explicit here might improve any potential debugging.

- Replace all uses of choices[0] with output

steven-solomon · 2024-05-22T16:03:06Z

@MaartenGr, I made the following updates based on your feedback:

Renamed choice to output
Introduce a logging prefix OpenAI Topic Representation
Improve copy of logs based on your feedback
- Modified references to doc_ids to use document IDs
- Fixed typos

Open questions:

steven-solomon added 3 commits May 10, 2024 14:16

Import logger

cf4852b

Handle finish reasons

69179b3

Add logging for each finish_reason

acba986

steven-solomon commented May 10, 2024

View reviewed changes

steven-solomon marked this pull request as ready for review May 10, 2024 14:58

MaartenGr reviewed May 22, 2024

View reviewed changes

steven-solomon added 3 commits May 22, 2024 15:21

Rename choice to output

335338e

- Replace all uses of choices[0] with output

Fix typo

c064bda

Use requested logging prefix

8a97198

Update messages

7e04b78

steven-solomon requested a review from MaartenGr May 22, 2024 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend OpenAI `finish_reason` handling #1985

Extend OpenAI `finish_reason` handling #1985

steven-solomon commented May 10, 2024 •

edited

steven-solomon May 10, 2024

MaartenGr May 22, 2024

steven-solomon May 10, 2024 •

edited

steven-solomon May 14, 2024

MaartenGr May 22, 2024

MaartenGr left a comment

MaartenGr May 22, 2024

steven-solomon May 22, 2024

MaartenGr May 22, 2024

steven-solomon May 22, 2024

MaartenGr May 22, 2024

MaartenGr May 22, 2024

MaartenGr May 22, 2024

MaartenGr May 22, 2024

steven-solomon commented May 22, 2024

		else:
		logger.warn(f"Extracing Topics - No label due to finish_reason {choice.finish_reason} for doc_ids ({repr_doc_ids})")

Extend OpenAI finish_reason handling #1985

Are you sure you want to change the base?

Extend OpenAI finish_reason handling #1985

Conversation

steven-solomon commented May 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steven-solomon May 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaartenGr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steven-solomon commented May 22, 2024

Extend OpenAI `finish_reason` handling #1985

Extend OpenAI `finish_reason` handling #1985

steven-solomon commented May 10, 2024 •

edited

steven-solomon May 10, 2024 •

edited