simplify prompt (langchain-ai#13)

Simplify to always try to start with "objects" task, to avoid using image name as a valid tag Start with OCR if it is a pdf Tweak for photo editing
dashesy · Mar 30, 2023 · db57ac6 · db57ac6
1 parent 6d6e537
commit db57ac6
Show file tree

Hide file tree

Showing 2 changed files with 14 additions and 8 deletions.
diff --git a/langchain/agents/assistant/base.py b/langchain/agents/assistant/base.py
@@ -130,7 +130,10 @@ def _extract_tool_and_input(self, llm_output: str, tries=0) -> Optional[Tuple[st
             elif "brand" in sub_cmd:
                 action = "Bing Search"
             elif "objects" in sub_cmd:
-                action = "Image Understanding"
+                if action_input.lower().endswith(".pdf"):
+                    action = "OCR Understanding"
+                else:
+                    action = "Image Understanding"
             if not action_input:
                 if not action:
                     if cmd.endswith("?"):
@@ -143,7 +146,8 @@ def _extract_tool_and_input(self, llm_output: str, tries=0) -> Optional[Tuple[st
             assert action_input
             if not action and is_face:
                 action = "Celebrity Understanding"
-            if not action and " text" in sub_cmd:
+            # TODO: separate llm to decide the task
+            if not action and (" is written" in sub_cmd or " text" in sub_cmd or sub_cmd.endswith(" say?")):
                 action = "OCR Understanding"
             if not action:
                 if tries < 4:

diff --git a/langchain/agents/assistant/prompt.py b/langchain/agents/assistant/prompt.py
@@ -5,15 +5,16 @@
 Any time there is an image in our conversation that you want to know about objects description, texts, OCR (optical character recognition), people, celebrities inside of the image you could ask Assistant by addressing him. 
 
 These are the tasks that Assistant can handle for an image: photo editing, celebrities, business card, receipt, objects, OCR, Bing
-If the task does not fit any of the above, make sure the question has the word objects in it.
-For example to ask about an image without any description, make sure the question has the word objects in it.
+Ask Assistant about the objects in the image.
+Then if there is text in the image, ask Assistant to do OCR
 For example to ask about an image that could be a business card, make sure the question has the word business card in it.
 For example to ask about an image that could be a receipt, make sure the question has the word receipt in it.
-For example other image types that may have text (sign, label, plan, invoice, money), and require OCR.
-For example if there is a person's face in the image find if there are celebrities in the image.
+Other image types that may have text (sign, label, plan, invoice, money), and require OCR.
+* Ask to do OCR if pdf
 <|im_end|>
 
-Gather your thoughts and observations in a list then if needed ask Assistant a task it can handle.
+Keep tasks Assistant can handle in mind. 
+Gather your thoughts and observations in a short list then if needed ask Assistant a task it can handle.
 Finally summerize the information and answer the question.
 For example:
 <|im_start|>Human
@@ -39,7 +40,8 @@
 <|im_start|>Human
 Move the logo in this image to the right
 <|im_sep|>{ai_prefix}
-1. This is a photo editing task
+1. The image should be edited
+2. This is a photo editing task
 Assistant, Move the logo in this business card image to the right  https://i.ibb.co/tsQ0Myn/00.jpg
 EXAMPLE END