feat: enable image modality for `ChatAgent` #473

zechengz · 2024-03-17T07:43:18Z

Description

Enable image modality for ChatAgent. Notice that only tested with single step chat agent, which means that the assistant agent just perform one step given some images etc.

Motivation and Context

Part of #454

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of example)

Implemented Tasks

Enable image modality in BaseMessage
Create a new ChatGPTVisionConfig as the vision model config is different from the text one
Update token counting for images
Create object recognition task type and example

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide. (required)
My change requires a change to the documentation.
I have updated the tests accordingly. (required for a bug fix or a new feature)
I have updated the documentation accordingly.

coderabbitai · 2024-03-17T07:43:23Z

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

dandansamax

Thanks @zechengz, it looks awesome. However, becuase it affects main data model BaseMessage, we may want all maintainers to look into it. @camel-ai/camel-maintainers

test/agents/test_chat_agent.py

Wendong-Fan

LGTM, normally we also require test for the example file, could you also add this?

zechengz · 2024-04-07T08:40:31Z

@Wendong-Fan I create a mock test instead as it costs relatively a lot for using the vision model.

camel/configs.py

Appointat

Good job. I reviewed it and have left some comments.

camel/agents/chat_agent.py

camel/messages/base.py

camel/utils/token_counting.py

camel/configs.py

examples/vision/object_recognition.py

Appointat

One more comment.

camel/configs.py

Appointat

Reviewed

Appointat

It looks ok to me.

lightaime · 2024-04-18T10:53:46Z

Thank @zechengz for the amazing implementation!!

Some questions to discuss:

Is ChatGPTVisionConfig needed? Could we use ChatGPTConfig instead?
Could we also merge FunctionCallingConfig with ChatGPTConfig also well? There may be some extra or not supported keys when we call the API endpoint for different models. But I guess we can filter the ones that are different from the default values when we revert them to dict. By doing this, we can hugely simplify the abstraction.
Since now gpt-4-turbo supports vision and function calling, should we just remove ModelType.GPT_4_TURBO_VISION and use ModelType.GPT_4_TURBO instead?

gpt-4-vision-preview GPT-4 model with the ability to understand images, in addition to all other GPT-4 Turbo capabilities. This is a preview model, we recommend developers to now use gpt-4-turbo which includes vision capabilities. Currently points to gpt-4-1106-vision-preview.

c.c. @Wendong-Fan @dandansamax @ocss884

Wendong-Fan · 2024-04-19T15:51:28Z

Thank @zechengz for the amazing implementation!!

Some questions to discuss:

Is ChatGPTVisionConfig needed? Could we use ChatGPTConfig instead?

Could we also merge FunctionCallingConfig with ChatGPTConfig also well? There may be some extra or not supported keys when we call the API endpoint for different models. But I guess we can filter the ones that are different from the default values when we revert them to dict. By doing this, we can hugely simplify the abstraction.

Since now gpt-4-turbo supports vision and function calling, should we just remove ModelType.GPT_4_TURBO_VISION and use ModelType.GPT_4_TURBO instead?

gpt-4-vision-preview GPT-4 model with the ability to understand images, in addition to all other GPT-4 Turbo capabilities. This is a preview model, we recommend developers to now use gpt-4-turbo which includes vision capabilities. Currently points to gpt-4-1106-vision-preview.

c.c. @Wendong-Fan @dandansamax @ocss884

Hey @lightaime , I agree with you, we can remove ModelType.GPT_4_TURBO_VISION since currently we can get ride of gpt-4-1106-preview and gpt-4-vision-preview by using gpt-4-turbo. Further, it would be great to remove ChatGPTVisionConfig , merge FunctionCallingConfig into ChatGPTConfig to make the abstraction tidy.

One more suggestion after I read the latest OpenAI doc, we can also add parameter detail to give user better flexibility. @zechengz

By controlling the detail parameter, which has three options, low, high, or auto, you have control over how the model processes the image and generates its textual understanding. By default, the model will use the auto setting which will look at the image input size and decide if it should use the low or high setting.
low will enable the "low res" mode. The model will receive a low-res 512px x 512px version of the image, and represent the image with a budget of 65 tokens. This allows the API to return faster responses and consume fewer input tokens for use cases that do not require high detail.
high will enable "high res" mode, which first allows the model to see the low res image and then creates detailed crops of input images as 512px squares based on the input image size. Each of the detailed crops uses twice the token budget (65 tokens) for a total of 129 tokens.

zechengz · 2024-04-21T02:18:55Z

@lightaime @Wendong-Fan SGTM, will create another PR for these.

Wendong-Fan · 2024-04-23T15:13:13Z

Hey @zechengz , I create a new issue, let's work on this together with @ocss884 to refactor it
#526
cc @lightaime

Add image for OpenAI model

545bd13

zechengz added Agent Related to camel agents Prompt Related to camel prompts Example New Feature labels Mar 17, 2024

zechengz requested review from lightaime and dandansamax March 17, 2024 07:43

zechengz self-assigned this Mar 17, 2024

zechengz added 3 commits March 24, 2024 05:19

Update

7929968

Merge branch 'master' into zecheng_gpt_with_vision

6747948

Update

00bd900

zechengz changed the title ~~[Draft] feat: enable image modality for ChatAgent~~ feat: enable image modality for ChatAgent Mar 24, 2024

zechengz marked this pull request as ready for review March 24, 2024 12:28

Wendong-Fan requested a review from a team March 25, 2024 13:22

dandansamax requested changes Mar 25, 2024

View reviewed changes

test/agents/test_chat_agent.py Show resolved Hide resolved

ocss884 self-assigned this Mar 25, 2024

dandansamax requested a review from a team March 26, 2024 16:13

Wendong-Fan reviewed Mar 27, 2024

View reviewed changes

ocss884 self-requested a review March 28, 2024 15:52

zechengz added 2 commits April 5, 2024 15:09

Merge branch 'master' into zecheng_gpt_with_vision

adc2073

Update

6315aca

zechengz requested review from dandansamax and Wendong-Fan April 7, 2024 08:39

ocss884 reviewed Apr 8, 2024

View reviewed changes

camel/configs.py Show resolved Hide resolved

Appointat reviewed Apr 8, 2024

View reviewed changes

camel/configs.py Show resolved Hide resolved

zechengz added 2 commits April 15, 2024 01:54

Merge branch 'master' into zecheng_gpt_with_vision

09e8f11

Update

e072603

zechengz requested review from ocss884 and Appointat April 15, 2024 09:55

zechengz added 5 commits April 15, 2024 03:01

Update

29eb0d7

Update

1bd5261

Update

b6bc53e

Update

6e13520

Update

a8fe062

Appointat reviewed Apr 15, 2024

View reviewed changes

Update

c34ee2f

zechengz requested a review from Appointat April 15, 2024 16:23

Appointat approved these changes Apr 15, 2024

View reviewed changes

zechengz and others added 2 commits April 16, 2024 13:03

Merge branch 'master' into zecheng_gpt_with_vision

8ca8f4e

Merge branch 'master' into zecheng_gpt_with_vision

60ffdba

Wendong-Fan merged commit 57c700a into master Apr 17, 2024
6 checks passed

Wendong-Fan deleted the zecheng_gpt_with_vision branch April 17, 2024 15:53

lightaime mentioned this pull request Apr 22, 2024

Update ModelType and remove legacy models #511

Merged

13 tasks

Wendong-Fan mentioned this pull request Apr 23, 2024

[Feature Request] Refactor ChatAgent #526

Closed

2 tasks

Wendong-Fan mentioned this pull request Apr 25, 2024

fix: retriever module follow LSP, utils function use GPT_3_5_TURBO #532

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable image modality for `ChatAgent` #473

feat: enable image modality for `ChatAgent` #473

zechengz commented Mar 17, 2024 •

edited by zhangzaibin

Loading

coderabbitai bot commented Mar 17, 2024 •

edited

Loading

Auto Review Skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

dandansamax left a comment

Wendong-Fan left a comment

zechengz commented Apr 7, 2024

Appointat left a comment

Appointat left a comment

Appointat left a comment

Appointat left a comment

lightaime commented Apr 18, 2024 •

edited

Loading

Wendong-Fan commented Apr 19, 2024

zechengz commented Apr 21, 2024

Wendong-Fan commented Apr 23, 2024

feat: enable image modality for ChatAgent #473

feat: enable image modality for ChatAgent #473

Conversation

zechengz commented Mar 17, 2024 • edited by zhangzaibin Loading

Description

Motivation and Context

Types of changes

Implemented Tasks

Checklist

coderabbitai bot commented Mar 17, 2024 • edited Loading

Auto Review Skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

dandansamax left a comment

Choose a reason for hiding this comment

Wendong-Fan left a comment

Choose a reason for hiding this comment

zechengz commented Apr 7, 2024

Appointat left a comment

Choose a reason for hiding this comment

Appointat left a comment

Choose a reason for hiding this comment

Appointat left a comment

Choose a reason for hiding this comment

Appointat left a comment

Choose a reason for hiding this comment

lightaime commented Apr 18, 2024 • edited Loading

Wendong-Fan commented Apr 19, 2024

zechengz commented Apr 21, 2024

Wendong-Fan commented Apr 23, 2024

feat: enable image modality for `ChatAgent` #473

feat: enable image modality for `ChatAgent` #473

zechengz commented Mar 17, 2024 •

edited by zhangzaibin

Loading

coderabbitai bot commented Mar 17, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)

lightaime commented Apr 18, 2024 •

edited

Loading