Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Dalle-Critic not working #2510

Closed
nazkhan-8451 opened this issue Apr 25, 2024 · 29 comments · May be fixed by #2586
Closed

[Bug]: Dalle-Critic not working #2510

nazkhan-8451 opened this issue Apr 25, 2024 · 29 comments · May be fixed by #2586
Assignees
Labels
bug Something isn't working

Comments

@nazkhan-8451
Copy link

nazkhan-8451 commented Apr 25, 2024

Describe the bug

Followed the notebook https://github.com/microsoft/autogen/blob/main/notebook/agentchat_image_generation_capability.ipynb, but getting the following response:
image

Code:

import autogen

config_list_gpt4 = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["wag-gpt4-128k"],
    },
)

config_list_gpt4_vision = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gpt4-vision"],
    },
)

config_list_dalle = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["dall-e-3"],
    },
)

gpt_config = {
    "cache_seed": 42,  # change the cache_seed for different trials
    "temperature": 0.7,
    "config_list": config_list_gpt4,
    "timeout": 300,
}

gpt_vision_config = {
    "cache_seed": 42,  # change the cache_seed for different trials
    "temperature": 0.7,
    "config_list": config_list_gpt4_vision,
    "timeout": 300,
}

dalle_config = {
    "cache_seed": 42,  # change the cache_seed for different trials
    "temperature": 0.7,
    "config_list": config_list_dalle,
    "timeout": 300,
}

def _is_termination_message(msg) -> bool:
    # Detects if we should terminate the conversation
    if isinstance(msg.get("content"), str):
        return msg["content"].rstrip().endswith("TERMINATE")
    elif isinstance(msg.get("content"), list):
        for content in msg["content"]:
            if isinstance(content, dict) and "text" in content:
                return content["text"].rstrip().endswith("TERMINATE")
    return False


def critic_agent() -> autogen.ConversableAgent:
    return autogen.ConversableAgent(
        name="critic",
        llm_config=gpt_vision_config,
        system_message=CRITIC_SYSTEM_MESSAGE,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )


def image_generator_agent() -> autogen.ConversableAgent:
    # Create the agent
    agent = autogen.ConversableAgent(
        name="dalle",
        llm_config=gpt_vision_config,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )

    # Add image generation ability to the agent
    dalle_gen = generate_images.DalleImageGenerator(llm_config=dalle_config)
    image_gen_capability = generate_images.ImageGeneration(
        image_generator=dalle_gen, text_analyzer_llm_config=gpt_config
    )

    image_gen_capability.add_to_agent(agent)
    return agent

def extract_images(sender: autogen.ConversableAgent, recipient: autogen.ConversableAgent) -> Image:
    images = []
    all_messages = sender.chat_messages[recipient]

    for message in reversed(all_messages):
        # The GPT-4V format, where the content is an array of data
        contents = message.get("content", [])
        for content in contents:
            if isinstance(content, str):
                continue
            if content.get("type", "") == "image_url":
                img_data = content["image_url"]["url"]
                images.append(img_utils.get_pil_image(img_data))

    if not images:
        raise ValueError("No image data found in messages.")

    return images

###################################################

dalle = image_generator_agent()
critic = critic_agent()

img_prompt = "robot"

result = dalle.initiate_chat(critic, message=img_prompt)

Steps to reproduce

No response

Model Used

No response

Expected Behavior

No response

Screenshots and logs

No response

Additional Information

# Name                    Version                   Build  Channel
aiohttp                   3.9.5                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
annotated-types           0.6.0                    pypi_0    pypi
anyio                     4.3.0                    pypi_0    pypi
appnope                   0.1.4              pyhd8ed1ab_0    conda-forge
asgiref                   3.8.1                    pypi_0    pypi
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
attrs                     23.2.0                   pypi_0    pypi
azure-core                1.30.1                   pypi_0    pypi
azure-identity            1.16.0                   pypi_0    pypi
backoff                   2.2.1                    pypi_0    pypi
bcrypt                    4.1.2                    pypi_0    pypi
beautifulsoup4            4.12.3                   pypi_0    pypi
build                     1.2.1                    pypi_0    pypi
bzip2                     1.0.8                h80987f9_5  
ca-certificates           2024.2.2             hf0a4a13_0    conda-forge
cachetools                5.3.3                    pypi_0    pypi
certifi                   2024.2.2                 pypi_0    pypi
cffi                      1.16.0                   pypi_0    pypi
chardet                   5.2.0                    pypi_0    pypi
charset-normalizer        3.2.0                    pypi_0    pypi
chroma-hnswlib            0.7.3                    pypi_0    pypi
chromadb                  0.4.24                   pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
coloredlogs               15.0.1                   pypi_0    pypi
comm                      0.2.2              pyhd8ed1ab_0    conda-forge
contourpy                 1.2.1                    pypi_0    pypi
cryptography              42.0.5                   pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
dataclasses-json          0.6.4                    pypi_0    pypi
dataclasses-json-speakeasy 0.5.11                   pypi_0    pypi
debugpy                   1.6.7           py311h313beb8_0  
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
deprecated                1.2.14                   pypi_0    pypi
dirtyjson                 1.0.8                    pypi_0    pypi
diskcache                 5.6.3                    pypi_0    pypi
distro                    1.9.0                    pypi_0    pypi
docker                    7.0.0                    pypi_0    pypi
emoji                     2.11.0                   pypi_0    pypi
exceptiongroup            1.2.0              pyhd8ed1ab_2    conda-forge
executing                 2.0.1              pyhd8ed1ab_0    conda-forge
fastapi                   0.110.1                  pypi_0    pypi
filelock                  3.13.4                   pypi_0    pypi
filetype                  1.2.0                    pypi_0    pypi
flaml                     2.1.2                    pypi_0    pypi
flatbuffers               24.3.25                  pypi_0    pypi
fonttools                 4.51.0                   pypi_0    pypi
frozenlist                1.4.1                    pypi_0    pypi
fsspec                    2024.3.1                 pypi_0    pypi
google-auth               2.29.0                   pypi_0    pypi
googleapis-common-protos  1.63.0                   pypi_0    pypi
greenlet                  3.0.3                    pypi_0    pypi
grpcio                    1.62.1                   pypi_0    pypi
h11                       0.14.0                   pypi_0    pypi
httpcore                  1.0.5                    pypi_0    pypi
httptools                 0.6.1                    pypi_0    pypi
httpx                     0.27.0                   pypi_0    pypi
huggingface-hub           0.22.2                   pypi_0    pypi
humanfriendly             10.0                     pypi_0    pypi
idna                      3.7                      pypi_0    pypi
importlib-metadata        7.0.0                    pypi_0    pypi
importlib-resources       6.4.0                    pypi_0    pypi
importlib_metadata        7.1.0                hd8ed1ab_0    conda-forge
ipykernel                 6.29.3             pyh3cd1d5f_0    conda-forge
ipython                   8.22.2             pyh707e725_0    conda-forge
jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
jinja2                    3.1.3                    pypi_0    pypi
joblib                    1.4.0                    pypi_0    pypi
jsonpatch                 1.33                     pypi_0    pypi
jsonpath-python           1.0.6                    pypi_0    pypi
jsonpointer               2.4                      pypi_0    pypi
jupyter_client            8.6.1              pyhd8ed1ab_0    conda-forge
jupyter_core              5.5.0           py311hca03da5_0  
kiwisolver                1.4.5                    pypi_0    pypi
kubernetes                29.0.0                   pypi_0    pypi
langchain                 0.1.16                   pypi_0    pypi
langchain-community       0.0.33                   pypi_0    pypi
langchain-core            0.1.44                   pypi_0    pypi
langchain-text-splitters  0.0.1                    pypi_0    pypi
langdetect                1.0.9                    pypi_0    pypi
langsmith                 0.1.49                   pypi_0    pypi
libcxx                    16.0.6               h4653b0c_0    conda-forge
libffi                    3.4.4                hca03da5_0  
libsodium                 1.0.18               h27ca646_1    conda-forge
llama-index               0.10.30                  pypi_0    pypi
llama-index-agent-openai  0.2.2                    pypi_0    pypi
llama-index-cli           0.1.12                   pypi_0    pypi
llama-index-core          0.10.30                  pypi_0    pypi
llama-index-embeddings-azure-openai 0.1.7                    pypi_0    pypi
llama-index-embeddings-openai 0.1.7                    pypi_0    pypi
llama-index-indices-managed-llama-cloud 0.1.5                    pypi_0    pypi
llama-index-legacy        0.9.48                   pypi_0    pypi
llama-index-llms-azure-openai 0.1.6                    pypi_0    pypi
llama-index-llms-openai   0.1.15                   pypi_0    pypi
llama-index-multi-modal-llms-openai 0.1.5                    pypi_0    pypi
llama-index-program-openai 0.1.5                    pypi_0    pypi
llama-index-question-gen-openai 0.1.3                    pypi_0    pypi
llama-index-readers-file  0.1.19                   pypi_0    pypi
llama-index-readers-llama-parse 0.1.4                    pypi_0    pypi
llama-parse               0.4.1                    pypi_0    pypi
llamaindex-py-client      0.1.18                   pypi_0    pypi
lxml                      5.2.1                    pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markdownify               0.12.1                   pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
marshmallow               3.21.1                   pypi_0    pypi
matplotlib                3.8.4                    pypi_0    pypi
matplotlib-inline         0.1.7              pyhd8ed1ab_0    conda-forge
mdurl                     0.1.2                    pypi_0    pypi
mmh3                      4.1.0                    pypi_0    pypi
monotonic                 1.6                      pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
msal                      1.28.0                   pypi_0    pypi
msal-extensions           1.1.0                    pypi_0    pypi
multidict                 6.0.5                    pypi_0    pypi
mypy-extensions           1.0.0                    pypi_0    pypi
ncurses                   6.4                  h313beb8_0  
nest-asyncio              1.6.0              pyhd8ed1ab_0    conda-forge
networkx                  3.3                      pypi_0    pypi
nltk                      3.8.1                    pypi_0    pypi
numpy                     1.26.4                   pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
onnxruntime               1.17.3                   pypi_0    pypi
openai                    1.21.2                   pypi_0    pypi
openssl                   1.1.1w               h53f4e23_0    conda-forge
opentelemetry-api         1.24.0                   pypi_0    pypi
opentelemetry-exporter-otlp-proto-common 1.24.0                   pypi_0    pypi
opentelemetry-exporter-otlp-proto-grpc 1.24.0                   pypi_0    pypi
opentelemetry-instrumentation 0.45b0                   pypi_0    pypi
opentelemetry-instrumentation-asgi 0.45b0                   pypi_0    pypi
opentelemetry-instrumentation-fastapi 0.45b0                   pypi_0    pypi
opentelemetry-proto       1.24.0                   pypi_0    pypi
opentelemetry-sdk         1.24.0                   pypi_0    pypi
opentelemetry-semantic-conventions 0.45b0                   pypi_0    pypi
opentelemetry-util-http   0.45b0                   pypi_0    pypi
orjson                    3.10.1                   pypi_0    pypi
overrides                 7.7.0                    pypi_0    pypi
packaging                 23.2                     pypi_0    pypi
pandas                    2.2.2                    pypi_0    pypi
parso                     0.8.4              pyhd8ed1ab_0    conda-forge
pexpect                   4.9.0              pyhd8ed1ab_0    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    10.3.0                   pypi_0    pypi
pip                       23.3.1          py311hca03da5_0  
platformdirs              4.2.0              pyhd8ed1ab_0    conda-forge
portalocker               2.8.2                    pypi_0    pypi
posthog                   3.5.0                    pypi_0    pypi
prompt-toolkit            3.0.42             pyha770c72_0    conda-forge
protobuf                  4.25.3                   pypi_0    pypi
psutil                    5.9.0           py311h80987f9_0  
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pulsar-client             3.5.0                    pypi_0    pypi
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pyasn1                    0.6.0                    pypi_0    pypi
pyasn1-modules            0.4.0                    pypi_0    pypi
pyautogen                 0.2.25                   pypi_0    pypi
pycparser                 2.22                     pypi_0    pypi
pydantic                  2.7.0                    pypi_0    pypi
pydantic-core             2.18.1                   pypi_0    pypi
pygments                  2.17.2             pyhd8ed1ab_0    conda-forge
pyjwt                     2.8.0                    pypi_0    pypi
pyparsing                 3.1.2                    pypi_0    pypi
pypdf                     4.2.0                    pypi_0    pypi
pypika                    0.48.9                   pypi_0    pypi
pyproject-hooks           1.0.0                    pypi_0    pypi
python                    3.11.0               hc0d8a6c_3  
python-dateutil           2.9.0.post0              pypi_0    pypi
python-dotenv             1.0.1                    pypi_0    pypi
python-iso639             2024.2.7                 pypi_0    pypi
python-magic              0.4.27                   pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
pyzmq                     25.1.2          py311h313beb8_0  
rapidfuzz                 3.8.1                    pypi_0    pypi
readline                  8.2                  h1a28f6b_0  
regex                     2024.4.16                pypi_0    pypi
replicate                 0.25.2                   pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         2.0.0                    pypi_0    pypi
rich                      13.7.1                   pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
safetensors               0.4.3                    pypi_0    pypi
scikit-learn              1.4.2                    pypi_0    pypi
scipy                     1.13.0                   pypi_0    pypi
sentence-transformers     2.7.0                    pypi_0    pypi
setuptools                68.2.2          py311hca03da5_0  
shellingham               1.5.4                    pypi_0    pypi
six                       1.16.0             pyh6c4a22f_0    conda-forge
sniffio                   1.3.1                    pypi_0    pypi
soupsieve                 2.5                      pypi_0    pypi
sqlalchemy                2.0.29                   pypi_0    pypi
sqlite                    3.41.2               h80987f9_0  
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
starlette                 0.37.2                   pypi_0    pypi
striprtf                  0.0.26                   pypi_0    pypi
sympy                     1.12                     pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
tenacity                  8.2.3                    pypi_0    pypi
termcolor                 2.4.0                    pypi_0    pypi
threadpoolctl             3.4.0                    pypi_0    pypi
tiktoken                  0.6.0                    pypi_0    pypi
tk                        8.6.12               hb8d0fd4_0  
tokenizers                0.19.1                   pypi_0    pypi
torch                     2.2.2                    pypi_0    pypi
tornado                   6.3.3           py311h80987f9_0  
tqdm                      4.66.2                   pypi_0    pypi
traitlets                 5.14.2             pyhd8ed1ab_0    conda-forge
transformers              4.40.0                   pypi_0    pypi
typer                     0.12.3                   pypi_0    pypi
typing-inspect            0.9.0                    pypi_0    pypi
typing_extensions         4.11.0             pyha770c72_0    conda-forge
tzdata                    2024.1                   pypi_0    pypi
unstructured              0.13.2                   pypi_0    pypi
unstructured-client       0.18.0                   pypi_0    pypi
urllib3                   1.26.18                  pypi_0    pypi
uvicorn                   0.29.0                   pypi_0    pypi
uvloop                    0.19.0                   pypi_0    pypi
watchfiles                0.21.0                   pypi_0    pypi
wcwidth                   0.2.13             pyhd8ed1ab_0    conda-forge
websocket-client          1.7.0                    pypi_0    pypi
websockets                12.0                     pypi_0    pypi
wheel                     0.41.2          py311hca03da5_0  
wrapt                     1.16.0                   pypi_0    pypi
xz                        5.4.6                h80987f9_0  
yarl                      1.9.4                    pypi_0    pypi
zeromq                    4.3.5                hebf3989_1    conda-forge
zipp                      3.18.1                   pypi_0    pypi
@nazkhan-8451 nazkhan-8451 added the bug Something isn't working label Apr 25, 2024
@ekzhu
Copy link
Collaborator

ekzhu commented Apr 25, 2024

Looks like the message output from critic is not complete. @WaelKarkoub do you know about possible cause of this?

@WaelKarkoub
Copy link
Collaborator

WaelKarkoub commented Apr 25, 2024

@ekzhu this is new to me, maybe the API provider is limiting the number of output tokens.

@nazkhan-8451 I ran your code and it works perfectly fine for me. I'm not sure how you setup your OAI_CONFIG_LIST, but check if you have max_tokens set which could limit the number of output tokens. I also would recommend "cache_seed": None, when testing with autogen, which should make it easier for you to debug.

Here is my version (I don't know what is wag-gpt4-128k, and gpt-4 now supports vision as well):

import os

from PIL.Image import Image

import autogen
from autogen.agentchat.contrib import img_utils
from autogen.agentchat.contrib.capabilities import generate_images

CRITIC_SYSTEM_MESSAGE = """You need to improve the prompt of the figures you saw.
How to create an image that is better in terms of color, shape, text (clarity), and other things.
Reply with the following format:

CRITICS: the image needs to improve...
PROMPT: here is the updated prompt!

If you have no critique or a prompt, just say TERMINATE
"""

config_list_gpt4 = [
    {
        "model": "gpt-4-turbo-2024-04-09",
        "api_key": os.environ["OPENAI_API_KEY"],
    }
]

config_list_gpt4_vision = config_list_gpt4

config_list_dalle = [
    {
        "model": "dall-e-3",
        "api_key": os.environ["OPENAI_API_KEY"],
    }
]

gpt_config = {
    "cache_seed": None,  # change the cache_seed for different trials
    "temperature": 0.7,
    "config_list": config_list_gpt4,
    "timeout": 300,
}

gpt_vision_config = {
    "cache_seed": None,  # change the cache_seed for different trials
    "temperature": 0.7,
    "config_list": config_list_gpt4_vision,
    "timeout": 300,
}

dalle_config = {
    "cache_seed": None,  # change the cache_seed for different trials
    "temperature": 0.7,
    "config_list": config_list_dalle,
    "timeout": 300,
}


def _is_termination_message(msg) -> bool:
    # Detects if we should terminate the conversation
    if isinstance(msg.get("content"), str):
        return msg["content"].rstrip().endswith("TERMINATE")
    elif isinstance(msg.get("content"), list):
        for content in msg["content"]:
            if isinstance(content, dict) and "text" in content:
                return content["text"].rstrip().endswith("TERMINATE")
    return False


def critic_agent() -> autogen.ConversableAgent:
    return autogen.ConversableAgent(
        name="critic",
        llm_config=gpt_vision_config,
        system_message=CRITIC_SYSTEM_MESSAGE,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )


def image_generator_agent() -> autogen.ConversableAgent:
    # Create the agent
    agent = autogen.ConversableAgent(
        name="dalle",
        llm_config=gpt_vision_config,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )

    # Add image generation ability to the agent
    dalle_gen = generate_images.DalleImageGenerator(llm_config=dalle_config)
    image_gen_capability = generate_images.ImageGeneration(
        image_generator=dalle_gen, text_analyzer_llm_config=gpt_config
    )

    image_gen_capability.add_to_agent(agent)
    return agent


def extract_images(sender: autogen.ConversableAgent, recipient: autogen.ConversableAgent) -> Image:
    images = []
    all_messages = sender.chat_messages[recipient]

    for message in reversed(all_messages):
        # The GPT-4V format, where the content is an array of data
        contents = message.get("content", [])
        for content in contents:
            if isinstance(content, str):
                continue
            if content.get("type", "") == "image_url":
                img_data = content["image_url"]["url"]
                images.append(img_utils.get_pil_image(img_data))

    if not images:
        raise ValueError("No image data found in messages.")

    return images


###################################################

dalle = image_generator_agent()
critic = critic_agent()

img_prompt = "robot"

result = dalle.initiate_chat(critic, message=img_prompt)

@nazkhan-8451
Copy link
Author

@WaelKarkoub wag-gpt4-128k is the deployed model name of gpt-4-turbo in azure. I don't know what I am doing wrong here. If your code and mine are same then I have no clue why this is happening. Is there any library version mismatch that could cause this?

@WaelKarkoub
Copy link
Collaborator

@nazkhan-8451 try updating to the latest autogen version, not certain if that would change anything. In your OAI_CONFIG_LIST, I know you have set your model and the API key, do you have anything else set up?

@nazkhan-8451
Copy link
Author

nazkhan-8451 commented Apr 26, 2024

@WaelKarkoub here is my file. I have checked the models individually. The api-key and url are correct

[
        {
            "model": "wag-gpt4-128k",
            "api_key": "api-key",
            "api_type": "azure",
            "base_url": "url",
            "api_version": "2024-02-15-preview",
            "tags": ["wag-gpt4-128k"]
        },
        {
            "model": "gpt-35-turbo-16k",
            "api_key": "api-key",
            "api_type": "azure",
            "base_url": "url",
            "api_version": "2024-02-15-preview",
            "tags": ["gpt-35"]
        },

        {
            "model": "gpt4-vision",
            "api_key": "api-key",
            "api_type": "azure",
            "base_url": "url",
            "api_version": "2023-12-01-preview",
            "tags": ["gpt-vision"]
        },

        {
            "model": "dall-e-3",
            "api_key": "api-key",
            "api_type": "azure",
            "base_url": "url/",
            "api_version": "2023-12-01-preview",
            "tags": ["dalle"]
        }
]

@nazkhan-8451
Copy link
Author

@WaelKarkoub I changed the code to cache=None and upgraded to pyautogen latest. There are 2 problems I am seeing:

  • messages are truncated
  • dalle agent saying, as a AI-text based model it can't generate images
dalle (to critic):

robot

--------------------------------------------------------------------------------
critic (to dalle):

CRITICS: the image needs to improve the depiction of the robot to make

--------------------------------------------------------------------------------
dalle (to critic):

I'm sorry for any confusion, but as an AI text-based model, I

--------------------------------------------------------------------------------
critic (to dalle):

TERMINATE

@WaelKarkoub
Copy link
Collaborator

@nazkhan-8451 your config looks correct. Your timeout is high enough that it shouldn't cause a problem. I'll make an azure account and test your script again

@WaelKarkoub
Copy link
Collaborator

@nazkhan-8451 i couldn't reproduce this bug, does this still happen to you?

@WaelKarkoub WaelKarkoub self-assigned this Apr 29, 2024
@nazkhan-8451
Copy link
Author

@WaelKarkoub It does. Not sure what am i doing wrong or how to go around it.

@WaelKarkoub
Copy link
Collaborator

@WaelKarkoub It does. Not sure what am i doing wrong or how to go around it.

@nazkhan-8451 check if you set hard limits in azure, not sure how that would look like. And if possible, check if this happens with OpenAI

@nazkhan-8451
Copy link
Author

@WaelKarkoub dall-e deployment works fine because i can generate image with this

import os
from openai import AzureOpenAI
import json
from autogen.agentchat.contrib import img_utils

client = AzureOpenAI(
    api_version="2024-02-01",
    azure_endpoint="",
    api_key="",
)

result = client.images.generate(
    model="dall-e-3", # the name of your DALL-E 3 deployment
    prompt="""Create image based on ice-cream description. Just create ice-cream image. Do NOT include name, words and description. Make it photorealistic, enhance its clarity. focus on ice-cream.

    Name: """ + str(recipe_name[0]) + 
    
    """Description:""" + str(recipe_description[0]),
    n=1
)

image_url = json.loads(result.model_dump_json())['data'][0]['url']
pil_img = img_utils.get_pil_image(image_url)
pil_img

@nazkhan-8451
Copy link
Author

I don't have openAI dall-e to test it.

@WaelKarkoub
Copy link
Collaborator

@nazkhan-8451 my concern is not the image generation part, but the chat completion side of things (i.e using the gpt models). See if you can still generate large texts with gpt

@nazkhan-8451
Copy link
Author

@WaelKarkoub
Copy link
Collaborator

@nazkhan-8451 https://github.com/microsoft/autogen/blob/main/notebook/agentchat_image_generation_capability.ipynb Does this work for you? just change the model name, API key, etc... accordingly

@nazkhan-8451
Copy link
Author

@WaelKarkoub this is giving the error: dalle (to critic):

robot


critic (to dalle):

CRITICS: the image needs to improve the depiction of the robot to make


dalle (to critic):

I'm sorry for any confusion, but as an AI text-based model, I


critic (to dalle):

TERMINATE

@WaelKarkoub
Copy link
Collaborator

WaelKarkoub commented Apr 30, 2024

@nazkhan-8451 Disable cache again by adjusting the configs, the output is the same because it's looking through your cache.

@WaelKarkoub
Copy link
Collaborator

@nazkhan-8451 just making sure, the prompt in the notebook is different from the console output your pasted in your comment. Can you run the notebook as is and see what the output is like? Make sure you disable the cache seed as well

@nazkhan-8451
Copy link
Author

@WaelKarkoub I ran the notebook as is.

dalle (to critic):

A happy dog wearing a shirt saying 'I Love AutoGen'. Make sure the text is clear.

--------------------------------------------------------------------------------
critic (to dalle):

CRITICS: the image needs to improve the visibility and readability of the text

--------------------------------------------------------------------------------
dalle (to critic):

I'm sorry for any confusion, but as an AI text-based model, I

--------------------------------------------------------------------------------
critic (to dalle):

TERMINATE

@WaelKarkoub
Copy link
Collaborator

@nazkhan-8451 yeah I'm stumped, would you mind posting it on Discord? https://aka.ms/autogen-dc. If not, I can post the issue myself as well

@nazkhan-8451
Copy link
Author

@WaelKarkoub I don't have discord. If you could post, we can continue to collaborate here. Thank you for all the help.

@WaelKarkoub
Copy link
Collaborator

@nazkhan-8451 can you try using MultimodalConversableAgent in your test script instead of conversable agents? it's in autogen/agentchat/contrib/multimodal_conversable_agent.py

@nazkhan-8451
Copy link
Author

@WaelKarkoub Converted both of them to Multimodal agent.

def critic_agent() -> MultimodalConversableAgent:
    return MultimodalConversableAgent(
        name="critic",
        llm_config=gpt_vision_config,
        system_message=CRITIC_SYSTEM_MESSAGE,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )


def image_generator_agent() -> MultimodalConversableAgent:
    # Create the agent
    agent = MultimodalConversableAgent(
        name="dalle",
        llm_config=gpt_vision_config,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )

    # Add image generation ability to the agent
    dalle_gen = generate_images.DalleImageGenerator(llm_config=dalle_config)
    image_gen_capability = generate_images.ImageGeneration(
        image_generator=dalle_gen, text_analyzer_llm_config=gpt_config
    )

    image_gen_capability.add_to_agent(agent)
    return agent
    

Needed to fix error in /autogen/agentchat/contrib/capabilities/generate_images.py to run it. Changed the system_message to

# system_messages = "\n".join([msg['message'] for msg in agent.system_message if 'message' in msg])
     # agent.update_system_message(system_messages + "\n" + SYSTEM_MESSAGE)

still got:

dalle (to critic):

A happy dog wearing a shirt saying 'I Love AutoGen'. Make sure the text is clear.

--------------------------------------------------------------------------------
critic (to dalle):

CRITICS: the image needs to improve the visibility of the text on the

--------------------------------------------------------------------------------
dalle (to critic):

I'm sorry for any confusion, but I am unable to generate images. If

--------------------------------------------------------------------------------
critic (to dalle):

TERMINATE

@nazkhan-8451
Copy link
Author

@WaelKarkoub I figured out where the bug is. It's the code which creates DalleImageGenerator. Because if I create a class manually like following and calls the Azure Dalle, it works. So basically DalleImageGenerator doesn't know how to create from Azure Dalle and only works with OpenAI (or that's what I understand).

This works (https://github.com/microsoft/autogen/blob/main/notebook/agentchat_dalle_and_gpt4v.ipynb):

from openai import AzureOpenAI

dalle_client = AzureOpenAI(
    api_version="2024-02-01",
    azure_endpoint="",
    api_key="",
)

class DALLEAgent(ConversableAgent):
    def __init__(self, name, llm_config: dict, **kwargs):
        super().__init__(name, llm_config=llm_config, **kwargs)

        # try:
        #     config_list = llm_config["config_list"]
        #     api_key = config_list[0]["api_key"]
        # except Exception as e:
        #     print("Unable to fetch API Key, because", e)
        #     api_key = os.getenv("OPENAI_API_KEY")

        # I had to remove all code that can call OpenAI and created force called the Azure client
        self._dalle_client = dalle_client
        self.register_reply([Agent, None], DALLEAgent.generate_dalle_reply)

@WaelKarkoub
Copy link
Collaborator

@nazkhan-8451 great catch! It's interesting how this bug affected the text output for other agents I'll have to take a look at it. Do you want to submit a PR for a fix? I don't mind doing that as well

@nazkhan-8451
Copy link
Author

Please, you do that. I will make this issue closed. Thank you.

@whiskyboy
Copy link
Collaborator

whiskyboy commented May 13, 2024

@WaelKarkoub @nazkhan-8451 I've faced the same text output cut-off issue when testing image generation capabilities. I'm also using AzureOpenAI deployment, and finally found it may be a limitation with AzureOpenAI GPT-4 Turbo with Vision deployment.

From the document, it looks like we have to set a max_tokens value in the request, otherwise the response will be cut-off:
image

After adding a max_tokens field into the llm_config when constructing the critic and dalle agent, I got the expected output:

def critic_agent() -> autogen.ConversableAgent:
    return autogen.ConversableAgent(
        name="critic",
        llm_config={"config_list": config_list_gpt4v, "temperature": 0.7, "max_tokens": 400},
        system_message=CRITIC_SYSTEM_MESSAGE,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
    )

@nazkhan-8451
Copy link
Author

@whiskyboy max_token solved the cutoff problem!
Are you using Azure Dalle? Because for Dalle I get the following error. But the api key is fine because when I forced it to accept it (showed above comment) it works.

AuthenticationError Traceback (most recent call last) Cell In[11], [line 7](vscode-notebook-cell:?execution_count=11&line=7) [4](vscode-notebook-cell:?execution_count=11&line=4) img_prompt = "A happy dog wearing a shirt saying 'I Love AutoGen'. Make sure the text is clear." [5](vscode-notebook-cell:?execution_count=11&line=5) # img_prompt = "Ask me how I'm doing" ----> [7](vscode-notebook-cell:?execution_count=11&line=7) result = dalle.initiate_chat(critic, message=img_prompt) AuthenticationError: Error code: 401 - {'error': {'code': 'invalid_api_key', 'message': 'Incorrect API key provided: f***************************4e23. You can find your API key at https://platform.openai.com/account/api-keys.', 'param': None, 'type': 'invalid_request_error'}}

@whiskyboy
Copy link
Collaborator

Are you using Azure Dalle? Because for Dalle I get the following error. But the api key is fine because when I forced it to accept it (showed above comment) it works.

@nazkhan-8451 No, I'm not using Azure Dalle. Instead I'm testing with HuggingFace text-to-image models (see #2599 ). I will try Azure Dalle later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants