<a href="https://colab.research.google.com/github/Rripped/PromptRefining/blob/main/PromptRefining.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

TODO: system message hinzufügen (whatsapp); in sys mess: originaler prompt, img mehr wie prompt, neuer prompt präziser

--> assistant, (moderation?)

-----------------------------------------------------------------------------


This notebook sends a prompt to ChatGPT 3.5 and creates images from scratch given a prompt as well as variations of given images. (See print statements at the end of the notebook)

If command fails, go to Runtime - Change Runtime type - GPU T4

In [84]:
!nvidia-smi

zsh:1: command not found: nvidia-smi


necessary installs

In [85]:
!pip install diffusers==0.11.1
!pip install transformers scipy ftfy accelerate
!pip install openai cohere tiktoken jq requests



necessary imports

In [86]:
import requests
import shutil
import os
from openai import OpenAI
from sklearn.metrics.pairwise import cosine_similarity
from functools import cache

In [87]:
client = OpenAI(api_key=os.environ["OAI"])

## Embedding
Initialize Embeddings functions

In [88]:
@cache
def get_embeddings(input: str):
    model = "text-embedding-ada-002"
    embeddings = client.embeddings.create(input=input, model=model)
    return embeddings.data[0].embedding

In [89]:
@cache
def generate_image(prompt):
  return client.images.generate(model="dall-e-2", prompt=prompt, size="256x256", quality="standard", n=1)

In [90]:
def get_image_url(image):
    return str(image.data[0].url)

In [91]:
def create_differences_for_image(initial_prompt, url):
    system_message = ("Compare an AI generated image with a matching prompt. "
        "State all differences of a prompt to the image in bulletpoints. "
        "Ignore artifacts not specified in the prompt. "
        "Return \"DONE\" if there are no differences.")
    generation = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[
          {
          "role": "system",
          "content": system_message

          },
        {
          "role": "user",
          "content": [
            {"type": "text", "text": initial_prompt},
            {
              "type": "image_url",
              "image_url": {
                "url": url,
              },
            },
          ]}
      ],
      max_tokens=150,
      #temperature=0.2,
      #top_p=0.3
    )
    return generation.choices[0].message.content


In [92]:
def generate_prompt_for_differences(previous_messages):
    system_message = "You are a prompt generator for images. You will get a list of differences for the generated image. Create a new prompt for a new image that eliminates the differences and satisfies the initial prompt."
    messages = [
        {
            "role": "system",
            "content": system_message
        }
    ]
    messages += previous_messages
    generation = client.chat.completions.create(
        model="gpt-4-1106-preview",
        messages=messages
    )
    return generation.choices[0].message.content

In [93]:
def download(url, file_name):
    res = requests.get(url, stream = True)

    if res.status_code == 200:
      with open(file_name,'wb') as f:
        shutil.copyfileobj(res.raw, f)
      print('Image sucessfully downloaded: ', file_name)
    else:
      print('Image couldn\'t be retrieved')

You will have to compare an AI generated image with an original prompt. The prompt is marked by the preceding tag [PROMPT].

The goal is to filter out all differences of a prompt to the image. Tagged by [DIFFERENCES] you will return a list of bulletpoints, with short differences between the prompt and the image. Be short and if no differences exist, leave the differences empty.

In a second paragraph marked with [NEW PROMPT], design a new prompt, which precisely filters out the differences, so that the new prompt can be used to create a new image, which is closer to the original prompt. In the end there should be no more things in the picture than asked for in the original prompt.

Provide no more information than defined here. Please answer with "Yes" if you understood.


You are given a prompt and an image. Your task is to improve the wording of the prompt so that Dall-E can generate a more realistic image of the given prompt. Respond with the newly generated prompt only.

Collect all embeddings in an array. We can then compare the embeddings of all steps and compute the similarity, e.g., the cosine similarity

In [94]:
def loop(initial_prompt, max_iteration_count):
    embeddings = []
    messages = [{
        "role": "user",
        "content": ""
    }, {
        "role": "assistant",
        "content": initial_prompt
    }]
    image = generate_image(initial_prompt)
    for i in range(max_iteration_count):
        url = get_image_url(image)
        #download(imageUrl, "output.png")
        print(f"Iteration {i + 1}:\n Image URL: {url}")
        differences = create_differences_for_image(initial_prompt, url)
        embeddings.append(get_embeddings(differences))
        print(f"Differences: {differences}")
        if differences == "DONE":
            return
        messages += [{
            "role": "user",
            "content": differences
        }]
        new_prompt = generate_prompt_for_differences(messages)
        messages += [{
            "role": "assistant",
            "content": new_prompt
        }]
        print(f"New prompt: {new_prompt}")
        image = generate_image(new_prompt)
    print(messages)
    return embeddings

embeddings = loop(initial_prompt="A mouse eating cheese", max_iteration_count=4)

Iteration 1:
 Image URL: https://oaidalleapiprodscus.blob.core.windows.net/private/org-0LvQq7RldYeEDw3JYyCX5ilq/user-Rt4RZD8kTYYetKF0cCAm0OzW/img-J7m7wvsZsQWXKh5VoljommVT.png?st=2023-12-01T13%3A50%3A49Z&se=2023-12-01T15%3A50%3A49Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-12-01T03%3A17%3A25Z&ske=2023-12-02T03%3A17%3A25Z&sks=b&skv=2021-08-06&sig=bvDU8IfDLLpDVBozVSfkFZMVo0pAv8D7j4GLAxKZPYM%3D
Differences: - The mouse is not actively eating the cheese; it appears to be hugging or holding the cheese instead.
- The cheese is missing bite marks, which would be expected if the mouse were eating it.
New prompt: Create an image of a mouse with its mouth open, taking a bite out of a piece of cheese, with clear bite marks visible on the cheese.
Iteration 2:
 Image URL: https://oaidalleapiprodscus.blob.core.windows.net/private/org-0LvQq7RldYeEDw3JYyCX5ilq/user-Rt4RZD8kTYYetKF0cCAm0OzW/img-PlUo3a

In [95]:
embeddings