# Image to image with Azure Open AI

In this lab we are going to use **Azure AI Vision service** and **Azure OprnAI GPT-4o model** to get reach description for an image.
Then we will use this description as a prompt to generate an artificial image using **Azure Open AI and its Dall-e-3** model.

You will be able to compare qulaty and precision of images created based on different type of description

In [None]:
# Importing the necessary libraries
import datetime
import openai
import os
import requests

from dotenv import load_dotenv
from io import BytesIO
from IPython.display import Image as viewimage
from matplotlib import pyplot as plt
from PIL import Image

Load environmrnt variables

In [None]:
load_dotenv(override=True)

# Azure Open AI
openai.api_type: str = "azure"
openai.api_key = os.getenv("AZURE_OPENAI_API_KEY")
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
openai.api_version = os.getenv("AZURE_OPENAI_API_VERSION")

model =   os.getenv("AZURE_OPENAI_MODEL")  # This is the deployed name of your GPT4o model from the Azure Open AI studio


Create a folder for generated images

In [None]:
RESULTS_DIR = "../data/artificial_images"

os.makedirs(RESULTS_DIR, exist_ok=True)

## 1. Functions

 - Function do create a reach image description using **Azure AI Vision** service.
 We will use denseCaption option. Note that in classical image analysis, you cannot define where model shall focus when creating an image description

In [None]:
def get_caption(image_file):
    """
    Get caption from an image using Azure Computer Vision 4
    """
    # Get key and endpoint for Azure Computer Vision service
    load_dotenv()
    key = os.getenv("AZURE_AI_SERVICES_API_KEY")
    endpoint = os.getenv("AZURE_AI_SERVICES_ENDPOINT")

    # settings
    #options = "&features=caption,tags"
    options = "&features=denseCaptions,tags" 
    model = "?api-version=2023-02-01-preview&modelVersion=latest"
    url = endpoint + "/computervision/imageanalysis:analyze" + model + options
    headers = {
        "Content-type": "application/octet-stream",
        "Ocp-Apim-Subscription-Key": key,
    }

    # Read the image file
    with open(image_file, "rb") as f:
        data = f.read()

    # Sending the requests
    response = requests.post(url, data=data, headers=headers)
    results = response.json()

    # Parsing the results
    #image_caption = results["captionResult"]["text"]
    image_caption = "; ".join(item["text"] for item in results["denseCaptionsResult"]["values"])

    tags = results["tagsResult"]["values"]
    tags_string = ", ".join(item["name"] for item in tags)
    caption = image_caption + ", " + tags_string

    return caption

- Function to creage a reach image description with focus on fashion features with **Azure OpenAI GPT-4o** model

Here you can define what kind of description you want to get from a picture

In [None]:
import base64
model =   os.getenv("AZURE_OPENAI_MODEL")

def gpt4V_fashion(image_file):
    """
    GPT4-Vision
    """
    # Checking if file exists
    if not os.path.exists(image_file):
        print(f"[Error] Image file {image_file} does not exist.")
        return None

    # Endpoint
    base_url = f"{openai.api_base}/openai/deployments/{model}"
    gpt4vision_endpoint = f"{base_url}/chat/completions?api-version={openai.api_version}"


    # Header
    headers = {"Content-Type": "application/json", "api-key": openai.api_key}

    # Encoded image
    encoded_image = base64.b64encode(open(image_file, "rb").read()).decode(
        "ascii"
    )

    context = """ 
    You are a fashion expert, familiar with identifying features of fashion articles from images.
    A user uploads an image and asks you to describe one particular piece in the shot: jacket, shoes, pants, \
    watches, etc.
    """

    prompt = """
    You respond with your analysis of the following fields:

    1. ITEM'S TYPE: Identify if it's a top, bottom, dress, outerwear, footwear, bag, jewelry...
    2. BRAND: identity the brand of the item.
    3. COLOR: Note the main color(s) and any secondary colors.
    4. PATTERN: Identify any visible patterns such as stripes, florals, animal print, or geometric designs.\
    Feel free to use any other patterns here.
    5. MATERIAL: Best guess at the material that the item is made from.
    6. FEATURES: Note any unique details or embellishments, like embroidery, sequins, studs, fringes, buttons,
    zippers...
    7. ITEM TYPE SPECIFIC: For each type of item, feel free to add any additional descriptions that are relevant \
    to help describe the item. For example, for a jacket you can include the neck and sleeve design, plus the length.
    8. MISC.: Anything else important that you notice.
    9. SIZE: Print the size of the item if you get it from the image.
    10. ITEM SUMMARY: Write a one line summary for this item.
    11. ITEM CLASSIFICATION: Classify this item into CLOTHES, BAG, SHOES, WATCH or OTHERS.
    12. ITEM TAGS: Generate 10 tags to describe this item. Each tags should be separated with a comma.
    13. STORIES: Write multiple stories about this product in 5 lines.

    The output should be a numbered bulleted list. Just print an empty line between each items starting at item 12.
    """

    # Prompt
    json_data = {
        "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": context
        }
      ]
    }, 
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": prompt
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{encoded_image}"
          }
        }
      ]
    }
  ],
        "max_tokens": 4000,
        "temperature": 0.7,
    }

    # Results
    response = requests.post(
        gpt4vision_endpoint, headers=headers, data=json.dumps(json_data)
    )

    if response.status_code == 200:
           return json.loads(response.text)["choices"][0]["message"]["content"]
  
    elif response.status_code == 429:
        print(
            "[429 Error] Too many requests. Please wait a couple of seconds and try again."
        )

    else:
        print("[Error] Error code:", response.status_code)

- Function to generate image with **Azure OpenAI Dall-e-3** model

Here we will send image description as a prompt to Dall-e-3 model to generate an image. Generate images will be stored in **data/artificial_images** folder

In [None]:
import os
from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI
import json

def dalle3(prompt, image_file, size=1024):
    """
    Generate an image from a prompt with Dall e 2
    """
    # prompt
    extra_prompt = "full view, detailled, 8K, do not generate any provocative content"
    prompt = prompt + extra_prompt

    # Get the endpoint and key for Azure Open AI
    load_dotenv("azure.env")
    openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
    openai.api_key = os.getenv("AZURE_OPENAI_API_KEY")

    api_version = os.getenv("AZURE_OPENAI_API_VERSION") or "2024-05-01-preview"
#-----------------
# Note: DALL-E 3 requires version 1.0.0 of the openai-python library or later


    openai_client = openai.AzureOpenAI(
            api_version=os.getenv("AZURE_OPENAI_API_VERSION") or "2024-02-15-preview",
            azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
            api_key=os.getenv("AZURE_OPENAI_KEY")
    )



    result = openai_client.images.generate(
        model="dall-e-3", # the name of your DALL-E 3 deployment
        prompt=prompt,
        n=1, 
        size = "1024x1024"
    )

    image_url = json.loads(result.model_dump_json())['data'][0]['url']
    #-----------------

    response = requests.get(image_url)
    img = Image.open(BytesIO(response.content))

        # Saving the artificial image
    filename = os.path.basename(image_file)
    output_file = os.path.join(RESULTS_DIR, "dalle3_" + filename)
    img.save(output_file)

    return img, output_file

Function to display 2 images side by side 

In [None]:
def display_images(imagefile1, imagefile2):
    """
    Display two images side by side
    """
    # Reading images
    img1 = plt.imread(imagefile1)
    img2 = plt.imread(imagefile2)

    # Subplots
    f, ax = plt.subplots(1, 2, figsize=(15, 8))
    ax[0].imshow(img1)
    ax[1].imshow(img2)

    # Titles
    ax[0].set_title("Original image")
    ax[1].set_title("DALL·E 3 artificial image")
    title = "Image to image with Azure Open AI"
    f.suptitle(title, fontsize=15)
    plt.tight_layout()

    # Removing the axis
    ax[0].axis("off")
    ax[1].axis("off")

    plt.show()

## 2. Testing

#### Test 1. With image description from **Azure AI Vision**

In [None]:
image_file='../data/fashion/image1.jpg'

print(image_file)
viewimage(filename=image_file, width=512)

In [None]:
prompt = get_caption(image_file)
prompt

In [None]:
artificial_img, artificial_image = dalle3(prompt, image_file)

display_images(image_file, artificial_image)

#### Test 2. With reach fashion caption from **Azure OpenAI GPT-4o**

In [None]:
prompt=gpt4V_fashion(image_file)

artificial_img, artificial_image = dalle3(prompt, image_file)
display_images(image_file, artificial_image)
print (prompt)

## 2. Challenge

1. Generate artificial images for other examples in data/fashion folder based on simple or reach description 


In [None]:
image_file = "..." # path to the image file
#image_view(image_file)1
prompt=... # call gpt4V_fashion(image_file) or get_caption(image_file) to get image description. Alternatively provide your own prompt


artificial_img, artificial_image = dalle3(prompt, image_file)
display_images(image_file, artificial_image)