# Gemini for images

References:

* https://ai.google.dev/gemini-api/docs/image-generation#before_you_begin
* https://ai.google.dev/gemini-api/docs/image-understanding

# Setting up

In [15]:
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
import base64

In [16]:
from google.colab import userdata

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')

In [17]:
client = genai.Client(api_key=GOOGLE_API_KEY)

In [5]:
#for model_info in client.models.list():
#   print(model_info.name)

# Generate an image

In [18]:
# Prompt
contents = ('Hi, can you create a 3d rendered image of a cat '
            'with long ears on top of an airplane '
            'realistic and bright')

In [19]:
response = client.models.generate_content(
    model="gemini-2.0-flash-exp-image-generation",
    contents=contents,
    config=types.GenerateContentConfig(
      response_modalities=['TEXT', 'IMAGE']
    )
)

In [20]:
for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO((part.inline_data.data)))
    image.save('gemini-native-image2.png') #file name
    image.show()

I will generate a 3D rendering of a realistic, brightly lit scene featuring a cat with unusually long ears perched on the wing of an airplane.




# Describe a local image

In [29]:
my_file = client.files.upload(file="/content/horse.jpg")

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[my_file, "Caption this image in less than 5 words."],
)

print(response.text)

Two majestic blue roan horses.


In [21]:
my_file = client.files.upload(file="/content/gemini-native-image2.png")

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[my_file, "Caption this image."],
)

print(response.text)

Here are some captions for the image of the cat on the plane wing:

**Funny:**

* "I always take the window seat, but this is ridiculous."
* "Just testing out my new wingwalker career."
* "I told the pilot to take the scenic route."
* "When you book a flight and forget to mention your emotional support animal."

**Simple:**

* "Cat on a plane... wing!"
* "Wing Walker Cat"
* "Up in the air!"

**Slightly More Descriptive:**

* "A tabby cat takes a stroll on the wing of an airplane in flight."
* "This adventurous feline is enjoying the ultimate view!"

I think the funny options will probably resonate with most people!



## Describe an image from a URL

In [24]:
import requests

image_path = "https://goo.gle/instrument-img"
image_bytes = requests.get(image_path).content
image = types.Part.from_bytes(
  data=image_bytes, mime_type="image/jpeg"
)

response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents=["What is this image?", image],
)

print(response.text)

The image shows the console of a pipe organ. It has multiple keyboards (manuals), foot pedals, stop knobs/tabs, and other controls for playing the instrument.


# Edit an image

In [25]:
image = Image.open('/content/gemini-native-image2.png') #path to file

text_input = ('Hi, This is a picture of a cat on an airplane.' #edit prompts
            'Can you add a llama next to it?',)

response = client.models.generate_content(
    model="gemini-2.0-flash-preview-image-generation",
    contents=[text_input, image],
    config=types.GenerateContentConfig(
      response_modalities=['TEXT', 'IMAGE']
    )
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO(part.inline_data.data))
    image.show()
    image.save('gemini-edited-image.png') #new file name

A fluffy white llama with long eyelashes will be standing calmly next to the tabby cat on the airplane wing, both looking out over the clouds under a clear blue sky.



# Differences between images

In [30]:
# Upload the first image
image1_path = "/content/horse.jpg"
uploaded_file = client.files.upload(file=image1_path)

# Prepare the second image as inline data
image2_path = "/content/gemini-native-image2.png"
with open(image2_path, 'rb') as f:
    img2_bytes = f.read()

# Create the prompt with text and multiple images
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[
        "What is different between these two images?",
        uploaded_file,  # Use the uploaded file reference
        types.Part.from_bytes(
            data=img2_bytes,
            mime_type='image/png'
        )
    ]
)

print(response.text)

Here's a breakdown of the differences between the two images:

*   **Content:** The first image contains a scene with two horses in a field. The second image depicts a cat standing on the wing of an airplane.
*   **Subject Matter:** The first image features animals in a natural environment, while the second has a domestic animal in an unnatural situation.
