###### **© 2025** | Licensed under the [MIT License](LICENSE)

# Color Code Adherence in AI Image Generation Models: A Case Study on Gemini

In this notebook, we evaluate how well Gemini 2.5 Flash Image (aka "Nano Banana") reproduces target colors specified in HEX, RGB, and HSL formats. For that, we analyze 50 generated images created with the model.

We test two methods:
1.  **Text-only color instructions:** Giving the model a contextual prompt plus a text hex code (e.g., `"The color is #d81515"`).
2.  **Color Image Reference:** Giving the model the same contextual prompt plus a visual image swatch of that color.

We evaluate the results using the **Delta E 2000** formula, the industry standard for perceptual color difference.


**Install Dependencies**

In [None]:
!pip install --q --no-cache-dir google-genai pillow numpy pandas scikit-image tqdm

**Import Libraries**

In [None]:
from google import genai
from PIL import Image, ImageDraw, ImageFont, ImageOps
import io
import os
import numpy as np
import pandas as pd
from skimage import color
from google.api_core import exceptions as google_exceptions
import getpass
import time
import base64
from IPython.display import display, HTML
import urllib.request
import matplotlib.pyplot as plt
import numpy as np

**Download Font**

We download an open-source font (DejaVu Sans) so we can draw clear, readable titles directly onto our test images for the report.

In [None]:
FONT_URL = "https://github.com/dejavu-fonts/dejavu-fonts/blob/main/ttf/DejaVuSans.ttf?raw=true"
FONT_FILE = "DejaVuSans.ttf"

try:
    urllib.request.urlretrieve(FONT_URL, FONT_FILE)
    print(f"Successfully downloaded font: {FONT_FILE}")
    # Test loading the font to make sure it works
    test_font = ImageFont.truetype(FONT_FILE, size=16)
    print("Font loaded successfully.")
except Exception as e:
    print(f"Error downloading or loading font: {e}")
    print("Titles may not appear correctly.")

**Configure API Key**

In [None]:
# --- 1. Securely get the API key ---
try:
    # Check if key is already set in the environment
    API_KEY = os.environ["GEMINI_API_KEY"]
    print("Found API Key in GEMINI_API_KEY environment variable.")
except KeyError:
    try:
        API_KEY = os.environ["GOOGLE_API_KEY"]
        print("Found API Key in GOOGLE_API_KEY environment variable.")
    except KeyError:
        print("API Key not found in environment. Please enter it below.")
        API_KEY = getpass.getpass("Enter your Gemini API Key: ")

if not API_KEY:
    print("❌ Error: API Key is empty. Please provide a key.")
else:
    # --- 2. Initialize and VALIDATE the client ---
    try:
        print("Validating API Key...")

        # Initialize the client (this defines 'client')
        client = genai.Client(api_key=API_KEY)

        # Make a light, fast test call to validate the key
        list(client.models.list())

        print("✅ Success! API Key is valid.")

        # Define the model to use for image generation
        IMAGE_MODEL = "gemini-2.5-flash-image-preview"

    except google_exceptions.PermissionDenied as e:
        print("❌ Validation Failed: Invalid API Key. Please check your key and try again.")
        # We stop the notebook by raising the error
        raise e
    except Exception as e:
        print(f"❌ An error occurred during validation: {e}")
        # We stop the notebook by raising the error
        raise e

**Helper Functions**

This cell contains all our core helper functions for image processing, color math, and adding titles.

In [None]:
FONT_PATH = "DejaVuSans.ttf"  # Font file we downloaded
TITLE_BAR_HEIGHT = 30         # Height in pixels for the title bar

def add_title_to_image(image, title_text):
    """
    Takes a PIL Image and returns a new image with a black title bar
    and white text drawn on it.
    """
    try:
        font = ImageFont.truetype(FONT_PATH, size=16)
    except IOError:
        print(f"Font not found at {FONT_PATH}. Using default font.")
        font = ImageFont.load_default()

    # Create a new blank canvas (black) that is taller than the original image
    new_width = image.width
    new_height = image.height + TITLE_BAR_HEIGHT
    titled_image = Image.new("RGB", (new_width, new_height), color="black")

    # Paste the original generated image onto the canvas, below the title bar area
    titled_image.paste(image, (0, TITLE_BAR_HEIGHT))

    # Prepare to draw text
    draw = ImageDraw.Draw(titled_image)

    # Simple text centering
    text_position = (10, 7) # (x, y) padding from top-left

    draw.text(text_position, title_text, font=font, fill="white")

    return titled_image

def display_side_by_side(*args, titles=[]):
    html_str = '<table>'
    if titles:
        html_str += '<tr>' + ''.join(f'<th style="text-align:center; padding: 10px; color: white;">{title}</th>' for title in titles) + '</tr>'
    html_str += '<tr>'
    for img in args:
        buffered = io.BytesIO()
        img.save(buffered, format="PNG")
        img_str = base64.b64encode(buffered.getvalue()).decode()
        html_str += f'<td style="padding: 10px;"><img src="data:image/png;base64,{img_str}" style="border: 1px solid #555; border-radius: 8px; max-width: 300px;" /></td>'
    html_str += '</tr></table>'
    display(HTML(html_str))

def hex_to_rgb(h):
    h = h.lstrip("#")
    return tuple(int(h[i : i + 2], 16) for i in (0, 2, 4))

def create_color_swatch_image(hex_color, size=(256, 256)):
    rgb_color = hex_to_rgb(hex_color)
    return Image.new("RGB", size, color=rgb_color)

def calculate_delta_e(rgb1, rgb2):
    rgb1_array = np.array([[rgb1]], dtype=np.uint8)
    rgb2_array = np.array([[rgb2]], dtype=np.uint8)
    lab1 = color.rgb2lab(rgb1_array)
    lab2 = color.rgb2lab(rgb2_array)
    delta = color.deltaE_ciede2000(lab1, lab2)
    return delta.item()

def get_average_rgb(image):
    avg_img = image.resize((1, 1), Image.Resampling.LANCZOS)
    return avg_img.getpixel((0, 0))

**Core Evaluation Functions**

This is the heart of our experiment.
* `run_generation`: A function to call the Gemini API, with built-in retries for rate limits.
* `evaluate_combination`: This performs our 2-way test for a single color, runs the analysis, and displays the titled results.

In [None]:
def run_generation(prompt_parts):
    """
    Calls the Gemini API and returns the generated PIL Image.
    Includes retry logic for rate limiting.
    On failure, returns None.
    """
    retries = 3
    wait_time = 10

    for i in range(retries):
        try:
            response = client.models.generate_content(
                model=IMAGE_MODEL,
                contents=prompt_parts
            )

            if response.candidates:
                for part in response.candidates[0].content.parts:
                    if part.inline_data:
                        return Image.open(io.BytesIO(part.inline_data.data)) # <-- Success

            error_text = response.text if response.text else "No image data found in response."
            print(f"  Generation complete, but no image returned. Response: {error_text}")
            break

        except google_exceptions.ResourceExhausted as e:
            print(f"  Hit rate limit. Waiting {wait_time}s... (Attempt {i+1}/{retries})")
            time.sleep(wait_time)
            wait_time *= 2

        except google_exceptions.InternalServerError as e:
            print(f"  API internal error. Waiting {wait_time}s... (Attempt {i+1}/{retries})")
            time.sleep(wait_time)
            wait_time *= 2

        except Exception as e:
            print(f"  Generation failed (non-retriable): {e}")
            break

    print("  All retries failed.")
    return None

def evaluate_combination(prompt, hex_color): # Takes base prompt + color
    """
    Runs the 2-way test:
    1. Text-Only (Contextual)
    2. Image-Only (Contextual)
    """
    print(f"--- Evaluating: '{prompt}' with color {hex_color} ---")

    # --- Setup ---
    reference_color_rgb = hex_to_rgb(hex_color)
    color_swatch_img = create_color_swatch_image(hex_color)

    # --- Run 1: Text-Only (Contextual) ---
    print("Generating (Run 1/2: Text-Only)...")
    start_time = time.time()
    parts_text = [f"{prompt}. The color is {hex_color}."]
    img_text = run_generation(prompt_parts=parts_text)
    time_text = time.time() - start_time

    if img_text is None:
        print("  Generation failed for Run 1. Skipping this color.")
        return None

    avg_color_text = get_average_rgb(img_text)
    delta_e_text = calculate_delta_e(reference_color_rgb, avg_color_text)

    print("  Pausing 10s...")
    time.sleep(10)

    # --- Run 2: Image-Only (Contextual) ---
    print("Generating (Run 2/2: Image-Only)...")
    start_time = time.time()
    parts_image = [f"{prompt}. The color should match the attached swatch.", color_swatch_img]
    img_image = run_generation(prompt_parts=parts_image)
    time_image = time.time() - start_time

    if img_image is None:
        print("  Generation failed for Run 2. Skipping this color.")
        return None

    avg_color_image = get_average_rgb(img_image)
    delta_e_image = calculate_delta_e(reference_color_rgb, avg_color_image)

    print("Evaluation Complete. Generating titled display...")

    display_img_target = add_title_to_image(color_swatch_img.resize((256, 256)), f"Target: {hex_color}")
    display_img_run1 = add_title_to_image(img_text.resize((256, 256)), "Run 1: Text-only")
    display_img_run2 = add_title_to_image(img_image.resize((256, 256)), "Run 2: Image reference")

    avg_swatch_text = create_color_swatch_image(f"#{'%02x%02x%02x' % avg_color_text}")
    avg_swatch_image = create_color_swatch_image(f"#{'%02x%02x%02x' % avg_color_image}")

    display_side_by_side(
        display_img_target,
        display_img_run1,
        display_img_run2,
        titles=[]
    )

    display_side_by_side(
        create_color_swatch_image(hex_color, (256, 50)),
        avg_swatch_text.resize((256, 50)),
        avg_swatch_image.resize((256, 50)),
        titles=[
            f"Reference Color",
            f"Avg. Color (Run 1)",
            f"Avg. Color (Run 2)"
        ]
    )

    print(f"Delta E (Run 1: Text-Only):   {delta_e_text:.2f}")
    print(f"Delta E (Run 2: Image-Only):  {delta_e_image:.2f}")

    scores = {
        "Text-Only": delta_e_text,
        "Image-Only": delta_e_image
    }
    best_method = min(scores, key=scores.get)
    print(f"✅ Best Method: {best_method} (ΔE: {scores[best_method]:.2f})")

    return {
        "hex_color": hex_color,
        "delta_e_text": delta_e_text,
        "delta_e_image": delta_e_image,
        "best_method": best_method,
        "best_score": scores[best_method],
        "img_text": img_text,
        "img_image": img_image,
        "time_text": time_text,
        "time_image": time_image
    }

**Generate Color List**

Here we define our base prompt and generate our test data: a list of 100 diverse colors, providing a large buffer to ensure we get 50 successful test runs.

In [None]:
import colorsys
import numpy as np
from IPython.display import display

def generate_diverse_colors(n=150): # <-- Increased buffer to 100
    colors_hex = []
    saturation = 0.9
    value = 0.85

    for hue in np.linspace(0, 1, n, endpoint=False):
        r, g, b = colorsys.hsv_to_rgb(hue, saturation, value)
        r_255, g_255, b_255 = int(r * 255), int(g * 255), int(b * 255)
        hex_code = f"#{r_255:02x}{g_255:02x}{b_255:02x}"
        colors_hex.append(hex_code)
    return colors_hex

color_list = generate_diverse_colors(100) # <-- Generates 100 colors
print(f"Generated {len(color_list)} diverse colors (to provide a buffer for any failures).")

# Visualize the color palette
print("Test Palette:")
swatches = [create_color_swatch_image(c, size=(10, 50)) for c in color_list]
full_swatch = Image.new('RGB', (len(swatches) * 10, 50))
x_offset = 0
for swatch in swatches:
    full_swatch.paste(swatch, (x_offset, 0))
    x_offset += 10
display(full_swatch)

**Run the 50-Example Loop**

This is the main execution cell. It runs the loop until it collects 50 successful test results. It skips any failures and shows a progress bar. This may take several minutes to complete.

In [None]:
from tqdm.notebook import tqdm # Use notebook-friendly progress bar

all_results = []

prompt = "A full-frame, solid-color background. The entire image should be a single, flat, unified color. No texture, no lighting, no gradients."


print(f"Starting evaluation... will run until 50 successful examples are collected.")
print(f"Using base prompt: '{prompt}'")

color_iterator = iter(color_list)
pbar = tqdm(total=50, desc="Gathering successful examples")

while len(all_results) < 50:
    try:
        hex_color = next(color_iterator)
    except StopIteration:
        print("!!! Ran out of colors before reaching 100 successes.") # Note: This should match your 'total' in pbar
        print(f"Proceeding with {len(all_results)} successful examples.")
        break

    # Correctly pass BOTH arguments
    results = evaluate_combination(prompt, hex_color)

    if results is not None:
        all_results.append(results)
        pbar.update(1)
    else:
        print(f"  Skipping failed color {hex_color}, trying next...")

pbar.close()
print(f"Evaluation complete. Collected {len(all_results)} successful results.")

**Final Summary Table**

Here we compile all the successful runs into a clean pandas DataFrame, calculate the overall statistics, and display the final report-ready table.

In [None]:
df_data = []
for r in all_results:
    data = r.copy()
    # Remove all image objects to make a clean dataframe
    del data["img_text"]
    del data["img_image"]
    df_data.append(data)

df = pd.DataFrame(df_data)

# --- New High-Level Analysis ---
print(f"--- Overall Statistics ({len(df_data)} Successful Runs) ---")
avg_de_text = df['delta_e_text'].mean()
avg_de_image = df['delta_e_image'].mean()

# Count wins for each method
wins_text = (df['best_method'] == 'Text-Only').sum()
wins_image = (df['best_method'] == 'Image-Only').sum()

avg_scores = {
    "Text-Only": avg_de_text,
    "Image-Only": avg_de_image
}
overall_best_method = min(avg_scores, key=avg_scores.get)

print(f"Average Delta E (Text-Only):    {avg_de_text:.2f}")
print(f"Average Delta E (Image-Only):   {avg_de_image:.2f}")
print("---")
print(f"Total Wins (Text-Only):     {wins_text} / {len(df_data)}")
print(f"Total Wins (Image-Only):    {wins_image} / {len(df_data)}")
print("---")
print(f"✅ Overall Best Method: {overall_best_method} (Avg. ΔE: {avg_scores[overall_best_method]:.2f})")


# --- Full Data Table ---
print(f"\n--- Full Data Table ({len(df_data)} Runs) ---")

# Select only the columns we need for the report
df_to_style = df.set_index('hex_color')[[
    'delta_e_text',
    'delta_e_image',
    'best_method',
    'best_score'
]]

df_renamed = df_to_style.rename(columns={
    "delta_e_text": "ΔE (Text-Only)",
    "delta_e_image": "ΔE (Image-Only)",
    "best_method": "Best Method",
    "best_score": "Best Score (ΔE)"
})

df_styled = df_renamed.style.background_gradient(
    cmap='viridis_r', subset=['Best Score (ΔE)']  # Green for low scores
).format({
    'ΔE (Text-Only)': '{:.2f}',
    'ΔE (Image-Only)': '{:.2f}',
    'Best Score (ΔE)': '{:.2f}',
}).set_caption("2-Way Instruction Method Comparison (Lower ΔE is Better)")

display(df_styled)

**Color Difference Distribution**

In [None]:
print(f"Generating distribution plot for all {len(df)} samples...")

plt.style.use('default')

# Sort the DataFrame to make the chart easier to read.
df_sorted = df.sort_values(by='delta_e_text', ascending=False)

# Get the data from the final DataFrame
labels = df_sorted['hex_color']
de_text_values = df_sorted['delta_e_text']
de_image_values = df_sorted['delta_e_image']

x = np.arange(len(labels))  # the label locations
width = 0.4  # the width of the bars

fig, ax = plt.subplots(figsize=(20, 12))

rects1 = ax.bar(x - width/2, de_text_values, width,
                label='ΔE (Text-Only Color Instruction)',
                color='#808080',  # Medium Gray
                hatch='//')       # Add stripes

rects2 = ax.bar(x + width/2, de_image_values, width,
                label='ΔE (Color Image Reference)',
                color='#000000')

# --- Add Labels, Title, and Legend ---
ax.set_ylabel('Delta E 2000 Score (Lower is Better)', fontsize=16)
ax.set_title(f'Delta E Distribution: Text-Only Color Instruction vs. Color Image Reference (n={len(df_sorted)})', fontsize=20)
ax.set_xticks(x)
ax.set_xticklabels(labels, rotation=90, fontsize=11) #

# Set legend and grid
ax.legend(fontsize=16)
ax.grid(axis='y', linestyle='--', alpha=0.7)

fig.tight_layout()
plt.show()

## Final Conclusion & Interpretation

The data from this 50-sample test provides two key insights:

**Image reference based generation is near-perfect:** The color image reference method resulted in an average Delta E score that is close to or below 1.0. This means the model can reproduce a visual color reference with a level of precision that is imperceptible to the human eye. The plots show that this is consistent across the entire color spectrum.

**Final takeaway:** For creative, fast generation, a text prompt is fine. For any professional or brand-accurate work that requires precise color replication, you must use a visual image reference to unlock the model's powerful and precise matching capabilities.