# Zero-Training Visual Defect Detection Using Amazon Nova Pro

In manufacturing, visual quality inspection is critical for ensuring product reliability<br/>
and compliance. However, traditional approaches—whether manual review or custom-trained<br/>
computer vision models—are costly, slow to adapt, and difficult to scale across diverse<br/>
product lines.

This Jypiter notebook introduces a zero-training, no-dataset-required visual inspection<br/>
system using **Amazon Nova Pro**, a multimodal foundation model accessed via **Amazon Bedrock**.<br/>
Using only a **Jupyter notebook**, you can detect manufacturing defects in product images with structured<br/>
natural language prompts—no computer vision expertise or labeled data required.<br/>
By the end of this notebook, you'll be able to:

* Upload and analyze product images in a Jupyter notebook
* Detect visual defects using Amazon Nova Pro
* Automatically return bounding boxes, failure reasons, and QC status
* Visualize defect overlays on product images

For more background consult the Readme in the same repository.

## Pipeline Architecture in this Notebook

This inspection pipeline operates entirely within a local Jupyter notebook and AWS serverless infrastructure:

1. Image Capture: Use widgets in the notebook to upload a product image (and optionally a reference image).
2. Image Preprocessing: Images are resized, converted to Base64, and prepared for inference.
3. AI Inference: Amazon Bedrock invokes Nova Pro to analyze the image and return structured defect data.
4. Visualization: Bounding boxes and defect reasons are drawn using matplotlib.

User → Jupyter Notebook → Amazon Bedrock (Nova Pro) → JSON Defect Output → Matplotlib Overlay

## Step-by-Step Implementation

The Jupyter notebook requires only an environment with internet access and AWS credentials to access Bedrock.<br/>
So you can run this locally on your laptop or on an **Amazon Sagemaker AI** notebook.

Install and import the required libraries. We need boto3 to invoke *Amazon Nova Pro* via  Amazon Bedrock, pillow <br/>
is Python Image Library and matplotlib to draw the detected areas on the images.

In [1]:
!pip install boto3 pillow matplotlib --quiet

zsh:1: command not found: pip


In [11]:
import boto3
import json
import base64
import io
from PIL import Image
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display
from textwrap import dedent

Define the model (currently we only tested **Amazon Nova** in the *light* and *pro* variant) and the endpoint.<br/>
Ensure beforehand that you have access to the model in the specified AWS region.

In [12]:
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")
MODEL_ID = "amazon.nova-pro-v1:0"

Define helper functions to be able to access images, encode them as well as resize and upload the files.

In [39]:
from PIL import Image
import io
import base64
import ipywidgets as widgets
from IPython.display import display, clear_output

# Upload widgets
qc_uploader = widgets.FileUpload(accept='image/*', multiple=False)
ref_uploader = widgets.FileUpload(accept='image/*', multiple=False)

# Output areas for images
qc_display = widgets.Output()
ref_display = widgets.Output()

# Show the widgets
display(widgets.VBox([
    widgets.Label("📸 Upload QC Image:"), qc_uploader, qc_display,
    widgets.Label("🧾 Upload Reference Image (optional):"), ref_uploader, ref_display
]))

# PNG conversion + resize to original size
def get_png_base64_from_bytes(original_bytes, max_size=(1024, 1024), scale=0.3):
    img = Image.open(io.BytesIO(original_bytes))
    img.thumbnail(max_size, Image.Resampling.LANCZOS)

    # Resize to scale %
    width, height = img.size
    resized_img = img.resize((int(width * scale), int(height * scale)))

    with io.BytesIO() as output:
        img.save(output, format="PNG")
        png_bytes = output.getvalue()
        base64_str = base64.b64encode(png_bytes).decode("utf-8")

    return base64_str, png_bytes, resized_img

# Callback: QC Image Upload
def on_qc_upload(change):
    if qc_uploader.value:
        file = qc_uploader.value[0]
        b64, raw, preview = get_png_base64_from_bytes(file['content'])
        global base64_image, image_data, img
        base64_image, image_data, img = b64, raw, preview
        with qc_display:
            clear_output(wait=True)
            display(img)

# Callback: Reference Image Upload
def on_ref_upload(change):
    if ref_uploader.value:
        file = ref_uploader.value[0]
        b64, raw, preview = get_png_base64_from_bytes(file['content'])
        global base64_reference_image, reference_image_data, ref_img
        base64_reference_image, reference_image_data, ref_img = b64, raw, preview
        with ref_display:
            clear_output(wait=True)
            display(ref_img)

# Attach callbacks
qc_uploader.observe(on_qc_upload, names='value')
ref_uploader.observe(on_ref_upload, names='value')


VBox(children=(Label(value='📸 Upload QC Image:'), FileUpload(value=(), accept='image/*', description='Upload')…

Define the system prompt. As you see it defines the special awareness so we can calculate bounding<br/>boxes to the images after the model has run.

You also see that we request a specific JSON structure to automatically process the output of the model.

We then add the images as base64 encoded images to the prompt and send the created prompt to Bedrock.

The response from Bedrock is cleaned in case that there is markdown included so we can automatically process the response.

In [78]:
# System prompt
default_system_prompt = dedent("""\
    You are an object detector. When the user provides you with an image, provide a JSON with the following content:
    1. Describe the color and features of the objects on the image.
    Give an explanation of the object from a quality control perspective, from a manufacturing point of view.
    That object is a cube, can be blue, yellow, green, or red. Give the color and the quality analysis in a simple and quick way.
    The color must be consistent. If you see that it has other colors, or any other object obstructing or on top of it, consider it NOK.
    Check irregular edges too.
    Provide coordinates in pixels for bounding boxes to detect the defects.
    Always measure based on X being the wider part of the image.
    Always use x_min, y_min, x_max, y_max.
    Give me only the coordinates of the defects.
    When multiple defects appear, give me the bounding boxes of all of them.
    Do not group them. Do not create bounding boxes from the reference image.
    When the image is not clear or unusual, consider it NOK.
    Differentiate between color and obstruction. Check any image differences.
    Limit the answer to:
        QC: OK — if it looks fine
        QC: NOK — if it has defects (with a short reason)
    Use the tag "text".
""")

# Prompt that requests a structured JSON
instruction = dedent("""\
    Provide me the JSON with the written description in 'text' and the list of objects in 'objects'.
    This list must include: name, color, qc, reason, bounding_box of the defect (x_min, y_min, x_max, y_max).
    Clean JSON only — no markdown, no extra characters.
    If you describe a defect, include its bounding box.
    Do not group bounding boxes or include any from the reference image.
    Example JSON:
    {
        "text": "The object is a blue and green sponge. The green part has white spots...",
        "objects": [
            {
                "name": "defect",
                "color": "green",
                "qc": "NOK",
                "reason": "white spots",
                "bounding_box": {
                    "x_min": 195, "y_min": 475, "x_max": 285, "y_max": 595
                }
            },
            {
                "name": "defect",
                "color": "blue",
                "qc": "NOK",
                "reason": "white spot and small hole",
                "bounding_box": {
                    "x_min": 690, "y_min": 420, "x_max": 760, "y_max": 500
                }
            }
        ]
    }
""")

# Construct the request
system_list = [{"text": default_system_prompt}]
message_list = [
    {
        "role": "user",
        "content": [
            {"image": {"format": "png", "source": {"bytes": base64.b64encode(image_data).decode()}}},
            {"text": instruction}
        ]
    }
]

if reference_image_data:
    message_list.append({
        "role": "user",
        "content": [
            {"text": "This is the reference image. Do not include bounding boxes for this image."},
            {"image": {"format": "png", "source": {"bytes": base64.b64encode(reference_image_data).decode()}}}
        ]
    })

native_request = {
    "schemaVersion": "messages-v1",
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": {
        "max_new_tokens": 2500,
        "top_p": 0.1,
        "top_k": 20,
        "temperature": 0.1
    }
}

try:
    response = bedrock_runtime.invoke_model(modelId=MODEL_ID, body=json.dumps(native_request))
    model_response = json.loads(response["body"].read())
    print("✅ Inference completed successfully.")
except Exception as e:
    print(f"❌ Model invocation error: {str(e)}")

raw_text = model_response["output"]["message"]["content"][0]["text"]

try:
    # Remove markdown or formatting just in case
    clean_json = raw_text.strip().replace("```json", "").replace("```", "")
    parsed = json.loads(clean_json)
    print("✅ Parsed QC Output:")
    print(json.dumps(parsed, indent=2))
except json.JSONDecodeError as e:
    print("❌ JSON parsing failed:", str(e))

✅ Inference completed successfully.
✅ Parsed QC Output:
{
  "text": "The image shows a box of cigars. Each cigar is wrapped in a black and gold wrapper with a logo. The cigars are neatly arranged in rows. There is a yellow piece of tape on one of the cigars, which is not standard and indicates a potential issue with that particular cigar.",
  "objects": [
    {
      "name": "cigar",
      "color": "brown",
      "qc": "NOK",
      "reason": "yellow tape",
      "bounding_box": {
        "x_min": 290,
        "y_min": 500,
        "x_max": 380,
        "y_max": 680
      }
    }
  ]
}


Now that we have the response we need to draw the bounding boxes of the detected faults over the images.
Matplotlib helps to do this.

Afterwards we show the reference image, the resuling image with drawing bouning boxes if there were detected errors <br/>
and the result description given by the model.

In [None]:
def draw_bounding_boxes(base64_img, objects):
    img = Image.open(io.BytesIO(base64.b64decode(base64_img)))
    img_width, img_height = img.size

    # Get the resolution from the first object
    resolution = objects[0].get("image_resolution", "1000x1000")
    model_width, model_height = map(int, resolution.lower().split("x"))

    scale_x = img_width / model_width
    scale_y = img_height / model_height

    fig, ax = plt.subplots(figsize=(10, 10))
    ax.imshow(img)

    for obj in objects:
        box = obj["bounding_box"]
        x_min = box["x_min"] * scale_x
        y_min = box["y_min"] * scale_y
        x_max = box["x_max"] * scale_x
        y_max = box["y_max"] * scale_y
        label = f"{obj['name']} ({obj['qc']})"
        color = "green" if obj["qc"] == "OK" else "red"

        rect = plt.Rectangle((x_min, y_min), x_max - x_min, y_max - y_min,
                             linewidth=2, edgecolor=color, facecolor='none')
        ax.add_patch(rect)
        ax.text(x_min, y_min - 10, label, color=color, fontsize=12, weight='bold')

    plt.axis('off')
    plt.show()
    
if base64_reference_image:
    ref_img = Image.open(io.BytesIO(base64.b64decode(base64_reference_image)))
    w, h = ref_img.size
    resized = ref_img.resize((w // 2, h // 2))
    print("🧾 Reference Image:")
    display(resized)

print("🧾 Quality Control Report:")
if parsed.get("objects"):
    report_text = f"""QC: {parsed['objects'][0]['qc']}

Description:
{parsed['text']}

Defects:"""

    for obj in parsed["objects"]:
        box = obj["bounding_box"]
        report_text += f"""
- {obj['name']} ({obj['color']}): {obj['qc']} — reason: {obj['reason']}
  Bounding Box: x_min={box['x_min']}, y_min={box['y_min']}, x_max={box['x_max']}, y_max={box['y_max']}
"""
    draw_bounding_boxes(base64_image, parsed["objects"])
else:
    report_text = f"""QC: OK

Description:
{parsed['text']}

No defects were detected.
"""
    if base64_image:
        qc_img = Image.open(io.BytesIO(base64.b64decode(base64_image)))
        w, h = qc_img.size
        resized = qc_img.resize((w // 2, h // 2))
        print("🧾 QC Image:")
        display(resized)

print(report_text)

With this simple steps you see how the whole pipeline works.