# Zero-Training Visual Defect Detection Using Amazon Nova Pro

In manufacturing, visual quality inspection is critical for ensuring product reliability<br/>
and compliance. However, traditional approaches—whether manual review or custom-trained<br/>
computer vision models—are costly, slow to adapt, and difficult to scale across diverse<br/>
product lines.

This Jupyter notebook introduces a zero-training, no-dataset-required visual inspection<br/>
system using **Amazon Nova Pro**, a multimodal foundation model accessed via **Amazon Bedrock**.<br/>
Using only a **Jupyter notebook**, you can detect manufacturing defects in product images with structured<br/>
natural language prompts—no computer vision expertise or labeled data required.<br/>
By the end of this notebook, you'll be able to:

* Upload and analyze product images in a Jupyter notebook
* Detect visual defects using Amazon Nova Pro
* Automatically return bounding boxes, failure reasons, and QC status
* Visualize defect overlays on product images

For more background consult the Readme in the same repository.

## Pipeline Architecture in this Notebook

This inspection pipeline operates entirely within a local Jupyter notebook and AWS serverless infrastructure:

1. Image Capture: Use widgets in the notebook to upload a product image (and optionally a reference image).
2. Image Preprocessing: Images are resized, converted to Base64, and prepared for inference.
3. Generative AI Inference: Amazon Bedrock invokes Nova Pro to analyze the image and return structured defect data.
4. Visualization: Bounding boxes and defect reasons are drawn using matplotlib.

User → Jupyter Notebook → Amazon Bedrock (Nova Pro) → JSON Defect Output → Matplotlib Overlay

## Step-by-Step Implementation

The Jupyter notebook requires only an environment with internet access and AWS credentials to access Bedrock.<br/>
So you can run this locally on your laptop or on an **Amazon Sagemaker AI** notebook.

Install and import the required libraries. We need boto3 to invoke *Amazon Nova Pro* via  Amazon Bedrock, pillow <br/>
is Python Image Library and matplotlib to draw the detected areas on the images.

In [None]:
!pip install boto3 pillow matplotlib --quiet

In [None]:
import boto3
import json
import base64
import io
from PIL import Image
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display
from textwrap import dedent

Define the model (currently we only tested **Amazon Nova** in the *light* and *pro* variant) and the endpoint.<br/>
Ensure beforehand that you have access to the model in the specified AWS region.

In [None]:
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")
MODEL_ID = "amazon.nova-pro-v1:0"

For the first test we will reference two images. The `ref` image is the reference image and the `qc` image is the image to test of the model is detecting anomalies on that image.

In [None]:
# ==== Replace these with your file paths ====
qc_image_path = "./sample-dataset/metalplate_nok.jpg"
ref_image_path = "./sample-dataset/metalplate_reference.jpg"

Define helper functions to be able to access images, encode them as well as resize and upload the files.

In [None]:
from PIL import Image
import io
import base64
import ipywidgets as widgets
from IPython.display import display, clear_output

def get_png_base64_from_file(filepath, max_size=(1024, 1024)):
    img = Image.open(filepath)
    img.thumbnail(max_size, Image.Resampling.LANCZOS)

    with io.BytesIO() as output:
        img.save(output, format="PNG")
        png_bytes = output.getvalue()
        base64_str = base64.b64encode(png_bytes).decode("utf-8")

    return base64_str, png_bytes, img

# Load and display QC image
def load_qc_image():
    global base64_image, image_data, img
    b64, raw, preview = get_png_base64_from_file(qc_image_path)
    base64_image, image_data, img = b64, raw, preview
    with qc_display:
        clear_output(wait=True)
        w, h = img.size
        img_rez = img.resize((w // 3, h // 3))
        display(img_rez)

# Load and display Reference image
def load_ref_image():
    global base64_reference_image, reference_image_data, ref_img
    b64, raw, preview = get_png_base64_from_file(ref_image_path)
    base64_reference_image, reference_image_data, ref_img = b64, raw, preview
    with ref_display:
        clear_output(wait=True)
        w, h = ref_img.size
        ref_img_rez = ref_img.resize((w // 3, h // 3))
        display(ref_img_rez)

qc_display = widgets.Output()
ref_display = widgets.Output()
display(widgets.VBox([
    widgets.Label("🧾 Reference Image:"), ref_display,
    widgets.Label("📸 QC Image:"), qc_display
]))

# Load both images
load_qc_image()
load_ref_image()

Define the system prompt. As you see it defines the special awareness so we can calculate bounding<br/>boxes to the images after the model has run.

You also see that we request a specific JSON structure to automatically process the output of the model.

Then there function for
* building the request out of the prompt and the images
* sending the request to Bedrock
* parsing and cleaning the response (cleaning in case there is markdown included so we can automatically process the response)

In [None]:
# System prompt
default_system_prompt = dedent("""\
    Compare the current image with the reference image 
    and highlight any visual differences. Mark as NOK if differences exist; otherwise, mark as OK. 
    Be very strict with the differences, give me all the differences. Check for any missing or visual differences.
    Scratches and dents are also important. Search for any aestethical problem, any barcode missing, if the objects have some 
    missing labels, or anything aestethically different, blurry text or logos. Any material differences, scratches on surface,
    color difference between reference and QC objects
""")

# Prompt that requests a structured JSON
instruction = dedent("""\
    Provide me the JSON with the written description in 'text' and the list of objects in 'objects'.
    This list must include: name, color, qc, reason, bounding_box of the defect (x_min, y_min, x_max, y_max).
    Clean JSON only — no markdown, no extra characters.
    If you describe a defect, include its bounding box.
    Do not group bounding boxes or include any from the reference image. Be strict on the JSON based on the example,
    keep the same names for the objects because I am parsing those later. Make bounding boxes for all defects.
    Example JSON:
    {
        "text": "The object is a blue and green sponge. The green part has white spots...",
        "objects": [
            {
                "name": "defect_1",
                "color": "green",
                "qc": "NOK",
                "reason": "white spots",
                "bounding_box": {
                    "x_min": 195, "y_min": 475, "x_max": 285, "y_max": 595
                }
            },
            {
                "name": "defect_2",
                "color": "blue",
                "qc": "NOK",
                "reason": "white spot and small hole",
                "bounding_box": {
                    "x_min": 690, "y_min": 420, "x_max": 760, "y_max": 500
                }
            }
        ]
    }
""")

def build_qc_request(image_data, instruction, default_system_prompt, reference_image_data=None):
    system_list = [{"text": default_system_prompt}]
    
    message_list = [
        {
            "role": "user",
            "content": [
                {"image": {"format": "png", "source": {"bytes": base64.b64encode(image_data).decode()}}},
                {"text": instruction}
            ]
        }
    ]

    if reference_image_data:
        message_list.append({
            "role": "user",
            "content": [
                {"text": "This is the reference image. Do not include bounding boxes for this image."},
                {"image": {"format": "png", "source": {"bytes": base64.b64encode(reference_image_data).decode()}}}
            ]
        })

    return {
        "schemaVersion": "messages-v1",
        "messages": message_list,
        "system": system_list,
        "inferenceConfig": {
            "max_new_tokens": 2500,
            "top_p": 0.1,
            "top_k": 20,
            "temperature": 0.1
        }
    }

def invoke_qc_model(native_request, model_id):
    try:
        response = bedrock_runtime.invoke_model(modelId=model_id, body=json.dumps(native_request))
        model_response = json.loads(response["body"].read())
        print("✅ Inference completed successfully.")
        return model_response
    except Exception as e:
        print(f"❌ Model invocation error: {str(e)}")
        return None

def parse_qc_response(model_response):
    if not model_response:
        return None
    try:
        raw_text = model_response["output"]["message"]["content"][0]["text"]
        clean_json = raw_text.strip().replace("```json", "").replace("```", "")
        parsed = json.loads(clean_json)
        print("✅ Parsed QC Output:")
        print(json.dumps(parsed, indent=2))
        return parsed
    except (KeyError, json.JSONDecodeError) as e:
        print(f"❌ Response parsing failed: {str(e)}")
        return None

native_request = build_qc_request(image_data, instruction, default_system_prompt, reference_image_data)
model_response = invoke_qc_model(native_request, MODEL_ID)
parsed_output = parse_qc_response(model_response)

Now that we have the response we need to draw the bounding boxes of the detected faults over the images.
Matplotlib helps to do this.

Afterwards we show the reference image, the resuling image with drawing bouning boxes if there were detected errors <br/>
and the result description given by the model.

In [None]:
def draw_bounding_boxes(base64_img, objects):
    img = Image.open(io.BytesIO(base64.b64decode(base64_img)))
    img_width, img_height = img.size

    model_width = 1000
    model_height = 800

    scale_x = img_width / model_width
    scale_y = img_height / model_height

    fig, ax = plt.subplots(figsize=(10, 10))
    ax.imshow(img)

    for obj in objects:
        box = obj["bounding_box"]
        x_min = box["x_min"] * scale_x
        y_min = box["y_min"] * scale_y
        x_max = box["x_max"] * scale_x
        y_max = box["y_max"] * scale_y
        label = f"{obj['name']} ({obj['qc']})"
        color = "green" if obj["qc"] == "OK" else "red"

        rect = plt.Rectangle((x_min, y_min), x_max - x_min, y_max - y_min,
                             linewidth=2, edgecolor=color, facecolor='none')
        ax.add_patch(rect)
        ax.text(x_min, y_min - 10, label, color=color, fontsize=12, weight='bold')

    plt.axis('off')
    plt.show()
    
def display_qc_report(parsed, base64_image, base64_reference_image=None):
    if base64_reference_image:
        ref_img = Image.open(io.BytesIO(base64.b64decode(base64_reference_image)))
        w, h = ref_img.size
        resized_ref = ref_img.resize((w // 2, h // 2))
        print("🧾 Reference Image:")
        display(resized_ref)

    if parsed.get("objects"):
        print("🧾 QC Image with Defects:")
        draw_bounding_boxes(base64_image, parsed["objects"])

        report_text = f"""QC: {parsed['objects'][0]['qc']}

Description:
{parsed['text']}

Defects:"""

        for obj in parsed["objects"]:
            box = obj["bounding_box"]
            report_text += f"""
- {obj['name']} ({obj['color']}): {obj['qc']} — reason: {obj['reason']}
  Bounding Box: x_min={box['x_min']}, y_min={box['y_min']}, x_max={box['x_max']}, y_max={box['y_max']}
"""
    else:
        qc_img = Image.open(io.BytesIO(base64.b64decode(base64_image)))
        w, h = qc_img.size
        resized_qc = qc_img.resize((w // 2, h // 2))
        print("🧾 QC Image:")
        display(resized_qc)

        report_text = f"""QC: OK

Description:
{parsed['text']}

No defects were detected.
"""

    print("\n🧾 Quality Control Report:")
    print(report_text)
    
display_qc_report(parsed_output, base64_image, base64_reference_image)

## 2. Example

So lets use some different images. To process those and further images lets alos define a function that call the whole pipline.

In [None]:
qc_image_path = "./sample-dataset/metalplate_nok2.jpg"
ref_image_path = "./sample-dataset/metalplate_reference.jpg"

def run_qc_full_pipeline(qc_image_path, ref_image_path):
    load_qc_image()
    load_ref_image()
    native_request = build_qc_request(image_data, instruction, default_system_prompt, reference_image_data)
    model_response = invoke_qc_model(native_request, MODEL_ID)
    parsed_output = parse_qc_response(model_response)
    display_qc_report(parsed_output, base64_image, base64_reference_image)

run_qc_full_pipeline(qc_image_path, ref_image_path)

## 3. Example

In [None]:
qc_image_path = "./sample-dataset/metalpiece_nok.jpg"
ref_image_path = "./sample-dataset/metalpiece_reference.jpg"
run_qc_full_pipeline(qc_image_path, ref_image_path)

## 4. Example

In [None]:
qc_image_path = "./sample-dataset/plasticpart_nok.jpg"
ref_image_path = "./sample-dataset/plasticpart_reference.jpg"
run_qc_full_pipeline(qc_image_path, ref_image_path)

## 5. Example

In [None]:
qc_image_path = "./sample-dataset/cubesponge2_nok.jpg"
ref_image_path = "./sample-dataset/cubesponge2_reference.jpg"
run_qc_full_pipeline(qc_image_path, ref_image_path)

## 6. Example

In [None]:
qc_image_path = "./sample-dataset/cubesponge2_nok2.jpg"
ref_image_path = "./sample-dataset/cubesponge2_reference.jpg"
run_qc_full_pipeline(qc_image_path, ref_image_path)

## 7. Example

In [None]:
qc_image_path = "./sample-dataset/cubesponge_nok.jpg"
ref_image_path = "./sample-dataset/cubesponge_reference.jpg"
run_qc_full_pipeline(qc_image_path, ref_image_path)