# AWS SageMaker JupyterLab & SiMa.ai's Palette Software

To kickstart our workflow, we need to ensure that our environment is **configured correctly**. Specifically, we'll be verifying the integration of the necessary software libraries with NVIDIA’s CUDA platform, as this will be essential for re-training our model efficiently.

***

In [None]:
!nvidia-smi
import torch
print(torch.__version__)
print(torch.cuda.is_available())

***

With the packages verified and configured, let's proceed by cloning the **YOLOv7** repository, which we’ll use to perform machine learning inference.

***

In [None]:
!git clone https://github.com/WongKinYiu/yolov7.git

***

**Use Case: PPE Detection**  
We’re addressing a **Personal Protective Equipment (PPE)** scenario, focusing on detecting people wearing helmets. This means our chosen classes (`Person`, `Human head`, `Helmet`, and `Glasses`) allow us to track whether a person has a helmet on while in a work environment, helping ensure safety compliance.

**Data Splits (75% Train, 25% Val, 0% Test)**  
We’re dedicating \(75\%\) of our dataset to training and \(25\%\) to validation. We have intentionally set the test split to \(0\%\) because our final pipeline validation is performed on a separate video (instead of a typical test dataset). This approach keeps our training process focused on building a robust model while leaving the ultimate performance verification to real-world video input.

**Choosing `model_flavor`: `yolov7` vs. `yolov7-tiny`**  
By default, we set `model_flavor` to `"yolov7-tiny"`. However, you can switch to `"yolov7"` for an increase decrease in accuracy and robustness compared to the smaller YOLOv7-tiny model but YOLOv7 takes **substantially longer to train**, which is important to consider if you’re working under time constraints. `"yolov7-tiny"` is smaller, faster model that generally requires less training time and fewer computing resources.

***
One of the powerful features here is the flexibility to change the `classes` variable to any of the **600** available classes. This allows your model to be re-trained and deployed to the edge, automatically tuned to detect only the specified classes, running seamlessly on a real device.

***

In [None]:
# Set dataset settings

classes = ["Person", "Human head", "Helmet", "Glasses"]
split_percentages = [0.75, 0.25, 0.00]  # Train, Val, Test splits
model_flavor = "yolov7-tiny"
s3_bucket = "<PASTE YOUR S3 NAME>"
print(f"S3 Bucket: {s3_bucket}")
if s3_bucket == "<PASTE YOUR S3 NAME>":
    raise Exception("Please update `s3_bucket` with your s3 bucket name")

***
## Dataset Creation
To begin fine-tuning our model, we'll need a dataset. Here, we'll use the open-source dataset **Open Images V7**. With FiftyOne's seamless integration, we can easily select any class from this dataset to tailor our training process.

In this Cell we display all the 600 classes users can choose from:
***

In [None]:
from pathlib import Path
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F
import fiftyone.utils.openimages
import shutil
import os

fo.utils.openimages.get_classes("v7")

***
This function, `divide_train`, prepares our dataset for model fine-tuning by loading a specified number of samples from the Open Images v7 dataset, filtering by the chosen `classes`. It exports the dataset in `YOLOv5` format and then organizes the images and labels into structured folders (`train`, `test`, and `val`). This setup ensures that each data split is correctly formatted, with the dataset ready for training, testing, and validation steps, aligning images with corresponding labels across each split. Finally, the dataset file paths are recorded for streamlined access during model training and deployment.

Before we load the subsets for each class, there are two important points to highlight regarding *dataset balancing*:

1. We **exclude the `Person` class** from our `target_classes`. Since “Helmet” and “Human head” labels naturally involve people, we don’t need to explicitly include “Person” again. Doing so would lead to “Person” being over-represented in the dataset, skewing the class distribution and potentially biasing our model. By removing “Person,” we ensure that images already containing helmets or heads (which imply people) are sufficient to capture person-related data.

2. We **limit the number of samples per class** (`samples_per_class`). By setting `max_samples=samples_per_class` for each target class, we make sure each class is represented evenly in the final dataset. This step prevents one class from dominating the training set and helps maintain a balanced distribution—an essential aspect for models that must accurately detect multiple object categories.

Below is the code snippet demonstrating these two actions:

***

In [None]:
# Parameters
BATCH_SIZE = 16  # Define your batch size here
target_classes = classes.copy()
target_classes.remove("Person")
samples_per_class = 8000  # Load at least 8,000 per class

# Ensure dataset size is divisible by BATCH_SIZE
# total_images = (total_images // BATCH_SIZE) * BATCH_SIZE

# Define paths
root = Path("new_data")
r_lab = root / "labels"
r_img = root / "images"

dataset_name = "open-images-v7"
open_images_split = "train"

tr, va, te = 'train', 'val', 'test'
fold_names = [tr, va, te]

# Create directories for labels and images
for path in [r_lab, r_img]:
    path.mkdir(parents=True, exist_ok=True)
    for split in fold_names:
        (path / split).mkdir(parents=True, exist_ok=True)

# Load subsets for each class
datasets = []
for cls in target_classes:
    print(f"Loading {samples_per_class} images for class: {cls}")
    dataset = foz.load_zoo_dataset(
        dataset_name,
        split=open_images_split,
        label_types=["detections"],
        classes=[cls],
        max_samples=samples_per_class,
        dataset_name=f"open-images-v7-{cls}-{samples_per_class}",
    )
    datasets.append(dataset)

In this stage, we **merge** all the previously loaded subsets (`datasets`) into a single `full_dataset`. We also enforce consistency in two ways: First, we remove existing duplicates by deleting a prior dataset named `"full_dataset"`—if any—and re-creating it. Then we take care to ensure the dataset’s total size remains divisible by our specified `BATCH_SIZE`. This maintains cleaner batch boundaries in downstream training steps.  

Next, we perform a series of **label manipulations and dataset exports**. We relabel “Man” and “Woman” to “Person,” and convert “Goggles”/“Sunglasses” into “Glasses.” We also add a mechanism to identify if a `Helmet`-bounding box is fully inside a `Person` bounding box that **does not** already contain a `Human head`—in which case we treat that Helmet detection as a new `Human head` (capturing a head + helmet situation). After finalizing these bounding-box labels, we remove duplicates and split our dataset into train, val, and test sets according to `split_percentages`. Finally, we export each set to the YOLOv5 format, copy its images, and write file paths into `.txt` split files. This results in a well-structured, deduplicated dataset—ready for model training and evaluation.

In [None]:
import warnings
warnings.filterwarnings("ignore")
# Combine all subsets into one dataset
full_dataset = fo.Dataset("full_dataset")
for dataset in datasets:
    full_dataset.merge_samples(dataset)

# Ensure dataset size is divisible by BATCH_SIZE
final_size = (full_dataset.count() // BATCH_SIZE) * BATCH_SIZE
full_dataset = full_dataset.take(final_size)


# Helper function to check if one bounding box is within another
def is_within_box(inner_box, outer_box):
    """
    Check if a bounding box (inner_box) is fully within another bounding box (outer_box).

    Args:
        inner_box (list): [x, y, width, height] of the inner box.
        outer_box (list): [x, y, width, height] of the outer box.

    Returns:
        bool: True if the inner box is within the outer box, False otherwise.
    """
    ix, iy, iw, ih = inner_box
    ox, oy, ow, oh = outer_box

    return ix >= ox and iy >= oy and (ix + iw) <= (ox + ow) and (iy + ih) <= (oy + oh)

# Relabel Goggles to Glasses and process Human head and Person labels
for sample in full_dataset:
    person_boxes = []
    human_head_boxes = []
    new_detections = []

    for detection in sample.ground_truth.detections:
        # Relabel "Man" and "Woman" to "Person"
        if detection.label in ["Man", "Woman"]:
            detection.label = "Person"

        # Relabel "Goggles" to "Glasses"
        if detection.label in ["Goggles", "Sunglasses"]:
            detection.label = "Glasses"

        # Collect "Person" and "Human head" bounding boxes
        if detection.label == "Person":
            person_boxes.append(detection.bounding_box)
        elif detection.label == "Human head":
            human_head_boxes.append(detection.bounding_box)

    # Process "Helmet" labels and check conditions
    for detection in sample.ground_truth.detections:
        if detection.label == "Helmet":
            helmet_box = detection.bounding_box
            for person_box in person_boxes:
                if is_within_box(helmet_box, person_box) and not any(
                    is_within_box(human_head_box, person_box) for human_head_box in human_head_boxes
                ):
                    # Create a new detection for Human head
                    new_detection = detection.copy()  # Copy the Helmet detection
                    new_detection.label = "Human head"  # Change label
                    new_detections.append(new_detection)

    # Add the new detections to the sample
    sample.ground_truth.detections.extend(new_detections)
    sample.save()  # Save changes for each sample


n_train_split = int(split_percentages[0] * final_size)
n_val_split = int(split_percentages[1] * final_size)
n_test_split = final_size - n_train_split - n_val_split

# Split dataset
splits = {
    tr: full_dataset.take(n_train_split),
    va: full_dataset.skip(n_train_split).take(n_val_split),
    te: full_dataset.skip(n_train_split + n_val_split).take(n_test_split),
}

# Define a function to copy images to their respective split folders
def copy_images_to_split(split_name, split_view):
    split_img_dir = r_img / split_name
    # Ensure the target directory exists
    split_img_dir.mkdir(parents=True, exist_ok=True)
    
    for sample in split_view:
        img_src_path = Path(sample.filepath)
        img_dst_path = split_img_dir / img_src_path.name
        # Check if the source file exists
        if not img_src_path.exists():
            print(f"Source file does not exist: {img_src_path}")
            continue  # Skip missing files
        shutil.copy(img_src_path, img_dst_path)

# Remove duplicate labels from the dataset
def remove_duplicate_labels(dataset, label_field):
    for sample in dataset:
        if label_field in sample and sample[label_field] is not None:
            detections = sample[label_field]
            # Use a dictionary to keep unique detections by label and bounding box
            unique_detections = []
            seen = set()
            for det in detections.detections:  # Access the detections list
                key = (det.label, tuple(det.bounding_box))
                if key not in seen:
                    seen.add(key)
                    unique_detections.append(det)
            # Update the sample with all unique detections
            detections.detections = unique_detections
            sample[label_field] = detections
            sample.save()

# # Apply the function to the balanced dataset
remove_duplicate_labels(full_dataset, "ground_truth")

# Export and create split files
for split_name, split_view in splits.items():
    split_img_dir = r_img / split_name
    labels_dir = r_lab / split_name
    split_txt = root / f"{split_name}.txt"

    # Export labels to YOLOv5 format
    split_view.export(
        dataset_type=fo.types.YOLOv5Dataset,
        labels_path=str(labels_dir),
        classes=classes,  # Correctly ensure target classes are used
    )

    # Copy images to split directory
    print(f"Copying images for split: {split_name}")
    copy_images_to_split(split_name, split_view)

    # Write image paths to split file
    with open(split_txt, "w") as f:
        for sample in split_view:
            img_relative_path = f"./images/{split_name}/{Path(sample.filepath).name}"
            f.write(f"{img_relative_path}\n")

    # Debugging: Check split sizes
    print(f"Split '{split_name}' contains {split_view.count()} images.")

print("Dataset split and export completed.")

***
# YoloV7 Model Training

This section configures the YOLOv7 model and downloads its pre-trained weights if necessary:

1. **Configuration File Setup**  
   - We define `config_path` to point to a specific YOLOv7 `.yaml` file, either `yolov7.yaml` or `yolov7-tiny.yaml` depending on the selected `model_flavor`. We also build a URL (`file_url`) for the corresponding pre-trained weights (`model_flavor.pt`) and set a `file_path` location to store them locally.  
   - Next, we list our detection classes (`["Person", "Human head", "Helmet", "Glasses", "Woman"]`) and then create a `custom.yaml` that YOLOv7 will use. This file references our train/val/test `.txt` splits and specifies the number of classes (`nc`) plus the class names (`names`). Storing this in `yolov7/data/custom.yaml` allows YOLOv7 to find our dataset details when we begin training.

2. **Updating the Number of Classes and Managing Model Weights**  
   - We load the original `model_flavor.yaml` file (in `yolov7/cfg/training/`) and modify the line that sets `nc:` (number of classes), ensuring it matches `len(classes)` from our custom dataset. This step guarantees the model architecture is aligned with our label set.  
   - Finally, we check whether the corresponding `.pt` weight file already exists. If it doesn’t, we create the `yolov7` directory (if not present) and download the specified weights via `wget`. Otherwise, we skip the download and log that the file is already available. This process prepares our environment with both the dataset configuration (`custom.yaml`) and the correct pre-trained weights (`.pt`), ready for fine-tuning or training.
***

In [None]:
# Path to the config file
config_path = f"yolov7/cfg/training/{model_flavor}.yaml"  # Update this with the path to your config file
file_url = f"https://github.com/WongKinYiu/yolov7/releases/download/v0.1/{model_flavor}.pt"
file_path = f"yolov7/{model_flavor}.pt"

classes = ["Person", "Human head", "Helmet", "Glasses", "Woman"]

custom_yaml = f"{tr}: ../{root}/{tr}.txt \n{va}: ../{root}/{va}.txt \n{te}: ../{root}/{te}.txt \n # number of classes \nnc: {len(classes)} \n # class names \nnames: {classes}"

file = open("yolov7/data/custom.yaml", "w")
file.write(custom_yaml)
file.close()

num_classes = len(classes)

# Read the file, update the 'nc' line, and rewrite the file
with open(config_path, 'r') as file:
    lines = file.readlines()

with open(config_path, 'w') as file:
    for line in lines:
        # Check if the line starts with 'nc:' and update it
        if line.strip().startswith("nc:"):
            file.write(f"nc: {num_classes}  # number of classes\n")
        else:
            file.write(line)

print(f"Num classes updated to {num_classes} in the config file.")


# Check if the file exists
if not os.path.exists(file_path):
    # Ensure the yolov7 directory exists
    os.makedirs("yolov7", exist_ok=True)
    
    # Download the file
    os.system(f"cd yolov7 && wget {file_url}")
    print("File downloaded successfully.")
else:
    print("File already exists. No download necessary.")

This snippet configures the key runtime parameters for training the YOLOv7 model, adapting to the environment’s available hardware:

1. **Device & Batch Size**  
   - We check if there is a CUDA-compatible GPU (`torch.cuda.is_available()`). If so, we set:
     - `DEVICE = 0` (indicating GPU index 0),
     - `NUM_WORKERS = 1` for loading data in parallel, and
     - `BATCH_SIZE = 16`, which is a reasonable size when a GPU is present for faster processing.  
   - If no GPU is found, the code falls back to the CPU (`DEVICE = 'cpu'`), keeps `NUM_WORKERS = 1`, and reduces `BATCH_SIZE` to `2` to accommodate the lower computational throughput on CPU-based systems.

2. **Image Size**  
   - `IMG_SIZE = 640` defines the width and height to which the images will be resized. In YOLO-type models, 640×640 is a typical default for balanced performance and accuracy.

3. **Setting Environment Variables**  
   - We store these values (device, batch size, image size, etc.) in environment variables (`os.environ`) such as `YOLOV7_DEVICE`, `YOLOV7_BATCH_SIZE`, etc. This makes them easily accessible for downstream scripts and ensures consistent usage of the same hyperparameters when running training and inference commands.

By configuring these parameters automatically based on CUDA availability, the training pipeline can seamlessly adjust to different hardware setups without manually changing code.

In [None]:
if torch.cuda.is_available():
    DEVICE = 0
    NUM_WORKERS = 1
    BATCH_SIZE = 64
else:
    DEVICE = 'cpu'
    NUM_WORKERS = 1
    BATCH_SIZE = 2

IMG_SIZE = 640

os.environ['YOLOV7_DEVICE'] = str(DEVICE)
os.environ['YOLOV7_NUM_WORKERS'] = str(NUM_WORKERS)
os.environ['YOLOV7_BATCH_SIZE'] = str(BATCH_SIZE)
os.environ['YOLOV7_IMG_SIZE'] = str(IMG_SIZE)
os.environ['YOLOV7_FLAVOR'] = model_flavor

***
This command launches a YOLOv7 training run from the **`yolov7`** directory with several key arguments:

1. **Change Directory**  
   - `cd yolov7` ensures that we move into the `yolov7` folder, where the main training script and configurations reside.

2. **W&B Offline**  
   - `python3 -m wandb offline` disables Weights & Biases (W&B) online logging, forcing local/offline logging instead. This is helpful if you don’t want to upload logs or metrics to W&B servers.

3. **Training Script (`train.py`)**  
   - `--workers $YOLOV7_NUM_WORKERS`: Sets the number of data-loading workers using the environment variable `YOLOV7_NUM_WORKERS`.
   - `--device $YOLOV7_DEVICE`: Defines the device for training (either GPU index or `'cpu'`), pulled from `YOLOV7_DEVICE`.
   - `--batch-size $YOLOV7_BATCH_SIZE`: Uses the `YOLOV7_BATCH_SIZE` environment variable, which might differ depending on GPU vs. CPU mode.
   - `--data data/custom.yaml`: Points to the custom dataset configuration file, which tells YOLOv7 where to find images/labels and how many classes to expect.
   - `--img $YOLOV7_IMG_SIZE $YOLOV7_IMG_SIZE`: Specifies image width and height (both equal to `YOLOV7_IMG_SIZE`) for model training.
   - `--cfg cfg/training/${YOLOV7_FLAVOR}.yaml`: Selects the YOLOv7 configuration file (e.g., `yolov7.yaml` or `yolov7-tiny.yaml`) based on `YOLOV7_FLAVOR`.
   - `--weights ${YOLOV7_FLAVOR}.pt`: Loads pre-trained weights (e.g., `yolov7.pt` or `yolov7-tiny.pt`) before fine-tuning on the new dataset.
   - `--name sima-${YOLOV7_FLAVOR}`: Assigns a custom name to this training run, useful for organizing results in logs.
   - `--hyp data/hyp.scratch.custom.yaml`: Provides custom hyperparameters (learning rate, momentum, etc.) for training.
   - `--epochs 10`: Sets the number of training epochs.

Putting it all together, this command **trains** the YOLOv7 model on our dataset for 10 epochs, with the user-defined batch size, device selection, and custom dataset/hyperparameters, while storing all outputs in the `runs/` directory under a distinctive run name.
***

In [None]:
!cd yolov7 && python3 -m wandb offline && python3 train.py --workers $YOLOV7_NUM_WORKERS --device $YOLOV7_DEVICE --batch-size $YOLOV7_BATCH_SIZE --data data/custom.yaml --img $YOLOV7_IMG_SIZE $YOLOV7_IMG_SIZE --cfg cfg/training/${YOLOV7_FLAVOR}.yaml --weights ${YOLOV7_FLAVOR}.pt --name sima-${YOLOV7_FLAVOR} --hyp data/hyp.scratch.custom.yaml --epochs 10

# ML Model ONNX Export

This code Cell retrieves the path of the most recent training run for `YOLOv7`.

1. `path` is defined as `'yolov7/runs/train/'`, where all training run results are stored.
2. `all_runs` is a list comprehension that gathers paths to all directories within `path`, filtering for those that are directories.
3. `latest_yolov7_run` finds the latest directory by selecting the one with the most recent modification time, using `max()` with `os.path.getmtime` as the key function.

This process provides the path to the latest `YOLOv7` training output, which is useful for accessing model results, logs, or performance metrics from the most recent run.


In [None]:
path = 'yolov7/runs/train/'
all_runs = [path + d for d in os.listdir(path) if os.path.isdir(path + d)]
latest_yolov7_run = max(all_runs, key=os.path.getmtime)
latest_yolov7_run

This code Cell sets up additional parameters and environment variables for evaluating `YOLOv7` model predictions.

1. `TOP_K` is set to 100, specifying the maximum number of detections to retain per image. This limits the number of predictions to the top 100 with the highest confidence scores.
2. `IOU_THR` (Intersection Over Union Threshold) is set to 0.65, defining the minimum overlap required between predicted and ground-truth bounding boxes for a positive match.
3. `CONF_THR` (Confidence Threshold) is set to 0.35, filtering out predictions with confidence scores below this threshold.

Each parameter is stored as an environment variable (`YOLOV7_TOP_K`, `YOLOV7_IOU_THR`, and `YOLOV7_CONF_THR`) to make them accessible in downstream evaluation scripts.

Additionally, `YOLOV7_FILES` is assigned `latest_yolov7_run`, linking it to the path of the most recent training run. This setup enables streamlined access to the model’s outputs and allows for consistency across different evaluation stages.

In [None]:
TOP_K = 100
IOU_THR = 0.65
CONF_THR = 0.30

os.environ['YOLOV7_TOP_K'] = str(TOP_K)
os.environ['YOLOV7_IOU_THR'] = str(IOU_THR)
os.environ['YOLOV7_CONF_THR'] = str(CONF_THR)

os.environ['YOLOV7_FILES'] = latest_yolov7_run

***
This code Cell exports the trained `YOLOv7` model for optimized deployment.

1. The command `cd yolov7` navigates to the `yolov7` directory, where the `export.py` script is located.
2. `python3 export.py` initiates the export process, taking several command-line arguments for customization:
   - `--weights` specifies the path to the best model weights from the latest training run (`best.pt`).
   - `--grid` enables the grid output, aligning model output with grid cells for improved precision.
   - `--end2end` prepares the model for end-to-end deployment, packaging all necessary components.
   - `--simplify` simplifies the model’s computation graph, enhancing inference efficiency.
   - `--topk-all`, `--iou-thres`, and `--conf-thres` set the maximum detections per image, Intersection Over Union threshold, and confidence threshold, respectively, based on previously defined environment variables.
   - `--img-size` defines the input image size for the exported model, set to `YOLOV7_IMG_SIZE`.
   - `--max-wh` limits the width and height of bounding boxes to `YOLOV7_IMG_SIZE`, ensuring consistency in detection scaling.

This setup exports an optimized, end-to-end `YOLOv7` model ready for deployment with enhanced performance, tailored thresholds, and grid alignment.
***

In [None]:
!cd yolov7 && python3 export.py --weights "../${YOLOV7_FILES}/weights/best.pt" --grid --end2end --simplify --topk-all $YOLOV7_TOP_K --iou-thres $YOLOV7_IOU_THR --conf-thres $YOLOV7_CONF_THR --img-size $YOLOV7_IMG_SIZE $YOLOV7_IMG_SIZE --max-wh $YOLOV7_IMG_SIZE

***
# Graph surgery

In this cell, we perform a **“graph surgery”** on our YOLOv7 ONNX model. The initial goal is to remove certain layers in the post-processing head (such as `Reshape`, `Transpose`, `Split`, and `Concat`) that may be **unsupported** in downstream frameworks or accelerators. These layers often manipulate detection outputs into the final bounding-box format but aren’t strictly necessary if we can maintain a 4D layout through simpler operations.

**First**, we locate the node prefix using `find_prefix_for_operation()` so we can systematically identify and remove the unwanted layers (`remove_nodes()`). By carefully removing the legacy YOLO layers (e.g., `Reshape` and `Transpose`), we make room for new 1×1 convolutions that achieve the same effect. This step streamlines the graph so that each detection scale (80×80, 40×40, and 20×20) remains a **4D tensor** without tricky reshapes.

**Second**, we create and insert **point-wise convolution** nodes (`insert_pointwise_conv()`) to transform the original detection output channels (`xy`, `wh`, `conf`) in place of the removed layers. These 1×1 convolutions are easier to optimize on certain hardware, and they keep the network consistent. We also update constant tensors (`update_elmtwise_const()`) so the re-labeled graph accurately processes bounding-box predictions for the newly defined structure.

**Finally**, we eliminate any references to **hardcoded dimensions** like 255 in the model. This can be crucial for certain hardware or for converting the model into different inference formats that don’t support fixed shape constraints. In the end, we run shape inference to confirm the model’s integrity (`onnx.shape_inference.infer_shapes(model)`) and save the refined ONNX file. With these modifications, the final graph becomes more portable and hardware-friendly, while still producing the same detection outputs in a more streamlined format.
***

In [None]:
import numpy as np
import onnx
from onnx import numpy_helper

# Goal
#   Replace the (reshape + transpose + split) with point-wise convolution.
#   Keep at 4D tensors for all layer outputs. 
#   Remove p3, p4, p5 model outputs.

def find_prefix_for_operation(model, op_type):
    """
    Find the prefix for nodes of a specific operation type.

    Args:
        model (onnx.ModelProto): Loaded ONNX model.
        op_type (str): Operation type to search for (e.g., "Reshape").

    Returns:
        set: A set of unique prefixes for the specified operation type.
    """
    prefixes = set()

    # Iterate over nodes and check operation type
    for node in model.graph.node:
        if node.op_type == op_type and "/" in node.name:
            # Extract prefix up to the second slash
            parts = node.name.split("/")
            if len(parts) > 2:
                prefix = "/".join(parts[:3])  # Example: '/model/model.xxx'
                prefixes.add(prefix)

    return prefixes.pop()

# Remove reshape + transpose + split
def remove_nodes(model):
    remove_node_list = [
        f"{MODEL_PREFIX}/Reshape",
        f"{MODEL_PREFIX}/Constant_2",
        f"{MODEL_PREFIX}/Transpose",
        f"{MODEL_PREFIX}/Split",
        f"{MODEL_PREFIX}/Concat",
        f"{MODEL_PREFIX}/Reshape_1",
        f"{MODEL_PREFIX}/Constant_7",
        f"{MODEL_PREFIX}/Reshape_2",
        f"{MODEL_PREFIX}/Constant_8",
        f"{MODEL_PREFIX}/Transpose_1",
        f"{MODEL_PREFIX}/Split_1",
        f"{MODEL_PREFIX}/Concat_1",
        f"{MODEL_PREFIX}/Reshape_3",
        f"{MODEL_PREFIX}/Reshape_4",
        f"{MODEL_PREFIX}/Constant_14",
        f"{MODEL_PREFIX}/Transpose_2",
        f"{MODEL_PREFIX}/Split_2",
        f"{MODEL_PREFIX}/Concat_2",
        f"{MODEL_PREFIX}/Reshape_5",
        f"{MODEL_PREFIX}/Constant",
        f"{MODEL_PREFIX}/Constant_1",
    ]

    remove_after_node_id = None
    for node_id, node in enumerate(list(model.graph.node)):
        if node.name in remove_node_list:
            model.graph.node.remove(node)
        elif node.name == f"{MODEL_PREFIX}/Concat_3":
            remove_after_node_id = node_id
            model.graph.node.remove(node)
        elif remove_after_node_id is not None and node_id > remove_after_node_id:
            model.graph.node.remove(node)


# Create point-wise convolution nodes.
def insert_pointwise_conv(model):
    def _create_conv_node(base_node_name, size, input_name, output_name):
        weight_name_prefix = base_node_name
        node_name = f"{base_node_name}_{size}"
        node = onnx.helper.make_node(
            name=node_name, op_type="Conv",
            inputs=[input_name, f"{weight_name_prefix}/weight:0"],
            outputs=[output_name], kernel_shape=(1, 1), dilations=(1, 1),
            strides=(1, 1), pads=(0, 0, 0, 0))
        return node

    NUM_CLASSES = len(classes)
    CHAN_PER_DET = 5 + NUM_CLASSES  # (x, y, w, h, obj + NUM_CLASSES)
    base = 0

    for idx, size in enumerate([80, 40, 20]):
        # Node names depend on the detection head
        node_name = {
            80: f"{MODEL_PREFIX}/Sigmoid",
            40: f"{MODEL_PREFIX}/Sigmoid_1",
            20: f"{MODEL_PREFIX}/Sigmoid_2",
        }[size]

        conv_node_name = {
            80: f"{MODEL_PREFIX}/m.0/Conv",
            40: f"{MODEL_PREFIX}/m.1/Conv",
            20: f"{MODEL_PREFIX}/m.2/Conv",
        }[size]

        for node_id, node in enumerate(list(model.graph.node)):
            if node.name != node_name:
                continue

            node.input[0] = f"{conv_node_name}_output_0"

            input_name = f"{node_name}_output_0"
            output_name = f"{MODEL_PREFIX}/Split_{idx}" if idx > 0 else f"{MODEL_PREFIX}/Split"

            # Add convolution nodes for xy, wh, and conf
            model.graph.node.insert(
                node_id + 1,
                _create_conv_node("xy_conv", size, input_name, f"{output_name}_output_0"))
            model.graph.node.insert(
                node_id + 2,
                _create_conv_node("wh_conv", size, input_name, f"{output_name}_output_1"))
            model.graph.node.insert(
                node_id + 3,
                _create_conv_node("conf1_conv", size, input_name, f"{output_name}_output_2"))
            break
    
    base = 0
    # Create and append weight tensors
    for size_group in [2, 2, (1 + NUM_CLASSES)]:
        data = np.zeros((3 * size_group, 3 * CHAN_PER_DET, 1, 1), dtype=np.float32)
        for idx in range(3 * size_group):
            src_idx = base + (idx // size_group) * CHAN_PER_DET + (idx % size_group)
            dst_idx = idx
            data[dst_idx, src_idx, 0, 0] = 1

        name = {
            0: "xy_conv/weight:0",
            2: "wh_conv/weight:0",
            4: "conf1_conv/weight:0",
        }[base]

        model.graph.initializer.append(onnx.helper.make_tensor(
            name=name,
            data_type=onnx.TensorProto.FLOAT,
            dims=data.shape,
            vals=data.flatten().tolist()))
        base += size_group


def update_elmtwise_const(model):
    for node in model.graph.initializer:
        if node.name in [
                f"{MODEL_PREFIX}/Constant_4_output_0",
                f"{MODEL_PREFIX}/Constant_6_output_0",
                f"{MODEL_PREFIX}/Constant_10_output_0",
                f"{MODEL_PREFIX}/Constant_12_output_0",
                f"{MODEL_PREFIX}/Constant_16_output_0",
                f"{MODEL_PREFIX}/Constant_18_output_0",
            ]:
            data = numpy_helper.to_array(node)
            shape = data.shape
            data = data.transpose(0, 1, 4, 2, 3).reshape(-1, shape[2], shape[3])

            if node.name in [
                    f"{MODEL_PREFIX}/Constant_4_output_0",
                    f"{MODEL_PREFIX}/Constant_10_output_0",
                    f"{MODEL_PREFIX}/Constant_16_output_0",
                ]:
                data = np.tile(data, (3, 1, 1))
            node.CopyFrom(numpy_helper.from_array(data, node.name))


def update_output_nodes(model):
    NUM_CLASSES = len(classes)
    CHAN_PER_DET = 1 + NUM_CLASSES

    for node in list(model.graph.output):
        model.graph.output.remove(node)

    for size in [80, 40, 20]:
        model.graph.output.append(onnx.helper.make_tensor_value_info(
            f"xy_{size}", onnx.TensorProto.FLOAT,
            (1, 3 * 2, size, size)))
        model.graph.output.append(onnx.helper.make_tensor_value_info(
            f"wh_{size}", onnx.TensorProto.FLOAT,
            (1, 3 * 2, size, size)))
        model.graph.output.append(onnx.helper.make_tensor_value_info(
            f"conf1_{size}", onnx.TensorProto.FLOAT,
            (1, 3 * CHAN_PER_DET, size, size)))

    # Map the new outputs
    name_map = {
        f"{MODEL_PREFIX}/Add": "xy_80",
        f"{MODEL_PREFIX}/Mul_1": "wh_80",
        "conf1_conv_80": "conf1_80",

        f"{MODEL_PREFIX}/Add_1": "xy_40",
        f"{MODEL_PREFIX}/Mul_3": "wh_40",
        "conf1_conv_40": "conf1_40",

        f"{MODEL_PREFIX}/Add_2": "xy_20",
        f"{MODEL_PREFIX}/Mul_5": "wh_20",
        "conf1_conv_20": "conf1_20",
    }
    for node in model.graph.node:
        if node.name in name_map:
            node.output[0] = name_map[node.name]


def remove_io_shape_constraints(model, old_dim=255):
    """
    Removes (or sets to dynamic) any dimension == old_dim in graph input/output shapes.
    """
    # --- Fix graph inputs ---
    for inp in model.graph.input:
        tensor_type = inp.type.tensor_type
        if not tensor_type.HasField("shape"):
            continue
        for d in tensor_type.shape.dim:
            if d.HasField("dim_value") and d.dim_value == old_dim:
                # Option A: remove entire input if you no longer want to fix its shape
                # model.graph.input.remove(inp)
                # break  # or continue if you have multiple references

                # Option B: set that dimension to dynamic
                d.ClearField("dim_value")

    # --- Fix graph outputs ---
    for out in model.graph.output:
        tensor_type = out.type.tensor_type
        if not tensor_type.HasField("shape"):
            continue
        for d in tensor_type.shape.dim:
            if d.HasField("dim_value") and d.dim_value == old_dim:
                # remove or set dynamic
                d.ClearField("dim_value")

def remove_node_attribute_shapes(model, old_dim=255):
    for node in model.graph.node:
        for attr in node.attribute:
            # Many shape-related attributes are stored as 'ints' or 'tensors'.
            if attr.type == onnx.AttributeProto.INTS:
                # E.g. Reshape's "shape" attribute might be here
                shape_list = list(attr.ints)
                # If 255 is in shape_list, we remove or fix it
                if old_dim in shape_list:
                    # Option A: remove 255 entirely
                    # shape_list = [x for x in shape_list if x != old_dim]
                    
                    # Option B: replace 255 with dynamic (-1)
                    shape_list = [(-1 if x == old_dim else x) for x in shape_list]
                    attr.ints[:] = shape_list
            elif attr.type == onnx.AttributeProto.TENSOR:
                # E.g., a Constant node that might define a shape tensor
                arr = onnx.numpy_helper.to_array(attr.t)
                if old_dim in arr:
                    # fix the array
                    arr = np.where(arr == old_dim, -1, arr)
                    attr.t.CopyFrom(onnx.numpy_helper.from_array(arr))                

def remove_value_info_shapes(model, old_dim=255):
    # We can either remove them entirely or fix that dimension.
    to_remove = []
    for vi in model.graph.value_info:
        tensor_type = vi.type.tensor_type
        if not tensor_type.HasField("shape"):
            continue

        # Check if shape includes old_dim
        found_255 = False
        for d in tensor_type.shape.dim:
            if d.HasField("dim_value") and d.dim_value == old_dim:
                found_255 = True
                break

        if found_255:
            # Option A: remove the entire value_info entry
            to_remove.append(vi)
            # Option B: fix dimension to dynamic
            # for d in tensor_type.shape.dim:
            #     if d.HasField("dim_value") and d.dim_value == old_dim:
            #         d.ClearField("dim_value")

    # Remove them from the graph
    for vi in to_remove:
        model.graph.value_info.remove(vi)

def remove_initializer_shapes(model, old_dim=255):
    to_remove = []
    for init in model.graph.initializer:
        dims = list(init.dims)
        if old_dim in dims:
            # Option A: remove the initializer
            to_remove.append(init)
            # Option B: fix dims from 255 -> -1 or the new dimension
            # dims = [new_dim if x == old_dim else x for x in dims]
            # Re-assign
            # init.ClearField("dims")
            # init.dims.extend(dims)

    for init in to_remove:
        model.graph.initializer.remove(init)

def remove_all_255_shapes(model, old_dim=255):
    # 1) Fix or remove I/O shapes
    remove_io_shape_constraints(model, old_dim)

    # 2) Fix or remove ValueInfo
    remove_value_info_shapes(model, old_dim)

    # 3) Fix or remove any attribute shapes
    remove_node_attribute_shapes(model, old_dim)

    # 4) Fix or remove initializers referencing 255
    remove_initializer_shapes(model, old_dim)

model_name = latest_yolov7_run + "/weights/best"
model = onnx.load(f"{model_name}.onnx")

MODEL_PREFIX = find_prefix_for_operation(model, "Reshape")

ONNX_MODEL_NAME = latest_yolov7_run + "/yolov7.onnx"

remove_nodes(model)
insert_pointwise_conv(model)
update_elmtwise_const(model)
update_output_nodes(model)

# Remove the existing shapes.
for node in list(model.graph.value_info):
    model.graph.value_info.remove(node)

remove_all_255_shapes(model, old_dim=255)

model = onnx.shape_inference.infer_shapes(model)
onnx.checker.check_model(model)
onnx.save(model, ONNX_MODEL_NAME)

print(model_name)
print("Graph surgery completed successfully!")

***
# Quantized Model Inference

This section imports the necessary libraries and modules, including `argparse`, `os`, `cv2`, `numpy`, and `torch`. It also imports specific functions and classes from various submodules, which will be used later in the script for handling image processing, model loading, and evaluation.
***

In [None]:
import argparse
import os

import cv2
import numpy as np
import torch

from afe.ir.defines import InputName
from afe.ir.tensor_type import ScalarType
from afe.load.importers.general_importer import onnx_source
from sima_utils.data.data_generator import DataGenerator
import onnxruntime as ort

from afe.apis.defines import default_quantization, HistogramMSEMethod, quantization_scheme, dataclasses
from afe.apis.loaded_net import load_model
from afe.core.utils import convert_data_generator_to_iterable
from afe.core.evaluate_networks import GraphEvaluatorLogger

import torchvision
from IPython.display import display
from PIL import Image

***
Here, we define constants for the dataset path and the number of samples to be processed. The `DATASET_PATH` variable points to the location of validation labels, while `NUM_OF_SAMPLES` specifies how many samples will be utilized.
***

In [None]:
DATASET_PATH =  'new_data/labels/val/'
NUM_OF_SAMPLES = 50

***
This function `clip_coords` adjusts the bounding box coordinates so they don't go outside the image boundaries. It takes `boxes` as input, which are in the format `[x1, y1, x2, y2]`, and ensures that all coordinates are within the valid range defined by `img_shape`.
***

In [None]:
def clip_coords(boxes, img_shape):
    # Clip bounding xyxy bounding boxes to image shape (height, width)
    boxes[:, 0].clamp_(0, img_shape[1])  # x1
    boxes[:, 1].clamp_(0, img_shape[0])  # y1
    boxes[:, 2].clamp_(0, img_shape[1])  # x2
    boxes[:, 3].clamp_(0, img_shape[0])  # y2

***
The `scale_coords` function rescales bounding box coordinates from one image size to another. It calculates the scaling factor and padding based on the dimensions of the source (`img1_shape`) and destination (`img0_shape`) images. The `clip_coords` function is called at the end to ensure the scaled coordinates are valid.
***

In [None]:
def scale_coords(img1_shape, coords, img0_shape, ratio_pad=None):
    # Rescale coords (xyxy) from img1_shape to img0_shape
    if ratio_pad is None:  # calculate from img0_shape
        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])  # gain  = old / new
        pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2  # wh padding
    else:
        gain = ratio_pad[0][0]
        pad = ratio_pad[1]

    coords[:, [0, 2]] -= pad[0]  # x padding
    coords[:, [1, 3]] -= pad[1]  # y padding
    coords[:, :4] /= gain
    clip_coords(coords, img0_shape)
    return coords

***
The `xywh2xyxy` function converts bounding box coordinates from the format `[center_x, center_y, width, height]` to `[x1, y1, x2, y2]`, where `x1, y1` is the top-left corner and `x2, y2` is the bottom-right corner. This format is often required for further processing.
***

In [None]:
def xywh2xyxy(x):
    # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[:, 0] = x[:, 0] - x[:, 2] / 2  # top left x
    y[:, 1] = x[:, 1] - x[:, 3] / 2  # top left y
    y[:, 2] = x[:, 0] + x[:, 2] / 2  # bottom right x
    y[:, 3] = x[:, 1] + x[:, 3] / 2  # bottom right y
    return y

***
The `box_iou` function calculates the Intersection over Union (IoU) between two sets of bounding boxes. It first computes the area of each box and then determines the intersection area. The IoU is returned as a measure of how much the boxes overlap.
***

In [None]:
def box_iou(box1, box2):
    # https://github.com/pytorch/vision/blob/master/torchvision/ops/boxes.py
    """
    Return intersection-over-union (Jaccard index) of boxes.
    Both sets of boxes are expected to be in (x1, y1, x2, y2) format.
    Arguments:
        box1 (Tensor[N, 4])
        box2 (Tensor[M, 4])
    Returns:
        iou (Tensor[N, M]): the NxM matrix containing the pairwise
            IoU values for every element in boxes1 and boxes2
    """

    def box_area(box):
        # box = 4xn
        return (box[2] - box[0]) * (box[3] - box[1])

    area1 = box_area(box1.T)
    area2 = box_area(box2.T)

    # inter(N,M) = (rb(N,M,2) - lt(N,M,2)).clamp(0).prod(2)
    inter = (torch.min(box1[:, None, 2:], box2[:, 2:]) - torch.max(box1[:, None, :2], box2[:, :2])).clamp(0).prod(2)
    return inter / (area1[:, None] + area2 - inter)  # iou = inter / (area1 + area2 - inter)


***
The `non_max_suppression` function applies the NMS algorithm to filter out overlapping bounding boxes based on confidence scores and IoU thresholds. It processes each image's predictions, applying constraints and aggregating the results into a final output.
***

In [None]:
def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False,
                        labels=()):
    """Runs Non-Maximum Suppression (NMS) on inference results

    Returns:
         list of detections, on (n,6) tensor per image [xyxy, conf, cls]
    """

    nc = prediction.shape[2] - 5  # number of classes
    xc = prediction[..., 4] > conf_thres  # candidates

    # Settings
    min_wh, max_wh = 2, 4096  # (pixels) minimum and maximum box width and height
    max_det = 300  # maximum number of detections per image
    max_nms = 30000  # maximum number of boxes into torchvision.ops.nms()
    time_limit = 10.0  # seconds to quit after
    redundant = True  # require redundant detections
    multi_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)
    merge = False  # use merge-NMS

    output = [torch.zeros((0, 6), device='cpu')] * prediction.shape[0]
    for xi, x in enumerate(prediction):  # image index, image inference
        # Apply constraints
        # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0  # width-height
        x = x[xc[xi]]  # confidence

        # Cat apriori labels if autolabelling
        if labels and len(labels[xi]):
            l = labels[xi]
            v = torch.zeros((len(l), nc + 5), device=x.device)
            v[:, :4] = l[:, 1:5]  # box
            v[:, 4] = 1.0  # conf
            v[range(len(l)), l[:, 0].long() + 5] = 1.0  # cls
            x = torch.cat((x, v), 0)

        # If none remain process next image
        if not x.shape[0]:
            continue

        # Compute conf
        if nc == 1:
            x[:, 5:] = x[:, 4:5]  # for models with one class, cls_loss is 0 and cls_conf is always 0.5,
            # so there is no need to multiplicate.
        else:
            x[:, 5:] *= x[:, 4:5]  # conf = obj_conf * cls_conf

        # Box (center x, center y, width, height) to (x1, y1, x2, y2)
        box = xywh2xyxy(x[:, :4])

        # Detections matrix nx6 (xyxy, conf, cls)
        if multi_label:
            i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).T
            x = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1)
        else:  # best class only
            conf, j = x[:, 5:].max(axis=1, keepdim=True)
            x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]

        # Filter by class
        if classes is not None:
            x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]

        # Check shape
        n = x.shape[0]  # number of boxes
        if not n:  # no boxes
            continue
        elif n > max_nms:  # excess boxes
            x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence

        # Batched NMS
        c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
        boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
        i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
        if i.shape[0] > max_det:  # limit detections
            i = i[:max_det]
        if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)
            # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
            iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrix
            weights = iou * scores[None]  # box weights
            x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes
            if redundant:
                i = i[iou.sum(1) > 1]  # require redundancy

        output[xi] = x[i]

    return output


***
This code Cell defines two functions, `preprocess` and `_load_model`, which prepare an image for model inference and load the `YOLOv7` ONNX model for evaluation.

1. **`preprocess` function:** This function processes an input image to match the model's expected input shape.
   - `img_h` and `img_w` capture the original image dimensions, while `new_h` and `new_w` are adjusted according to the specified `input_shape`.
   - If `letter_box` is `True`, it applies "letterbox" resizing, preserving the original aspect ratio by adding padding. The resized image is placed on a grey (127) background to match `input_shape`, centering it with calculated offsets (`offset_h`, `offset_w`).
   - If `letter_box` is `False`, the image is resized directly to `input_shape`.
   - The image is then converted from BGR to RGB, transposed to channel-first format, normalized (dividing by 255.0), and expanded to a 4D tensor for model input.

2. **`_load_model` function:** This function loads the ONNX model for inference.
   - `model_path` is set to the path of the `YOLOv7` ONNX model (`ONNX_MODEL_NAME`).
   - `shapes_dict` specifies the input shape of the model as `(1, 3, IMG_SIZE, IMG_SIZE)`, while `dtype_dict` sets the data type for the input.
   - `onnx_source` retrieves model information, and `load_model` loads the ONNX model for use in inference.

Finally, `loaded_net` calls `_load_model` to initialize the model, preparing it for inference on preprocessed images.
***

In [None]:
def preprocess(img, input_shape, letter_box=True):
    if letter_box:
        img_h, img_w, _ = img.shape
        new_h, new_w = input_shape[0], input_shape[1]
        offset_h, offset_w = 0, 0
        if (new_w / img_w) <= (new_h / img_h):
            new_h = int(img_h * new_w / img_w)
            offset_h = (input_shape[0] - new_h) // 2
        else:
            new_w = int(img_w * new_h / img_h)
            offset_w = (input_shape[1] - new_w) // 2
        resized = cv2.resize(img, (new_w, new_h))
        img = np.full((input_shape[0], input_shape[1], 3), 127, dtype=np.uint8)
        img[offset_h:(offset_h + new_h), offset_w:(offset_w + new_w), :] = resized
    else:
        img = cv2.resize(img, (input_shape[1], input_shape[0]))

    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img.transpose((2, 0, 1)).astype(np.float32)
    img /= 255.0
    img = np.expand_dims(img, 0)
    return img


def _load_model():
    model_path = ONNX_MODEL_NAME
    shapes_dict = {"images": (1, 3, IMG_SIZE, IMG_SIZE)}
    dtype_dict = {"images": ScalarType.float32}

    importer_params = onnx_source(model_path=model_path, shape_dict=shapes_dict, dtype_dict=dtype_dict)

    loaded_net = load_model(importer_params)
    return loaded_net

from afe.apis.error_handling_variables import enable_verbose_error_messages

enable_verbose_error_messages()

loaded_net = _load_model()

***
# ML Model Quantization

Now we create calibration data and perform model quantization to improve performance.

- **`_make_calibration_data()`**: This function gathers images for calibration from the COCO validation dataset located in `/coco/val2017`. It processes the first 10 images in the dataset by reading them, resizing them to 640x640 pixels (with the help of the previously defined `preprocess` function), and storing them in a list. After processing, we concatenate all the images into a single array, preparing them for calibration. Finally, we convert this data into a format suitable for the model using `convert_data_generator_to_iterable`.

- **Quantization Process**: After generating the calibration data, we set up the quantization configuration using the default settings with calibration based on the Histogram Mean Squared Error method. We then call `_make_calibration_data()` to retrieve our prepared calibration images. With this data in hand, we can quantize our loaded model (`loaded_net`) by passing in the calibration data and the quantization configuration, resulting in a more efficient model ready for deployment.

***

In [None]:
def _make_calibration_data():
    """
    Make calibration data using 35 samples from '/coco/val2017' dataset.
    """

    images = []
    for filename in os.listdir(DATASET_PATH)[:10]:
        image_path = os.path.join(DATASET_PATH, filename).replace("labels", "images").replace(".txt", ".jpg")
        image = cv2.imread(image_path)
        preprocessed_image1 = preprocess(img=image, input_shape=(IMG_SIZE, IMG_SIZE)).transpose(0, 2, 3, 1)
        images.append(preprocessed_image1)

    cal_images = np.concatenate(images, axis=0)

    inputs = {InputName('images'): cal_images}

    calibration_data = convert_data_generator_to_iterable(DataGenerator(inputs))

    return calibration_data

# Quantize model
quant_configs = default_quantization.with_calibration(HistogramMSEMethod(num_bins=1024))
calibration_data = _make_calibration_data()
quantized_net = loaded_net.quantize(calibration_data=calibration_data, quantization_config=quant_configs, model_name=latest_yolov7_run.split("/")[-1])

***

This section executes the model on a set of images, processes the predictions, and saves the quantized model.

- **Initialization**: We start by setting the `QUANTIZED` and `ONNX` flags to determine which model execution path to follow. We also initialize a logger (`GraphEvaluatorLogger`) to track the progress of the evaluation.

- **Processing Each Sample**: We iterate through a specified number of image samples from the dataset:
  - The logger updates a progress bar to provide feedback on the current processing state.
  - Each image is read and preprocessed to the required input shape of 640x640 pixels.

- **Model Execution**: Based on the flags:
  - If `QUANTIZED` is `True`, we execute the quantized model (`quantized_net`) with the preprocessed input.
  - If `ONNX` is `True`, we prepare the input for the ONNX model and run the inference session using `onnxruntime`.
  - If neither flag is set, we execute the standard loaded network (`loaded_net`).

- **Output Handling**: The optimized model produces multiple outputs that need to be rearranged. This part of the code reshapes and concatenates the outputs to create a single prediction tensor:
  - The predictions are then processed through a non-max suppression function to filter out overlapping bounding boxes based on a confidence threshold.

- **Scaling Predictions**: The detected bounding box coordinates are scaled back to match the original image dimensions for accurate visualization.

- **Final Output Preparation**: We extract bounding boxes, scores, and class labels from the predictions. The bounding box format is adjusted from `(xmin, ymin, xmax, ymax)` to `(ymin, xmin, ymax, xmax)` for compatibility with later processing.

- **Model Saving and Compilation**: Finally, the quantized model is saved and compiled into the specified directory, ensuring it is ready for deployment.

This structured approach ensures that images are efficiently processed and predictions are accurately made and stored.

***

In [None]:
QUANTIZED = True
ONNX = False

ort.set_default_logger_severity(3)

video_state = {}

def load_and_preprocess_image(filename):
    """
    Load and preprocess an image or the next frame from a video.

    Args:
        filename (str): Path to the image or video file.
    
    Returns:
        tuple: (preprocessed_image, original_image/frame)
    
    Raises:
        StopIteration: When no more frames are available in a video.
    """
    if filename.endswith(".mp4"):
        # Check if the video is already opened
        if filename not in video_state:
            cap = cv2.VideoCapture(filename)
            if not cap.isOpened():
                raise FileNotFoundError(f"Cannot open video file: {filename}")
            video_state[filename] = cap
        else:
            cap = video_state[filename]

        # Read the next frame
        ret, frame = cap.read()
        if not ret:  # End of video
            cap.release()
            del video_state[filename]  # Clean up state
            raise StopIteration(f"No more frames in video: {filename}")

        # Preprocess the frame
        preprocessed_frame = preprocess(img=frame, input_shape=(IMG_SIZE, IMG_SIZE)).transpose(0, 2, 3, 1)
        return preprocessed_frame, frame

    else:
        # Handle image files
        image_path = os.path.join(DATASET_PATH, filename).replace("labels", "images").replace(".txt", ".jpg")
        image = cv2.imread(image_path)
        if image is None:
            raise FileNotFoundError(f"Cannot load image file: {image_path}")
        preprocessed_image = preprocess(img=image, input_shape=(IMG_SIZE, IMG_SIZE)).transpose(0, 2, 3, 1)
        return preprocessed_image, image


def execute_model(preprocessed_image):
    """Execute the model based on the selected mode (QUANTIZED or ONNX)."""
    inputs = {InputName('images'): preprocessed_image}
    
    if QUANTIZED:
        return quantized_net.execute(inputs)
    elif ONNX:
        ort_session = ort.InferenceSession(ONNX_MODEL_NAME)
        
        preprocessed_image = preprocessed_image.transpose(0, 3, 1, 2)
        outputs = ort_session.run(None, {'images': preprocessed_image})

        final_output = []
        for output in outputs:
            final_output.append(output.transpose(0, 2, 3, 1))
        return final_output
    else:
        return loaded_net.execute(inputs)

def process_and_draw_outputs(out, original_image):
    """Process the model outputs and draw predictions on the original image."""
    output = []
    for i in range(3):
        data = list()
        for x in out[0 + i*3 : 3 + i*3]:
            shape = x.shape
            x = x.reshape(*shape[0:3], 3, shape[3] // 3).transpose(0, 3, 1, 2, 4)
            x = x.reshape(shape[0], -1, shape[3] // 3)
            data.append(x)
        data = np.concatenate(data, axis=2)
        output.append(data)

    pred = np.concatenate(output, axis=1)

    # Get predictions    
    detections = get_detections(pred, original_image.shape)

    return draw_detections(original_image, detections)

def get_detections(pred, image_shape):
    """Get bounding box predictions and scale them to the original image size."""
    net_out = None
    pred = non_max_suppression(prediction=torch.from_numpy(pred), conf_thres=0.18, iou_thres=0.0)

    # Scale coordinates to match original picture
    detections = []
    for detections in pred:
        detections[:, :4] = scale_coords((IMG_SIZE, IMG_SIZE), detections[:, :4], image_shape).round()

    if detections is not None:
        bbox, scores, classes_number = detections[:, :4], detections[:, 4:5], detections[:, 5:6]
        bboxes = np.array(bbox)
        net_out = [bboxes, scores.detach().cpu().numpy()[:, 0], [classes[int(class_n)] for class_n in classes_number]]

    return net_out

def draw_detections(image, detections):
    """Draw bounding boxes and labels on the image."""
    boxes = detections[0]
    for i, box in enumerate(boxes):
        x_min, y_min, x_max, y_max = map(int, box[:4])
        score = detections[1][i]
        label = detections[2][i]
        color = (0, 255, 0)  # Green color for the boxes
        cv2.rectangle(image, (x_min, y_min), (x_max, y_max), color, 2)
        cv2.putText(image, f'Class: {label} | Score: {score:.2f}', 
                    (x_min, y_min - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
    
    return image

def run_inference(num_samples=NUM_OF_SAMPLES):
    """Run inference on a set number of samples from the dataset."""
    logger = GraphEvaluatorLogger(True, None)

    for i, filename in enumerate(os.listdir(DATASET_PATH)[:5]):
        logger.print_progressbar(i + 1, num_samples, "")
        
        # Load and preprocess image
        preprocessed_image, original_image = load_and_preprocess_image(filename)

        # Execute the model and process outputsfilename
        out = execute_model(preprocessed_image)
        image_with_detections = process_and_draw_outputs(out, original_image)

        # Display the image with predictions in Jupyter Notebook
        display(Image.fromarray(image_with_detections[:, :, ::-1]))

# After defining the functions, call run_inference to execute predictions
run_inference()

***
# ML Model Compilation

This code snippet handles saving and compiling the quantized YOLOv7 model, placing the compiled model into a specified directory.

1. **Setting the Directory Path**:
   - `saved_mpk_directory = latest_yolov7_run + "/compiled_yolov7"`:
     - This line constructs the path where the compiled YOLOv7 model will be saved. It uses `latest_yolov7_run` to get the path to the most recent YOLOv7 training run and appends `/compiled_yolov7` to create a subdirectory for the compiled model.

2. **Saving the Quantized Model**:
   - `quantized_net.save("yolov7", output_directory=saved_mpk_directory)`:
     - This line saves the quantized YOLOv7 model (represented by `quantized_net`) to the specified directory (`saved_mpk_directory`). The model is saved under the name `"yolov7"`.

3. **Compiling the Quantized Model**:
   - `quantized_net.compile(output_path=saved_mpk_directory, compress=False)`:
     - This line compiles the saved quantized model and stores the compiled files in the `saved_mpk_directory`. The `compress=False` argument ensures that the model is not compressed during the compilation process.

This sequence of steps ensures that the quantized YOLOv7 model is saved and compiled into a format ready for deployment or further use.

***

In [None]:
saved_mpk_directory = latest_yolov7_run + "/compiled_yolov7"
quantized_net.save("yolov7", output_directory=saved_mpk_directory)
quantized_net.compile(output_path=saved_mpk_directory, compress=False)

***
# Preparing the compiled model for Edgematic

In this final step, we **upload** the compiled YOLOv7 model artifact to an Amazon S3 bucket and **generate a pre-signed URL** for easy access:

1. **Extracting the Model Name and Path**  
   - We parse `latest_yolov7_run` to get the run name (e.g., `exp7`) and construct the path to the compiled `.tar.gz` file. This archive contains the optimized, quantized, or compiled YOLOv7 model. 

2. **Uploading to S3 with boto3**  
   - We import the `boto3` library, which provides a Python interface to AWS services. We create an S3 resource (`boto3.resource("s3")`) and call `upload_file(...)` to transfer the `.tar.gz` file from our local machine to the specified S3 bucket (`s3_bucket`), storing it under a structured key (`models/<name>.tar.gz`).

3. **Ensuring Easy Retrieval**  
   - Once the file is in S3, we generate a **pre-signed URL** that will remain valid for 30 minutes. By calling `generate_presigned_url` on an S3 **client**, we get a unique link that can be shared to grant temporary download permission—no need to make the bucket publicly accessible.  
   - This approach allows you or other collaborators to retrieve the YOLOv7 artifact without manually logging into AWS or altering S3 permissions.

By **storing the compiled model in S3** and distributing a **time-limited pre-signed URL**, you ensure a secure, controlled mechanism for accessing the compiled YOLOv7 model—ideal for subsequent steps such as inference on AWS services or local testing in Edgematic.
***

In [None]:
import boto3

# --------------------------------
# 1) Upload the compiled tar.gz file
# --------------------------------

name = latest_yolov7_run.split("/")[-1]
file_name = f"{latest_yolov7_run}/compiled_yolov7/{name}_mpk.tar.gz"
object_key = f"models/{name}.tar.gz"

s3 = boto3.resource('s3')
s3.meta.client.upload_file(file_name, s3_bucket, object_key)

print(f"Uploaded {file_name} to s3://{s3_bucket}/{object_key}")

# --------------------------------
# 2) Generate a 30-minute pre-signed URL
# --------------------------------

s3_client = boto3.client("s3")

presigned_url = s3_client.generate_presigned_url(
    ClientMethod="get_object",
    Params={
        "Bucket": s3_bucket,
        "Key": object_key
    },
    ExpiresIn=1800  # 30 minutes
)

print("Pre-Signed URL (valid for 30 min):")
print(presigned_url)