## Running multiple models with warm-up

This notebook demonstrates how to load and run multiple AI models using DeGirum PySDK on a Hailo-8 or Hailo-8L device.

It showcases the model warm-up technique, which involves running a single dummy inference on each model after loading. This step ensures all runtime resources and tensor buffers are initialized, which avoids latency spikes during the first real inference.

#### Necessary Imports & loading models

In [None]:
import cv2, numpy as np, degirum as dg

# ------------------------------------------------------------------
# 1. SETUP
# ------------------------------------------------------------------
host = "@local"
zoo = "degirum/hailo"
device_type = "HAILORT/HAILO8L"
token=''
pose_model_name = "yolov8n_relu6_coco_pose--640x640_quant_hailort_hailo8l_1"
face_model_name = "scrfd_500m--640x640_quant_hailort_hailo8l_1"
face_vec_model_name = "arcface_mobilefacenet--112x112_quant_hailort_hailo8l_1"


#### Comparing latency with warmup and without warmup

In [None]:
import time, numpy as np, degirum as dg

# Load a Hailo model
model = dg.load_model(
    model_name="yolov8n_relu6_coco_pose--640x640_quant_hailort_hailo8l_1",
    inference_host_address="@cloud",
    token="dg_8PNGrkCskAPQooMPxoBRT8qBPSzac2cKoF2Qo",
    zoo_url="degirum/hailo",
    device_type="HAILORT/HAILO8L"
)

dummy_input = np.zeros((640,640,3), dtype=np.uint8)

# --- Inference WITHOUT warm-up ---
start = time.time()
_ = model(dummy_input)
t1 = time.time() - start
print(f"First inference (no warm-up): {t1*1000:.1f} ms")

# --- Inference WITH warm-up ---
_ = model(dummy_input)  # warm-up step

start = time.time()
_ = model(dummy_input)
t2 = time.time() - start
print(f"Subsequent inference (warmed up): {t2*1000:.1f} ms")


## Multi-model inference pipeline with warmup


We run Pose detection continuously on every frame. If a dummy condition is met (e.g., more than one person detected), we run face detection model to localize faces and then we use face embedding (vector) model on each detected face. This showcases how using a dummy inference reduces latency while model switching

#### Loading models and running warm-up inference

In [None]:

print("Loading models...")
PoseModel = dg.load_model(model_name=pose_model_name, inference_host_address=host, zoo_url=zoo, token=token, device_type=device_type)
FaceModel = dg.load_model(model_name=face_model_name, inference_host_address=host, zoo_url=zoo, token=token, device_type=device_type)
FaceVectorModel = dg.load_model(model_name=face_vec_model_name, inference_host_address=host, zoo_url=zoo, token=token, device_type=device_type)


# Dummy image for warm-up
dummy_pose_img = np.zeros((640,640,3), dtype=np.uint8)
dummy_face_img = np.zeros((640,640,3), dtype=np.uint8)
dummy_face_crop = np.zeros((112,112,3), dtype=np.uint8)

print("Warming up models...")
PoseModel(dummy_pose_img)
FaceModel(dummy_face_img)
FaceVectorModel(dummy_face_crop)

print("Warm-up complete. Models are ready for real-time inference.")

#### Running inference

In [None]:

cap = cv2.VideoCapture(0) # change to a file path or RTSP if needed

while True:
  ret, frame = cap.read()
  if not ret: break

  # --- Pose inference ---
  pose_result = PoseModel(frame)
  print("\n========== POSE MODEL ==========")
  # print(f"Raw Output: {pose_result} \n ")
  print(f"Number of people detected: {len(pose_result.results)}")
  overlay = pose_result.image_overlay.copy()

  # Dummy condition: run face inference only if >1 person detected
  if len(pose_result.results) > 1:
    print(f"***** Condition met: {len(pose_result.results)} persons found")

    # --- Run Face Detection ---
    face_result = FaceModel(frame)
    print("\n========== FACE DETECTION MODEL ==========")
    # print(f"Raw Output: {face_result}")
    print(f"Number of faces detected: {len(face_result.results)}")
    for face in face_result.results:
      x1,y1,x2,y2 = map(int, face["bbox"])
      face_crop = frame[y1:y2, x1:x2]

      # Resize to 112x112 for FaceVec
      if face_crop.shape[0] > 0 and face_crop.shape[1] > 0:
        face_resized = cv2.resize(face_crop, (112,112))
        vec_result = FaceVectorModel(face_resized)
        embedding = np.asarray(vec_result.results[0]["data"]).flatten()

        # Optional: print embedding length or preview ID
        print("\n========== FACE VECTOR MODEL ==========")
        # print(f"Raw Output: {vec_result}")
        print(f"Embedding Length: {len(embedding)}")
        print(f"Embedding Vector (first 5): {embedding[:5]}")
        print(f"Embedding Norm: {np.linalg.norm(embedding)}")
        cv2.putText(overlay, f"VecID: {embedding[0]:.2f}", (x1, y1 - 10),
            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 0), 1)

  # Show overlay
  cv2.imshow("Pose + Optional Face Pipeline", overlay)
  if cv2.waitKey(1) & 0xFF in (ord('q'), ord('x')):
    break

cap.release()
cv2.destroyAllWindows()