# APEX API Demo: Evaluating Object Detection Models

This notebook demonstrates how to use YRIKKA's APEX API to evaluate object detection models in specific contexts. We'll walk through:

1. Setting up your environment
2. Packaging your model
3. Uploading it to YRIKKA
4. Submitting an evaluation job
5. Getting and interpreting results

## Setup

First, let's import our required libraries and set up our API key. Make sure you have a `.env` file with your `YRIKKA_API_KEY`.

In [1]:
import os
import tarfile
import time
import requests
from dotenv import load_dotenv

# Load API key from .env file
load_dotenv()
api_key = os.getenv("YRIKKA_API_KEY")

if not api_key:
    print("⚠️ Please set your YRIKKA_API_KEY in .env file")
else:
    print("✅ API key loaded successfully")

✅ API key loaded successfully


## Configuration

Let's set up our configuration variables. We'll be using the YOLOv8 Cherry detection example model included in the repository.

In [2]:
# Configuration
my_model_dir = "../examples/yolo_v8_cherry"  # Directory containing model files
output_filename = "model_package.tar.gz"  # Name for our tarball

# API endpoints
presigned_url = "https://api.yrikka.com/v1/presigned"
submit_job_url = "https://api.yrikka.com/v1/submit-job"
job_status_url = "https://api.yrikka.com/v1/job-status"

# Headers for API requests
headers = {
    "x-api-key": api_key,
    "Content-Type": "application/json"
}

## Step 1: Create Model Package

First, we'll create a tarball of our model directory. This package must include:
- inference.py (implementing required functions)
- model weights file
- manifest.json

The example model directory structure looks like:
```
yolo_v8_cherry/
├── inference.py     # Implements required interface functions
├── model.pt        # Your model weights
└── manifest.json   # Configuration specifying entry points
```

In [3]:
def create_tarball(directory, output_file):
    with tarfile.open(output_file, "w:gz") as tar:
        tar.add(directory, arcname=os.path.basename(directory))
    return os.path.getsize(output_file)

# Create the tarball
size_bytes = create_tarball(my_model_dir, output_filename)
size_mb = size_bytes / (1024 * 1024)

print(f"✅ Created tarball: {output_filename}")
print(f"📦 Package size: {size_mb:.1f} MB")

if size_mb > 4000:
    print("⚠️ Warning: Package is larger than 4GB limit!")

✅ Created tarball: model_package.tar.gz
📦 Package size: 45.9 MB


## Step 2: Get Upload URL

Now we'll request a pre-signed URL from YRIKKA to upload our model package. This URL will be valid for a limited time and allows us to securely upload our model to YRIKKA's storage.

The response includes:

- **Upload URL**: A temporary URL for uploading your model package
- **Package URI**: Your model's unique identifier that you'll use when submitting evaluation jobs


In [4]:
def get_presigned_url():
    response = requests.get(presigned_url, headers=headers)
    response.raise_for_status()
    return response.json()["upload_url"], response.json()["s3_uri"]

# Get the upload URL
upload_url, model_package_uri = get_presigned_url()

print("✅ Received pre-signed URL")
print(f"📤 Upload URL: {upload_url}")
print(f"📫 Model package will be stored at: {model_package_uri}")

✅ Received pre-signed URL
📤 Upload URL: https://yrikka-public.s3.amazonaws.com/uploads/b7e223e35cdf470cb8f2615b6845a844/model_package.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA4FQXMVW7CRN4UKTG%2F20250403%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20250403T143843Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEIf%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMiJHMEUCIQC45y9CfRpO4SSnVTG7gTqP1TwZlH84YkYJuQ%2BJN3uvwQIgRPhj8cm77UFV0C8CDa0gM0EOpEBeKHAWDhJl44nE9fYqhQMI8P%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAAGgw4MzY0OTM5NDYzMDIiDGwR1PktlmdVbv9fwCrZApK%2BPXVtmAvEi7AsSOsn9GbvDHjisBofI0W6sUwGmVDF0BLWe7bjvpWvk232fr8cRjBwZQHksP48gpnDWHCvdj0Lj4x3bbQTYaTISXDggzeqzPjBPwr5DtDGpmaYr4zz2Qbc8VH5%2BqJK9lzgXR2A07irbaVfPYWafHELedNCYy0%2FpRxeslGIs7wR9q%2Bo2LH8kZFRDRrCf9FwF%2FS3SDEFMF03X1VO3cKw%2BpxUEQfCTcnxwGY4iy8WBJG7EzYW66t6%2FZUYKvSMCZvzDKxqjT5BwkBUFJdkNm2KJCCO8A1iHV81VJfvXfLTSYFMtb7RZlUJ8Q%2FuIlIrmYX7e0v9fmazEU2JUPNPoaRC2XDIVfzi0AytkiZ2K6TO5e6o08cRsw6Myk47NJ

## Step 3: Upload Model Package

Let's upload our model package using the pre-signed URL. This step might take a few minutes depending on your model size and internet connection.

⚠️ **Important**: Your model package must be less than 4GB in size. Larger packages will be rejected.

In [5]:
def upload_tarball(upload_url, file_path):
    print("📤 Uploading model package...")
    with open(file_path, "rb") as f:
        response = requests.put(upload_url, data=f)
    response.raise_for_status()

# Upload the package
upload_tarball(upload_url, output_filename)
print("✅ Model package uploaded successfully!")

📤 Uploading model package...
✅ Model package uploaded successfully!


## Step 4: Submit Evaluation Job

Now we'll submit our evaluation job with a detailed context description. The context description is crucial as it determines what scenarios your model will be tested in.

For this example, we're testing the model's ability to detect two types of cherries in different conditions.

In [6]:
def submit_job(model_package_uri):
    data = {
        "s3_model_package_uri": model_package_uri,
        "target_classes": ["dark_brown_cherry", "green_cherry"],
        "context_description": (
            "Test the cherry detection model under dark and light conditions at various times of day "
            "and in different weather conditions. Additionally, evaluate the model's performance in "
            "detecting cherries lying on both green grass and brown dirt."
        )
    }
    response = requests.post(submit_job_url, headers=headers, json=data)
    response.raise_for_status()
    return response.json()["job_id"]

# Submit the job
job_id = submit_job(model_package_uri)
print(f"✅ Job submitted successfully!")
print(f"📋 Job ID: {job_id}")

✅ Job submitted successfully!
📋 Job ID: 00cd433e-ac78-4c42-8f67-e7b2d2bad5d4


## Step 5: Monitor Job Progress

Let's check the status of our job. Note that evaluation typically takes 20-60 minutes as APEX:
1. Generates custom test images based on your context
2. Runs your model on these images
3. Analyzes performance across different scenarios

The cell below will check status every 5 minutes until completion.

In [7]:
def check_job_status(job_id):
    while True:
        response = requests.get(job_status_url, headers=headers, params={"job_id": job_id})
        response.raise_for_status()
        
        data = response.json()
        status = data.get("status")
        message = data.get("message")

        print(f"🔍 Job status message: {message}")
        
        if status == "SUCCESS":
            print("✅ Evaluation completed successfully!")
            return data.get("results")
        elif status in ["FAIL", "ERROR"]:
            print(f"❌ Job failed: {message}")
            return None
        
        print(f"⏳ Still processing...")
        time.sleep(300)  # Wait 5 minutes between checks

# Check status and get results
results = check_job_status(job_id)

🔍 Job status message: Currently processing at node: generate_images.
⏳ Still processing...
🔍 Job status message: Currently processing at node: generate_images.
⏳ Still processing...
🔍 Job status message: Currently processing at node: evaluation_agent.
⏳ Still processing...
🔍 Job status message: Currently processing at node: evaluation_agent.
⏳ Still processing...
🔍 Job status message: Currently processing at node: evaluation_agent.
⏳ Still processing...
🔍 Job status message: Currently processing at node: evaluation_agent.
⏳ Still processing...
🔍 Job status message: Currently processing at node: evaluation_agent.
⏳ Still processing...
🔍 Job status message: Currently processing at node: evaluation_agent.
⏳ Still processing...
🔍 Job status message: None
✅ Evaluation completed successfully!


## Results Analysis

Let's examine our results in detail. APEX provides both aggregate metrics and granular breakdowns by context.

The metrics include:
- **Precision**: Percentage of correct detections among all detections
- **Recall**: Percentage of actual objects that were detected
- **F1-score**: Harmonic mean of precision and recall

In [9]:
if results:
    # Print aggregate metrics
    print("📊 Overall Performance:")
    agg = results["Aggregate"]
    print(f"   Precision: {agg['Precision']:.3f}")
    print(f"   Recall: {agg['Recall']:.3f}")
    print(f"   F1-score: {agg['F1-score']:.3f}")
    
    print("\n📈 Detailed Performance by Context:")
    for category in results["Granular"]:
        print(f"\n{category['Category'].upper()}:")
        for item in category["Items"]:
            print(f"\n   {item['Context']}:")
            print(f"      Precision: {item['Precision']:.3f}")
            print(f"      Recall: {item['Recall']:.3f}")
            print(f"      F1-Score: {item['F1-Score']:.3f}")
            
    print("\n💡 Analysis Tips:")
    print("- Look for significant variations in performance across different contexts")
    print("- Pay attention to contexts where F1-score is particularly low")
else:
    print("❌ No results available")

📊 Overall Performance:
   Precision: 0.384
   Recall: 0.534
   F1-score: 0.387

📈 Detailed Performance by Context:

LIGHTING CONDITION:

   light:
      Precision: 0.352
      Recall: 0.551
      F1-Score: 0.374

   dark:
      Precision: 0.593
      Recall: 0.444
      F1-Score: 0.508

BACKGROUND:

   green_grass:
      Precision: 0.422
      Recall: 0.559
      F1-Score: 0.429

   brown_dirt:
      Precision: 0.349
      Recall: 0.469
      F1-Score: 0.348

TIME OF DAY:

   afternoon:
      Precision: 0.392
      Recall: 0.524
      F1-Score: 0.438

   morning:
      Precision: 0.287
      Recall: 0.235
      F1-Score: 0.226

   night:
      Precision: 1.000
      Recall: 0.500
      F1-Score: 0.667

   evening:
      Precision: 0.400
      Recall: 0.400
      F1-Score: 0.400

   not_specified:
      Precision: 0.615
      Recall: 0.652
      F1-Score: 0.606

WEATHER CONDITION:

   foggy:
      Precision: 0.556
      Recall: 0.333
      F1-Score: 0.417

   cloudy:
      Precision: 0.