# Project Notes (Sanitized for Git)

This repository contains a **sanitized** version of the Gracity Insects YOLOv8 Classification notebooks.
All tenant-specific identifiers (bucket names, namespaces, OCIDs, local absolute paths) have been replaced by placeholders.

**Author:** Cristina Varas Menadas  
**Last updated:** 2026-02-19

> To run these notebooks, set the configuration values in the first "Configuration" section of each notebook.


# Gracity Insects (YOLO Classification) â€” 00. Environment & OCI Access

This notebook verifies your environment, installs dependencies (if needed), and validates Object Storage access using **Resource Principals**.


## Configuration

Update these variables for your tenancy/project.

- **Bucket**: `<BUCKET_NAME>`
- **Dataset prefix** (images): `<PROJECT_PREFIX>/v1/raw/datasets/insects_kaggle_v1/`
- **Labels prefix** (metadata/manifests): `<PROJECT_PREFIX>/v1/labels/insects_kaggle_v1/`
- **Runs prefix** (artifacts): `<PROJECT_PREFIX>/yolo/runs/insects_kaggle_v1/`

We intentionally keep **`test/` as validation** for this starter project (to match your current bucket structure).

## 0.1 Imports

In [None]:
from __future__ import annotations

import os
import sys
import platform
from typing import Any, Dict

## 0.2 Verify runtime

In [None]:
print("Python:", sys.version)
print("Platform:", platform.platform())
print("Working dir:", os.getcwd())

## 0.3 Install / verify dependencies

If your notebook image already has these installed, the commands will be fast/no-op.

In [None]:
%pip -q install --upgrade "oci>=2.120.0" "ultralytics>=8.0.0" "scikit-learn>=1.3.0" "pandas>=2.0.0" "matplotlib>=3.7.0" "tqdm>=4.66.0" "opencv-python-headless>=4.9.0"

## 0.4 Authenticate to Object Storage with Resource Principals

In [None]:
import oci
from oci.object_storage import ObjectStorageClient

signer = oci.auth.signers.get_resource_principals_signer()
os_client = ObjectStorageClient(config={}, signer=signer)

namespace: str = os_client.get_namespace().data
print("Namespace:", namespace)

## 0.5 Basic bucket listing

This is a quick connectivity + policy check.

In [None]:
from typing import List

BUCKET_NAME: str = "<BUCKET_NAME>"
PREFIX: str = "<PROJECT_PREFIX>/v1/raw/datasets/insects_kaggle_v1/"

resp = os_client.list_objects(namespace, BUCKET_NAME, prefix=PREFIX, limit=10)
objects: List[str] = [o.name for o in resp.data.objects]
print("Sample objects:")
for name in objects:
    print(" -", name)

## 0.6 (Optional) Print full counts under train/ and test/

This can take time if there are many objects.

In [None]:
def count_objects(prefix: str) -> int:
    total: int = 0
    next_start: str | None = None
    while True:
        r = os_client.list_objects(namespace, BUCKET_NAME, prefix=prefix, start=next_start, limit=1000)
        total += len(r.data.objects)
        next_start = r.data.next_start_with
        if not next_start:
            break
    return total

train_prefix = PREFIX + "train/"
test_prefix  = PREFIX + "test/"

print("train count:", count_objects(train_prefix))
print("test (val) count:", count_objects(test_prefix))