# Training on the Crowd Human Dataset

The model is first trained on the well-known [Crowd Human](https://www.crowdhuman.org/) dataset. This dataset helps the model learn the basic features needed to detect human bodies and heads in images effectively.

Note: This training process is optimized to run on Google Colab with GPU support, which significantly speeds up the computations.

In [None]:
# Install the library with YOLO models
%pip install ultralytics

In [None]:
from ultralytics import settings, YOLO
settings.update({"wandb": False})

# Load a COCO-pretrained YOLO11n model
model = YOLO("yolo11n.pt")

# Train the model on the Crown Human dataset for 100 epochs
model.train(
    data="/content/drive/MyDrive/crowd_human/data.yaml", # Path to the dataset configuration file in YAML format
    project="/content/drive/MyDrive/crowd_human/runs",   # Directory to save training results and logs
    epochs=100,                                          # Number of training epochs
    batch=16,                                            # Batch size for training
    imgsz=640,                                           # Input image size (width and height)
    device=0,                                            # GPU device ID (use 0 for the first GPU, or 'cpu' for CPU training)
    save=True,                                           # Whether to save the trained model and checkpoints
)

# Fine-Tuning on Custom Dataset

The next step is to fine-tune the model using a custom-labeled dataset obtained from original museum videos. This process is necessary to address several challenges:

1. Background: The museum environment contains numerous statues and exhibits that the pre-trained model frequently misclassifies as human beings.
2. Site Conditions: The museum videos have distinctive camera angles and lighting conditions that are not adequately represented in the Crowd Human dataset, necessitating adaptation.
3. Crowd Densities: The model needs to see the museum-specific crowd densities to improve its prediction confidence and accuracy.

The custom dataset comprises 230 images, split into 220 images for training and 10 images for validation. Given the small size of the dataset, the validation set has been handcrafted to include both trivial and non-trivial scenarios:

- Trivial Scenarios: All human beings in the room are fully visible.
- Non-Trivial Scenarios Some human beings are partially obscured (either by other visitors or by exhibition items) requiring the model to handle occlusions effectively.

The fine-tuning process ensures the model adapts to the nuances of the museum environment. 

In [None]:
from ultralytics import settings, YOLO
settings.update({"wandb": False})

# Load the best weights from the Crown Human training session
model = YOLO("/content/drive/MyDrive/crowd_human/runs/train/weights/best.pt")

model.train(
    data="/content/drive/MyDrive/museum_human/data.yaml", # Path to the dataset configuration file in YAML format
    project="/content/drive/MyDrive/museum_human/runs",   # Directory to save training results and logs
    epochs=100,                                           # Number of training epochs
    batch=16,                                             # Batch size for training
    imgsz=960,                                            # Input image size (width and height)
    device=0,                                             # GPU device ID (use 0 for the first GPU, or 'cpu' for CPU training)
    save=True,                                            # Whether to save the trained model and checkpoints
)

# Results

The model performed well on key object detection metrics:

- mAP50: 0.986  
  Indicates excellent detection accuracy with minimal localization errors.  

- mAP50-95: 0.757  
  Shows good detection accuracy even under stricter IoU thresholds.  

- Box Precision: 0.95  
  Reflects strong agreement between predicted boxes and ground truth boxes.