# Pig Behavior Detection - YOLO Training on Colab

This notebook trains a YOLO model to classify pig behaviors:
- **Distress behaviors**: tail_biting, ear_biting, aggression
- **Normal behaviors**: eating, sleeping, rooting

## Setup Instructions

1. **Enable GPU**: Runtime ‚Üí Change runtime type ‚Üí GPU ‚Üí Save
2. **Upload your data**: 
   - Upload videos to `videos/` folder
   - Upload JSON annotations to `annotations/` folder
3. **Run all cells** in order

The trained model will be saved as `best.pt` - download it when training completes!


In [None]:
# Install dependencies
!pip install ultralytics opencv-python-headless pyyaml -q

print("‚úì Dependencies installed")


In [None]:
# Create directory structure
import os
from pathlib import Path

directories = [
    'videos',
    'annotations',
    'pig_crops/train',
    'pig_crops/val',
    'yolo_dataset/train/images',
    'yolo_dataset/train/labels',
    'yolo_dataset/val/images',
    'yolo_dataset/val/labels',
]

for dir_path in directories:
    os.makedirs(dir_path, exist_ok=True)

print("‚úì Directory structure created")
print("\nNext: Upload your videos to 'videos/' and JSON files to 'annotations/'")


## Step 1: Import from Google Drive (FASTEST METHOD!)

If you uploaded your `videos/` and `annotations/` folders to Google Drive, use this method instead of direct upload.

**Steps:**
1. Upload your folders to Google Drive (anywhere in "My Drive")
2. Run the cell below
3. It will ask for permission to access Drive
4. Enter the path to your folders in Drive


In [None]:
# Import from Google Drive
from google.colab import drive
import shutil
import os
from pathlib import Path

# Mount Google Drive
print("Mounting Google Drive...")
print("You'll be asked to authorize access - click 'Connect to Google Drive'")
drive.mount('/content/drive')

print("\n‚úì Google Drive mounted!")
print("\n" + "="*60)
print("IMPORTANT: Find your folders in Google Drive")
print("="*60)
print("\nYour folders should be somewhere like:")
print("  /content/drive/MyDrive/videos/")
print("  /content/drive/MyDrive/annotations/")
print("\nOR they might be in a subfolder like:")
print("  /content/drive/MyDrive/Faunavision/videos/")
print("  /content/drive/MyDrive/Faunavision/annotations/")
print("\n" + "="*60)

# Ask user for the path
print("\nEnter the path to your videos folder in Google Drive:")
print("Example: MyDrive/videos  or  MyDrive/Faunavision/videos")
drive_videos_path = input("Path to videos folder: ").strip()

print("\nEnter the path to your annotations folder in Google Drive:")
print("Example: MyDrive/annotations  or  MyDrive/Faunavision/annotations")
drive_annotations_path = input("Path to annotations folder: ").strip()

# Construct full paths
full_videos_path = f"/content/drive/{drive_videos_path}"
full_annotations_path = f"/content/drive/{drive_annotations_path}"

# Verify paths exist
if not os.path.exists(full_videos_path):
    print(f"\n‚ùå Error: Videos folder not found at: {full_videos_path}")
    print("\nAvailable folders in /content/drive/MyDrive/:")
    try:
        for item in os.listdir("/content/drive/MyDrive/")[:10]:
            print(f"  - {item}")
    except:
        pass
else:
    print(f"\n‚úì Found videos folder: {full_videos_path}")
    
if not os.path.exists(full_annotations_path):
    print(f"\n‚ùå Error: Annotations folder not found at: {full_annotations_path}")
else:
    print(f"‚úì Found annotations folder: {full_annotations_path}")

# Copy files
if os.path.exists(full_videos_path) and os.path.exists(full_annotations_path):
    print("\nüìÅ Copying files from Google Drive to Colab...")
    
    # Create local folders
    os.makedirs('videos', exist_ok=True)
    os.makedirs('annotations', exist_ok=True)
    
    # Copy videos
    video_files = list(Path(full_videos_path).glob('*.mp4'))
    print(f"\nCopying {len(video_files)} video files...")
    for video_file in video_files:
        shutil.copy2(video_file, f'videos/{video_file.name}')
    print(f"‚úì Copied {len(video_files)} videos to videos/")
    
    # Copy annotations
    json_files = list(Path(full_annotations_path).glob('*.json'))
    print(f"\nCopying {len(json_files)} JSON files...")
    for json_file in json_files:
        shutil.copy2(json_file, f'annotations/{json_file.name}')
    print(f"‚úì Copied {len(json_files)} JSON files to annotations/")
    
    print("\n" + "="*60)
    print("‚úì‚úì‚úì IMPORT COMPLETE! ‚úì‚úì‚úì")
    print("="*60)
    print(f"\nVideos: {len(video_files)} files")
    print(f"Annotations: {len(json_files)} files")
    print("\nYou can now proceed to the next cells!")
else:
    print("\n‚ö†Ô∏è  Please check the paths and try again.")
    print("\nTip: You can also list files with:")
    print("  !ls /content/drive/MyDrive/")


## Alternative: Direct Upload (if not using Google Drive)

If you prefer to upload directly to Colab instead of using Google Drive, use the cells below.


### Option A: Upload via Colab file browser
1. Click the folder icon (üìÅ) on the left sidebar
2. Navigate to `videos/` and `annotations/` folders
3. Click "Upload" and select your files

### Option B: Upload via code (run the cell below)


In [None]:
# Upload files directly (alternative method if not using Google Drive)
from google.colab import files
import shutil

print("Upload your video files (MP4 format):")
uploaded_videos = files.upload()
for filename in uploaded_videos.keys():
    shutil.move(filename, f'videos/{filename}')
    print(f"‚úì Moved {filename} to videos/")

print("\nUpload your JSON annotation files:")
uploaded_jsons = files.upload()
for filename in uploaded_jsons.keys():
    if filename.endswith('.json'):
        shutil.move(filename, f'annotations/{filename}')
        print(f"‚úì Moved {filename} to annotations/")

print("\n‚úì Upload complete!")


In [None]:
# Verify uploaded files
import os

videos = [f for f in os.listdir('videos') if f.endswith('.mp4')]
jsons = [f for f in os.listdir('annotations') if f.endswith('.json')]

print(f"Videos found: {len(videos)}")
for v in videos[:5]:
    print(f"  - {v}")
if len(videos) > 5:
    print(f"  ... and {len(videos) - 5} more")

print(f"\nJSON files found: {len(jsons)}")
for j in jsons[:5]:
    print(f"  - {j}")
if len(jsons) > 5:
    print(f"  ... and {len(jsons) - 5} more")

if len(videos) == 0 or len(jsons) == 0:
    print("\n‚ö†Ô∏è  Warning: No videos or JSON files found. Please upload your data first!")
else:
    print("\n‚úì Files ready for processing")


## Step 1: Upload Your Data

**Option A: Upload via Colab file browser**
1. Click the folder icon on the left sidebar
2. Navigate to `videos/` and `annotations/` folders
3. Click "Upload" and select your files

**Option B: Upload via code (run the cell below)**


In [None]:
# Upload files (alternative method)
from google.colab import files
import shutil

print("Upload your video files (MP4 format):")
uploaded_videos = files.upload()
for filename in uploaded_videos.keys():
    shutil.move(filename, f'videos/{filename}')
    print(f"‚úì Moved {filename} to videos/")

print("\nUpload your JSON annotation files:")
uploaded_jsons = files.upload()
for filename in uploaded_jsons.keys():
    if filename.endswith('.json'):
        shutil.move(filename, f'annotations/{filename}')
        print(f"‚úì Moved {filename} to annotations/")

print("\n‚úì Upload complete!")


In [None]:
# Verify uploaded files
import os

videos = [f for f in os.listdir('videos') if f.endswith('.mp4')]
jsons = [f for f in os.listdir('annotations') if f.endswith('.json')]

print(f"Videos found: {len(videos)}")
for v in videos[:5]:
    print(f"  - {v}")
if len(videos) > 5:
    print(f"  ... and {len(videos) - 5} more")

print(f"\nJSON files found: {len(jsons)}")
for j in jsons[:5]:
    print(f"  - {j}")
if len(jsons) > 5:
    print(f"  ... and {len(jsons) - 5} more")

if len(videos) == 0 or len(jsons) == 0:
    print("\n‚ö†Ô∏è  Warning: No videos or JSON files found. Please upload your data first!")
else:
    print("\n‚úì Files ready for processing")


## Step 2: Parse Annotations and Extract Crops

This step reads your JSON annotations and extracts cropped pig images from videos, organized by behavior class.


In [None]:
# Parse annotations and extract crops
import json
import cv2
from pathlib import Path
from collections import Counter

# Behavior mapping from your JSON labels to our classes
BEHAVIOR_MAPPING = {
    'sleep': 'sleeping',
    'lying': 'sleeping',
    'eat': 'eating',
    'drink': 'eating',
    'walk': 'rooting',
    'run': 'rooting',
    'standing': 'rooting',
    'sitting': 'rooting',
    'investigating': 'rooting',
    'playwithtoy': 'rooting',
    'jumpontopof': 'rooting',
    'fight': 'aggression',
    'chase': 'aggression',
    'nose-poke-elsewhere': 'tail_biting',
    'nose-to-nose': 'ear_biting',
    'other': 'rooting',
    # Direct matches
    'tail_biting': 'tail_biting',
    'ear_biting': 'ear_biting',
    'aggression': 'aggression',
    'eating': 'eating',
    'sleeping': 'sleeping',
    'rooting': 'rooting',
}

BEHAVIOR_CLASSES = {
    'tail_biting': 0,
    'ear_biting': 1,
    'aggression': 2,
    'eating': 3,
    'sleeping': 4,
    'rooting': 5,
}

def parse_json_annotation(json_path):
    """Parse JSON annotation file."""
    with open(json_path, 'r') as f:
        data = json.load(f)
    
    if isinstance(data, dict) and 'objects' in data:
        converted = []
        for obj in data['objects']:
            pig_id = obj.get('id', 'unknown')
            frames_list = obj.get('frames', [])
            
            frames = []
            bboxes = []
            behaviors = []
            visibilities = []
            ground_truths = []
            
            for frame_obj in frames_list:
                frames.append(frame_obj.get('frameNumber', 0))
                
                bbox_dict = frame_obj.get('bbox', {})
                if isinstance(bbox_dict, dict):
                    x = bbox_dict.get('x', 0)
                    y = bbox_dict.get('y', 0)
                    w = bbox_dict.get('width', 0)
                    h = bbox_dict.get('height', 0)
                    bboxes.append([x, y, x + w, y + h])
                else:
                    bboxes.append([])
                
                behavior = frame_obj.get('behaviour', '')
                behaviors.append(behavior)
                
                visible = frame_obj.get('visible', True)
                visibilities.append(1.0 if visible else 0.0)
                
                gt = frame_obj.get('isGroundTruth', True)
                ground_truths.append(gt)
            
            converted.append({
                'tracking_id': pig_id,
                'frames': frames,
                'bounding_box': bboxes,
                'behavior_label': behaviors,
                'visibility': visibilities,
                'ground_truth': ground_truths
            })
        
        return converted
    return []

def extract_crops(video_path, annotations, output_dir, min_visibility=0.5):
    """Extract cropped images from video."""
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    
    cap = cv2.VideoCapture(str(video_path))
    if not cap.isOpened():
        print(f"Error: Could not open {video_path}")
        return 0
    
    saved_count = 0
    
    for pig_annotation in annotations:
        tracking_id = pig_annotation.get('tracking_id', 'unknown')
        frames = pig_annotation.get('frames', [])
        bboxes = pig_annotation.get('bounding_box', [])
        behavior_labels = pig_annotation.get('behavior_label', [])
        visibilities = pig_annotation.get('visibility', [])
        ground_truths = pig_annotation.get('ground_truth', [])
        
        for i, frame_num in enumerate(frames):
            if i >= len(bboxes) or i >= len(behavior_labels):
                continue
            
            bbox = bboxes[i]
            behavior_label = behavior_labels[i] if i < len(behavior_labels) else ''
            visibility = visibilities[i] if i < len(visibilities) else 1.0
            ground_truth = ground_truths[i] if i < len(ground_truths) else True
            
            if not ground_truth or visibility < min_visibility:
                continue
            
            # Map behavior
            behavior_clean = behavior_label.lower().strip()
            mapped_behavior = BEHAVIOR_MAPPING.get(behavior_clean, behavior_clean)
            class_id = BEHAVIOR_CLASSES.get(mapped_behavior)
            
            if class_id is None:
                continue
            
            # Read frame
            cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
            ret, frame = cap.read()
            if not ret:
                continue
            
            # Crop
            if len(bbox) == 4:
                x1, y1, x2, y2 = bbox
                x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
                h, w = frame.shape[:2]
                x1, y1 = max(0, x1), max(0, y1)
                x2, y2 = min(w, x2), min(h, y2)
                
                if x2 > x1 and y2 > y1:
                    crop = frame[y1:y2, x1:x2]
                    
                    # Save
                    behavior_dir = output_dir / mapped_behavior
                    behavior_dir.mkdir(exist_ok=True)
                    image_name = f"{Path(video_path).stem}_pig{tracking_id}_frame{frame_num:06d}.jpg"
                    image_path = behavior_dir / image_name
                    cv2.imwrite(str(image_path), crop)
                    saved_count += 1
    
    cap.release()
    return saved_count

# Process all videos
print("Processing videos and extracting crops...")
total_saved = 0
behavior_counts = Counter()

for video_file in os.listdir('videos'):
    if not video_file.endswith('.mp4'):
        continue
    
    video_path = Path('videos') / video_file
    json_file = video_file.replace('.mp4', '.json')
    json_path = Path('annotations') / json_file
    
    if not json_path.exists():
        print(f"‚ö†Ô∏è  No JSON found for {video_file}, skipping...")
        continue
    
    print(f"\nProcessing: {video_file}")
    annotations = parse_json_annotation(json_path)
    print(f"  Found {len(annotations)} pig annotations")
    
    saved = extract_crops(video_path, annotations, 'pig_crops/train', min_visibility=0.5)
    total_saved += saved
    print(f"  Extracted {saved} cropped images")

print(f"\n‚úì Total cropped images extracted: {total_saved}")

# Count images per behavior
for behavior in BEHAVIOR_CLASSES.keys():
    behavior_dir = Path('pig_crops/train') / behavior
    if behavior_dir.exists():
        count = len(list(behavior_dir.glob('*.jpg')))
        if count > 0:
            print(f"  {behavior}: {count} images")


In [None]:
# Split into train/val (80/20)
import shutil
import random

random.seed(42)  # For reproducibility

for behavior in BEHAVIOR_CLASSES.keys():
    train_dir = Path('pig_crops/train') / behavior
    val_dir = Path('pig_crops/val') / behavior
    
    if not train_dir.exists():
        continue
    
    val_dir.mkdir(parents=True, exist_ok=True)
    
    images = list(train_dir.glob('*.jpg'))
    random.shuffle(images)
    
    val_count = int(len(images) * 0.2)  # 20% for validation
    
    for img in images[:val_count]:
        shutil.move(str(img), str(val_dir / img.name))
    
    print(f"{behavior}: {len(images) - val_count} train, {val_count} val")

print("\n‚úì Train/val split complete")


## Step 4: Prepare YOLO Dataset Format

Convert cropped images to YOLO format (images + label files).


In [None]:
# Prepare YOLO dataset format
def prepare_yolo_dataset(crops_dir, output_dir):
    """Convert crops to YOLO format."""
    crops_dir = Path(crops_dir)
    output_images = Path(output_dir) / 'images'
    output_labels = Path(output_dir) / 'labels'
    
    output_images.mkdir(parents=True, exist_ok=True)
    output_labels.mkdir(parents=True, exist_ok=True)
    
    total = 0
    
    for behavior, class_id in BEHAVIOR_CLASSES.items():
        behavior_dir = crops_dir / behavior
        if not behavior_dir.exists():
            continue
        
        for img_path in behavior_dir.glob('*.jpg'):
            # Copy image
            shutil.copy(img_path, output_images / img_path.name)
            
            # Create label file (classification: single class per image)
            # For classification, we use class_id as the label
            label_path = output_labels / (img_path.stem + '.txt')
            with open(label_path, 'w') as f:
                f.write(str(class_id))
            
            total += 1
    
    return total

# Prepare train and val datasets
print("Preparing YOLO dataset format...")
train_count = prepare_yolo_dataset('pig_crops/train', 'yolo_dataset/train')
val_count = prepare_yolo_dataset('pig_crops/val', 'yolo_dataset/val')

print(f"‚úì Train images: {train_count}")
print(f"‚úì Val images: {val_count}")


## Step 5: Create data.yaml

Create the YOLO configuration file.


In [None]:
# For YOLO classification, we just need the directory path
# The model will automatically detect classes from folder names

print("‚úì Dataset ready for classification training")
print("\nDataset path: yolo_dataset/")
print("Classes detected from folder names:")
for behavior in BEHAVIOR_CLASSES.keys():
    train_path = Path('yolo_dataset/train') / behavior
    val_path = Path('yolo_dataset/val') / behavior
    train_count = len(list(train_path.glob('*.jpg'))) if train_path.exists() else 0
    val_count = len(list(val_path.glob('*.jpg'))) if val_path.exists() else 0
    if train_count > 0 or val_count > 0:
        print(f"  - {behavior}: {train_count} train, {val_count} val")


## Step 6: Train YOLO Model

This is the main training step. It will take 1-4 hours depending on your dataset size.


In [None]:
# Train YOLO model
from ultralytics import YOLO

print("Starting YOLO classification training...")
print("This may take 1-4 hours depending on your dataset size.")
print("Early stopping: Training will stop if no improvement for 10 epochs.")
print("You can monitor progress below.\n")

# Initialize model (using YOLOv8 classification)
model = YOLO('yolov8n-cls.pt')  # Use 'yolov8s-cls.pt' or 'yolov8m-cls.pt' for larger models

# Train
# For classification, data should be the directory path, not a YAML file
results = model.train(
    data='yolo_dataset',  # Directory path, not YAML file
    epochs=100,
    imgsz=224,
    batch=16,
    name='pig_behavior_classification',
    project='.',
    patience=10,  # Early stopping: stop if no improvement for 10 epochs
    save=True,
    plots=True
)

print("\n‚úì Training complete!")
print(f"\nBest model saved at: {results.save_dir}/weights/best.pt")
print(f"Training stopped after {len(results.results_dict) if hasattr(results, 'results_dict') else 'N/A'} epochs")


## Step 7: Download Trained Model

Download the `best.pt` model file to use in your local project.


In [None]:
# Download the trained model
from google.colab import files
import os

# Find the best model
model_path = None
for root, dirs, files_list in os.walk('pig_behavior_classification'):
    if 'best.pt' in files_list:
        model_path = os.path.join(root, 'best.pt')
        break

if model_path and os.path.exists(model_path):
    print(f"‚úì Found model at: {model_path}")
    print("\nDownloading model...")
    files.download(model_path)
    print("\n‚úì Model downloaded! Save it to your project's models/ folder.")
else:
    print("‚ö†Ô∏è  Model not found. Check the training output above.")


## Next Steps

1. **Download the model**: The `best.pt` file should have downloaded above
2. **Save to your project**: Move it to `models/` folder in your FaunaVision project
3. **Set environment variable**: 
   ```bash
   export YOLO_MODEL_PATH="models/best.pt"
   ```
4. **Run your backend**: The model will automatically be used for video analysis!

---

**Training complete!** üéâ
