# FarmWise: Farmland Segmentation and Size Classification with YOLOv8

**Date**: April 14, 2025

This notebook implements a farm segmentation system using the YOLOv8 architecture to identify agricultural fields from satellite imagery, calculate their sizes, and classify them for targeted recommendations.

## Project Overview

**Goal**: Create a system that can:
1. Detect and segment farmlands from satellite imagery using YOLOv8
2. Calculate the size/area of each identified farm
3. Classify farms by size (small, medium, large)
4. Enable a recommendation system based on farm size classification

**Approach**: YOLOv8 architecture for instance segmentation

## 1. Business Understanding

### 1.1 Problem Statement

Agricultural recommendations are most effective when tailored to the specific context of a farm, with farm size being a crucial factor. Large farms may benefit from different techniques, equipment, and crop selections compared to small ones. This project aims to automatically classify farms by size from satellite imagery using YOLOv8 segmentation to enable targeted recommendations.

### 1.2 Success Criteria

- **Technical Success**: Achieve high accuracy in farmland segmentation (e.g., mAP50-95 for segmentation > 0.5)
- **Business Success**: Enable accurate size-based classification of farms for targeted recommendations

## 2. Data Acquisition and Understanding

### 2.1 Setup and Environment Preparation

In [None]:
# Install necessary packages: ultralytics for YOLOv8, roboflow for data download, and others
!pip install torch torchvision torchaudio ultralytics roboflow opencv-python matplotlib numpy pillow scikit-learn scikit-image tqdm

In [None]:
# Import required libraries
import os
import numpy as np
import matplotlib.pyplot as plt
import torch
from PIL import Image
import cv2
from skimage import measure
from tqdm.notebook import tqdm
from roboflow import Roboflow
from ultralytics import YOLO # Import YOLO
import yaml
import json

# Set random seeds for reproducibility (less critical for YOLO training itself, but good practice)
torch.manual_seed(42)
np.random.seed(42)

# Check for GPU availability and set up CUDA device
print("CUDA Available:", torch.cuda.is_available())
if torch.cuda.is_available():
    num_gpus = torch.cuda.device_count()
    print(f"Number of GPUs available: {num_gpus}")
    for i in range(num_gpus):
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
    device = torch.device('cuda:0') # YOLO typically uses device 0 by default, but can be specified
    print("Using GPU")
else:
    device = torch.device('cpu')
    print("No GPU available, using CPU. Training will be significantly slower.")

# Display CUDA version if available
if torch.cuda.is_available():
    print(f"CUDA Version: {torch.version.cuda}")
    print(f"Current CUDA device: {torch.cuda.current_device()}")

### 2.2 Data Acquisition from Roboflow

In [None]:
# Initialize Roboflow and load dataset
# Note: You will need to provide your Roboflow API key
try:
    rf = Roboflow(api_key="HE9CEH5JxJ3U0vXrQTOy")  # Replace with your actual API key
    project = rf.workspace("sid-mp92l").project("final-detectron-2")
    # Ensure you download the 'yolov8' format if available, or a compatible segmentation format
    dataset = project.version(1).download("yolov8")
    dataset_path = dataset.location
    data_yaml_path = os.path.join(dataset_path, "data.yaml")
    print(f"Dataset downloaded to: {dataset_path}")
    print(f"Data YAML path: {data_yaml_path}")
    # Verify data.yaml exists
    if not os.path.exists(data_yaml_path):
        print("\nERROR: data.yaml not found in the downloaded dataset location!")
        print("YOLOv8 training requires this file. Please check the download format and location.")
        data_yaml_path = None # Prevent use of non-existent path
except Exception as e:
    print(f"Error downloading dataset from Roboflow: {e}")
    print("Please check your API key and project details.")
    dataset_path = None
    data_yaml_path = None

### 2.3 Dataset Exploration (Optional)
YOLOv8 handles data loading internally, but we can still explore the structure.

In [None]:
# Explore the dataset structure
def explore_directory(path, level=0):
    if not os.path.exists(path):
        print(f"Directory not found: {path}")
        return
    print('  ' * level + f"|-- {os.path.basename(path)}")
    if os.path.isdir(path):
        items = os.listdir(path)
        count = 0
        for item in items:
            if count >= 10 and level > 0: # Limit display depth for subdirs
                 break
            item_path = os.path.join(path, item)
            if os.path.isdir(item_path):
                explore_directory(item_path, level + 1)
            else:
                print('  ' * (level + 1) + f"|-- {item}")
            count += 1
        if len(items) > count:
            print('  ' * (level + 1) + f"|-- ... ({len(items) - count} more items)")

if dataset_path:
    print("Dataset Structure:")
    explore_directory(dataset_path)
else:
    print("Dataset path not defined. Skipping exploration.")

In [None]:
# Visualize some sample images with their annotations (using YOLO format)
import random

def visualize_yolo_samples(data_yaml_path, num_samples=3):
    if not data_yaml_path or not os.path.exists(data_yaml_path):
        print("Cannot visualize samples: data.yaml path is invalid.")
        return

    try:
        with open(data_yaml_path, 'r') as f:
            data_cfg = yaml.safe_load(f)
    except Exception as e:
        print(f"Error reading data.yaml: {e}")
        return

    # Construct paths relative to the YAML file location
    base_dir = os.path.dirname(data_yaml_path)
    train_img_dir = os.path.join(base_dir, data_cfg.get('train', 'train/images'))
    train_label_dir = os.path.join(base_dir, data_cfg.get('train', '').replace('images', 'labels')) # Heuristic

    if not os.path.isdir(train_img_dir) or not os.path.isdir(train_label_dir):
        print(f"Error: Training image or label directory not found.")
        print(f"Checked img: {train_img_dir}")
        print(f"Checked label: {train_label_dir}")
        # Try alternative common structures if the first guess failed
        train_img_dir = os.path.join(base_dir, 'train', 'images')
        train_label_dir = os.path.join(base_dir, 'train', 'labels')
        if not os.path.isdir(train_img_dir) or not os.path.isdir(train_label_dir):
             print("Alternative paths also failed. Cannot visualize.")
             return
        else:
             print("Using alternative paths: train/images and train/labels")

    class_names = data_cfg.get('names', ['Unknown'])
    print(f"Class names: {class_names}")

    img_files = [f for f in os.listdir(train_img_dir) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]
    if not img_files:
        print(f"No images found in {train_img_dir}")
        return

    sample_files = random.sample(img_files, min(num_samples, len(img_files)))

    plt.figure(figsize=(15, 5 * num_samples))

    for i, img_file in enumerate(sample_files):
        img_path = os.path.join(train_img_dir, img_file)
        label_file = os.path.splitext(img_file)[0] + '.txt'
        label_path = os.path.join(train_label_dir, label_file)

        try:
            img = cv2.imread(img_path)
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            h, w, _ = img.shape
        except Exception as e:
            print(f"Error loading image {img_file}: {e}")
            continue

        img_draw = img.copy() # Create a copy to draw on

        if os.path.exists(label_path):
            with open(label_path, 'r') as f:
                lines = f.readlines()

            for line in lines:
                parts = line.strip().split()
                if len(parts) < 5:
                    continue

                class_id = int(parts[0])
                class_name = class_names[class_id] if class_id < len(class_names) else f"Class {class_id}"
                color = plt.cm.get_cmap('tab10')(class_id % 10)[:3] # Get a color
                color = tuple(int(c * 255) for c in color)

                # Check for segmentation format (class_id x1 y1 x2 y2 ... xN yN)
                if len(parts) > 5 and len(parts) % 2 == 1:
                    points_norm = np.array([float(p) for p in parts[1:]]).reshape(-1, 2)
                    points_pixel = (points_norm * np.array([w, h])).astype(np.int32)
                    cv2.polylines(img_draw, [points_pixel], isClosed=True, color=color, thickness=2)
                    # Optionally fill polygon
                    # overlay = img_draw.copy()
                    # cv2.fillPoly(overlay, [points_pixel], color)
                    # alpha = 0.4
                    # img_draw = cv2.addWeighted(overlay, alpha, img_draw, 1 - alpha, 0)
                    label_pos = points_pixel.min(axis=0)
                    cv2.putText(img_draw, class_name, (label_pos[0], max(0, label_pos[1]-5)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)
                # Check for detection format (class_id cx cy w h)
                elif len(parts) == 5:
                    cx, cy, bw, bh = map(float, parts[1:])
                    x1 = int((cx - bw / 2) * w)
                    y1 = int((cy - bh / 2) * h)
                    x2 = int((cx + bw / 2) * w)
                    y2 = int((cy + bh / 2) * h)
                    cv2.rectangle(img_draw, (x1, y1), (x2, y2), color, 2)
                    cv2.putText(img_draw, class_name, (x1, max(0, y1-5)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)

        plt.subplot(num_samples, 1, i + 1)
        plt.imshow(img_draw)
        plt.title(f"Image: {img_file}")
        plt.axis('off')

    plt.tight_layout()
    plt.show()

# Run visualization
visualize_yolo_samples(data_yaml_path, num_samples=3)

ModuleNotFoundError: No module named 'yaml'

### 2.4 Data Preparation (Handled by YOLOv8)

YOLOv8 handles data loading, transformations, and augmentation internally based on the `data.yaml` file and training arguments. We don't need a custom `Dataset` or `DataLoader`.

In [None]:
# This cell previously contained the FarmlandDataset class and visualization.
# It is no longer needed as YOLOv8 handles data loading.
print("Data preparation is handled internally by YOLOv8 based on the data.yaml file.")
print("Ensure the paths in data.yaml are correct relative to its location.")

# Verify the content of data.yaml
if data_yaml_path and os.path.exists(data_yaml_path):
    print("\n--- Contents of data.yaml ---")
    try:
        with open(data_yaml_path, 'r') as f:
            print(f.read())
        print("---------------------------")
    except Exception as e:
        print(f"Error reading data.yaml: {e}")
else:
    print("\nWarning: data.yaml not found or path is invalid. Cannot verify contents.")

In [None]:
# This cell previously contained DataLoader setup and transformations.
# It is no longer needed for YOLOv8 training.
print("DataLoader setup and transformations are managed by YOLOv8 during training.")
print("Augmentations can be configured via arguments in the model.train() call.")

## 3. Modeling with YOLOv8

### 3.1 Initialize YOLOv8 Segmentation Model

In [None]:
# Initialize a YOLOv8 segmentation model
# We can start with a pretrained model like 'yolov8n-seg.pt'
# Other options: yolov8s-seg.pt, yolov8m-seg.pt, yolov8l-seg.pt, yolov8x-seg.pt
try:
    model = YOLO('yolov8n-seg.pt')  # Load a pretrained segmentation model
    print("YOLOv8 segmentation model loaded successfully.")
    # Move model to the appropriate device (though YOLO often handles this internally)
    model.to(device)
except Exception as e:
    print(f"Error loading YOLO model: {e}")
    print("Ensure 'yolov8n-seg.pt' is downloadable or provide a local path.")
    model = None

In [None]:
# This cell previously initialized the U-Net model and optimizer.
# It is no longer needed for YOLOv8.
print("YOLOv8 model initialization done in the previous cell.")
print("Optimizer and loss are handled internally by YOLOv8's train method.")

### 3.2 Training the YOLOv8 Model

In [None]:
# Train the YOLOv8 model

if model and data_yaml_path:
    print(f"Starting YOLOv8 training with data: {data_yaml_path}")
    try:
        # Define training parameters
        epochs = 30 # Adjust as needed
        img_size = 640 # YOLOv8 default, adjust based on dataset/GPU memory
        batch_size = 16 # Adjust based on GPU memory (e.g., 8, 16, 32). Use -1 for auto-batch.

        # Start training
        results = model.train(
            data=data_yaml_path,
            epochs=epochs,
            imgsz=img_size,
            batch=batch_size,
            device=0 if torch.cuda.is_available() else 'cpu', # Specify device
            project='FarmWise_YOLOv8_Training', # Project folder for results
            name='exp', # Experiment name (subfolder)
            exist_ok=True, # Overwrite existing experiment folder
            # Add other augmentations/parameters as needed:
            # degrees=10, # Random rotation
            # flipud=0.5, # Random vertical flip
            # mosaic=1.0, # Mosaic augmentation (usually enabled by default)
            # patience=10 # Early stopping patience
        )
        print("YOLOv8 training completed.")
        print(f"Training results saved in: {results.save_dir}")
        # The best model weights are typically saved as 'best.pt' in the experiment folder
        best_model_path = os.path.join(results.save_dir, 'weights', 'best.pt')
        print(f"Best model saved at: {best_model_path}")

        # Load the best model for subsequent steps
        model = YOLO(best_model_path)
        model.to(device)

    except Exception as e:
        print(f"An error occurred during YOLOv8 training: {e}")
        import traceback
        traceback.print_exc()
        best_model_path = None
else:
    print("Skipping training: Model or data.yaml path not available.")
    best_model_path = None

### 3.3 Validation (Optional)
YOLOv8 automatically validates during training. We can also run validation separately.

In [None]:
# Validate the trained model (optional, as validation happens during training)
if model and data_yaml_path and best_model_path:
    print("\nRunning validation on the best model...")
    try:
        metrics = model.val(
            data=data_yaml_path,
            imgsz=640, # Use the same image size as training
            split='val' # Specify the validation split
        )
        print("Validation Metrics:")
        # Access specific metrics, e.g., segmentation mAP
        print(f"  mAP50-95(Seg): {metrics.seg.map:.4f}")
        print(f"  mAP50(Seg): {metrics.seg.map50:.4f}")
    except Exception as e:
        print(f"An error occurred during validation: {e}")
else:
    print("Skipping validation: Model or data.yaml path not available.")

### 3.4 Visualize Training Results
YOLOv8 saves training plots (like loss curves, metrics) in the results directory.

In [None]:
# Display training results plots saved by YOLOv8
results_dir = None
if 'results' in locals() and hasattr(results, 'save_dir'):
    results_dir = results.save_dir
elif best_model_path:
    # Try to infer results dir from best_model_path
    results_dir = os.path.dirname(os.path.dirname(best_model_path))

if results_dir and os.path.isdir(results_dir):
    print(f"Displaying results from: {results_dir}")
    results_png_path = os.path.join(results_dir, 'results.png')
    confusion_matrix_path = os.path.join(results_dir, 'confusion_matrix.png')

    if os.path.exists(results_png_path):
        print("\n--- Training Metrics Plot ---")
        display(Image.open(results_png_path))
    else:
        print(f"results.png not found in {results_dir}")

    if os.path.exists(confusion_matrix_path):
        print("\n--- Confusion Matrix ---")
        display(Image.open(confusion_matrix_path))
    # else:
        # print(f"confusion_matrix.png not found in {results_dir}")
else:
    print("Could not find YOLOv8 results directory. Cannot display plots.")
    print("Check the 'FarmWise_YOLOv8_Training/exp*' folders.")

## 4. Farm Size Calculation and Classification

Now we'll use our trained YOLOv8 model to segment farms and calculate their sizes.

In [None]:
# Ensure the best model is loaded
if best_model_path and os.path.exists(best_model_path):
    model = YOLO(best_model_path)
    model.to(device)
    print(f"Loaded best model from {best_model_path}")
elif 'model' not in locals() or model is None:
    print("Error: No trained YOLOv8 model available.")
    # Attempt to load a default if needed, but this likely won't be trained on the specific task
    # model = YOLO('yolov8n-seg.pt')
    # model.to(device)
else:
    print("Using the model currently in memory (might not be the best one if training failed).")

# Function to segment farms in an image using YOLOv8
def segment_farms_yolo(model, image_path, confidence_threshold=0.25):
    if model is None:
        print("Model not available for segmentation.")
        return None, None, None
    try:
        # Load image using PIL for original size info
        img_pil = Image.open(image_path).convert("RGB")
        original_size = img_pil.size # (width, height)

        # Perform prediction
        results = model.predict(image_path, conf=confidence_threshold, device=device)

        # Check if results were obtained
        if not results or len(results) == 0:
            print("No results returned from model prediction.")
            return None, np.array(img_pil), None

        # Assuming results[0] contains the prediction for the single image
        pred_result = results[0]

        # Combine masks from all detected instances into a single binary mask
        combined_mask = np.zeros(pred_result.orig_shape, dtype=np.uint8)

        if pred_result.masks is not None:
            print(f"Found {len(pred_result.masks)} potential farm segments.")
            for i, mask_data in enumerate(pred_result.masks):
                # The mask data might need resizing if prediction was done at a different size
                # Access the mask array (assuming it's directly available or via .data)
                mask_tensor = mask_data.data.squeeze() # Get the mask tensor [H, W]
                mask_np = mask_tensor.cpu().numpy().astype(np.uint8)

                # Resize mask to original image size if necessary
                if mask_np.shape != pred_result.orig_shape:
                     mask_np_resized = cv2.resize(mask_np, (original_size[0], original_size[1]), interpolation=cv2.INTER_NEAREST)
                else:
                     mask_np_resized = mask_np

                # Add this mask to the combined mask (use bitwise OR)
                combined_mask = cv2.bitwise_or(combined_mask, mask_np_resized * 255) # Multiply by 255 if mask is 0/1
        else:
            print("No segmentation masks found in the prediction results.")

        return combined_mask, np.array(img_pil), pred_result # Return raw results too

    except FileNotFoundError:
        print(f"Error: Image file not found at {image_path}")
        return None, None, None
    except Exception as e:
        print(f"Error during YOLO segmentation: {e}")
        import traceback
        traceback.print_exc()
        # Try to load image anyway for context
        try:
            img_pil = Image.open(image_path).convert("RGB")
            return None, np.array(img_pil), None
        except:
            return None, None, None

# Function to calculate farm sizes and classify them (remains largely the same)
# Takes the combined binary mask as input
def calculate_farm_sizes(binary_mask, pixels_per_meter=None):
    if binary_mask is None:
        return [], None
    # Use connected component analysis to identify individual farms
    # Ensure mask is binary (0 or 255)
    binary_mask_01 = (binary_mask > 128).astype(np.uint8)
    labeled_mask, num_farms = measure.label(binary_mask_01, connectivity=2, return_num=True)

    print(f"Found {num_farms} connected components.")

    # Calculate properties of each labeled region
    regions = measure.regionprops(labeled_mask)

    # Store farm areas
    farm_areas = []

    for region in regions:
        # Skip very small regions (likely noise)
        if region.area < 100:  # Adjust threshold as needed
            continue

        # Calculate area in pixels
        area_pixels = region.area

        # Convert to real-world units if pixels_per_meter is provided
        if pixels_per_meter is not None:
            area_sq_meters = area_pixels / (pixels_per_meter ** 2)
            # Convert to hectares (1 hectare = 10,000 sq meters)
            area_hectares = area_sq_meters / 10000
            farm_areas.append(area_hectares)
        else:
            farm_areas.append(area_pixels)

    print(f"Calculated areas for {len(farm_areas)} farms (after filtering small regions).")
    return farm_areas, labeled_mask

# Function to classify farms by size (remains the same)
def classify_farms(farm_areas, unit='pixels'):
    # Define size thresholds (adjust based on your specific context)
    if unit == 'hectares':
        # Real-world thresholds (in hectares)
        small_threshold = 10     # 0-10 hectares = small farm
        medium_threshold = 50    # 10-50 hectares = medium farm
        # > 50 hectares = large farm
    else:
        # Pixel-based thresholds (adjust based on your image resolution)
        small_threshold = 5000     # 0-5000 pixels = small farm
        medium_threshold = 20000   # 5000-20000 pixels = medium farm
        # > 20000 pixels = large farm

    # Classify each farm
    farm_classes = []
    for area in farm_areas:
        if area < small_threshold:
            farm_classes.append('Small')
        elif area < medium_threshold:
            farm_classes.append('Medium')
        else:
            farm_classes.append('Large')

    # Count farms in each category
    class_counts = {
        'Small': farm_classes.count('Small'),
        'Medium': farm_classes.count('Medium'),
        'Large': farm_classes.count('Large')
    }

    return farm_classes, class_counts

In [None]:
# Function to visualize farm segmentation and classification (using labeled mask)
def visualize_farm_classification(image, labeled_mask, farm_areas, farm_classes):
    if image is None or labeled_mask is None:
        print("Cannot visualize: Image or labeled mask is missing.")
        return

    # Create a colormap for visualization
    # Use specific colors for classes
    colors = {
        'Small': [0, 255, 0, 128], # Green
        'Medium': [255, 255, 0, 128], # Yellow
        'Large': [255, 0, 0, 128] # Red
    }
    legend_colors = {'Small': 'green', 'Medium': 'yellow', 'Large': 'red'}

    # Create a colored overlay based on farm classification
    overlay = np.zeros((image.shape[0], image.shape[1], 4), dtype=np.uint8)

    regions = measure.regionprops(labeled_mask)
    valid_region_indices = [i for i, r in enumerate(regions) if r.area >= 100] # Indices of farms that passed area threshold

    if len(valid_region_indices) != len(farm_classes):
        print(f"Warning: Mismatch between number of classified farms ({len(farm_classes)}) and valid regions ({len(valid_region_indices)}). Visualization might be incomplete.")
        # Attempt to proceed, assuming farm_classes corresponds to the filtered regions

    class_idx = 0
    for i, region in enumerate(regions):
        if region.area < 100:
            continue

        if class_idx < len(farm_classes):
            farm_class = farm_classes[class_idx]
            color_value = colors.get(farm_class, [128, 128, 128, 128]) # Default gray
            # Fill region with corresponding color
            coords = region.coords
            overlay[coords[:, 0], coords[:, 1]] = color_value
            class_idx += 1
        else:
            # Handle mismatch if necessary
            print(f"Warning: No class found for region {i+1}. Skipping color.")

    # Plot results
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 7))

    # Original image with segmentation overlay
    ax1.imshow(image)
    ax1.imshow(overlay)
    ax1.set_title('Farm Segmentation and Classification')
    ax1.axis('off')

    # Create custom legend
    legend_elements = [
        plt.Rectangle((0, 0), 1, 1, color=legend_colors['Small'], alpha=0.5, label='Small Farms'),
        plt.Rectangle((0, 0), 1, 1, color=legend_colors['Medium'], alpha=0.5, label='Medium Farms'),
        plt.Rectangle((0, 0), 1, 1, color=legend_colors['Large'], alpha=0.5, label='Large Farms')
    ]
    ax1.legend(handles=legend_elements, loc='upper right')

    # Pie chart of farm size distribution
    class_counts = {
        'Small': farm_classes.count('Small'),
        'Medium': farm_classes.count('Medium'),
        'Large': farm_classes.count('Large')
    }

    if sum(class_counts.values()) > 0:  # Check if we have any farms
        labels = list(class_counts.keys())
        sizes = list(class_counts.values())
        pie_colors = [legend_colors[l] for l in labels]

        ax2.pie(sizes, labels=labels, colors=pie_colors, autopct='%1.1f%%', startangle=90)
        ax2.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
        ax2.set_title('Farm Size Distribution')
    else:
        ax2.text(0.5, 0.5, 'No farms detected or classified', horizontalalignment='center', verticalalignment='center')
        ax2.axis('off')

    plt.tight_layout()
    plt.show()

In [None]:
# Process a sample image using YOLOv8
def process_sample_image_yolo(image_path, pixel_scale=None):
    if not model:
        print("Model not available. Cannot process image.")
        return None, None, None

    # Segment farms using YOLOv8
    # Use a lower confidence if needed to capture more potential segments
    binary_mask, image, raw_results = segment_farms_yolo(model, image_path, confidence_threshold=0.25)

    if image is None:
        print("Failed to load or process image.")
        return None, None, None

    if binary_mask is None:
        print("Segmentation failed or produced no mask.")
        # Show the original image anyway
        plt.imshow(image)
        plt.title("Original Image (Segmentation Failed)")
        plt.axis('off')
        plt.show()
        return [], [], {'Small': 0, 'Medium': 0, 'Large': 0}

    # Calculate farm sizes
    unit = 'hectares' if pixel_scale else 'pixels'
    farm_areas, labeled_mask = calculate_farm_sizes(binary_mask, pixel_scale)

    # Classify farms
    farm_classes, class_counts = classify_farms(farm_areas, unit=unit)

    # Print results
    print(f"\n--- Analysis for {os.path.basename(image_path)} ---")
    print(f"Number of farms detected and classified: {len(farm_areas)}")
    print(f"Farm size classification ({unit}): {class_counts}")

    # Visualize results using the labeled mask from connected components
    visualize_farm_classification(image, labeled_mask, farm_areas, farm_classes)

    # Optional: Visualize raw YOLO predictions
    if raw_results:
        print("\n--- Raw YOLOv8 Prediction Visualization ---")
        try:
            # Use YOLO's built-in plotting
            pred_plot = raw_results.plot() # Returns numpy array (BGR)
            plt.figure(figsize=(10, 10))
            plt.imshow(cv2.cvtColor(pred_plot, cv2.COLOR_BGR2RGB))
            plt.title("Raw YOLOv8 Detections/Segmentations")
            plt.axis('off')
            plt.show()
        except Exception as plot_e:
            print(f"Could not plot raw YOLO results: {plot_e}")

    return farm_areas, farm_classes, class_counts

# Try with a test image from the validation set
test_img_dir = None
if dataset_path:
    # Construct path relative to dataset location
    base_dir = dataset_path
    # Try finding validation images path from data.yaml if possible
    val_img_path_rel = 'valid/images' # Default guess
    if data_yaml_path and os.path.exists(data_yaml_path):
        try:
            with open(data_yaml_path, 'r') as f:
                data_cfg = yaml.safe_load(f)
                val_img_path_rel = data_cfg.get('val', val_img_path_rel)
        except Exception as e:
            print(f"Warning: Could not read val path from data.yaml: {e}")
    test_img_dir = os.path.join(base_dir, val_img_path_rel)

if test_img_dir and os.path.isdir(test_img_dir):
    test_img_files = [f for f in os.listdir(test_img_dir) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]
    if test_img_files:
        # Select a random image or the first one
        # test_img_path = os.path.join(test_img_dir, random.choice(test_img_files))
        test_img_path = os.path.join(test_img_dir, test_img_files[0])
        print(f"\nProcessing test image: {test_img_path}")
        # Note: Provide pixel_scale if known, e.g., pixel_scale=0.5 (meters per pixel)
        farm_areas, farm_classes, class_counts = process_sample_image_yolo(test_img_path, pixel_scale=None)
    else:
        print(f"No image files found in validation directory: {test_img_dir}")
else:
    print("Validation image directory not found or dataset path not set. Cannot process test image.")

## 5. Recommendation System Based on Farm Size
(This section remains the same as it depends on the classification output, not the model type)

In [None]:
# Define recommendations based on farm size
def get_recommendations_by_size(farm_size):
    recommendations = {
        'Small': {
            'Crop Selection': [
                'Focus on high-value crops (e.g., specialty vegetables, herbs, berries)',
                'Consider intercropping to maximize land use',
                'Explore vertical farming techniques for space optimization'
            ],
            'Equipment': [
                'Invest in versatile, small-scale equipment',
                'Consider equipment sharing programs or cooperatives',
                'Focus on precision hand tools for specialized tasks'
            ],
            'Marketing': [
                'Direct-to-consumer sales (farmers markets, CSA)',
                'Develop value-added products',
                'Leverage organic or specialty certifications'
            ],
            'Sustainability': [
                'Implement intensive organic practices',
                'Consider agroecological approaches',
                'Explore permaculture design principles'
            ]
        },
        'Medium': {
            'Crop Selection': [
                'Balance between specialty and commodity crops',
                'Consider crop rotation systems',
                'Explore diversification strategies'
            ],
            'Equipment': [
                'Invest in mid-sized tractors and implements',
                'Consider precision agriculture technology',
                'Develop efficient irrigation systems'
            ],
            'Marketing': [
                'Develop relationships with local wholesalers and restaurants',
                'Consider cooperative marketing',
                'Explore agritourism opportunities'
            ],
            'Sustainability': [
                'Implement integrated pest management',
                'Consider conservation tillage practices',
                'Develop soil health management plans'
            ]
        },
        'Large': {
            'Crop Selection': [
                'Focus on efficient production of commodity crops',
                'Consider dedicating portions to specialty high-value crops',
                'Implement strategic crop rotation systems'
            ],
            'Equipment': [
                'Invest in large-scale, efficient machinery',
                'Implement precision agriculture and automation',
                'Consider GPS guidance systems and variable rate technology'
            ],
            'Marketing': [
                'Develop contracts with processors and distributors',
                'Consider futures markets and hedging strategies',
                'Explore export opportunities'
            ],
            'Sustainability': [
                'Implement conservation agriculture practices at scale',
                'Consider renewable energy investments',
                'Develop comprehensive nutrient management plans'
            ]
        }
    }

    return recommendations.get(farm_size, {})

# Function to display recommendations for a specific farm
def display_farm_recommendations(farm_class):
    recommendations = get_recommendations_by_size(farm_class)

    if not recommendations:
        print(f"No recommendations available for {farm_class} farms.")
        return

    print(f"\n=== Recommendations for {farm_class} Farms ===\n")

    for category, items in recommendations.items():
        print(f"\n{category}:")
        for item in items:
            print(f"  • {item}")

    print("\n" + "=" * 50)

In [None]:
# Display recommendations for each farm size category
for size in ['Small', 'Medium', 'Large']:
    display_farm_recommendations(size)

## 6. End-to-End Pipeline

In [None]:
# End-to-end pipeline function using YOLOv8
def process_farm_image_pipeline(image_path, pixel_scale=None):
    """
    Process a satellite image using YOLOv8 to detect farms, classify them by size,
    and provide recommendations.

    Args:
        image_path (str): Path to the satellite image
        pixel_scale (float, optional): Scale factor in meters per pixel (if available)

    Returns:
        dict: A dictionary containing the results of the analysis
    """
    if not model:
        print("Model not available. Cannot run pipeline.")
        return None

    print(f"\n=== Running End-to-End Pipeline for: {image_path} ===")

    # Step 1: Segment farms using the trained YOLOv8 model
    binary_mask, image, raw_results = segment_farms_yolo(model, image_path, confidence_threshold=0.25)

    if image is None:
        print("Pipeline failed: Could not load or process image.")
        return None
    if binary_mask is None:
        print("Pipeline failed: Segmentation did not produce a mask.")
        # Optionally show the original image
        plt.imshow(image); plt.title("Original Image (Segmentation Failed)"); plt.axis('off'); plt.show()
        return {'num_farms': 0, 'farm_areas': [], 'farm_classes': [], 'class_counts': {'Small': 0, 'Medium': 0, 'Large': 0}, 'predominant_size': None}

    # Step 2: Calculate farm sizes
    unit = 'hectares' if pixel_scale else 'pixels'
    farm_areas, labeled_mask = calculate_farm_sizes(binary_mask, pixel_scale)

    # Step 3: Classify farms by size
    farm_classes, class_counts = classify_farms(farm_areas, unit)

    # Step 4: Print summary
    print(f"\nFarm Analysis Summary:")
    print(f"Total farms detected and classified: {len(farm_areas)}")
    print(f"Farm size distribution ({unit}): {class_counts}")

    # Step 5: Calculate predominant farm size
    predominant_size = None
    if len(farm_areas) > 0:
        # Find the class with the highest count
        predominant_size = max(class_counts, key=class_counts.get)
        # Handle cases where counts might be zero or equal
        if class_counts[predominant_size] == 0:
             predominant_size = None # No farms actually classified
        else:
             print(f"Predominant farm size: {predominant_size}")
             # Step 6: Provide recommendations based on predominant farm size
             display_farm_recommendations(predominant_size)
    else:
        print("No farms detected or classified in the image.")

    # Step 7: Visualize results
    if len(farm_areas) > 0:
        visualize_farm_classification(image, labeled_mask, farm_areas, farm_classes)
    else:
        # If no farms, show the original image and the (likely empty) binary mask
        fig, axes = plt.subplots(1, 2, figsize=(12, 6))
        axes[0].imshow(image); axes[0].set_title("Original Image"); axes[0].axis('off')
        axes[1].imshow(binary_mask, cmap='gray'); axes[1].set_title("Segmentation Mask (No Farms Found)"); axes[1].axis('off')
        plt.suptitle("No Farms Detected or Classified")
        plt.show()

    # Step 8: Return results
    results_dict = {
        'num_farms': len(farm_areas),
        'farm_areas': farm_areas,
        'farm_classes': farm_classes,
        'class_counts': class_counts,
        'predominant_size': predominant_size
    }

    # Save results to JSON
    try:
        results_filename = f"farm_analysis_{os.path.splitext(os.path.basename(image_path))[0]}.json"
        with open(results_filename, 'w') as f:
            # Convert numpy types if necessary
            serializable_results = {k: (v.tolist() if isinstance(v, np.ndarray) else v) for k, v in results_dict.items()}
            json.dump(serializable_results, f, indent=4)
        print(f"\nAnalysis results saved to {results_filename}")
    except Exception as json_e:
        print(f"\nError saving results to JSON: {json_e}")

    print(f"=== Pipeline Finished for: {image_path} ===")
    return results_dict

# Test the end-to-end pipeline on a sample image (if available)
if test_img_dir and os.path.isdir(test_img_dir):
    test_img_files = [f for f in os.listdir(test_img_dir) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]
    if test_img_files:
        # Process the first validation image
        test_img_path_pipeline = os.path.join(test_img_dir, test_img_files[0])
        pipeline_results = process_farm_image_pipeline(test_img_path_pipeline, pixel_scale=None)
        # Process another image if available
        if len(test_img_files) > 1:
             test_img_path_pipeline_2 = os.path.join(test_img_dir, test_img_files[1])
             pipeline_results_2 = process_farm_image_pipeline(test_img_path_pipeline_2, pixel_scale=None)
    else:
        print("No validation images found to test the pipeline.")
else:
    print("Validation image directory not found. Cannot test pipeline.")

## 7. Conclusion and Next Steps

### 7.1 Summary

In this project, we have:
1. Loaded and processed a dataset of satellite imagery with farm annotations in YOLO format.
2. Initialized and trained a YOLOv8 segmentation model for farmland detection.
3. Developed methods to extract segmentation masks from YOLOv8 predictions and calculate farm sizes using connected components.
4. Created a classification system to categorize farms by size (Small, Medium, Large).
5. Built a recommendation system providing tailored advice based on farm size.
6. Integrated all components into an end-to-end pipeline for processing new images.

### 7.2 Limitations

Current limitations of the system include:
- Segmentation accuracy depends heavily on the quality and diversity of the training data and YOLOv8 model choice/hyperparameters.
- Size classification thresholds are heuristic and may need adjustment for different regions or image resolutions.
- Lacks real-world unit calibration (meters/hectares) without accurate image scale information (pixel_scale).
- Recommendations are general and do not account for specific climate, soil, or market conditions.
- Connected components might merge adjacent farms if segmentation is imperfect or farms are very close.

### 7.3 Future Improvements

Potential next steps for improving the system:
1. **Model Enhancements**:
   - Experiment with larger YOLOv8 models (e.g., yolov8m-seg, yolov8l-seg) if compute resources allow.
   - Fine-tune hyperparameters (learning rate, augmentations, epochs) using tools like hyperparameter sweeping.
   - Train on a larger, more diverse dataset covering various farm types, lighting conditions, and geographical areas.
   - Explore techniques to better separate adjacent fields if merging is an issue.

2. **Size Calculation**:
   - Integrate with GIS systems or use image metadata (e.g., EXIF, GeoTIFF tags) to obtain accurate geospatial coordinates and determine image scale (pixel_scale) automatically.
   - Account for terrain variations using Digital Elevation Models (DEMs) for more accurate area calculations.

3. **Recommendation System**:
   - Incorporate climate data (temperature, rainfall), soil type data, and market prices for more targeted recommendations.
   - Develop region-specific recommendation models.
   - Create a more interactive interface allowing users to input additional farm context.

4. **Deployment & User Interface**:
   - Export the trained YOLOv8 model to formats like ONNX or TensorRT for optimized inference.
   - Develop a web or mobile application for easier access, allowing users to upload imagery or select areas on a map.
   - Provide visualization tools for farmers to explore segmentation results and recommendations.

5. **Validation**:
   - Conduct field validation comparing calculated areas with actual farm measurements.
   - Collect feedback from agricultural experts and farmers on the accuracy and usefulness of the segmentations and recommendations.

## 8. Model Export (YOLOv8)

YOLOv8 provides a simple method to export the trained model to various formats like ONNX, TensorRT, CoreML, etc., for deployment.

In [None]:
# Export the trained YOLOv8 model

if model and best_model_path:
    print(f"Exporting model from: {best_model_path}")
    try:
        # Export to ONNX format (recommended for cross-platform compatibility)
        onnx_path = model.export(format='onnx', imgsz=640) # Use the same imgsz as training/validation
        print(f"Model exported to ONNX format at: {onnx_path}")

        # Optional: Export to other formats if needed
        # torchscript_path = model.export(format='torchscript')
        # print(f"Model exported to TorchScript format at: {torchscript_path}")

        # For TensorRT export (requires TensorRT installation and GPU)
        # if torch.cuda.is_available():
        #     try:
        #         tensorrt_path = model.export(format='engine', device=0) # Specify GPU device
        #         print(f"Model exported to TensorRT engine at: {tensorrt_path}")
        #     except Exception as trt_e:
        #         print(f"Could not export to TensorRT: {trt_e}")
        # else:
        #     print("Skipping TensorRT export: No GPU available or TensorRT not installed.")

    except Exception as export_e:
        print(f"An error occurred during model export: {export_e}")
else:
    print("Skipping model export: No trained model available.")

In [None]:
# This cell previously contained the U-Net transfer learning training loop.
# It is removed as we are using YOLOv8.
print("U-Net transfer learning section removed.")

## 9. Visualization of Predictions (YOLOv8)

Let's visualize the predictions of the final YOLOv8 model on a few validation images.

In [None]:
# Function to visualize YOLOv8 predictions on an image
def visualize_yolo_predictions(model, image_path, conf_threshold=0.25):
    if not model:
        print("Model not available.")
        return
    if not os.path.exists(image_path):
        print(f"Image not found: {image_path}")
        return

    print(f"Visualizing predictions for: {os.path.basename(image_path)}")
    try:
        # Run prediction
        results = model.predict(image_path, conf=conf_threshold, device=device)

        if results and len(results) > 0:
            # Use YOLO's built-in plot function
            pred_plot = results[0].plot() # Returns a BGR numpy array

            # Display the plot
            plt.figure(figsize=(12, 12))
            plt.imshow(cv2.cvtColor(pred_plot, cv2.COLOR_BGR2RGB))
            plt.title(f"YOLOv8 Predictions (Conf: {conf_threshold})")
            plt.axis('off')
            plt.show()
        else:
            print("No predictions found for this image.")
            # Show original image if no predictions
            img = Image.open(image_path)
            plt.imshow(img)
            plt.title("Original Image (No Predictions)")
            plt.axis('off')
            plt.show()

    except Exception as e:
        print(f"Error during prediction visualization: {e}")
        import traceback
        traceback.print_exc()

# Visualize predictions on a few validation images
if test_img_dir and os.path.isdir(test_img_dir):
    test_img_files = [f for f in os.listdir(test_img_dir) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]
    num_viz = min(3, len(test_img_files)) # Visualize up to 3 images
    if num_viz > 0:
        print(f"\n--- Visualizing Predictions on {num_viz} Validation Images ---")
        selected_files = random.sample(test_img_files, num_viz)
        for img_file in selected_files:
            viz_path = os.path.join(test_img_dir, img_file)
            visualize_yolo_predictions(model, viz_path, conf_threshold=0.25)
    else:
        print("No validation images found for visualization.")
else:
    print("Validation image directory not found. Cannot visualize predictions.")

### End of Notebook

In [None]:
# This cell previously contained ONNX export for U-Net.
# YOLOv8 export is handled in Section 8.
print("U-Net ONNX export section removed.")

In [None]:
# This cell previously contained a U-Net specific visualization function.
# YOLOv8 prediction visualization is handled in Section 9.
print("U-Net visualization function removed.")