One-Class Carrot Detection in Webcam Video

This code uses a pre-trained MobileNetV2 model to extract features from images. These features then train a One-Class SVM to identify carrots. The system captures live webcam frames, extracts features from each, and uses the SVM to detect carrots. Detected carrots are logged with a timestamp, and results are displayed live on the frame. Finally, all detections in the video (only 100 frames due to size limits) are saved to a CSV file.

Potential Future Improvements
- Add object detection for bounding boxes: To pinpoint where carrots are, not just if they're present.
- Implement tracking to count unique carrots: To avoid counting the same carrot multiple times across frames.
- Use a binary classifier (carrot vs. non-carrot) and robustly trained for better performance: A one-class model can struggle with diverse "not carrot" examples. 
- Optimize for real-time performance (e.g., GPU, quantization): To ensure smooth, high-speed processing for live video.
- Handle multiple carrots in one frame: The current system only indicates presence, not quantity of items at this point.

In [None]:
from torchvision import models, transforms
from PIL import Image
import torch
import os

import numpy as np

# Load pretrained feature extractor
model = models.mobilenet_v2(pretrained=True).features
model.eval()

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor()
])

def extract_feature(img_path):
    img = Image.open(img_path).convert('RGB')
    img_tensor = transform(img).unsqueeze(0)
    with torch.no_grad():
        feature = model(img_tensor).squeeze().mean([1, 2]).numpy()  # Global avg pooling
    return feature

# Folder of carrot images
features = []
for file in os.listdir("../fruits-360/Training/Carrot 1"):
    if file.endswith(".jpg") or file.endswith(".png"):
        f = extract_feature(os.path.join("../fruits-360/Training/Carrot 1", file))
        features.append(f)

X = np.vstack(features)


: 

In [None]:
from sklearn.svm import OneClassSVM

clf = OneClassSVM(kernel='rbf', gamma='scale', nu=0.1)
clf.fit(X)

In [None]:
import cv2
from IPython.display import display, clear_output
from PIL import Image
import numpy as np
import time

# Setup logging
carrot_log = []

# Show image in notebook
def show_frame(frame):
    img = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    clear_output(wait=True)
    display(img)

# Carrot detection + logging
def is_carrot(frame):
    img = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    tensor = transform(img).unsqueeze(0)
    with torch.no_grad():
        feature = model(tensor).squeeze().mean([1, 2]).numpy()
    is_detected = clf.predict([feature])[0] == 1

    if is_detected:
        timestamp = time.strftime("%Y-%m-%d %H:%M:%S")
        carrot_log.append((timestamp, "Carrot detected"))

    return is_detected

# Start camera loop
cap = cv2.VideoCapture(0)
for _ in range(100):  # or use while True for continuous
    ret, frame = cap.read()
    if not ret:
        break

    label = "Carrot" if is_carrot(frame) else "Not Carrot"
    cv2.putText(frame, label, (10,30), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,0,0), 2)
    show_frame(frame)
cap.release()



In [None]:
# Save log to CSV
import csv
with open("carrot_log.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["timestamp", "event"])
    writer.writerows(carrot_log)