<a href="https://colab.research.google.com/github/RovanDhaliwal/Rock-Paper-Scissors/blob/main/VGP338_Rovan_Final_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Final Project: Real-Time Object Detection System

## Project Overview

Congratulations on completing the foundational modules! You've learned Python programming, data manipulation with NumPy and Pandas, built neural networks from scratch (MLPs), understood convolutional architectures (CNNs), and mastered transfer learning techniques. Now it's time to apply everything you've learned to build a **real-world object detection system**.

In this final project, you will leverage **pre-trained models** to create an intelligent system that can detect and classify objects in **images or live video feeds**. Your system should demonstrate both **high accuracy** and **real-time performance**.

---

## Learning Objectives

By completing this project, you will:
- Apply transfer learning to a real-world computer vision problem
- Work with pre-trained models from PyTorch, Hugging Face, or other sources
- Implement real-time video processing and object detection
- Optimize models for both accuracy and performance
- Build a complete end-to-end ML application
- Present and document your work professionally

---

## Project Requirements

### Core Requirements (Mandatory)

#### 1. **Use a Pre-trained Model**
You must use at least one pre-trained model from:
- **PyTorch Hub** (torchvision.models, torch.hub)
- **Hugging Face** (transformers, timm)
- **TensorFlow Hub**
- **ONNX Model Zoo**
- Other reputable sources (YOLO, Detectron2, etc.)

#### 2. **Object Detection Capability**
Your system must be able to:
- Detect objects in static images
- Process live video feed from webcam OR video files
- Draw bounding boxes around detected objects
- Display class labels and confidence scores

#### 3. **Performance Metrics**
You must measure and report:
- **Accuracy**: Precision, Recall, F1-Score, mAP (if applicable)
- **Speed**: FPS (Frames Per Second), inference time
- **Resource Usage**: CPU/GPU utilization, memory consumption

#### 4. **User Interface**
Implement at least one of:
- Command-line interface with clear instructions
- GUI using OpenCV, Tkinter, or Streamlit (Stretch Goals)
- Web interface using Flask/FastAPI (Stretch Goals)
- Jupyter notebook with interactive widgets

#### 5. **Documentation**
Provide comprehensive documentation including:
- README with setup instructions
- Code comments explaining key sections
- Model selection justification
- Performance analysis and results
- Demo video or screenshots

---

## Project Ideas

Choose **ONE** of the following project ideas or propose your own (subject to approval):

### ü§ñ **1. Autonomous Robot Obstacle Detection**
Build a system that detects obstacles for autonomous navigation:
- Detect people, vehicles, furniture, walls
- Calculate distance/proximity warnings
- Real-time processing for navigation decisions
- **Bonus**: Integrate with robot simulator (Gazebo, Webots)

**Suggested Models**: YOLO, Faster R-CNN, MobileNet-SSD

---

### ü§ü **2. Sign Language Translator**
Create a real-time sign language recognition system:
- Detect hand gestures from webcam
- Translate ASL (American Sign Language) to text
- Support alphabet and common words/phrases
- **Bonus**: Text-to-speech output

**Suggested Models**: MediaPipe Hands, Custom CNN with transfer learning, Vision Transformers

---

### ‚úä‚úã‚úåÔ∏è **3. Unbeatable Rock-Paper-Scissors AI**
Build an AI that predicts and beats human players:
- Real-time hand gesture recognition
- Predict opponent's move before they complete it
- Track win/loss statistics
- **Bonus**: Add pattern recognition to predict player tendencies

**Suggested Models**: MobileNetV2, EfficientNet, Custom CNN

---

### üöó **4. Traffic Sign Detection & Recognition**
Develop a system for autonomous driving assistance:
- Detect and classify traffic signs (stop, yield, speed limit, etc.)
- Work with various lighting and weather conditions
- Real-time processing for driving scenarios
- **Bonus**: Add lane detection

**Suggested Models**: YOLO, Faster R-CNN, ResNet with transfer learning

---

### üîç **5. Smart Surveillance System**
Create an intelligent security monitoring system:
- Detect people, vehicles, suspicious activities
- Alert on specific events (person entering restricted area)
- Track objects across frames
- **Bonus**: Face recognition, anomaly detection

**Suggested Models**: YOLO, Detectron2, RetinaNet

---

### üçé **6. Food Recognition & Nutrition Tracker**
Build a dietary assistant application:
- Identify food items from photos
- Estimate portion sizes
- Provide nutritional information
- **Bonus**: Meal logging and calorie tracking

**Suggested Models**: EfficientNet, Vision Transformers, Food-101 pre-trained models

---

### üêï **7. Pet Breed Classifier & Detector**
Develop a pet identification system:
- Detect cats/dogs in images or video
- Classify breed with high accuracy
- Provide breed information and characteristics
- **Bonus**: Multi-pet detection in same frame

**Suggested Models**: ResNet, EfficientNet, YOLO for detection

---

### üéØ **8. Custom Project (Your Idea)**
Propose your own object detection project:
- Must involve real-time or near-real-time processing
- Must use pre-trained models with transfer learning
- Must have clear accuracy and performance metrics
- Submit a 1-page proposal for approval

---

## Technical Specifications

### Minimum Performance Targets

| Metric             | Minimum Target | Excellent Target |
| ------------------ | -------------- | ---------------- |
| **Accuracy**       | 70%            | 90%+             |
| **FPS (Video)**    | 10 FPS         | 30+ FPS          |
| **Inference Time** | <200ms         | <50ms            |
| **Model Size**     | <500MB         | <100MB           |

### Required Technologies

**Core Stack:**
- Python 3.8+
- PyTorch or TensorFlow
- OpenCV for video processing
- NumPy for data manipulation

**Recommended Libraries:**
```python
# Computer Vision
import torch
import torchvision
from torchvision import transforms
import cv2

# Pre-trained Models
from transformers import pipeline  # Hugging Face
import timm  # PyTorch Image Models

# Utilities
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Performance
import time
from collections import deque
```

---

## Project Phases

### Phase 1: Research & Planning
- [ ] Choose your project idea
- [ ] Research available pre-trained models
- [ ] Identify datasets for testing/validation
- [ ] Set up development environment
---

### Phase 2: Model Selection & Integration
- [ ] Download and test pre-trained models
- [ ] Implement basic inference pipeline
- [ ] Test on sample images
- [ ] Benchmark initial performance

---

### Phase 3: Real-Time Implementation
- [ ] Implement video capture and processing
- [ ] Optimize for real-time performance
- [ ] Add visualization (bounding boxes, labels)
- [ ] Implement FPS counter and metrics
- [ ] Handle edge cases and errors

---

### Phase 4: Optimization & Enhancement
- [ ] Fine-tune model if needed
- [ ] Optimize inference speed
- [ ] Improve accuracy through data augmentation
- [ ] Add advanced features (tracking, alerts, etc.)
- [ ] Conduct thorough testing

---

### Phase 5: Documentation & Presentation
- [ ] Write comprehensive README
- [ ] Create demo video
- [ ] Prepare presentation slides
- [ ] Document challenges and solutions
- [ ] Prepare for final presentation

---

## Implementation Guide

### Step 1: Set Up Your Environment

```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install torch torchvision opencv-python
pip install transformers timm
pip install numpy pandas matplotlib
pip install streamlit  # Optional: for web UI
```

### Step 2: Load Pre-trained Model

**Example with YOLO:**
```python
import torch

# Load YOLOv5 from PyTorch Hub
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# Set to evaluation mode
model.eval()

# Test on image
results = model('path/to/image.jpg')
results.show()
```

**Example with Hugging Face:**
```python
from transformers import pipeline

# Load object detection pipeline
detector = pipeline("object-detection",
                   model="facebook/detr-resnet-50")

# Detect objects
results = detector("path/to/image.jpg")
print(results)
```

### Step 3: Implement Real-Time Video Processing

```python
import cv2
import torch
import time

# Load model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Open webcam
cap = cv2.VideoCapture(0)

# FPS calculation
fps_queue = []

while True:
    start_time = time.time()
    
    # Read frame
    ret, frame = cap.read()
    if not ret:
        break
    
    # Run inference
    results = model(frame)
    
    # Render results
    rendered_frame = results.render()[0]
    
    # Calculate FPS
    fps = 1 / (time.time() - start_time)
    fps_queue.append(fps)
    if len(fps_queue) > 30:
        fps_queue.pop(0)
    avg_fps = sum(fps_queue) / len(fps_queue)
    
    # Display FPS
    cv2.putText(rendered_frame, f'FPS: {avg_fps:.1f}',
                (10, 30), cv2.FONT_HERSHEY_SIMPLEX,
                1, (0, 255, 0), 2)
    
    # Show frame
    cv2.imshow('Object Detection', rendered_frame)
    
    # Exit on 'q'
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
```

### Step 4: Optimize for Performance

**Use GPU Acceleration:**
```python
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
```

**Reduce Input Resolution:**
```python
# Resize frame for faster processing
frame_resized = cv2.resize(frame, (640, 480))
```

**Use Lighter Models:**
```python
# YOLOv5n (nano) is faster than YOLOv5s (small)
model = torch.hub.load('ultralytics/yolov5', 'yolov5n')
```

**Batch Processing (for video files):**
```python
# Process multiple frames at once
results = model(frames_batch)
```

### Step 5: Measure Performance

```python
import time
import numpy as np

def benchmark_model(model, test_images, num_runs=100):
    """Benchmark model performance"""
    inference_times = []
    
    for _ in range(num_runs):
        start = time.time()
        results = model(test_images)
        inference_times.append(time.time() - start)
    
    return {
        'mean_time': np.mean(inference_times),
        'std_time': np.std(inference_times),
        'fps': 1 / np.mean(inference_times)
    }

# Run benchmark
stats = benchmark_model(model, test_image)
print(f"Average inference time: {stats['mean_time']*1000:.2f}ms")
print(f"FPS: {stats['fps']:.1f}")
```

---

## Evaluation Criteria

Your project will be evaluated on the following criteria:

### 1. **Technical Implementation (40 points)**
- [ ] Correct use of pre-trained models (10 pts)
- [ ] Real-time video processing capability (10 pts)
- [ ] Code quality and organization (10 pts)
- [ ] Error handling and robustness (10 pts)

### 2. **Performance (25 points)**
- [ ] Accuracy metrics (10 pts)
- [ ] Speed/FPS performance (10 pts)
- [ ] Resource efficiency (5 pts)

### 3. **Innovation & Features (15 points)**
- [ ] Creative problem-solving (5 pts)
- [ ] Additional features beyond requirements (5 pts)
- [ ] User experience and interface (5 pts)

### 4. **Documentation (10 points)**
- [ ] Clear README with setup instructions (3 pts)
- [ ] Code comments and documentation (3 pts)
- [ ] Performance analysis and results (4 pts)

### 5. **Presentation (10 points)**
- [ ] Clear explanation of approach (4 pts)
- [ ] Live demo or video demonstration (4 pts)
- [ ] Discussion of challenges and solutions (2 pts)

**Total: 100 points**

**Bonus Points (up to 20):**
- Exceptional performance (>95% accuracy or >60 FPS)
- Novel application or approach
- Deployment (web app, mobile app, Docker container)
- Contribution to open source or dataset creation

---

## Submission Requirements
Githb Repo with README

### README.md Template

```markdown
# [Project Title]

## Overview
Brief description of your project and its purpose.

## Features
- Feature 1
- Feature 2
- Feature 3

## Installation

### Prerequisites
- Python 3.8+
- Webcam (for real-time detection)

### Setup
\`\`\`bash
pip install -r requirements.txt
\`\`\`

## Usage

### Run on Images
\`\`\`bash
python main.py --mode image --input path/to/image.jpg
\`\`\`

### Run on Video
\`\`\`bash
python main.py --mode video --input path/to/video.mp4
\`\`\`

### Run on Webcam
\`\`\`bash
python main.py --mode webcam
\`\`\`

## Model Information
- **Model**: [Model name and source]
- **Architecture**: [Brief architecture description]
- **Pre-training**: [What dataset was it pre-trained on]

## Performance
- **Accuracy**: XX%
- **FPS**: XX
- **Inference Time**: XXms

## Results
[Include screenshots or link to demo video]

## Challenges & Solutions
[Discuss main challenges and how you solved them]

## Future Improvements
[What would you add given more time]

## Acknowledgments
[Credit any resources, tutorials, or code you used]
```

---

## Resources

### Pre-trained Models

**Object Detection:**
- [YOLOv5](https://github.com/ultralytics/yolov5) - Fast and accurate
- [Detectron2](https://github.com/facebookresearch/detectron2) - Facebook's detection framework
- [Hugging Face DETR](https://huggingface.co/facebook/detr-resnet-50) - Transformer-based detection
- [TorchVision Models](https://pytorch.org/vision/stable/models.html) - Faster R-CNN, RetinaNet

**Classification (for specific tasks):**
- [timm](https://github.com/rwightman/pytorch-image-models) - 700+ pre-trained models
- [Hugging Face Vision Models](https://huggingface.co/models?pipeline_tag=image-classification)

### Datasets for Testing

- [COCO Dataset](https://cocodataset.org/) - Common objects
- [Open Images](https://storage.googleapis.com/openimages/web/index.html) - Large-scale detection
- [Roboflow Universe](https://universe.roboflow.com/) - Custom datasets
- [Kaggle Datasets](https://www.kaggle.com/datasets) - Various domains

### Tutorials & Documentation

- [PyTorch Object Detection Tutorial](https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html)
- [OpenCV Python Tutorials](https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html)
- [Hugging Face Vision Tasks](https://huggingface.co/tasks/object-detection)
- [Real Python - OpenCV](https://realpython.com/face-recognition-with-python/)

---

## Tips for Success

### üéØ **Start Simple**
1. Get a basic model working on static images first
2. Then add video processing
3. Finally optimize for performance

### ‚ö° **Optimize Early**
- Profile your code to find bottlenecks
- Use GPU acceleration when available
- Consider model quantization for speed

### üìä **Measure Everything**
- Track FPS, inference time, accuracy
- Compare different models
- Document what works and what doesn't

### üêõ **Debug Systematically**
- Test each component separately
- Use print statements and logging
- Validate input/output at each stage

### üí° **Be Creative**
- Add unique features that showcase your skills
- Think about real-world applications
- Make it visually appealing

### ü§ù **Ask for Help**
- Use office hours for technical questions
- Collaborate with classmates (but submit individual work)
- Search Stack Overflow and GitHub issues

---

## Presentation Guidelines

Your final presentation should be **5 minutes** and include:

1. **Introduction**
   - Problem statement
   - Why this project matters

2. **Technical Approach**
   - Model selection and justification
   - Architecture overview
   - Key implementation decisions

3. **Live Demo**
   - Show your system in action
   - Demonstrate key features
   - Highlight performance metrics

4. **Results & Analysis**
   - Accuracy and performance metrics
   - Comparison with baselines
   - Challenges and solutions

5. **Q&A**
   - Answer questions from audience

---

## Frequently Asked Questions

**Q: Can I use multiple pre-trained models?**  
A: Yes! Combining models (e.g., detection + classification) is encouraged.

**Q: What if I don't have a GPU?**  
A: Use Google Colab, Kaggle Notebooks, or optimize for CPU with lighter models.

**Q: Can I work in a team?**  
A: This is an individual project, but you can discuss ideas with classmates.

**Q: What if my accuracy is below 70%?**  
A: Document why and what you tried. Focus on learning and problem-solving.

**Q: Can I use a dataset I created?**  
A: Yes! Custom datasets are encouraged, especially for unique applications.

**Q: How do I handle webcam on remote servers?**  
A: Test locally or use video files or image sequence. Document any limitations.

---

## Academic Integrity

- You must write your own code (no copy-pasting entire projects)
- Properly cite any code snippets, tutorials, or resources used
- Pre-trained models are allowed and encouraged
- Using libraries and frameworks is expected
- Collaboration on ideas is okay, but implementation must be individual

---

## Final Notes

This project is your opportunity to showcase everything you've learned and build something impressive for your portfolio. Choose a project you're passionate about, aim for excellence, and don't be afraid to be creative!

Remember:
- **Start early** - Don't underestimate the time needed
- **Iterate often** - Get feedback and improve continuously  
- **Document everything** - Your future self will thank you
- **Have fun** - This is your chance to build something cool!

Good luck! üöÄ


In [19]:
import os
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

import torch
from torch import nn
import torchvision.models as models

from torchvision.datasets import ImageFolder
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split, Subset
import torch.optim as optim

from pathlib import Path

In [20]:
!pip install opendatasets

import opendatasets as od
od.download("https://www.kaggle.com/datasets/drgfreeman/rockpaperscissors", data_dir="data")

Collecting opendatasets
  Downloading opendatasets-0.1.22-py3-none-any.whl.metadata (9.2 kB)
Downloading opendatasets-0.1.22-py3-none-any.whl (15 kB)
Installing collected packages: opendatasets
Successfully installed opendatasets-0.1.22
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: rovandhaliwal
Your Kaggle Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑
Dataset URL: https://www.kaggle.com/datasets/drgfreeman/rockpaperscissors
Downloading rockpaperscissors.zip to data/rockpaperscissors


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 306M/306M [00:06<00:00, 47.9MB/s]





In [23]:
train_transform = transforms.Compose([
    transforms.RandomRotation(degrees=15),
    transforms.RandomHorizontalFlip(),
    transforms.Resize(244),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225])
])

test_transform = transforms.Compose({
    transforms.Resize(244),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225])
})

In [43]:
train_transform = transforms.Compose([
    transforms.RandomRotation(degrees=15),
    transforms.RandomHorizontalFlip(),
    transforms.Resize(244),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225])
])

test_transform = transforms.Compose([
    transforms.Resize(244),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225])
])

In [55]:
root = Path("data/rockpaperscissors/rps-cv-images")

base = datasets.ImageFolder(root=str(root), transform=None)

n = len(base)
perm = torch.randperm(n)
n_train = int(n * 0.8)
idx_train = perm[:n_train]
idx_test = perm[n_train:]

train_base = ImageFolder(root=str(root), transform=train_transform)
test_base = ImageFolder(root=str(root), transform=test_transform)

train_dataset = Subset(train_base, idx_train)
test_dataset = Subset(test_base, idx_test)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

In [56]:
alex_net_model.classifier = nn.Sequential(nn.Linear(9216, 1024),
                                 nn.ReLU(),
                                 nn.Dropout(0.4),
                                 nn.Linear(1024, 3), # Changed from 2 to 3 classes
                                 nn.LogSoftmax(dim=1))

In [57]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(alex_net_model.classifier.parameters(), lr=0.001)

In [59]:
epochs = 10

train_losses = []

for epoch in range(epochs):
    alex_net_model.train()
    running_loss = 0.0

    for i, batch in enumerate(train_loader, start=1):
        inputs, labels = batch

        output = alex_net_model(inputs)
        loss = criterion(output, labels)
        running_loss += loss.item()

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # print batch loss
        if i % 20 == 0:
            print(f'Batch {i} loss: {loss.item():.4f}')

    epoch_loss = running_loss / len(train_loader)
    train_losses.append(epoch_loss)
    print(f"Epoch {epoch+1}/{epochs}, Loss: {epoch_loss:.4f}")

Batch 20 loss: 0.5502
Batch 40 loss: 0.1981
Epoch 1/10, Loss: 0.2866
Batch 20 loss: 0.0038
Batch 40 loss: 0.0308
Epoch 2/10, Loss: 0.0758
Batch 20 loss: 0.0005
Batch 40 loss: 0.0034
Epoch 3/10, Loss: 0.0470
Batch 20 loss: 0.0000
Batch 40 loss: 0.0001
Epoch 4/10, Loss: 0.0280
Batch 20 loss: 0.0091
Batch 40 loss: 0.0000
Epoch 5/10, Loss: 0.0191
Batch 20 loss: 0.0018
Batch 40 loss: 0.0000
Epoch 6/10, Loss: 0.0166
Batch 20 loss: 0.0247
Batch 40 loss: 0.0269
Epoch 7/10, Loss: 0.0505
Batch 20 loss: 0.0003
Batch 40 loss: 0.0000
Epoch 8/10, Loss: 0.0587
Batch 20 loss: 0.0202
Batch 40 loss: 0.0045
Epoch 9/10, Loss: 0.0245
Batch 20 loss: 0.0000
Batch 40 loss: 0.0002
Epoch 10/10, Loss: 0.0496


In [60]:
alex_net_model.eval()
test_loss = 0.0
correct_predictions = 0
total_samples = 0

with torch.no_grad():
    for inputs, labels in test_loader:
        outputs = alex_net_model(inputs)
        loss = criterion(outputs, labels)
        test_loss += loss.item() * inputs.size(0)

        _, predicted = torch.max(outputs.data, 1)
        total_samples += labels.size(0)
        correct_predictions += (predicted == labels).sum().item()

avg_test_loss = test_loss / total_samples
accuracy = (correct_predictions / total_samples) * 100

print(f"Test Loss: {avg_test_loss:.4f}")
print(f"Test Accuracy: {accuracy:.2f}%")

Test Loss: 0.0526
Test Accuracy: 98.86%


In [62]:
# Create virtual environment
# These commands are meant for a local terminal, not directly in a Colab code cell.
# In Colab, dependencies are typically installed directly into the environment using '!pip install'.
# The virtual environment creation and activation are not applicable in Colab's typical workflow.

# Install dependencies
!pip install torch torchvision opencv-python
!pip install transformers timm
!pip install numpy pandas matplotlib
!pip install streamlit  # Optional: for web UI

Collecting streamlit
  Downloading streamlit-1.52.2-py3-none-any.whl.metadata (9.8 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.52.2-py3-none-any.whl (9.0 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m9.0/9.0 MB[0m [31m59.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m6.9/6.9 MB[0m [31m104.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pydeck, streamlit
Successfully installed pydeck-0.9.1 streamlit-1.52.2


In [64]:
import torch

# Install ultralytics package if not already installed
!pip install ultralytics

# Load YOLOv5 from PyTorch Hub
# The trust_repo=True is added to explicitly trust the repository, addressing the UserWarning.
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True, trust_repo=True)

# Set to evaluation mode
model.eval()

# Test on image
# You will need to replace 'path/to/image.jpg' with an actual image file path
# or a numpy array/PIL Image if you want to run this example.
# For now, let's just make sure the model loads correctly.
# results = model('path/to/image.jpg')
# results.show()
print("YOLOv5s model loaded successfully. Ready for inference.")

Collecting ultralytics
  Downloading ultralytics-8.3.240-py3-none-any.whl.metadata (37 kB)
Collecting ultralytics-thop>=2.0.18 (from ultralytics)
  Downloading ultralytics_thop-2.0.18-py3-none-any.whl.metadata (14 kB)
Downloading ultralytics-8.3.240-py3-none-any.whl (1.1 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m1.1/1.1 MB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading ultralytics_thop-2.0.18-py3-none-any.whl (28 kB)
Installing collected packages: ultralytics-thop, ultralytics
Successfully installed ultralytics-8.3.240 ultralytics-thop-2.0.18
Creating new Ultralytics Settings v0.0.6 file ‚úÖ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.


Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 üöÄ 2025-12-19 Python-3.12.12 torch-2.9.0+cpu CPU

Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt to yolov5s.pt...
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 14.1M/14.1M [00:00<00:00, 173MB/s]

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape... 


YOLOv5s model loaded successfully. Ready for inference.


In [65]:
import cv2
import torch
import time

# Load model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Open webcam
cap = cv2.VideoCapture(0)

# FPS calculation
fps_queue = []

while True:
    start_time = time.time()

    # Read frame
    ret, frame = cap.read()
    if not ret:
        break

    # Run inference
    results = model(frame)

    # Render results
    rendered_frame = results.render()[0]

    # Calculate FPS
    fps = 1 / (time.time() - start_time)
    fps_queue.append(fps)
    if len(fps_queue) > 30:
        fps_queue.pop(0)
    avg_fps = sum(fps_queue) / len(fps_queue)

    # Display FPS
    cv2.putText(rendered_frame, f'FPS: {avg_fps:.1f}',
                (10, 30), cv2.FONT_HERSHEY_SIMPLEX,
                1, (0, 255, 0), 2)

    # Show frame
    cv2.imshow('Object Detection', rendered_frame)

    # Exit on 'q'
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 üöÄ 2025-12-19 Python-3.12.12 torch-2.9.0+cpu CPU

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape... 
