# **📌 Project Analysis: Pizza Store Scooper Violation Detection (Computer Vision + Microservices)**



## 🧠 Problem Statement
The goal is to ensure hygiene protocol compliance in a pizza store using a computer vision system that monitors whether workers use a scooper when picking ingredients from specific **Regions of Interest (ROIs)** — mainly **protein containers**. Any manual interaction without a scooper is a **violation** and should be detected and logged in real-time.

---

## 🎯 Functional Requirements Breakdown

### 1. **Video Ingestion**
- Input source: RTSP stream or pre-recorded video.
- Frame extraction is required at real-time or near-real-time rate.
- **Edge Cases**:
  - Poor lighting, occlusion, multiple people in frame.
  - Different camera angles.

### 2. **ROI-Based Interaction Detection**
- Only specific ROIs (e.g., **protein cargo**) are considered for violation detection.
- User should be able to **define and save ROI areas**.
- **Logic**:
  - If a **hand** enters ROI, track whether a **scooper** was used when touching the ingredient and transferring it to **pizza**.

### 3. **Scooper Usage Detection**
- Use object detection (YOLO v8) to detect:
  - **Hand**
  - **Scooper**
  - **Person**
  - **Pizza**
- Temporal logic needed to infer interaction:
  - Scooper enters ROI → grabs item → hand moves item to pizza → ✅
  - Hand enters ROI → moves to pizza without scooper → ❌

### 4. **Violation Logic**
- A **violation** occurs **only when**:
  - A hand grabs ingredients **from a defined ROI** and places them **on a pizza** **without** using a scooper.
- **Edge Conditions to Handle**:
  - Hand in ROI but not grabbing (e.g., cleaning) → **no violation**.
  - Two persons working simultaneously in frame → track each independently.
  
### 5. **Real-Time Streaming + Detection Visualization**
- Detected objects and ROIs must be rendered on video frames in **real-time**.
- Metadata (bounding boxes, labels, timestamps, violation flags) sent to frontend.

---

## 🎯 Objective

Design and implement a microservices-based system that:

 - 1. Ingests video from cameras in the pizza store.
 - 2. Identifies interactions with ROIs, such as the protein container not all containers are
important only the one he grapes with scooper in the videos.
 - 3. Determines whether a scooper was used.
 - 4. Flags any violations (ingredient grabbed without scooper from ROIs only).
 - 5. Handle the case when he can get his hand at ROI but not getting anythings (like
cleaning )
 - 6. Handle the case when two works on the pizza table at the same time
 - 7. Display the video with detection results to a frontend in real-time.

---

## 🧱 Microservices Architecture

| Service                | Responsibility                                                                 |
|------------------------|----------------------------------------------------------------------------------|
| **1. Frame Reader**    | Read frames from video or camera and publish to message broker.                 |
| **2. Message Broker**  | Buffer and route frames (e.g., Kafka or RabbitMQ).                              |
| **3. Detection Service**| Detect objects, infer scooper usage, log violations, and push results forward. |
| **4. Streaming Service**| Serve live annotated video stream + REST API for stats.                        |
| **5. Frontend UI**     | Show annotated video, ROI overlays, and violation counts.                       |

---

## 💽 Provided Resources

- 📹 **Videos** (unannotated & annotated): Used for inference/testing.
- 🧠 **YOLOv8 Pretrained Model**:
  - Classes: `['hand', 'person', 'pizza', 'scooper']`
  - Trained on: `1254` annotated images.
- 🔗 Dataset on Roboflow: Available for **fine-tuning**.
- 🧪 **Test Videos**:
  - `Sah w b3dha ghalt.mp4` → 1 violation
  - `Sah w b3dha ghalt (2).mp4` → 2 violations
  - `Sah w b3dha ghalt (3).mp4` → 1 violation

---

## 🛠️ Suggested Technologies

| Component               | Suggested Tools / Frameworks                                           |
|------------------------|------------------------------------------------------------------------|
| **Frame Reader**       | OpenCV, GStreamer, FFmpeg                                               |
| **Message Broker**     | Apache Kafka, RabbitMQ                                                 |
| **Detection Service**  | YOLOv8 (via Ultralytics), PyTorch, FastAPI                             |
| **Streaming Service**  | WebSocket / MJPEG / WebRTC, REST (FastAPI/Flask)                       |
| **Frontend**           | React.js, Vue.js, or HTML/JS; avoid Streamlit for production           |

---

## 🧠 Logical Flow

1. Load or stream video → extract frame.
2. Frame is sent to detection microservice via broker.
3. Detection Service:
   - Detect objects (hand, scooper, pizza, person).
   - Check if:
     - Hand enters ROI.
     - Scooper is NOT present during grab.
     - Item is transferred to pizza.
   - If yes → log violation:
     - Save frame path, bounding boxes, labels, timestamp.
4. Forward annotated frame + metadata to Streaming Service.
5. Stream annotated frames + REST API to Frontend.
6. Frontend:
   - Display bounding boxes and ROIs.
   - Show live alerts (e.g., red box).
   - Show total count of violations.

---

## ✅ Deliverables

- [ ] Microservices code (detection, reader, streaming, frontend).
- [ ] README file with setup.
- [ ] Real-time UI with bounding boxes, ROIs, and violation count.
- [ ] Pre-recorded demo of system working.
- [ ] Docker Compose for deployment (preferred).

---

## 🚀 Bonus Points

- Full Dockerized setup using **Docker Compose**.
- Best practice use of:
  - Asynchronous programming (e.g., `asyncio`, `aiohttp`).
  - Model inference optimization (e.g., TensorRT, ONNX).
- Scalable message broker configuration.
- Custom ROI editor tool for dynamic zone definition.

---

## 🔍 Key Challenges and Considerations

| Challenge                          | Recommendation / Handling                                                 |
|-----------------------------------|---------------------------------------------------------------------------|
| Multiple people in frame          | Use **multi-object tracking** to distinguish hands/actions                |
| Hands enter ROI but no grab       | Require **temporal analysis** or sequence matching (e.g., detect trajectory) |
| ROI misalignment                  | Allow ROI definition tool in frontend                                     |
| Real-time performance             | Optimize frame rate and resolution, use async inference                   |
| Lighting/occlusion                | Augment dataset, or fine-tune pretrained model with diverse conditions    |

---


# **🧠 Deep Explanation and Comparison of System Architecture Components**


## 📸 **1. Frame Reader**

| **Tool**   | **Description** |
|------------|-----------------|
| **OpenCV** | Lightweight, easy to use in Python/C++. Good for reading video frames and simple pre-processing. |
| **GStreamer** | High-performance multimedia framework, supports complex pipelines and low latency streaming. |
| **FFmpeg** | Powerful CLI/video library for encoding, decoding, and format conversion. Not ideal for real-time. |

### 🔁 Comparison

| Feature        | OpenCV        | GStreamer      | FFmpeg         |
|----------------|---------------|----------------|----------------|
| Ease of Use    | ✅ Very Easy  | ❌ Complex     | ⚠️ Moderate    |
| Real-Time Use  | ✅ Yes        | ✅ Yes         | ❌ No          |
| Performance    | ⚠️ Medium    | ✅ High        | ✅ High        |
| Ideal For      | Prototypes    | Production RT  | Offline Jobs   |

---

## 📬 **2. Message Broker**

| **Tool**        | **Description** |
|------------------|-----------------|
| **Apache Kafka** | Distributed streaming platform. Great for event-driven systems and analytics. High throughput. |
| **RabbitMQ**     | Traditional message queue. Simple, great for microservices and task-based workflows. |

### 🔁 Comparison

| Feature            | Apache Kafka       | RabbitMQ           |
|--------------------|---------------------|---------------------|
| Messaging Model    | Log-based pub/sub   | Queue-based         |
| Durability         | ✅ Very High        | ⚠️ Configurable     |
| Throughput         | ✅ High             | ⚠️ Medium           |
| Ordering           | ✅ Partition-based  | ✅ Queue-based      |
| Use Case           | Streaming pipelines | Microservice tasks  |

---

## 🧠 **3. Detection Service**

| **Tool**   | **Description** |
|------------|-----------------|
| **YOLOv8** | State-of-the-art real-time object detection model. Optimized and accurate. |
| **PyTorch** | Deep learning framework for building and deploying models like YOLOv8. |
| **FastAPI** | Lightweight async web framework to wrap detection models as APIs. |

### 🔁 Comparison

| Feature        | YOLOv8         | PyTorch         | FastAPI           |
|----------------|----------------|------------------|--------------------|
| Role           | Pretrained model | ML framework    | API interface      |
| Flexibility    | ⚠️ Limited     | ✅ High         | ✅ High           |
| Speed          | ✅ Real-Time   | Depends on model | ✅ High (async)   |
| Best For       | Inference      | Model training   | Serving models     |

---

## 📡 **4. Streaming Service**

| **Tool**    | **Description** |
|-------------|-----------------|
| **WebSocket** | Full-duplex real-time communication. Ideal for metadata or alert streaming. |
| **MJPEG**     | Streams JPEG frames over HTTP. Easy to implement but bandwidth heavy. |
| **WebRTC**    | Peer-to-peer real-time video/audio streaming. Great for ultra-low latency. |
| **REST (FastAPI)** | Simple endpoints to fetch snapshots or metadata. Not ideal for live video. |

### 🔁 Comparison

| Feature        | WebSocket       | MJPEG           | WebRTC           | REST             |
|----------------|------------------|------------------|------------------|------------------|
| Real-Time      | ✅ Yes          | ⚠️ Limited      | ✅ Ultra-Low     | ❌ No            |
| Audio Support  | ❌ No           | ❌ No           | ✅ Yes           | ❌ No            |
| Complexity     | ✅ Easy         | ✅ Easy         | ❌ High          | ✅ Easy          |
| Best Use Case  | Metadata/events | Simple streaming| Live media       | Snapshot access  |

---

## 🎨 **5. Frontend**

| **Tool**      | **Description** |
|---------------|-----------------|
| **React.js**  | Robust, component-based UI library. Suitable for complex dashboards. |
| **Vue.js**    | Lightweight framework with reactive design. Easier to learn than React. |
| **HTML/JS**   | Vanilla web development. Good for quick prototypes, not scalable. |
| **Streamlit** ⚠️ | Great for ML prototypes, but not production-ready for custom UIs. |

### 🔁 Comparison

| Feature        | React.js        | Vue.js          | HTML/JS          | Streamlit ⚠️     |
|----------------|------------------|------------------|------------------|------------------|
| Scalability    | ✅ Excellent     | ✅ Good         | ❌ Low           | ❌ Low           |
| Developer UX   | ⚠️ Moderate     | ✅ Easy         | ✅ Easy          | ✅ Very Easy     |
| Interactivity  | ✅ High         | ✅ High         | ⚠️ Moderate     | ⚠️ Low           |
| Best Use Case  | Full apps        | Lightweight apps| Small projects   | Demos only       |

---

## ✅ Summary Table

| **Layer**          | **Tool A**        | **Tool B**       | **Tool C**         | **Best For**                                      |
|--------------------|-------------------|------------------|---------------------|---------------------------------------------------|
| Frame Reader       | OpenCV            | GStreamer        | FFmpeg              | OpenCV (ease), GStreamer (prod), FFmpeg (offline) |
| Message Broker     | Kafka             | RabbitMQ         | —                   | Kafka (streaming), RabbitMQ (microservices)       |
| Detection Service  | YOLOv8            | PyTorch          | FastAPI             | All used together for detection API               |
| Streaming Service  | WebSocket         | MJPEG            | WebRTC              | WebRTC (media), WS (metadata), MJPEG (simple)     |
| Frontend           | React.js          | Vue.js           | HTML/JS             | React (scalable), Vue (lightweight), HTML (quick) |

---


# **📬 Message Broker Usage: Apache Kafka vs RabbitMQ**


## What is a Message Broker?
A **message broker** is a system that enables communication between services by **receiving, storing, and forwarding messages**. It helps in building scalable and decoupled systems.

---

## 🔁 1. Decoupling Services
- Producers send messages without knowing the consumers.
- Consumers can subscribe and process messages independently.
- Enables modular and flexible architecture.

---

## 📥📤 2. Asynchronous Communication
- Services communicate without waiting for each other.
- Useful for background processing (e.g., sending emails, generating reports).

---

## 📊 3. Data Streaming and Real-Time Processing
- Ideal for handling continuous data flows.
- **Kafka** excels in real-time analytics, logs, and metrics pipelines.

---

## 🛡️ 4. Reliability and Fault Tolerance
- Messages are stored and acknowledged.
- Ensures **no data loss** if consumers fail or restart.

---

## ⏳ 5. Load Buffering
- Handles sudden traffic spikes.
- Queues messages until consumers are ready to process them.

---

## 🔁 6. Retry and Dead Letter Queues
- Failed messages can be retried or sent to a dead-letter queue.
- Helps with debugging and error tracking.

---

## ✉️ 7. Event-Driven Architectures
- Enables **event sourcing** and **reactive systems**.
- Example: "User registered" event triggers welcome email, etc.

---

## 🧩 Kafka vs RabbitMQ Comparison

| Feature               | **Apache Kafka**                        | **RabbitMQ**                             |
|-----------------------|------------------------------------------|-------------------------------------------|
| Message Model         | Pub/Sub (log-based)                      | Queue-based (message routing)             |
| Ordering              | Strong within partitions                 | FIFO in queues (if configured)            |
| Performance           | High throughput, scalable                | Lower latency, good for real-time tasks   |
| Persistence           | Messages stored on disk (log)            | Optional (persistent or transient queues) |
| Use Case Focus        | Streaming, big data, analytics           | Task queues, short-lived jobs             |
| Message Routing       | Topic-based (via partitions)             | Exchange types (direct, topic, fanout)    |

---

## 🛠 Use Case Examples

### Apache Kafka:
- Website clickstream analysis
- IoT sensor data pipelines
- Log aggregation
- Real-time analytics dashboards

### RabbitMQ:
- Email/task queues
- Real-time chat systems
- Background job processing
- Microservices task coordination

---

## ✅ Summary
Message brokers are essential in:
- Decoupling systems
- Improving scalability and resilience
- Handling real-time and asynchronous communication

Choose **Kafka** for **high-throughput streaming**, and **RabbitMQ** for **low-latency task distribution**.


# **📁 Folder Structure**


```plaintext
vision-monitoring-project/
├── README.md
├── docker-compose.yml
├── .env
├── requirements.txt
│
├── frame_reader/                         # Captures video frames
│   ├── main.py
│   ├── utils.py
│   └── config.py
│
├── detection_service/                   # YOLOv8 + logic to detect scooper, hand, pizza, violations
│   ├── app/
│   │   ├── main.py                      # FastAPI app
│   │   ├── inference.py                 # YOLOv8 wrapper
│   │   ├── violation_logic.py           # Custom rule-based logic
│   │   ├── models/
│   │   │   └── yolov8_model.pt
│   │   └── config.py
│   ├── requirements.txt
│   └── Dockerfile
│
├── message_broker/                      # Kafka or RabbitMQ configurations
│   ├── docker/
│   │   ├── kafka/
│   │   │   └── docker-compose.kafka.yml
│   │   └── rabbitmq/
│   │       └── docker-compose.rabbitmq.yml
│   └── topics/
│       └── frame-topic.txt
│
├── streaming_service/                   # WebSocket / MJPEG / WebRTC server
│   ├── app/
│   │   ├── main.py
│   │   ├── websocket_server.py
│   │   ├── streamer.py                  # MJPEG / WebRTC
│   │   └── utils.py
│   └── Dockerfile
│
├── frontend/                            # React or Vue frontend
│   ├── public/
│   ├── src/
│   │   ├── components/
│   │   ├── views/
│   │   ├── App.jsx
│   │   └── index.jsx
│   ├── package.json
│   └── vite.config.js
│
├── shared/                              # Shared libraries between services
│   ├── logger.py
│   ├── schemas.py                       # Pydantic models, data schemas
│   └── constants.py
│
├── data/                                # Optional: recorded video or images for local testing
│   ├── test_video.mp4
│   └── samples/
│
├── deployments/                         # Kubernetes / Docker Swarm / CI/CD
│   ├── k8s/
│   └── ci-cd/
│
└── tests/                               # Unit and integration tests
    ├── test_frame_reader.py
    ├── test_detection_service.py
    ├── test_streaming.py
    └── test_utils.py
```

# **🍕 Computer Vision Scooper Violation Detection — Step-by-Step Implementation Guide**



## 🧠 Overview

This guide details how to implement a computer vision system to detect hygiene violations in a pizza store, specifically when workers fail to use a scooper to handle ingredients in Regions of Interest (ROIs).

The system will be built using a **microservices-based architecture** to ensure scalability, maintainability, and real-time performance.

---

## 🔧 Prerequisites

### 📦 System Requirements

* Python 3.8+
* Docker & Docker Compose
* Git
* Node.js + npm (for frontend)

```bash
(.venv) ferganey@ferganey-Inspiron-5570:~/GitHub/personal/Final_Learning_AI/07_Computer_Vision/04_EndtoEndProjects$ curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -

sudo apt install -y nodejs

```

### 📁 Dataset & Models

* Annotated dataset on Roboflow
* Pretrained YOLOv8 model
* Sample test videos

---

## 📁 Step 1: Project Initialization

### 1.1 Folder Structure

Use the folder structure provided in the previous answer.

### 1.2 Setup Git Repository

```bash
git init vision-monitoring-project
cd vision-monitoring-project
```

---

## 🎥 Step 2: Frame Reader Microservice

### 2.1 Objective

Read frames from a video file or RTSP stream and send them to a message broker.

### 2.2 Implementation

* Use OpenCV to extract frames
* Publish each frame as a binary message to Kafka or RabbitMQ

```python
# frame_reader/main.py
import cv2, pika, json
# (Simplified code)
```

### 2.3 Dockerize

```dockerfile
# frame_reader/Dockerfile
FROM python:3.9
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "main.py"]
```

---

## 📨 Step 3: Message Broker Setup

### 3.1 Choose a Broker

* Kafka (for high performance)
* RabbitMQ (easier setup)

### 3.2 Docker Compose Service

```yaml
# In docker-compose.yml
rabbitmq:
  image: rabbitmq:3-management
  ports:
    - "5672:5672"
    - "15672:15672"
```

---

## 🔍 Step 4: Detection Service

### 4.1 Objective

* Subscribe to the broker
* Run YOLOv8 inference on the frame
* Apply business logic to detect violations
* Save metadata and publish to streaming

### 4.2 Implementation

* Use Ultralytics YOLOv8 + PyTorch
* Define logic: `hand in ROI and no scooper = violation`
* Return: bounding boxes, labels, timestamp, violation status

```python
# detection_service/app/inference.py
from ultralytics import YOLO
model = YOLO('yolov8.pt')
# Detect objects
```

### 4.3 REST API

Use FastAPI to expose an endpoint for metadata.

### 4.4 Dockerize

Create Dockerfile and include all dependencies.

---

## 🌐 Step 5: Streaming Service

### 5.1 Objective

* Send annotated video stream to frontend
* Provide REST API for metadata

### 5.2 Implementation Options

* MJPEG (simplest)
* WebSocket (recommended)
* WebRTC (most performant)

Use FastAPI or Flask for both WebSocket & REST endpoints.

---

## 💻 Step 6: Frontend

### 6.1 Objective

* Display live video with bounding boxes
* Highlight ROIs
* Show violation count and alerts

### 6.2 Tools

* React.js (preferred)
* Socket.IO or WebSocket client

### 6.3 Sample UI Features

* Bounding boxes
* Violation alerts in red
* Metadata panel

---

## 🧪 Step 7: Testing

### 7.1 Unit Testing

* Test frame extraction, detection logic

### 7.2 Integration Testing

* Run all services together with test video
* Confirm correct detection and alerts

### 7.3 Manual Testing

* Use sample test videos:

  * `Sah w b3dha ghalt.mp4`
  * `Sah w b3dha ghalt (2).mp4`

---

## 🚀 Step 8: Deployment

### 8.1 Docker Compose

Add all services to `docker-compose.yml`

```yaml
version: '3.8'
services:
  frame_reader:
    build: ./frame_reader
  detection_service:
    build: ./detection_service
  streaming_service:
    build: ./streaming_service
  frontend:
    build: ./frontend
  rabbitmq:
    image: rabbitmq:3-management
```

### 8.2 Run All Services

```bash
docker-compose up --build
```

---

## ✅ Deliverables

* Functional microservices-based system
* Live frontend showing bounding boxes, ROIs, and violation count
* REST API for metadata
* Source code with documentation
* Recorded video of system working

---

## 🌟 Bonus Points

* Add Grafana + Prometheus for monitoring
* Log violations to a PostgreSQL/SQLite DB
* Use Redis for frame caching
* Add authentication to frontend/backend

---

Need help writing any specific service code or config? I can help build those next.
