In [None]:
## Akhilesh Pant (AU FTCA: MCA)


In [14]:
import cv2
import numpy as np
import mediapipe as mp
from ultralytics import YOLO

def detect_gaze(landmarks):
    left_eye = landmarks[0:2]
    right_eye = landmarks[2:4]
    left_center = np.mean(left_eye, axis=0).astype("int")
    right_center = np.mean(right_eye, axis=0).astype("int")

    if left_center[0] < right_center[0]:
        return "Looking Right"
    elif left_center[0] > right_center[0]:
        return "Looking Left"
    else:
        return "Looking Straight"

def main():
    cap = cv2.VideoCapture(0)
    mp_face_mesh = mp.solutions.face_mesh
    face_mesh = mp_face_mesh.FaceMesh()
    model = YOLO("yolov8n.pt")  # Ensure this model file is available

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        # Object Detection
        results = model(frame)
        if results and results[0].boxes is not None:
            for box in results[0].boxes:
                x1, y1, x2, y2 = map(int, box.xyxy[0])
                conf = box.conf[0]
                cls = int(box.cls[0])
                cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)
                cv2.putText(frame, f"{model.names[cls]} {conf:.2f}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 0, 0), 2)

        # Gaze Detection
        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        results = face_mesh.process(rgb_frame)

        if results.multi_face_landmarks:
            for face_landmarks in results.multi_face_landmarks:
                landmarks = []
                for idx in [33, 133, 362, 263]:  # Eye landmarks
                    x = int(face_landmarks.landmark[idx].x * frame.shape[1])
                    y = int(face_landmarks.landmark[idx].y * frame.shape[0])
                    landmarks.append((x, y))
                    cv2.circle(frame, (x, y), 2, (0, 255, 0), -1)

                gaze_direction = detect_gaze(landmarks)
                cv2.putText(frame, gaze_direction, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)

        cv2.imshow("Defense AI - Object & Gaze Detection", frame)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()



0: 480x640 1 person, 648.7ms
Speed: 155.2ms preprocess, 648.7ms inference, 79.1ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 192.2ms
Speed: 3.3ms preprocess, 192.2ms inference, 2.7ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 201.4ms
Speed: 3.3ms preprocess, 201.4ms inference, 3.4ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 222.8ms
Speed: 3.4ms preprocess, 222.8ms inference, 2.5ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 202.8ms
Speed: 3.7ms preprocess, 202.8ms inference, 2.7ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 211.1ms
Speed: 2.7ms preprocess, 211.1ms inference, 6.6ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 258.2ms
Speed: 4.4ms preprocess, 258.2ms inference, 2.7ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 262.1ms
Speed: 3.5ms preprocess, 262.1ms inference, 3.7ms postprocess per image

## Explanation

Here's an explanation of the code **line by line**, covering the concepts behind each part:

---

## ✅ **Imports**

```python
import cv2
import numpy as np
import mediapipe as mp
from ultralytics import YOLO
```

- **`cv2`**: OpenCV library used for real-time computer vision tasks like video capturing and image processing.  
- **`numpy`**: Used for numerical operations, especially for array and mathematical operations.  
- **`mediapipe`**: A framework by Google for building multimodal machine learning pipelines (e.g., face, hand, and body detection).  
- **`YOLO`**: Imports the YOLO model (You Only Look Once) for real-time object detection.

---

## ✅ **Function: Gaze Detection**

```python
def detect_gaze(landmarks):
    left_eye = landmarks[0:2]
    right_eye = landmarks[2:4]
```
- The function **`detect_gaze`** takes a list of 4 eye landmarks.  
- **`left_eye`**: Contains the first two landmarks (assumed as the left eye).  
- **`right_eye`**: Contains the last two landmarks (assumed as the right eye).

---

```python
    left_center = np.mean(left_eye, axis=0).astype("int")
    right_center = np.mean(right_eye, axis=0).astype("int")
```
- **`np.mean()`**: Calculates the mean (average) position of the eye landmarks to get the **center of the eyes**.  
- **`astype("int")`**: Converts the result into integers for drawing and comparison.

---

```python
    if left_center[0] < right_center[0]:
        return "Looking Right"
    elif left_center[0] > right_center[0]:
        return "Looking Left"
    else:
        return "Looking Straight"
```
- Compares the **horizontal (X-axis)** positions of both eyes.  
- If the left eye is more to the left than the right eye, it's **"Looking Right"** and vice versa.  
- If the X-coordinates are similar, it's **"Looking Straight"**.

---

## ✅ **Main Function for Detection**

```python
def main():
    cap = cv2.VideoCapture(0)
```
- Initializes the webcam using **`cv2.VideoCapture(0)`**, where `0` refers to the default camera.

---

### 🔵 **Face Mesh Initialization**

```python
    mp_face_mesh = mp.solutions.face_mesh
    face_mesh = mp_face_mesh.FaceMesh()
```
- **`mp.solutions.face_mesh`**: Loads the **MediaPipe Face Mesh** solution.  
- **`FaceMesh()`**: Initializes the face mesh model for detecting facial landmarks.

---

### 🔵 **YOLO Model Initialization**

```python
    model = YOLO("yolov8n.pt") 
```
- Loads the **YOLOv8n (nano) model** for object detection.  
- Assumes the model weights file (`yolov8n.pt`) is available.

---

## ✅ **Main Loop for Live Detection**

```python
    while True:
        ret, frame = cap.read()
        if not ret:
            break
```
- Continuously captures frames from the webcam.  
- **`ret`**: Boolean indicating if the frame was successfully captured.  
- **`frame`**: The actual image frame.  
- **`break`**: Exits the loop if the frame isn't captured.

---

### 🔵 **Object Detection with YOLO**

```python
        results = model(frame)
```
- Runs the YOLO model on the current **`frame`** to detect objects.

---

```python
        if results and results[0].boxes is not None:
            for box in results[0].boxes:
```
- Checks if any detection **`results`** exist and if bounding **`boxes`** are present.

---

```python
                x1, y1, x2, y2 = map(int, box.xyxy[0])
```
- Extracts the **bounding box coordinates** (`x1`, `y1` for the top-left and `x2`, `y2` for the bottom-right) and converts them to integers.

---

```python
                conf = box.conf[0]
                cls = int(box.cls[0])
```
- **`conf`**: Confidence score of the detected object.  
- **`cls`**: Detected object's class index.

---

```python
                cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)
                cv2.putText(frame, f"{model.names[cls]} {conf:.2f}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 0, 0), 2)
```
- Draws a **blue rectangle** around the detected object.  
- Displays the **class label** and **confidence score** above the rectangle.

---

### 🔵 **Gaze Detection Using MediaPipe**

```python
        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        results = face_mesh.process(rgb_frame)
```
- Converts the **BGR frame to RGB** because MediaPipe requires RGB format.  
- Processes the frame to detect face landmarks.

---

```python
        if results.multi_face_landmarks:
            for face_landmarks in results.multi_face_landmarks:
```
- Checks if **multiple face landmarks** are detected.

---

```python
                landmarks = []
                for idx in [33, 133, 362, 263]:
```
- Selects specific **landmark indices** for the eyes. These indices are predefined for eye regions in MediaPipe.

---

```python
                    x = int(face_landmarks.landmark[idx].x * frame.shape[1])
                    y = int(face_landmarks.landmark[idx].y * frame.shape[0])
                    landmarks.append((x, y))
```
- Converts normalized landmark coordinates to **pixel coordinates** by multiplying them with the frame dimensions.

---

```python
                    cv2.circle(frame, (x, y), 2, (0, 255, 0), -1)
```
- Draws **green circles** at the detected eye landmarks for visualization.

---

```python
                gaze_direction = detect_gaze(landmarks)
                cv2.putText(frame, gaze_direction, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
```
- Calls the **`detect_gaze`** function to determine the gaze direction.  
- Displays the gaze direction on the screen.

---

### 🔵 **Display the Frame**

```python
        cv2.imshow("Defense AI - Object & Gaze Detection", frame)
```
- Shows the final frame with object detection and gaze direction in a window.

---

### 🔵 **Exit Condition**

```python
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
```
- Waits for a key press. If the key is **'q'**, it exits the loop.

---

## ✅ **Release Resources**

```python
    cap.release()
    cv2.destroyAllWindows()
```
- Releases the webcam resource.  
- Closes all OpenCV windows.

---

## ✅ **Script Execution**

```python
if __name__ == "__main__":
    main()
```
- Ensures that the **`main()`** function runs **only if the script is executed** directly.

---

### 🔥 **Key Concepts Covered**:
- **Real-Time Object Detection** with YOLO.  
- **Facial Landmark Detection** with MediaPipe.  
- **Gaze Detection Logic** by comparing eye landmark positions.  
- **Drawing and Annotating** images with OpenCV.  
- **Efficient Loop Control** for real-time video processing.

---

This code efficiently combines **object detection** and **gaze direction detection** for potential defense or surveillance applications. Let me know if you'd like to modify or enhance any part of it! 🚀