## Edit to fill this

__Name__: Sulem Bakrawala

__Email id__: sulembakrawala123@gmail.com


To start go to file tab and create a copy of this notebook on your own drive

# __Objective:__ Given the input video file localize and draw bounding boxes around the face of characters.

- Candidate can use any methohd or platform to tackle this problem. not a fan of colab downlaod the video on to your system using [video](https://drive.google.com/file/d/1nyeeqBJyDr2zphBDQ9ruh99JBdYm4nPH/view?usp=sharing) and upload the solution back here with your code attached.

- You are free to use any model or module, either trained by you or state-of-the-art.
- The code should be well-documented. One can also use markdown cells to write your approach for every step.
- In case of plagiarism, the candidate will be immediately rejected. You can use some helper code available online but must be appropriately referenced.

In [1]:
# run this to download the video file as test_video.mp4
! gdown --fuzzy https://drive.google.com/file/d/1nyeeqBJyDr2zphBDQ9ruh99JBdYm4nPH/view?usp=sharing --o test_video.m4

Downloading...
From: https://drive.google.com/uc?id=1nyeeqBJyDr2zphBDQ9ruh99JBdYm4nPH
To: /content/test_video.m4
  0% 0.00/7.54M [00:00<?, ?B/s] 63% 4.72M/7.54M [00:00<00:00, 27.6MB/s]100% 7.54M/7.54M [00:00<00:00, 41.5MB/s]


### Your approach here

In [2]:
# Install OpenCV without GUI dependencies
!pip install opencv-python-headless

import cv2
import numpy as np
from google.colab import files

# Function to perform face detection on a video and save the results
def process_video_for_face_detection(input_video_path, output_video_path):
    # Load the Haar Cascade face detection model
    face_detector = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

    # Open the input video file
    video_capture = cv2.VideoCapture(input_video_path)

    # Check if the video was opened successfully
    if not video_capture.isOpened():
        print("Error: Unable to open video file.")
        return

    # Retrieve video properties
    frames_per_second = video_capture.get(cv2.CAP_PROP_FPS)
    frame_width = int(video_capture.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(video_capture.get(cv2.CAP_PROP_FRAME_HEIGHT))

    # Create a VideoWriter object for saving the output video
    codec = cv2.VideoWriter_fourcc(*'XVID')
    video_writer = cv2.VideoWriter(output_video_path, codec, frames_per_second, (frame_width, frame_height))

    while video_capture.isOpened():
        success, frame = video_capture.read()
        if not success:
            break

        # Convert the frame to grayscale for face detection
        gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Detect faces in the frame
        detected_faces = face_detector.detectMultiScale(gray_frame, scaleFactor=1.1, minNeighbors=5)

        # Draw rectangles around detected faces
        for (x, y, w, h) in detected_faces:
            cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 0, 0), 2)

        # Write the modified frame to the output video
        video_writer.write(frame)

    # Release video resources
    video_capture.release()
    video_writer.release()
    print("Face detection complete. Results saved to:", output_video_path)

# Upload a video file
uploaded_files = files.upload()
input_video_file = next(iter(uploaded_files))  # Get the name of the uploaded video file
output_video_file = 'output_video.avi'  # Define the output video file name

# Execute the face detection function
process_video_for_face_detection(input_video_file, output_video_file)



Saving test_video.mp4 to test_video.mp4
Face detection complete. Results saved to: output_video.avi


In [3]:
# result video
files.download(output_video_file)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Questions

[link text](https://)#### __Q1:__ What is the video processing library and localization model you used?

__ANS__ The code uses OpenCV for video processing and a Haar Cascade Classifier model (`haarcascade_frontalface_default.xml`) for face localization. OpenCV handles video reading and writing, while the Haar Cascade model detects faces in each frame.


#### __Q2:__ If given enough resources, time or data. what better approach you might have implemented?

__ANS__ If sufficient resources, time, and data are available, a more effective approach for face detection could include:

1. Convolutional Neural Networks (CNNs): Custom models designed to enhance accuracy.
2. YOLO (You Only Look Once): A real-time detection system capable of identifying multiple objects, including faces.
3. SSD (Single Shot MultiBox Detector): An efficient method for real-time face detection.
4. Data Augmentation: Techniques to artificially increase dataset size and diversity, enhancing model robustness.
5. Ensemble Methods: Combining outputs from multiple models to achieve better overall performance.

#### __Q3:__ Explain some real life use cases of Object detection or localization. If you have a project using these also explain that problem statement.
__ANS__ I made a project on Human Action Recognition and localization. So, I am sharing use cases and project in brief.

Use Cases of Human Action Recognition
1. Surveillance: Detecting suspicious behaviors.
2. Sports Analytics: Analyzing player movements.
3. Healthcare: Monitoring patient activities.
4. Gaming: Enabling motion-based controls.
5. Robotics: Understanding human actions.
6. Virtual Assistants: Interpreting gestures.

Problem Statement: Develop a model to recognize and classify human actions (e.g., walking, running) in real-time video feeds.

Objectives:
- Detect and classify actions from video.
- Provide real-time feedback.

Approach:
- Use datasets like UCF101.
- Implement a 3D CNN or RNN architecture.
- Train the model and deploy it for real-time processing.

Impact: Enhances security, improves sports training, and assists in healthcare monitoring.



```
# This is formatted as code
```

#### __Q4:__ Explain breifly model architectue of ResNet?
__ANS__ ResNet (Residual Network) Architecture
ResNet includes the following features:

Residual Blocks: These consist of two or three convolutional layers with skip connections that allow inputs to bypass certain layers, facilitating better optimization.

Skip Connections: These connections improve gradient flow during training, helping to overcome the vanishing gradient issue.

Variable Depth: ResNet comes in various depths, such as ResNet-18, ResNet-50, and ResNet-101, with deeper architectures employing bottleneck layers to enhance efficiency.

Pooling Layers: The network utilizes pooling layers to decrease spatial dimensions and extract important features.

Final Layers: Typically, ResNet concludes with global average pooling followed by a softmax layer for classification tasks.