<h1 style="text-align: center">Real-Time Object Detection with YOLO and RealSense Depth Camera</h1>

<h2>Introduction</h2>
<p style="text-align: justify">This project implements real-time object detection and segmentation using the YOLO (You Only Look Once) model, integrated with a RealSense camera. The script captures video frames from the RealSense camera applies object detection, overlays segmentation masks, and visualizes the results in real-time.</p>

<h3>Object Detection</h3>
<table width="100%">
    <tr>
        <td width="50%" bgcolor="#ffffff">
            <p style="text-align: justify">Object detection is a computer vision technique that identifies and locates objects within an image or video frame. This process involves both classifying objects and determining their positions, typically marked with bounding boxes.</p>
        </td>
        <td width="50%" bgcolor="#ffffff">
            <img src="img/ObjDet.jpg" alt="RealSense D415 Depth Camera" />
        </td>
    </tr>
</table>
<hr>
<footer>
    <p style="text-align: right" ><a href="https://www.freepik.com/free-vector/businessman-technology-measuring-eye-position-movement-tiny-people-eye-tracking-technology-gaze-tracking-eye-position-sensor-concept_11669265.htm#query=object%20recognition&position=20&from_view=keyword&track=ais&uuid=ea96fe7c-a24b-456f-b3da-470e6a4a764e#position=20&query=object%20recognition">Image source: freepik.com</a></p>
</footer>

<h3>RealSense</h3>
<table width="100%">
    <tr>
        <td width="50%" bgcolor="#ffffff">
            <p style="text-align: justify">The RealSense camera is a series of depth-sensing cameras developed by Intel, designed to capture 3D spatial data and enable depth perception in various applications, ranging from virtual reality and augmented reality to robotics and gesture recognition.</p>
        </td>
        <td width="50%" bgcolor="#ffffff">
            <img src="img/d415_front.png" alt="RealSense D415 Depth Camera" />
        </td>
    </tr>
</table>
<hr>
<footer>
    <p style="text-align: right"><a href="https://www.intelrealsense.com/depth-camera-d415/">Source: intelrealsense.com</a></p>
</footer>


<h3>Industrial Use Cases</h3>
<table width="100%">
    <tr>
        <td width="50%" bgcolor="#ffffff">
            <ul style="text-align: left">
                <li>Quality Control</li>
                <li>Inventory Management</li>
                <li>Safety Compliance Monitoring</li>
                <li>Automation in Manufacturing</li>
                <li>Robotics and Automated Guided Vehicles (AGVs)</li>
                <li>Agricultural Automation</li>
                <li>Surveillance and Security</li>
                <li>Pharmaceuticals and Healthcare</li>
                <li>Food and Beverage Industry</li>
                <li>Mining and Construction</li>
            </ul>
        </td>
        <td width="50%" bgcolor="#ffffff">
            <img src="img/d415_front.png" alt="RealSense D415 Depth Camera" />
        </td>
    </tr>
</table>
<hr>
<footer>
    <p style="text-align: right"><a href="https://www.intelrealsense.com/depth-camera-d415/">Source: intelrealsense.com</a></p>
</footer>

## Machine Learning Models

Machine learning object detection models identify and locate objects in images or videos. They use algorithms, typically based on convolutional neural networks, to classify objects and pinpoint their positions with bounding boxes. Popular models like YOLO, SSD, and Faster R-CNN vary in speed and accuracy, and are essential in applications like autonomous driving, surveillance, and augmented reality.

<html syle="height: 100%; padding: 0; margin: 0">
<body syle="height: 100%; padding: 0; margin: 0">
    <h3>Platforms</h3>
    <table width="100%" syle="height: 100%; border-collapse: collapse; padding: 0; margin: 0">
        <tr>
            <td valign="top">
                <div id="main">Lorem ipsum, etc.</div>
            </td>
        </tr>
        <tr>
            <td valign="bottom">
                <div id="footer">Copyright some evil company...</div>
            </td>
        </tr>
    </table>
</body>
</html>

### Model Comparison
<p style="text-align: justify">Faster RCNN, YOLO and SSD are three popular object detection systems that use deep learning to locate and classify objects in images. They differ in their architectures, speed and accuracy. Here is a brief comparison of their main features:</p>

<dl>
  <dt>Faster RCNN:</dt>
  <dd style="text-align: justify">This system consists of two modules: a region proposal network (RPN) that generates candidate regions of interest (RoIs), and a Fast RCNN network that classifies and refines the RoIs. Faster RCNN is accurate and robust, but it is slow compared to the other two systems, as it requires multiple stages and computations.</dd>

  <dt>YOLO:</dt>
  <dd style="text-align: justify">This system divides the input image into a grid of cells, and predicts bounding boxes and class probabilities for each cell. YOLO is fast and efficient, as it performs object detection in a single pass through the network. However, it may struggle with small or overlapping objects, as it has a limited number of bounding boxes per cell.</dd>

  <dt>SSD:</dt>
  <dd style="text-align: justify">This system also performs object detection in a single pass, but it uses multiple feature maps of different resolutions to generate bounding boxes and class probabilities. SSD is faster than Faster RCNN and more accurate than YOLO, as it can detect objects of various sizes and shapes. However, it may still miss some small or occluded objects, as it relies on fixed aspect ratios and scales.</dd>
</dl>
<p style="text-align: justify">In summary, Faster RCNN is suitable for applications that require high accuracy and can tolerate low speed, such as medical image analysis or autonomous driving. YOLO is suitable for applications that require real-time performance and can tolerate some errors, such as video surveillance or sports analysis. SSD is a good compromise between speed and accuracy, and can be used for general-purpose object detection tasks.</p>
<hr>
<footer>
<p style="text-align: right"><a href="https://medium.com/ibm-data-ai/faster-r-cnn-vs-yolo-vs-ssd-object-detection-algorithms-18badb0e02dc">Source: medium.com</a></p>
</footer>

### YOLO v8


#### Different functionalities
dfasgasdfgsdfgsd dfsgkj hsdlg sldifug sjg sldfkjfgh lskjdjghf lksjdfhg lka glijdjg hlkjjgfh lkfdgh
sldfjkgh sdf g;oddg dfjgh ;klafhgj;kahfg; afg adfg kajgh ;akljfgh;aookdfgh afkjg akjgfh
pioadfhg pag ;aog ;aoishjgf poasdfj ipoahj g;a gjf;aois gj;oaiisjgd ;aosigd hjasdg
aspoiigdh paog paoig jao;s gj[aoiugd poaish g;kajshdg;kang;lkashg j

#### Training
dfasgasdfgsdfgsd dfsgkj hsdlg sldifug sjg sldfkjfgh lskjdjghf lksjdfhg lka glijdjg hlkjjgfh lkfdgh
sldfjkgh sdf g;oddg dfjgh ;klafhgj;kahfg; afg adfg kajgh ;akljfgh;aookdfgh afkjg akjgfh
pioadfhg pag ;aog ;aoishjgf poasdfj ipoahj g;a gjf;aois gj;oaiisjgd ;aosigd hjasdg
aspoiigdh paog paoig jao;s gj[aoiugd poaish g;kajshdg;kang;lkashg j

#### COCO training set
dfasgasdfgsdfgsd dfsgkj hsdlg sldifug sjg sldfkjfgh lskjdjghf lksjdfhg lka glijdjg hlkjjgfh lkfdgh
sldfjkgh sdf g;oddg dfjgh ;klafhgj;kahfg; afg adfg kajgh ;akljfgh;aookdfgh afkjg akjgfh
pioadfhg pag ;aog ;aoishjgf poasdfj ipoahj g;a gjf;aois gj;oaiisjgd ;aosigd hjasdg
aspoiigdh paog paoig jao;s gj[aoiugd poaish g;kajshdg;kang;lkashg j

#### Pretrained models
dfasgasdfgsdfgsd dfsgkj hsdlg sldifug sjg sldfkjfgh lskjdjghf lksjdfhg lka glijdjg hlkjjgfh lkfdgh
sldfjkgh sdf g;oddg dfjgh ;klafhgj;kahfg; afg adfg kajgh ;akljfgh;aookdfgh afkjg akjgfh
pioadfhg pag ;aog ;aoishjgf poasdfj ipoahj g;a gjf;aois gj;oaiisjgd ;aosigd hjasdg
aspoiigdh paog paoig jao;s gj[aoiugd poaish g;kajshdg;kang;lkashg j

#### Prediction
dfasgasdfgsdfgsd dfsgkj hsdlg sldifug sjg sldfkjfgh lskjdjghf lksjdfhg lka glijdjg hlkjjgfh lkfdgh
sldfjkgh sdf g;oddg dfjgh ;klafhgj;kahfg; afg adfg kajgh ;akljfgh;aookdfgh afkjg akjgfh
pioadfhg pag ;aog ;aoishjgf poasdfj ipoahj g;a gjf;aois gj;oaiisjgd ;aosigd hjasdg
aspoiigdh paog paoig jao;s gj[aoiugd poaish g;kajshdg;kang;lkashg j

## Implementation


### Prerequisites

### Overlay Function

### Plot Box Function

### Camera Feed Preparation

### Loading Model

### Main Loop

### Clean-up and Resource Management

## Usage