### Go through https://www.datacamp.com/blog/yolo-object-detection-explained
### Write in points what you understood. You can write in many points as you like.
### You can also refer other blogs related to YOLO.

<h1> Introduction </h1>
<ul>
<li>YOLO is a real-time object detection algorithm, object detection in computer vision involves identifying and localizing objects within images or videos using bounding boxes.</li>
<li>Fast R-CNN cannot be used in real-time, because it takes 2-3 seconds, but YOLO, only one forward pass is required through the network to make the final prediction. </li>
<li>yolo reframe object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities.</li>

<li>YOLO is a fast convolutional network that predicts multiple bounding boxes and class probabilities for images. It trains on full images and optimizes detection performance, reducing latency to less than 25 milliseconds. </li>

<li>YOLO, trained on the ImageNet-1000 dataset, achieves 88% accuracy on ImageNet 2012 validation, comparable to GoogLeNet. It uses fewer layers and filters, and a sum-squared error loss function for easy optimization.</li>

<li>The architecture divides an image into SxS grids, detecting objects whose center is in that grid. Each grid predicts bounding boxes with confidence scores, indicating object presence and how precise it predicts the bounding box coordinates.</li>

<li>YOLO is a fast, efficient real-time system capable of processing images at 45 FPS and exceeding the mean Average Precision (mAP) of other systems.</li>

<li>.YOLO has very few background mistakes and a significantly higher accuracy than other state-of-the-art models.</li>
</ul>

<h1>Architercture</h1>
<ul>
<li> YOLO architecture is similar to GoogleNet.It has overall 24 convolutional layers, four max-pooling layers, and two fully connected layers. </li>
<li> input image is resized into 448x448 before going through convolution network</li>
<li>to reduce the number of channels 1x1 convolution is applied, which is then followed by 3x3 convolution to generate cuboidal output</li>
<li>ReLU activation function is used, except for the last layer, which use linear activation function</li>
<li>batch normalization and dropout are done to regularize the model and prevent overfitting</li>
</ul>

<h1>How YOLO object detection works</h1>
<ul>
<li> Residual blocks: This step involves dividing the original image into NxN grid cells, each predicting the object's class and probability/confidence value.</li>
<li>Bounding box regression: YOLO determines bounding boxes for images, using a single regression module. It uses Y = [pc, bx, by, bh, bw, c1, c2] for each bounding box, including probability score, coordinates, and classes(which can be as many as required). </li>
<li>Intersection Over Unions(IOU): YOLO uses an IOU threshold to discard irrelevant grid box candidates, focusing on those with an IOU > threshold, ignoring those with an IOU ≤ threshold.</li>
<li>Non-Max Suppression(NMS):NMS can be used to select boxes with the highest probability score of detection, as setting a threshold for IOU may not be sufficient.</li>
</ul>

<h1>Evolution of YOLO</h1>
<ol>
<li>
YOLO or YOLOv1: 
<ul> 
<li>Single-shot object recognition was first introduced by the original YOLO, which divided the image into a grid and predicted bounding boxes and class probabilities in a unified way.</li>
</ul>
</li>
<li>
YOLOv2/YOLO9000:
<ul>
<li>Added Darknet-19 architecture, which uses batch normalization, and improved speed and accuracy by using anchor boxes for better localization.</li>
<li>Presented the idea of multi-scale detection with feature maps from several levels.</li>
<li> Increased the number of object classes to over 9,000 categories by combining WordNet's hierarchical categorization algorithm with YOLO.</li>
</ul>
</li>
<li>
YOLOv3:
<ul>
<li> More feature pyramid scales were added to improve accuracy and allow for numerous resolutions of detection.</li>
<li>Independent logistic classifiers have been used to accurately predict class of bounding boxes instead of using softmax as in YOLOv2.</li>
</ul>
</li>
<li>
YOLOv4:
<ul>
<li> By using CSPDarknet53 as the backbone and leveraging Cross-Stage Partial connections to boost information flow, accuracy and speed were further increased.</li>
<li>Used a variety of methods, including spatial pyramid pooling, PANet, and Mish activation.</li>
<li>performs hyper-paramter selection using genetic algorithms</li>
</ul>
</li>
<li>
YOLOR:
<ul>
<li>It is based on the unified network which is a combination of explicit(conscious) and implicit(subconscious) knowledge approaches.</li>
<li>More robost architecture is created based on, feature alignment, prediction alignment for object detection, and canonical representation for multi-task learning.</li>
</ul>
</li>
<li>
YOLOX:
<ul>
<li>Introduced a flexible and scalable anchor-free detector that makes use of a detection head known as Decoupled Head.</li>
<li>Emphasized efficiency and adaptability for a range of deployment scenarios.</li>
</ul>
</li>
<li>
YOLOv5:
<ul>
<li>First YOLO version implemented in Pytorch, uses CSPDarknet53 as its backbone. </li>
<li>Includes a Focus layer, reducing layers and parameters while increasing forward and backward speed without significantly impacting the mAP.</li>
</ul>
</li>
<li>
YOLOv6:
<ul>
<li>Meituan, a Chinese e-commerce company, has released the YOLOv6 framework, designed for industrial applications. </li>
<li>the backbone was inspired by the original one-stage YOLO architecture</li>
<li>significant improvements to previous YOLOv5 are: hardware-friendly backbone, efficient decoupled head, and a more effective training strategy. </li>
</ul>
</li>
<li>
YOLOv7:
<ul>
<li>It surpassed all the previous models in terms of accuracy and speed.</li>
<li>YOLOv7 has improved its architecture by integrating E-ELAN and scaling it with models like YOLOv4, Scaled YOLOv4, and YOLO-R, enhancing learning diversity and inference speed.</li>
<li>YOLOv7's trainable bag-of-freebies approach enhances model accuracy without increasing training costs, enhancing both inference speed and detection accuracy.</li>
</ul>
</li>
<li>
YOLOv8:
<ul>
<li>By leveraging efficiant neural network architecture, YOLOv8 can process images in real-time, enabling quick and reliable object recognition.</li>
<li>YOLOv8 excels in detecting small objects, enhancing it appliability in senarios where precision detection is crucial.</li>
<li>Anchor-free architecture, it also employs a multi-scale prediction method, allowing it to detect objects at various scales within an images.</li>
</ul>
</li>
</ol>

<h3>Additional Refrences</h3>
<pre>
https://www.geeksforgeeks.org/yolo-you-only-look-once-real-time-object-detection/
https://towardsdatascience.com/yolo-you-only-look-once-real-time-object-detection-explained-492dc9230006
https://towardsdatascience.com/review-yolov2-yolo9000-you-only-look-once-object-detection-7883d2b02a65
https://keylabs.ai/blog/comparing-yolov8-and-yolov7-whats-new/
</pre>