## 1. What are the objective of using Selective Search in R-CNN ?

- Selective Search is used as a region proposal method in R-CNN (Region-based Convolutional Neural Network) to generate potential regions of interest (ROIs) in an input image.
- The objective of using Selective Search in R-CNN is to efficiently propose a diverse set of regions that likely contain objects, helping to focus subsequent computational efforts on the most promising areas of the image. 

***

## 2. Explain the following phases involved in R-CNN:


###    a. Region Proposal


`In the region proposal phase, potential regions of interest (ROIs) are generated from the input image. These regions are proposed based on selective search or another region proposal method. The goal is to identify candidate regions that may contain objects.`

###    b. Warping and resizing


`The purpose of warping and resizing is to make the regions compatible with the input size expected by the subsequent neural network layers.`

###    c. Pre trained CNN architecture


`The regions of interest, now warped and resized, are passed through a pre-trained CNN (Convolutional Neural Network) architecture. The CNN serves as a feature extractor, capturing relevant features from the proposed regions.`

###    d. Pre trained SVM models


`Following the CNN feature extraction, a support vector machine (SVM) is trained for each object class. These SVM models are used to classify the extracted features into different classes, determining whether a given region contains an object of interest or not. `

###    e. Clean up


`After classification, a post-processing step is performed to eliminate duplicate or highly overlapping bounding box proposals. Non-maximum suppression (NMS) is commonly used to clean up the bounding box proposals, keeping only the most confident predictions.`

###    f. Implementation of bounding box

`The final step involves implementing bounding boxes around the detected objects based on the refined region proposals and their classifications. `

***

##  3. What are the possible pre trained CNNs we can use in Pre trained CNN Architecrure ??


- VGG (Visual Geometry Group) Networks: VGG16, VGG19
- Residual Network (ResNet): ResNet50, ResNet101, ResNet152
- GoogLeNet (Inception): InceptionV3
- Xception: Xception
- DenseNet: DenseNet121, DenseNet169, DenseNet201
- AlexNet: AlexNet

***

## 4. How is SVM implemented in the R-CNN framework ??

- The region proposals are warped and resized to a fixed size, usually to match the input size expected by a pre-trained Convolutional Neural Network (CNN).
- The warped and resized region proposals are fed through a pre-trained CNN, such as VGG, ResNet, or another architecture. 
- For each class, a separate SVM is trained using the features extracted by the pre-trained CNN. These SVMs are binary classifiers that determine whether a given region proposal belongs to a particular object class or not.

- The combination of region proposal, feature extraction using a pre-trained CNN, and classification using **SVMs** allows the R-CNN framework to detect and classify objects in images. 

***

## 5. How does Non-maximum Supression work ??

- Sort all the bounding box proposals in descending order based on their confidence scores. The proposal with the highest confidence score comes first.
- Start with the bounding box proposal that has the highest confidence score. 
- Calculate the Intersection over Union (IoU) between the candidate box and all other remaining boxes in the sorted list. IoU is a measure of the overlap between two bounding boxes and is calculated as the area of intersection divided by the area of the union.
- Set a predefined IoU threshold (e.g., 0.5). If the IoU between the candidate box and any other box exceeds this threshold, discard the box with the lower confidence score.
- Repeat the process by selecting the next highest-scoring box from the sorted list and calculating IoU with the remaining boxes. Again, discard any boxes that exceed the IoU threshold.
- Continue this process until all boxes in the sorted list have been considered.
- **This cleaning process is called Non-maximum Supression work**

***

## 6. How Fast R-CNN is better than R-CNN ??


- R-CNN used a separate algorithm (Selective Search) for region proposals, which was slow and a bottleneck in the system. Fast R-CNN, on the other hand, incorporates the region proposal step into the neural network itself, making it an end-to-end trainable system.
- In R-CNN, each region proposal was independently processed by a CNN, resulting in redundant computations for overlapping regions. Fast R-CNN shares the computation for overlapping regions, making the process more efficient.
- R-CNN extracts features separately for each region proposal, leading to redundant feature computation. In Fast R-CNN, features are extracted only once for the entire image, and the region of interest (ROI) pooling layer is used to obtain fixed-size feature vectors for each region proposal. 
- Fast R-CNN allows end-to-end training of the entire system, including the region proposal network (RPN) and the object detection network. 

***

## 7. Using mathematical intuition, explain ROI polling in Fast R-CNN ?


- ROI (Region of Interest) pooling is a crucial step in Fast R-CNN that allows the extraction of a fixed-size feature vector from an arbitrary-sized region of the feature map.
- Suppose we have a feature map of size `W × H × C (width, height, channels)`.
- For a given region proposal, represented by the coordinates `(x,y,w,h)`, where `(x,y)` is the top-left corner, w and h are the width and height, respectively.
- Divide the region proposal into a fixed grid. Each grid cell corresponds to a fraction of the original region. Apply max pooling independently in each grid cell.The result is a fixed-size feature map for the region proposal.

`ROI pooling=MaxPool({Feature map(x 
i
​
 ,y 
i
​
 )∣i=1,2,...,N})`

***

## 8. Explain Following Process...

### a. ROI Projection


- ROI projection refers to the mapping of a region proposal from the original image space to the feature map space. 
- In Fast R-CNN, this is essential because the region proposals are generated based on the original image, but the subsequent object detection is performed on a feature map.
- Mathematically, ROI projection involves scaling and mapping the coordinates of the region proposal (x,y,w,h) from the original image space to the corresponding coordinates in the feature map space.

### b. ROI polling

- ROI pooling is a process in Fast R-CNN that extracts fixed-size feature maps from the feature map corresponding to a region proposal. 
- This is necessary because region proposals can be of different sizes, and the subsequent fully connected layers require a consistent input size.

***

## 9. In comparison with R-CNN, why did the object classifier activate function change in Fast R-CNN ?


- In R-CNN, the object classifier used softmax activation for object classification. However, in Fast R-CNN, the softmax activation was replaced with a softmax function applied independently for each class, combined with a binary sigmoid activation for the background class.

- Multi-Class Classification within Regions:
  `In R-CNN, each region proposal was classified independently into one of the classes using a softmax activation. This approach didn't consider the fact that an object might belong to multiple classes simultaneously.`

- Binary Background Class Activation:
  `Fast R-CNN introduced a binary sigmoid activation for the background class. This allows the model to distinguish between the object classes and the background, treating background as a separate class.`

- Efficiency and Training Simplicity:
  `The change in the activation function simplifies the training process. It allows for more flexibility in handling multi-class scenarios and improves the efficiency of the training process.`

***

## 10. What major changes in Faster R-CNN compared to Fast R-CNN ?


Faster R-CNN is an extension of Fast R-CNN that introduces a Region Proposal Network (RPN) to generate region proposals, eliminating the need for an external region proposal method like Selective Search.

- Region Proposal Network (RPN):
  `The most significant change is the integration of the Region Proposal Network (RPN) into the overall architecture. The RPN generates region proposals based on anchor boxes and is trained simultaneously with the object detection network.`

- End-to-End Training:
  `Faster R-CNN enables end-to-end training of both the RPN and the object detection network. This unified training approach leads to better optimization and improved overall performance.`

- Anchor Boxes:
  `Faster R-CNN uses anchor boxes (also called anchor boxes or default boxes) in the RPN to propose potential regions of interest. These anchor boxes are pre-defined bounding boxes of different scales and aspect ratios. The RPN predicts adjustments to these anchor boxes to refine the proposals.`

- Shared Convolutional Features:
  `The convolutional features are shared between the RPN and the object detection network, reducing redundancy and computational cost.`

***

## 11. Explain the concept of Anchore box.

- Anchor boxes, also known as anchor boxes or default boxes, are a crucial concept in object detection models like Faster R-CNN. They are used by the Region Proposal Network (RPN) to generate potential bounding box proposals.
- The use of anchor boxes contributes to the efficiency and accuracy of the region proposal process in object detection models like Faster R-CNN.

***

## 12. Implementation of Fast R-CNN using 2017 COCO dataset i.e train dataset, test dataset and valid dataset. Use pre-trained backbone network like, Resnet or VGG for feature extraction. 

In [3]:
!pip install tensorflow

Collecting tensorflow
  Using cached tensorflow-2.13.1-cp38-cp38-win_amd64.whl.metadata (2.6 kB)
INFO: pip is looking at multiple versions of tensorflow to determine which version is compatible with other requirements. This could take a while.
  Using cached tensorflow-2.13.0-cp38-cp38-win_amd64.whl.metadata (2.6 kB)
Collecting tensorflow-intel==2.13.0 (from tensorflow)
  Using cached tensorflow_intel-2.13.0-cp38-cp38-win_amd64.whl.metadata (4.1 kB)
Collecting astunparse>=1.6.0 (from tensorflow-intel==2.13.0->tensorflow)
  Using cached astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting flatbuffers>=23.1.21 (from tensorflow-intel==2.13.0->tensorflow)
  Using cached flatbuffers-23.5.26-py2.py3-none-any.whl.metadata (850 bytes)
Collecting gast<=0.4.0,>=0.2.1 (from tensorflow-intel==2.13.0->tensorflow)
  Using cached gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting google-pasta>=0.1.1 (from tensorflow-intel==2.13.0->tensorflow)
  Using cached google_pasta-0.2.0-py3-none-any.whl (57 kB

In [4]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.imagenet_utils import preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import load_img, img_to_array