# **COMPUTER VISION - INTRODUCTION**

### What is a Computer Vision Problem?

A computer vision problem involves tasks or challenges where computers are trained to interpret, understand, and analyze visual information (such as images or videos) from the world. These problems aim to enable machines to be able to "SEE" and make decisions based on visual data, mimicking human vision capabilities.


### What are the types of Computer Vision Problems?

Examples of computer vision problems include:
* **Image Classification:** Categorizing an image into predefined categories (e.g., determining if an image is of a cat or a dog).
* **Object Detection:** Identifying specific objects (like cars, animals, or people) within images or videos.
* **Biometric Recognition:** Identifying or verifying individuals based on facial features.
* **Image Segmentation:** Dividing an image into distinct regions or objects (e.g., separating a background from a foreground).
* **Optical Character Recognition (OCR):** Extracting text from images or scanned documents.
* **Activity Recognition:** Understanding and identifying human activities in videos.
* **3D Reconstruction:** Creating three-dimensional models of objects or scenes from 2D images.
</br>
<img src="../resources/Types_of_CV_Tasks.png"></img>
</br></br>

### What makes Computer Vision efficient?

Convolutional Neural Networks (CNNs). These are a type of special Neural Networks architecture particularly designed to handle image data effectively and efficiently.

The reasons behind CNNs to be powerful are:

1. **Feature Extraction with Convolution Layers:** CNNs use convolution layers that automatically detect and extract important features from an image, such as edges, textures, shapes, and more. This eliminates the need for manual feature engineering, saving time and effort.


2. **Spatial Hierarchy of Features:** By using multiple layers, CNNs first capture simple patterns like edges and corners. As the network goes deeper, it identifies more complex patterns, such as objects or faces, creating a hierarchy of features.


3. **Parameter Sharing and Sparsity:** Convolutional layers use the same filters (kernels) across different parts of an image. This reduces the number of parameters significantly, making CNNs computationally efficient and less prone to overfitting.


4. **Pooling Layers:** These layers reduce the spatial dimensions of images while retaining important information, further optimizing computations and reducing data complexity.


5. **Translation Invariance:** CNNs are designed to recognize objects regardless of their position in the image. This makes them highly robust for tasks like object detection and classification.

## What is a Convolutional Neural Network?

A Convolutional Neural Network (CNN) is a type of deep learning algorithm specifically designed for processing structured grid-like data, such as images.

</br>

### Key Components of a CNN

##### 1. **Convolutional Layer** 
* This is the core building block of a CNN. 
* It **applies filters (kernels)** to the input image to detect specific features, such as edges, patterns, or textures. 
* The result is a **set of feature maps** that highlight important parts of the image.

##### 2. **ReLU (Rectified Linear Unit) Activation**
Introduced after the convolutional layer, ReLU **removes negative values from the feature maps to add non-linearity** to the model, enabling it to learn complex patterns.

##### 3. **Pooling Layer**
* This layer **reduces the spatial dimensions** of feature maps while retaining important information.
* Common pooling methods include **max pooling** (selecting the maximum value in a region) and **average pooling** (calculating the average value).

##### 4. **Fully Connected Layer**
At the final stage of the network, these layers **flatten the feature maps into a vector** and make predictions, such as classifying images into categories.

##### 5. **Softmax or Sigmoid Function**
This function converts the **final output into probabilities**, useful for classification tasks.

</br>
<img src="../resources/CNN_Architecture.png"></img>

### Learning Computer Vision Basics

1. Pytorch Computer Vision Library `torchvision`