# Unit1 - Fundamentals


Data types


| Feature                 | Image                                            | Video                                           | Audio                                              | Tabular Data                                                   |
|-------------------------|--------------------------------------------------|-------------------------------------------------|---------------------------------------------------|----------------------------------------------------------------|
| **Type**                | Single moment in time                            | Sequence of images over time                    | Single moment in time                             | Structured data organized in rows and columns                   |
| **Data Representation** | Typically a 2D array of pixels                   | Typically a 3D array of frames                  | Typically a 1D array of audio samples             | Typically a 2D array of features as columns and rows (spreadsheet, database tables) |
| **File Types**          | JPEG, PNG, RAW, etc.                             | MP4, AVI, MOV, etc.                             | WAV, MP3, FLAC, etc.                              | CSV, Excel (.xlsx, .xls), Database formats, etc.               |
| **Data Augmentation**   | Flipping, rotating, cropping                     | Temporal jittering, speed variations, occlusion | Background noise addition, reverberation, spectral manipulation | ROSE, SMOTE, ADASYN                                             |
| **Feature Extraction**  | Edges, textures, colors                          | Edges, textures, colors, optical flow, trajectories | Spectrogram, Mel-Frequency Cepstral Coefficients (MFCCs), Chroma features | Statistical analysis, Feature engineering, Data aggregation     |
| **Learning Models**     | CNNs                                             | RNNs, 3D CNNs                                   | CNNs, RNNs                                         | Linear Regression, Decision Trees, Random Forests, Gradient Boosting |
| **Machine Learning Tasks** | Image classification, Segmentation, Object Detection | Video action recognition, temporal modeling, tracking | Speech recognition, speaker identification, music genre classification | Regression, Classification, Clustering                           |
| **Computational Cost**  | Less expensive                                  | More expensive                                  | Moderate to high                                  | Generally less expensive compared to others                     |
| **Applications**        | Facial recognition for security access control  | Sign language interpretation for live communication | Voice assistants, Speech-to-text, Music genre classification | Predictive modeling, Fraud detection, Weather forecasting       |

## Understanding Resolution in Digital Imaging

Spatial resolution refers to the smallest distinguishable detail in an image and is often measured in line pairs per unit distance or pixels per unit distance. The meaningfulness of spatial resolution is context-dependent, varying according to the spatial units used. For example, a 20-megapixel camera typically offers higher detail resolution than an 8-megapixel camera. Intensity resolution relates to the smallest detectable change in intensity level and is often limited by the hardware’s capabilities. It’s quantized in binary increments, such as 8 bits or 256 levels. The perception of these intensity changes is influenced by various factors, including noise, saturation, and the capabilities of human vision.

![](https://huggingface.co/datasets/hf-vision/course-assets/resolve/4def8c412ee6b08f4522e818a0474d155363d87b/pic_7.png)

## Imaging in real-life

As human species, we only see a fraction of the spectrum. We call that the visible spectrum. The image below shows us just how narrow it is:

![](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/human_spectrum.jpg)

To see more than what Mother Nature has given us, we need sensors capturing beyond that spectrum. In other words, we need to detect things at different wavelengths. Infrared (IR) is used in night vision devices and some astronomical observations. Magnetic resonance uses strong magnetic fields and radio waves to image soft human tissues. We created ways to see things that do not rely on light. For instance, electron microscopy uses electrons to zoom in at much higher resolution than traditional light. Ultrasound is another great example. Ultrasound imaging harnesses sound waves to create detailed, real-time images of internal organs and tissues, offering a non-invasive and dynamic perspective that goes beyond what is achievable with standard light-based imaging methods.

## What is computer vision

Computer vision is the science and technology of making machines see. It involves the development of theoretical and algorithmic methods to acquire, process, analyze, and understand visual data, and to use this information to produce meaningful representations, descriptions, and interpretations of the world

![](https://huggingface.co/datasets/hf-vision/course-assets/resolve/743a2a115b53f258c9e6bc7744534d9e03b8a124/CV_in_defintiion.png)

### CV Tasks

We have seen before that computer vision is really hard for computers because they have no previous knowledge of the world. In our example, we start knowing what a ball is, how to track its movement, how objects usually move in space, how to estimate when the ball will reach us, where your foot is, how a foot moves, and how to estimate how much force you need to hit the ball. If we were to break this down into specific computer vision tasks, we would have:

Scene Recognition

- Object Recognition
- Object Detection
- Segmentation (instance, semantic)
- Tracking
- Dynamic Environment Adaptation
- Path Planning

You will read more about the core tasks of computer vision in the Computer Vision Tasks chapter. But there are many more tasks that computer vision can do! Here is a non-exhaustive list:

- Image Captioning
- Image Classification
- Image Description
- Anomaly Detection
- Image Generation
- Image Restoration
- Autonomous Exploration
- Localization

# Challenges CV systems


| Factor                                | Challenges                                                                                                                                                                                |
|---------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Variability in Data                  | The data collected from the real world is highly diverse, with variations in lighting, viewpoint, occlusions, and backgrounds, making it challenging for reliable computer vision systems to be developed. |
| Scalability                           | Computer vision systems need to be scalable to manage large datasets and meet real-time processing requirements due to the continuous increase in visual data.                             |
| Accuracy                              | Achieving high accuracy in object detection, scene interpretation, and tracking is a significant challenge, especially in complex or cluttered scenes, often due to noise, irrelevant features, and poor image quality. |
| Robustness to Noise                  | Real-world data is noisy, containing defects, sensor artifacts, and distortions. Computer vision systems must be robust enough to handle and process such noisy data effectively.          |
| Integration with Other Technologies   | Integrating computer vision with technologies like natural language processing, robotics, or augmented reality poses challenges related to system interoperability, expanding the usability of machine learning and computer vision. |
| Privacy and Ethical Concerns          | Real-world applications of computer vision, especially in surveillance, facial recognition, and data gathering, raise concerns about privacy and ethics, necessitating proper handling of databases and personal information. |
| Real-time Processing                  | Applications like autonomous vehicles and augmented reality require real-time processing, posing challenges in achieving the necessary computational efficiency, often requiring substantial computational power and capable cloud platforms. |
| Long-term Reliability                 | Maintaining the reliability of computer vision systems over extended periods in real-life scenarios is challenging, as ensuring continued accuracy and flexibility can be difficult.         |
| Generalization                        | Developing models with good generalization across diverse contexts and domains is a significant challenge, requiring the ability to adapt to changing circumstances without extensive retraining. |
| Calibration and Maintenance           | Calibrating and maintaining hardware, such as cameras and sensors, in real-world settings presents challenges, often due to logistical complications and the need to withstand extreme weather conditions. |

## ethical considerations

![](https://huggingface.co/datasets/hf-vision/course-assets/resolve/743a2a115b53f258c9e6bc7744534d9e03b8a124/ethical_considerations.png)