# 🧠 AI Tasks and Data Domains

---

## 🎯 Overview

In this lesson, we will focus on **AI tasks** and the **types of data** associated with three primary AI domains:

- **Language**
- **Audio and Speech**
- **Vision**

Each of these domains has specific data types, model architectures, and applications. Let's explore each one.

---

## 🗣️ Language Domain

### 🔹 Types of Language AI Tasks

Language-related AI tasks can be:

- **Text-related** (analyzing existing text)
- **Generative AI** (creating new text)

### 🧩 Text-related Tasks
These tasks use **text as input**, and the output can vary depending on the task.

**Examples:**
- Detecting language
- Extracting entities or key phrases
- Translating text between languages

For instance, in a translation tool:
1. Type or paste your text.  
2. Choose source and target languages.  
3. Click “Translate.”

---

### 🧠 Generative AI Tasks

Generative AI tasks **generate new text** based on training from large language models.

**Examples:**
- Writing stories or poems
- Summarizing text
- Answering questions

The best-known example is **ChatGPT**, which generates responses using **large language models** and **machine learning**.

---

### 🔢 How Text as Data Works

Text is **sequential data**. It consists of sentences made of words, which must be converted into **numbers** to train AI models.

- **Tokenization:** Converts words into numerical tokens.  
- **Padding:** Ensures sentences have equal lengths.  
- **Embedding:** Represents word similarity (e.g., cosine similarity).

---

### ⚙️ Language AI Models

Language AI models are trained on large textual datasets for **Natural Language Processing (NLP)** tasks.

**Common Model Architectures:**

| Architecture | Description |
|---------------|--------------|
| **Recurrent Neural Networks (RNNs)** | Process data sequentially and store hidden states. |
| **Long Short-Term Memory (LSTM)** | Handle sequential data and maintain context using gates. |
| **Transformers** | Process data in parallel and use **self-attention** for contextual understanding. |

---

## 🔊 Audio and Speech Domain

### 🔹 Types of Audio AI Tasks

Audio and speech AI tasks can be:

- **Audio-related**
- **Generative AI**

### 🧩 Audio-related Tasks

These use **audio or speech as input**, with output depending on the goal.

**Examples:**
- Speech-to-text conversion  
- Speaker recognition  
- Voice conversion  

---

### 🎶 Generative Audio AI Tasks

Generative AI in this domain **creates audio output**.

**Examples:**
- Music composition  
- Speech synthesis  

---

### 📈 How Audio Data Works

Audio is **digitized** as snapshots taken over time.

- **Sample rate:** Number of samples per second (e.g., 44.1 kHz → 44,100 samples/sec).  
- **Bit depth:** Number of bits in each audio sample (represents richness of information).  

A single sample reveals little; **multiple samples must be correlated** to form meaningful sound data.

---

### ⚙️ Audio AI Model Architectures

| Architecture | Description |
|---------------|--------------|
| **RNNs** | Capture sequential patterns in sound. |
| **LSTMs** | Retain temporal context in audio sequences. |
| **Transformers** | Enable parallel processing and self-attention in audio signals. |
| **Variational Autoencoders (VAEs)** | Generate or reconstruct new audio signals. |
| **Waveform Models** | Model raw audio waveforms directly. |
| **Siamese Networks** | Learn similarity between different audio samples. |

---

## 👁️ Vision Domain

### 🔹 Vision AI Tasks

Vision tasks use **images** as input.

**Examples:**
- Image classification  
- Object detection  
- Facial recognition  

Facial recognition is widely used in:
- **Security**
- **Biometrics**
- **Law enforcement**
- **Social media**

---

### 🎨 Generative Vision AI Tasks

These tasks generate new visual content.

**Examples:**
- Creating images from text descriptions  
- Generating stylized or high-resolution images  
- Producing 3D models (e.g., objects, buildings, people)

---

### 🧮 Images as Data

Images are made up of **pixels** (grayscale or color). A single pixel conveys little meaning — models must interpret patterns across many pixels.

---

### ⚙️ Vision AI Model Architectures

| Architecture | Description |
|---------------|--------------|
| **Convolutional Neural Networks (CNNs)** | Detect patterns in images; learn visual hierarchies. |
| **YOLO (You Only Look Once)** | Detects and locates objects within images in real time. |
| **Generative Adversarial Networks (GANs)** | Generate realistic images and videos. |

---

## 📊 Other AI Tasks Across Domains

### 🔸 Anomaly Detection
- Detect unusual patterns in **time-series data**.
- Applications: **fraud detection**, **machine failure prediction**.

### 🔸 Recommendation Systems
- Recommend items using data from **similar users** or **similar products**.
- Used in **e-commerce** and **media platforms**.

### 🔸 Forecasting
- Predict future trends using **time-series data**.
- Applications: **weather prediction**, **stock market forecasting**.

---

## 🏁 Key Takeaways

- AI spans three major data domains: **Language**, **Audio/Speech**, and **Vision**.  
- Each domain requires unique data preparation and specialized deep learning models.  
- **Generative AI** enables creation — of text, audio, or images — rather than just analysis.  
- Architectures like **Transformers**, **CNNs**, and **GANs** power modern AI advancements.  

---
