#  1. Zero-Shot Learning (ZSL)

**Definition:**
A model is asked to perform a task or recognize a class **it has never seen during training**, without any labeled examples of that class. Instead, it relies on **auxiliary information** such as natural language descriptions, attributes, or embeddings.

**Key Idea:**
 The model transfers knowledge from known tasks/classes to unseen ones using **semantic relationships**.

**How it works (intuition):**

* During training, the model learns a joint representation of **inputs** (e.g., images, text) and **descriptions/labels** (e.g., word embeddings, prompts).
* At test time, it gets a new class (e.g., "zebra") that wasn’t in training data, but it knows "zebra = an animal with black and white stripes".
* The model uses its learned embedding space to match the input image with the description.

**Example:**

* Train on animals like **cats, dogs, horses**.
* At inference, ask it to classify a **zebra**.
* Even though "zebra" images were never seen, the model matches "zebra" with its **textual description or attributes**.

**Applications:**

* **Image classification:** CLIP (OpenAI) → Given text like "a photo of a zebra", it matches unseen images.
* **Machine translation:** Translate English → German without training directly, by pivoting through embeddings.
* **Text-to-SQL, text-to-code:** LLMs answering queries in domains they were not explicitly trained for.

---

#  2. Few-Shot Learning (FSL)

**Definition:**
A model is asked to perform a task or recognize a class with **only a handful of labeled examples per class** (e.g., 1–10 samples).

**Key Idea:**
 Learn to **generalize from very few examples** by leveraging **meta-learning** or pretrained representations.

**How it works (intuition):**

* The model is pretrained on a large dataset (like ImageNet or massive text).
* During fine-tuning or prompting, it adapts to a new task with very few labeled examples.
* Often done with **meta-learning approaches** (like Prototypical Networks, Matching Networks, MAML).

**Example:**

* You want a model to recognize **penguins**, but you only have 5 penguin images.
* A pretrained CNN + metric-learning approach can cluster embeddings, so even 5 samples are enough to define a "penguin class".

**Applications:**

* **Medical imaging:** Often, only a few labeled disease samples are available.
* **LLMs:** Few-shot prompting in GPT – give it 3–4 examples in your prompt, and it generalizes to new queries.
* **Speech recognition:** Adapt to a new speaker with a few audio clips.

---

# 3. Zero-Shot vs. Few-Shot (Comparison)

| Aspect                                | Zero-Shot                                               | Few-Shot                             |
| ------------------------------------- | ------------------------------------------------------- | ------------------------------------ |
| **Training exposure to target class** | ❌ Never seen                                            | ✅ Seen a few labeled examples        |
| **Auxiliary info needed**             | Semantic knowledge (text, attributes, prompts)          | Small labeled dataset                |
| **Generalization**                    | From **descriptions/embeddings**                        | From **few examples**                |
| **Example**                           | Classify "zebra" with no zebra images, only description | Classify "zebra" with 5 zebra images |
| **Common in**                         | CLIP, LLMs (zero-shot QA, translation)                  | Meta-learning, LLM few-shot prompts  |

---

#  4. Intuitive Analogy

* **Zero-Shot:** You’ve never eaten "ramen" but someone tells you *"it’s like noodles in soup with toppings."* You recognize it the first time you see it.
* **Few-Shot:** You try "ramen" only 3 times → Now you can reliably recognize ramen in the future.

---

In deep learning, **pretrained models + embeddings (CLIP, LLMs, transformers)** made **zero-shot and few-shot learning practical**.
Instead of training from scratch, we now rely on **transfer learning + prompt engineering + meta-learning**.

---




#  1. Zero-Shot Learning Networks

These rely heavily on **pretraining + semantic embeddings** (text, attributes, prompts).

###  Vision (Images)

* **CLIP (OpenAI, 2021)**
  Learns a joint embedding of image and text → enables zero-shot image classification with natural language prompts.
* **ALIGN (Google, 2021)**
  Similar to CLIP, large-scale image–text contrastive training.
* **DeViSE (2013)**
  Maps images into a word embedding space (Word2Vec/GloVe) for zero-shot classification.
* **Zero-shot GANs** (e.g., *T2F, StyleGAN adaptations*)
  Generate unseen categories based on textual descriptions.

### 📖 NLP (Text)

* **GPT family (GPT-3, GPT-4, etc.)**
  Zero-shot text generation & QA with only a prompt (no labeled data).
* **BERT + prompt-based learning**
  With "masked language modeling" can adapt to unseen labels when reframed as text prediction.
* **T5 (Text-to-Text Transfer Transformer)**
  Zero-shot across tasks (e.g., summarization, translation) by framing everything as text-to-text.

###  Multimodal

* **Florence (Microsoft)** – large multimodal foundation model.
* **X-CLIP, BLIP, Flamingo (DeepMind)** – multimodal models with strong zero-shot transfer.

---

# 🔹 2. Few-Shot Learning Networks

These are designed to generalize from **very few examples**, often using **meta-learning** or **metric learning**.

###  Vision (Images)

* **Matching Networks (2016)**
  Learn to compare query images to few labeled support examples.
* **Prototypical Networks (2017)**
  Represent each class by the mean embedding ("prototype") → classify queries by distance to prototypes.
* **Relation Networks (2018)**
  Learn a similarity function between query and support samples.
* **MAML (Model-Agnostic Meta-Learning, 2017)**
  Learns an initialization that quickly adapts to new tasks with few gradient steps.
* **Siamese Networks (2015)**
  Pairwise comparison of embeddings → good for 1-shot classification.

###  NLP (Text)

* **GPT-3/4 with Few-Shot Prompting**
  Provide a few Q\&A examples in the prompt → model generalizes to new queries.
* **PET (Pattern-Exploiting Training)**
  Fine-tunes masked language models with a few examples.
* **Meta-learning for NLP (e.g., ProtoBERT, FewShotBERT)**.

###  Speech & Multimodal

* **Speech few-shot**: Adaptation of wav2vec2.0 or HuBERT to a new speaker with few samples.
* **Vision-Language few-shot**: CLIP + Prototypical networks for few-shot classification.

---

#  3. Summary (Networks by Type)

| Type          | Networks / Models                                                                                           | Core Idea                                                                        |
| ------------- | ----------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
| **Zero-Shot** | CLIP, ALIGN, DeViSE, GPT-3/4, T5, BERT (prompted), Flamingo                                                 | Use pretrained embeddings + semantic info (text/attributes)                      |
| **Few-Shot**  | Prototypical Networks, Matching Nets, Relation Nets, MAML, Siamese Nets, Few-Shot BERT, GPT-3 with examples | Learn to generalize from few labeled samples via meta-learning / metric learning |

---

 **Rule of thumb:**

* If you have **no samples** of the new task/class → **Zero-Shot (CLIP, GPT-3, T5)**.
* If you have **1–10 samples per class** → **Few-Shot (ProtoNets, Matching Nets, MAML, or LLM prompting with examples)**.

---
