# **1. Neural Architecture Search (NAS)**
**Neural Architecture Search (NAS)** is a subfield of **AutoML (Automated Machine Learning)** that aims to **automatically design neural network architectures** rather than hand-crafting them.

The idea is to let an algorithm *search* through a space of possible network designs to find one that performs best for a given task (e.g., image classification, detection, or segmentation).

---

## 1.1. Motivation

Traditionally, researchers manually design architectures like **ResNet**, **DenseNet**, or **Transformer** based on intuition and trial-and-error.
However, there are many hyperparameters and design decisions:

* Number of layers
* Kernel sizes
* Skip connections
* Width/depth of the network
* Type of blocks (conv, attention, etc.)

This manual process is **time-consuming** and often **sub-optimal**.
NAS automates this design process.

---

## 2. General Workflow of NAS

The process can be broken down into **three major components**:

| Component               | Role                                                                                       |
| ----------------------- | ------------------------------------------------------------------------------------------ |
| **Search Space**        | Defines *what* architectures can be explored (e.g., types of layers, connections, etc.)    |
| **Search Strategy**     | Defines *how* architectures are sampled and improved (e.g., RL, evolution, gradient-based) |
| **Evaluation Strategy** | Defines *how* to measure the performance (e.g., train model fully, or use proxy training)  |

---

### (a) **Search Space**

This specifies the **building blocks** and **rules** for generating architectures.

Example:

* Convolution types: 3×3, 5×5, depthwise conv, etc.
* Skip connections allowed or not.
* Number of channels, etc.

Formally, NAS tries to find:
$$
\arg\min_{A \in \mathcal{A}} ; \mathcal{L}_{val}(w^*(A), A)
$$

where

$$
w^*(A) = \arg\min_{w} ; \mathcal{L}_{train}(w, A)
$$

Here:

* $A$ = architecture from search space $\mathcal{A}$
* $w$ = its weights
* $\mathcal{L}_{train}$ = training loss
* $\mathcal{L}_{val}$ = validation loss

So NAS must **optimize both architecture and its weights**.

---

### (b) **Search Strategy**

How to explore the search space efficiently.

#### Common Strategies:

1. **Reinforcement Learning (RL) based NAS**

   * A controller (e.g., RNN) generates architectures.
   * Reward = validation accuracy.
   * Example: **NASNet** (Zoph & Le, 2017).

2. **Evolutionary Algorithms**

   * Start with random architectures.
   * Mutate and recombine top performers.
   * Example: **AmoebaNet** (Real et al., 2019).

3. **Gradient-Based (Differentiable NAS)**

   * Represent architecture choices as continuous parameters.
   * Optimize with gradient descent.
   * Example: **DARTS (Differentiable Architecture Search)**.

---

### (c) **Evaluation Strategy**

Training each candidate model fully is **expensive**.
So evaluation uses approximations:

* **Early stopping:** Train a few epochs only.
* **Weight sharing:** Train a “supernet” that contains all sub-architectures (e.g., **ENAS**).
* **Performance prediction:** Use a small model to predict performance without full training.

---

## 3. Key NAS Variants

| Method                              | Type                            | Example Paper                    |
| ----------------------------------- | ------------------------------- | -------------------------------- |
| **RL-based**                        | Discrete search                 | NASNet (2017)                    |
| **Evolutionary**                    | Discrete search                 | AmoebaNet (2019)                 |
| **Differentiable (Gradient-based)** | Continuous relaxation           | DARTS (2018)                     |
| **One-shot NAS**                    | Weight sharing                  | ENAS (2018), ProxylessNAS (2019) |
| **Hardware-aware NAS**              | Adds latency/energy constraints | FBNet, MnasNet                   |

---




# **2. What Is a “Design Space”?**

Traditionally, people talk about **a single architecture**:

* ResNet-50
* MobileNetV2
* EfficientNet-B0
* ViT-Base
* etc.

A **Design Space** is not one model — it is a **parametrized family of models** that follow a set of rules.

A design space defines:

1. Which hyperparameters are allowed
2. Their ranges
3. How these parameters interact
4. How architecture components can be composed

A model is **one point** inside that space.

---

## **2.1. Why Design Spaces?**

Deep learning is 99% exploration:

* number of layers
* number of channels
* kernel sizes
* expansion ratios
* attention heads
* SE vs non-se
* training schedules
* etc.

But this exploration is normally **ad-hoc**, **messy**, and inefficient.

The RegNet authors realized something:

### Instead of searching for a specific architecture that performs well,

### it is better to design **a structured space** where *every model* is likely to perform well.

This is a huge shift in thinking:

**Search for good *rules*, not for a single good architecture.**

---

## **2.2. Analogy**

Think of a design space as:

A **blueprint** defining the rules for building a whole family of houses:

* same materials
* same construction rules
* same constraints

But each house may vary in:

* number of rooms
* size
* layout
* number of floors

In deep learning:

* A model = a house
* A design space = the blueprint for building all houses

---

## **2.3. Why Design Spaces Improve Model Quality**

The idea is that **architecture structure matters more than individual hyperparameters**.

Example:

If models perform better when:

* the number of channels increases smoothly
* bottleneck ratios stay in {1, 2, 4}
* group sizes are bounded
* depth per stage is not chaotic

Then these principles should be **built into the design space**.

This ensures:

* all sampled models are good
* scaling up/down is consistent
* no random weird models exist
* training and inference are efficient

---

## **2.4. How RegNet Discovered a Better Design Space**

The authors conducted NAS (Neural Architecture Search) but instead of using the best architecture, they analyzed **population statistics**:

### **2.4.1 Key empirical findings**

From thousands of high-performing models:

1. Width increases approximately **linearly**
2. Depth per stage follows **simple distributions**
3. Group sizes are **stable**
4. Bottleneck ratios cluster around **few values**

These “laws” inspired the **RegNet design space**, which formalizes them.

---

## **2.5. Properties of a Good Design Space**

According to the RegNet paper, a good design space should have:

### **2.5.1 Predictability**

A small change in parameters should produce a small change in model structure.

### **2.5.2 Regularity**

Smooth transitions in:

* width
* depth
* kernel size
* attention heads

### **2.5.3 Scalability**

Models can be:

* very small (1 GFLOP)
* medium (10 GFLOPs)
* huge (100+ GFLOPs)

… while still obeying the same rules.

### **2.5.4 Consistency**

All models in the space should share a similar look:

* architecture layout
* block type
* stage structure

### **2.5.5 Efficiency**

Models from the space should be:

* easy to implement
* friendly to hardware (tensor cores, GPUs)
* well-behaved during training

---

## **2.6. Design Space vs Neural Architecture Search (NAS)**

NAS tries to find:

**a single best architecture**

RegNet’s philosophy is:

**discover general principles that apply to an entire family of models**

The advantages:

| NAS                              | Design Space                          |
| -------------------------------- | ------------------------------------- |
| very expensive                   | fast, cheap                           |
| produces irregular architectures | produces regular, clean architectures |
| hard to scale                    | naturally scalable                    |
| architecture may not generalize  | rules apply across scales             |

Design spaces use **statistical analysis** from NAS but convert insights into structured rules.

---

## **2.7. Examples of Design Spaces in Other Models**

### **2.7.1 EfficientNet**

Defines a scaling space with 3 dimensions:

* depth
* width
* resolution

This is a **scaling design space**.

### **2.7.2 MobileNetV3**

Defines a space of:

* inverted bottlenecks
* kernel sizes {3, 5, 7}
* squeeze-excitation usage

Then searched the best combination.

### **2.7.3 ConvNeXt**

Defines a space of:

* patch size
* depth per stage
* spatial downsampling pattern

### **2.7.4 ViT**

Defines a transformer design space:

* depth
* hidden size
* MLP expansion
* number of heads

---

## **2.8. The RegNet Design Space (Key Components)**

RegNet design space uses:

1. Initial width $ w_0 $
2. Width slope $ \Delta w $
3. Width quantization
4. Bottleneck ratio $ b $
5. Group size $ g $
6. Depth (number of blocks)

These parameters uniquely define:

* stage widths
* stage depths
* block widths
* growth pattern

A model in the RegNet family is simply one sample from this space.

---

## **2.9. Why This Matters in Practice**

### **2.9.1 Practical benefits**

* You can scale models cleanly (RegNet-400MF, 1.6GF, 3.2GF, 8GF, 16GF, 32GF)
* Models are hardware-friendly (regular channel sizes)
* No expensive architecture search needed
* Architectures are interpretable

### **2.9.2 Modern deep-learning perspective**

Design spaces represent the idea that:

**Good architectures follow structural laws.
Our job is to discover those laws, not random architectures.**

RegNet proved that simple rules outperform many NAS-generated models.

---

## **2.10. Summary**

A **Design Space** is a structured, rule-based framework that defines *families of architectures* instead of individual networks.

RegNet found that top CNNs share simple statistical patterns, and converted those patterns into:

* smooth width curves
* fixed bottleneck ratios
* stable group sizes
* predictable stage transitions

Sampling any architecture from this space gives a strong model with excellent compute–accuracy tradeoffs.

---


