## 1. What is MBConv?

**MBConv (Mobile Inverted Bottleneck Convolution)** was introduced in **MobileNetV2** and reused in **EfficientNet**.

It’s called *inverted* because:

* A normal bottleneck first **reduces** channels, then applies convolution.
* MBConv first **expands** channels, does computation, and then **projects back** to fewer channels.

---

### MBConv block structure

1. **Expansion (1×1 convolution)**
   Expands from input channels $C_{in}$ to $t \times C_{in}$, where $t$ is the **expansion factor** (usually 6).

   $$
   X_{expand} = \text{ReLU6}(\text{BN}(\text{Conv}_{1\times1}(X_{in})))
   $$

2. **Depthwise convolution (3×3 convolution per channel)**
   Applies spatial convolution **independently** for each channel.

   $$
   X_{depth} = \text{ReLU6}(\text{BN}(\text{ConvDepthwise}_{3\times3}(X_{expand})))
   $$

3. **Projection (1×1 convolution)**
   Reduces back to $C_{out}$ channels.

   $$
   X_{out} = \text{BN}(\text{Conv}_{1\times1}(X_{depth}))
   $$

4. **Skip connection (optional)**
   If stride = 1 and $C_{in} = C_{out}$:
   $$
   Y = X_{in} + X_{out}
   $$

---

**MBConv = Expansion (1×1) → Depthwise (3×3) → Projection (1×1)**


**Depthwise Separable = Depthwise (3×3) → Pointwise (1×1)**

So MBConv is a **generalized and more expressive** version of Depthwise Separable Conv.

---

## 2. Numerical Example

Let’s take a **tiny example** to see the shapes.

| Parameter        | Symbol       | Value |
| ---------------- | ------------ | ----- |
| Input size       | $H \times W$ | 8 × 8 |
| Input channels   | $C_{in}$     | 4     |
| Output channels  | $C_{out}$    | 4     |
| Expansion factor | $t$          | 6     |
| Kernel size      | $k$          | 3     |
| Stride           | 1            |       |

---

#### Step 1: Expansion (1×1 conv)

$$
C_{expand} = t \times C_{in} = 6 \times 4 = 24
$$

Output tensor shape:
$$
[8, 8, 24]
$$

---

#### Step 2: Depthwise 3×3 conv

Each of the 24 channels gets its own 3×3 filter → no channel mixing.

Output tensor shape:
$$
[8, 8, 24]
$$

(assuming padding = 1, stride = 1)

---

#### Step 3: Projection (1×1 conv)

Reduces back to 4 channels:

$$
[8, 8, 24] \xrightarrow{\text{Conv1×1}} [8, 8, 4]
$$

---

#### Step 4: Skip connection

Since stride = 1 and input/output channels are the same (4), we add:

$$
Y = X_{in} + X_{out}
$$

Final output shape:
$$
[8, 8, 4]
$$




#### Python Code

```python
import torch

input = torch.randn(1, 4, 8, 8)  # [B, Cin, H, W]
expansion_factor = 6
```

#### 1. Expansion (1×1 convolution)

```python
conv1x1 = torch.nn.Conv2d(in_channels=4,
                          out_channels=expansion_factor * 4,  # 24
                          kernel_size=1,
                          stride=1)
output_expanded = conv1x1(input)
print(output_expanded.shape)  # [1, 24, 8, 8]
```

This expands the number of channels:
$$
C_{out} = t \times C_{in} = 6 \times 4 = 24
$$

---

#### 2. Depthwise convolution (3×3)

```python
conv_depthwise = torch.nn.Conv2d(in_channels=expansion_factor * 4,
                                 out_channels=expansion_factor * 4,
                                 kernel_size=3,
                                 stride=1,
                                 padding=1,          # keep size same
                                 groups=expansion_factor * 4)  # depthwise
output_depthwise = conv_depthwise(output_expanded)
print(output_depthwise.shape)  # [1, 24, 8, 8]
```

Each channel is convolved **independently** (since `groups=in_channels`).

---

#### 3. Projection (1×1 convolution)

```python
conv_projection = torch.nn.Conv2d(in_channels=expansion_factor * 4,
                                  out_channels=4,
                                  kernel_size=1,
                                  stride=1)
output_projected = conv_projection(output_depthwise)
print(output_projected.shape)  # [1, 4, 8, 8]
```

This projects back down to the original number of channels.

---

#### 4. Optional skip connection

If stride = 1 and `C_in == C_out`, you can add:

```python
output = input + output_projected
print(output.shape)  # [1, 4, 8, 8]
```

---

#### ✅ Summary of shapes

| Stage      | Operation | Input shape   | Output shape  | Parameters     |
| ---------- | --------- | ------------- | ------------- | -------------- |
| Expansion  | 1×1 conv  | [1, 4, 8, 8]  | [1, 24, 8, 8] | 4×24×1×1 = 96  |
| Depthwise  | 3×3 conv  | [1, 24, 8, 8] | [1, 24, 8, 8] | 24×3×3 = 216   |
| Projection | 1×1 conv  | [1, 24, 8, 8] | [1, 4, 8, 8]  | 24×4×1×1 = 96  |
| **Total**  |           |               |               | **408 params** |

---




## 4. Parameter Comparison

Let’s roughly compare MBConv vs Depthwise-Separable Conv for the same example.

| Layer type                         | Parameters                       |
| ---------------------------------- | -------------------------------- |
| Expansion 1×1 conv                 | $1×1×4×24 = 96$                  |
| Depthwise 3×3 conv                 | $3×3×24 = 216$                   |
| Projection 1×1 conv                | $1×1×24×4 = 96$                  |
| **Total MBConv**                   | **408**                          |
| Depthwise-separable (no expansion) | $3×3×4 + 1×1×4×4 = 36 + 16 = 52$ |

