### 6.894 Accelerated Computing Lecture 3: Memory

Jonathan Ragan-Kelley Hii



Main Memory

# A Conventional Memory Hierarchy

Tradeoff: small, fast, close vs. large, slow, far

Fundamental constraint: large memories are farther away, on average



# A Conventional Memory Hierarchy

Tradeoff: small, fast, close vs. large, slow, far

Fundamental constraint: large memories are farther away, on average

DRAM chips

Main Memory







**GPU** Main Memory (20 GB)

### How can we get more bandwidth?

- Faster bus frequency
- Wider interface (parallelism)



Capacity: 1-4 GBytes x n chips

} 25.6 GB/sec

Interface: 32 bits
Speed: 6.4 GT/s

(DDR5-6400)

### Increasing memory bus frequency

#### Tradeoff:

```
faster
signaling
 shorter
  wires
  fewer
  chips
   less
capacity
```



Capacity: 4 GBytes x 1 chips
Interface: 32 bits
Speed: 18 GT/s
(GDDR6)

Chips
72 GB/sec



Capacity: 1-4 GBytes x n chips Interface: 32 bits Speed: 6.4 GT/s 32.6 GB/sec (DDR5-6400)

### Increasing memory bus width (parallelism!)

Aggregate: 5 chips

Capacity: 20 GBytes

Interface: 160 bits 360 GB/sec

Speed: 18 GT/s

(GDDR6)

Limit: processor pins

Practical: ~384-512 bits

(12-16 chips)

DRAM GDDR6

DRAM GDDR6

DRAM GDDR6 Processor

DRAM GDDR6

**DRAM**GDDR6

## Increasing memory bus width further: package-level integration





Increasing memory bus width further:

package-level integration

#### Per-module:

Capacity: 24 GBytes

Interface: 2048 bits } 819 GB/sec

Speed: 3.2 GT/s

(HBM3)

#### Aggregate:

Capacity: 96 GBytes

Interface: 8192 bits } 3.3 TB/sec



# High bandwidth requires wide memory interfaces

How can we keep them fed?

DRAM GDDR6

DRAM GDDR6 DRAM GDDR6 DRAM GDDR6

DRAM GDDR6 DRAM GDDR6

**DRAM**GDDR6

DRAM GDDR6

GDDR6

GDDR6



GDDR6

GDDR6

GDDR6

GDDR6







DRAM GDDR6 DRAM GDDR6 DRAM GDDR6

DRAM GDDR6

DRAM GDDR6 **DRAM**GDDR6









# Striping memory across channels generally gives higher bandwidth

