# INTRODUCTION TO MEMORY SYSTEW Project Exam Help

https://tutorcs.com

Lecturer: Hui Annie Guo WeChat: cstutorcs

h.guo@unsw.edu.au

K17-501F

#### Lecture overview

- Topics
  - Memory technology
    - SRAM Assignment Project Exam Help
    - DRAM
    - DISK https://tutorcs.com

WeChat: cstutorcs

- Suggested reading
  - H&P Chapter 5.2

## **Memory technologies**

- Volatile memory
  - Power is needed to maintain the stored information
  - SRAM: Static Random Access Memory
    - Low density, expensive, Project Exam Help
    - Static: content will last "forever" (until power is off) https://tutorcs.com
  - DRAM: Dynamic Random Access Memory
    - High density, weaphstawcstutorcs
    - Dynamic: need to be "refreshed" regularly
- Non-volatile memory
  - Data stored will stay even power is off
    - Disk
    - Flash

## A typical SRAM



- Write Enable is http://www.welly.active-1000 (WE\_L)
- D<sub>in</sub> and D<sub>out</sub> are combined to save pins:
  - A new control signal, output enable (OE\_L) is needed
    - When WE\_L is asserted (low), OE\_L is de-asserted (high)
      - D serves as the data input pin
    - When WE\_L is de-asserted (High), OE\_L is asserted (Low)
      - D is the data output pin
    - Both WE\_L and OE\_L are asserted:
      - Result is unknown. Not allowed!

## A typical SRAM



- Write Enable is http://www.welly.active-1000 (WE\_L)
- D<sub>in</sub> and D<sub>out</sub> are combined to save pins:
  - When WE\_L is asserted (low), OE\_L is deasserted (high)
    - D serves as the data input pin
  - When WE\_L is deasserted (High), OE\_L is asserted (Low)
    - D is the data output pin
  - Both WE\_L and OE\_L are asserted:
    - Result is unknown. Don't do that!!!

#### A 16-word x 4-bit SRAM structure



#### 6-transistor SRAM cell



4. Sense Amp on the column detects the difference between the two bit lines, and has the bit lines charged and discharged with the cell value

#### **DRAM**

- Offers higher capacity than SRAM
- Can be structured with multiple banks
- Each bank contains rowstand each prow may have multiple word columns

https://tutorcs.com



## DRAM (cont.)

 The DRAM interface usually has a separate data bus and command bus. They are shared by the multiple banks

 Assignment Project Exam Help

 Data bus: for transferring data to and from memory

· Command bulsttfor/stertolingcommands and addresses.

WeChat: cstutorcs



## A typical DRAM bank



- · Control signals (RASPS; CASQI, WEQI, OE\_L) are all active low
- Din and Dout are combined (D):
  - When WE\_L is asserted (low), 65 Lutistee asserted (high)
    - D serves as the data input pin
  - When WE\_L is de-asserted (high), OE\_L is asserted (low)
    - · D is the data output pin
- Row and column addresses share the same pins (A)
  - Controlled by row/column address strobe
    - RAS\_L goes low: input to A is latched in as row address
    - CAS\_L goes low: input to A is latched in as column address

#### 1-transistor DRAM cell

#### Write:

- 1. Drive bit line
- 2. Select row
- Read:

Assignment Project Exam Help

bit

row select

- 1. Precharge bit lines
- 2. Select row https://tutorcs.com
- 3. Cell and bit lines share charges
- 4. Sense (sense hat: cstutorcs
- 5. Write: restore the value

#### Refresh

- do a dummy read to every cell (in a row)
  - power hungry and performance unwise
- Usually done by special hardware refresh control component

## DRAM access cycle

- DRAM is accessed on a row basis.
- Accessing a memory location (e.g. a word column) follows the following access cycle:
  - Open the row related to the memory location
    - When memory is ready (i.e. the bits lines have been precharged)
  - WeChat: cstutorcs
     Access the column (read/write)
  - Precharge (bit lines) for next different row access

| row access | column access | precharge |
|------------|---------------|-----------|
|------------|---------------|-----------|

## **DRAM** performance

- The time for a memory access is
  - Trac + Tcac + Tprecharge

    - rac: row access
       cac: column access

      Project Exam Help
- DRAM performanceuis affected by
  - memory design
  - · memory operation to the memo
  - data transfer
- The performance can be improved by
  - Increasing memory data access throughput
  - Increasing data transfer speed over the memory bus.

### Increasing data throughput (1)

- Burst mode
  - Consecutive accesses without need to send the address of each word in the row
    - · The time Acisigm accets Project Each wold is saved

https://tutorcs.com



### Increasing data throughput (2)

Multi-bank interleaved access



### Increasing data throughput (3)

- Wide memory data bus
  - A bus line can transfer multiple words
  - · See next slide for comparison Exam Help

https://tutorcs.com

WeChat: cstutorcs

#### Three memory access organizations

memory organization



### **Example**

#### For a given memory:

- 1 cycle to send address,
- 6 cycles to access memory to fetch a word,
- 1 cycle to sengment Project Exam Help
- To get a block of 4 words
  https://tutorcs.com
  = 4 x (1+6+1) = 32 cycles
  - Wide Mem: = 8 cycles
  - = 11 cycles Interleaved Mem\*:
    - Any limitations?

| Address | Bank 0 | Address | Bank 1 | Address | Bank 2 | Address | Bank 3 |
|---------|--------|---------|--------|---------|--------|---------|--------|
| 0       |        | 1 [     |        | 2       |        | 3       |        |
| 4       |        | 5       |        | 6       |        | 7       |        |
| 8       |        | 9       |        | 10      |        | 11      |        |
| 12      |        | 13      |        | 14      |        | 15      |        |

## Increasing data transfer rate

- Double data rate (DDR) DRAM
  - Transfer on rising and falling clock edges
- · Quad data rate (QRR) DRAMam Help
  - Separate DDR input and output ports https://tutorcs.com

WeChat: cstutorcs

#### Hard disk drive

- A hard disk drive (HDD) has one or a set of hard platters
- Platters are circular disks
  - · made of a non magnetic insterial Help
    - typically aluminium alloy, glass or ceramic
  - · coated with a thin layer of magnetic material

 used to hold data WeChat: cstutorcs



## Hard disk drive (cont.)

- Platters are partitioned into tracks and sectors
  - tracks are concentric circles
  - sectors are pie shaped wedges
- · Data is storeightigittally just heaforthelpf tiny magnetized fields on the platter each field represents a bit second

  - · each field has two magnetic orientations

• represent either '0' or



## Hard disk drive (cont.)

- Platter can spin
  - E.g. at 3600 or 7200 rpm



- able to move the heads from the hub to the edge of drive.
   WeChat: cstutorcs
  - E.g. at 50 times/per second.
- To read/write, the head should be placed over the related location.



## Hard disk drive (cont.)

- The electronic controller controls
  - the read/write mechanism and
  - · the motopthat spinst the platters in Help
- The electronic circuits https://tutorcs.com
  - turn bytes into magnetic domains (writing).
  - assemble the maghetic domains on the drive into bytes (reading)

## **HDD** performance

 There are two ways to measure the performance of a hard disk:



- seek time
  - the amount of time that when the first byte of the file is sent to the processor. Times between 10 and 120 milliseconds are common.
- Rotational latency hat: cstutores
  - the time required for the first bit of the data sector pass through the read/write head. Times between 2 to 4 ms are common.
- data rate
  - the number of bytes per second that the drive can deliver to CPU. Rates between 5 and 40 megabytes per second are common.

## **Seagate Cheetah 15k.4 DISC Drive**

| Geometry attribute         | Value              |                         |            |
|----------------------------|--------------------|-------------------------|------------|
| Platters Assi              | ignment Project    | ct Exam Help            |            |
| Surfaces (read/write heads | 8                  |                         |            |
| Surface diameter           | h++35 in//++++0100 | Performance attribute   | Value      |
| Sector size                | https://tutorcs    | Rotational rate         | 15,000 RPM |
| Zones                      | 15                 | Avg. rotational latency | 2 ms       |
| Cylinders                  | WeChat: cstu       | tors seek time          | 4 ms       |
| Recording density (max)    | 628,000 bits/in.   | Sustained transfer rate | 58-96 MB/s |
| Track density              | 85,000 tracks/in.  |                         |            |
| Areal density (max)        | 53.4 Gbits/sq. in. |                         |            |
| Formatted capacity         | 146.8 GB           |                         |            |

#### SSD\*

- Solid State Disk
  - Storage technology based on Flash memory

Assignment Project Exam Help

https://tutorcs.com

WeChat: cstutorcs

## Flash memory\*

- Solid state storage device
  - Everything is electronic (no mechanical moving parts involved in accessing this memory)
  - A type of EEPROM device (Electronically Erasable Programmable Read Only Memory). It has a grid of columns and rows with a cell at each intersection

WeChat: cstutorcs

## Flash memory\*

- A cell is a modified transistor with two gates
  - floating gate and control gate
- The two gates are separated from each other Assignment Project Exam Help by a thin oxide layer.
- The floating date: hit ks's to the wordline through the control gate with a small threshold
  - · If linked, the cell has a value of 1, otherwise, 0



## Flash memory\*

- A blank flash memory has all of the gates linked, giving each cell a value of 1
- To erase 1, any electrical charge from the bitline is applied to the floating gate.

The negative electrons act as a barrier (large threshold) between the control gate and floating gate.

COMP3211/9211 Week 4-2 **30** 

## Processor-DRAM memory speed gap



#### Impact of the speed gap on performance

- Suppose a processor executes at
  - clock rate = 2 GHz
  - CPI = 1.1
  - 50% arith/logic 30% ld/st 20% control Help
- Suppose data memory operations get 50 cycle penalty https://tutorcs.com
  - Pipeline has to wait 50 cycles for each memory access WeChat: cstutorcs
- CPI
  - = ideal CPI + average stalls per instruction
    - $= 1.1 + 0.30 \times 50$
    - = 16.1
- Because of the slowness of memory, on average, the pipeline outputs every 16 clock cycles!