# **COA LAB Assignment-4**

- 21CS01066, 21CS02009, 21CS01038

Config: SM2\_GTX480 Scheduler: gto L1D\_total\_cache\_misses = 29528 L1D\_total\_cache\_miss\_rate = 0.4075 L2 total cache misses = 26L2\_total\_cache\_miss\_rate = 0.0004 gpgpu\_simulation\_time = 0 days, 0 hrs, 0 min, 11 sec (11 sec) gpgpu\_simulation\_rate = 147790 (inst/sec) gpgpu\_simulation\_rate = 4632 (cycle/sec) gpgpu\_silicon\_slowdown = 151122x Kernel Executed 8 times Result stored in result.txt GPGPU-Sim: \*\*\* exit detected \*\*\* Scheduler: lrr L1D total cache misses = 29603L1D\_total\_cache\_miss\_rate = 0.4085 L2\_total\_cache\_misses = 26 L2\_total\_cache\_miss\_rate = 0.0004 gpgpu simulation time = 0 days, 0 hrs, 0 min, 11 sec (11 sec) gpgpu\_simulation\_rate = 147790 (inst/sec) gpgpu\_simulation\_rate = 4698 (cycle/sec) gpgpu silicon slowdown = 148999x Kernel Executed 8 times Result stored in result.txt GPGPU-Sim: \*\*\* exit detected \*\*\* Scheduler: tla L1D total cache misses = 29676 L1D\_total\_cache\_miss\_rate = 0.4096 L2\_total\_cache\_misses = 26 L2\_total\_cache\_miss\_rate = 0.0004 gpgpu\_simulation\_time = 0 days, 0 hrs, 0 min, 12 sec (12 sec) gpgpu\_simulation\_rate = 135474 (inst/sec) gpgpu\_simulation\_rate = 4383 (cycle/sec) gpgpu\_silicon\_slowdown = 159707x

Kernel Executed 8 times Result stored in result.txt

-----

# Config: SM3\_KEPLER\_TITAN

-----

#### Scheduler: gto

L1D\_total\_cache\_accesses = 0

L1D\_total\_cache\_misses = 0

L2 total cache misses = 26

L2\_total\_cache\_miss\_rate = 0.0003

gpgpu\_simulation\_time = 0 days, 0 hrs, 0 min, 20 sec (20 sec)

gpgpu\_simulation\_rate = 81284 (inst/sec)

gpgpu\_simulation\_rate = 3212 (cycle/sec)

gpgpu\_silicon\_slowdown = 260585x

Kernel Executed 8 times

Result stored in result.txt

GPGPU-Sim: \*\*\* exit detected \*\*\*

#### Scheduler: lrr

L1D\_total\_cache\_accesses = 0

 $L1D_{total_cache_misses} = 0$ 

L2 total cache misses = 26

L2\_total\_cache\_miss\_rate = 0.0003

gpgpu\_simulation\_time = 0 days, 0 hrs, 0 min, 21 sec (21 sec)

gpgpu\_simulation\_rate = 77414 (inst/sec)

gpgpu\_simulation\_rate = 3108 (cycle/sec)

gpgpu\_silicon\_slowdown = 269305x

Kernel Executed 8 times

Result stored in result.txt

GPGPU-Sim: \*\*\* exit detected \*\*\*

#### Scheduler: tla

L1D\_total\_cache\_accesses = 0

L1D total cache misses = 0

L2\_total\_cache\_accesses = 90993

L2\_total\_cache\_misses = 26

gpgpu\_simulation\_time = 0 days, 0 hrs, 0 min, 22 sec (22 sec)

gpgpu\_simulation\_rate = 73895 (inst/sec)

gpgpu\_simulation\_rate = 2986 (cycle/sec)

gpgpu\_silicon\_slowdown = 280308x

Kernel Executed 8 times

Result stored in result.txt

-----

# Config: SM6\_TITANX

\_\_\_\_\_

### Scheduler: gto

L1D total cache accesses = 0

 $L1D_{total_cache_misses} = 0$ 

L2\_total\_cache\_miss\_rate = 0.002

gpgpu\_simulation\_time = 0 days, 0 hrs, 0 min, 38 sec (38 sec)

gpgpu\_simulation\_rate = 42781 (inst/sec)

gpgpu\_simulation\_rate = 3752 (cycle/sec)

gpgpu\_silicon\_slowdown = 377665x

Kernel Executed 8 times

Result stored in result.txt

GPGPU-Sim: \*\*\* exit detected \*\*\*

#### Scheduler: lrr

L1D\_total\_cache\_accesses = 0

L1D total cache misses = 0

L2\_total\_cache\_miss\_rate = 0.002

gpgpu\_simulation\_time = 0 days, 0 hrs, 0 min, 40 sec (40 sec)

gpgpu\_simulation\_rate = 40642 (inst/sec)

gpgpu\_simulation\_rate = 3581 (cycle/sec)

gpgpu silicon slowdown = 395699x

Kernel Executed 8 times

Result stored in result.txt

# Scheduler: tla

L1D\_total\_cache\_accesses = 0

 $L1D_{total_cache_misses} = 0$ 

L2 total cache miss rate = 0.002

gpgpu\_simulation\_time = 0 days, 0 hrs, 0 min, 41 sec (41 sec)

gpgpu\_simulation\_rate = 39651 (inst/sec)

gpgpu\_simulation\_rate = 3504 (cycle/sec)

gpgpu\_silicon\_slowdown = 404394x

Kernel Executed 8 times

Result stored in result.txt

\_\_\_\_\_

Config: SM7\_QV100

-----

#### Scheduler: gto

L1D\_total\_cache\_misses = 33266

L1D total cache miss rate = 0.3671

L2\_total\_cache\_accesses = 33197

 $L2\_total\_cache\_misses = 0$ 

L2\_total\_cache\_miss\_rate = 0.0000

gpgpu\_simulation\_time = 0 days, 0 hrs, 1 min, 14 sec (74 sec)

gpgpu\_simulation\_rate = 21968 (inst/sec)

gpgpu\_simulation\_rate = 1645 (cycle/sec)

gpgpu\_silicon\_slowdown = 688145x

Kernel Executed 8 times

Result stored in result.txt

GPGPU-Sim: \*\*\* exit detected \*\*\*

#### Scheduler: lrr

L1D\_total\_cache\_misses = 33258

L1D total cache miss rate = 0.3670

L2\_total\_cache\_accesses = 33185

 $L2\_total\_cache\_misses = 0$ 

L2\_total\_cache\_miss\_rate = 0.0000

gpgpu\_simulation\_time = 0 days, 0 hrs, 1 min, 19 sec (79 sec)

gpgpu\_simulation\_rate = 20578 (inst/sec)

gpgpu\_simulation\_rate = 1543 (cycle/sec)

gpgpu\_silicon\_slowdown = 733635x

Kernel Executed 8 times

Result stored in result.txt

GPGPU-Sim: \*\*\* exit detected \*\*\*

#### Scheduler: tla

L1D\_total\_cache\_misses = 33257

L1D\_total\_cache\_miss\_rate = 0.3670

L2\_total\_cache\_accesses = 33191

 $L2\_total\_cache\_misses = 0$ 

L2\_total\_cache\_miss\_rate = 0.0000

gpgpu\_simulation\_time = 0 days, 0 hrs, 1 min, 22 sec (82 sec)

gpgpu\_simulation\_rate = 19825 (inst/sec)

gpgpu\_simulation\_rate = 1491 (cycle/sec)

gpgpu\_silicon\_slowdown = 759221x

Kernel Executed 8 times

Result stored in result.txt

Config: SM7\_TITANV

\_\_\_\_\_

#### Scheduler: gto

L1D\_total\_cache\_misses = 33280

L1D\_total\_cache\_miss\_rate = 0.3672

L2\_total\_cache\_miss\_rate = 0.002

gpgpu\_simulation\_time = 0 days, 0 hrs, 0 min, 35 sec (35 sec)

gpgpu\_simulation\_rate = 46448 (inst/sec)

gpgpu\_simulation\_rate = 1200 (cycle/sec)

gpgpu\_silicon\_slowdown = 1000000x

Kernel Executed 8 times

Result stored in result.txt

GPGPU-Sim: \*\*\* exit detected \*\*\*

#### Scheduler: lrr

L1D\_total\_cache\_misses = 33293

L1D\_total\_cache\_miss\_rate = 0.3674

L2\_total\_cache\_miss\_rate = 0.002

gpgpu\_simulation\_time = 0 days, 0 hrs, 0 min, 35 sec (35 sec)

gpgpu\_simulation\_rate = 46448 (inst/sec)

gpgpu\_simulation\_rate = 1202 (cycle/sec)

gpgpu silicon slowdown = 998336x

Kernel Executed 8 times

Result stored in result.txt

GPGPU-Sim: \*\*\* exit detected \*\*\*

#### Scheduler: tla

L1D total cache misses = 33299

L1D total cache miss rate = 0.3674

L2\_total\_cache\_miss\_rate = 0.002

gpgpu\_simulation\_time = 0 days, 0 hrs, 0 min, 33 sec (33 sec)

gpgpu\_simulation\_rate = 49263 (inst/sec)

gpgpu\_simulation\_rate = 1288 (cycle/sec)

gpgpu\_silicon\_slowdown = 931677x

Kernel Executed 8 times

Result stored in result.txt

GPGPU-Sim: \*\*\* exit detected \*\*\*

-----

```
Config : SM75_RTX2060
```

#### Scheduler: gto

```
L1D_total_cache_misses = 33273
L1D_total_cache_miss_rate = 0.3671
L2_total_cache_accesses = 33188
L2_total_cache_misses = 0
L2_total_cache_miss_rate = 0.0000
gpgpu_simulation_time = 0 days, 0 hrs, 0 min, 32 sec (32 sec)
gpgpu_simulation_rate = 50803 (inst/sec)
gpgpu_simulation_rate = 3807 (cycle/sec)
gpgpu_silicon_slowdown = 358550x
Kernel Executed 8 times
Result stored in result.txt
GPGPU-Sim: *** exit detected ***
```

#### Scheduler: lrr

```
L1D_total_cache_misses = 33274
L1D_total_cache_miss_rate = 0.3671L2_total_cache_accesses = 33196
L2_total_cache_misses = 0
L2_total_cache_miss_rate = 0.0000
gpgpu_simulation_time = 0 days, 0 hrs, 0 min, 32 sec (32 sec)
gpgpu_simulation_rate = 50803 (inst/sec)
gpgpu_simulation_rate = 3810 (cycle/sec)
gpgpu_silicon_slowdown = 358267x
Kernel Executed 8 times
Result stored in result.txt
GPGPU-Sim: *** exit detected ****
```

#### Scheduler: tla

```
L1D_total_cache_misses = 33302
L1D_total_cache_miss_rate = 0.3675
L2_total_cache_accesses = 33182
L2_total_cache_misses = 0
L2_total_cache_miss_rate = 0.0000
gpgpu_simulation_time = 0 days, 0 hrs, 0 min, 34 sec (34 sec)
gpgpu_simulation_rate = 47814 (inst/sec)
gpgpu_simulation_rate = 3586 (cycle/sec)
gpgpu_silicon_slowdown = 380646x
Kernel Executed 8 times
Result stored in result.txt
GPGPU-Sim: *** exit detected ***
```

\_\_\_\_\_

# Question-1: Ans:

| Configurations        | Warp<br>Schedulers | IPC     | L1D hit rate | L2 hit rate |
|-----------------------|--------------------|---------|--------------|-------------|
| and the second second | GTO                | 31.9063 | 0.5925       | 0.996       |
| SM2_GTX480            | LRR                | 31.458  | 0.5915       | 0.996       |
|                       | TLA                | 30.9089 | 0.5904       | 0.996       |
| SM3_KEPLER_TITAN      | GTO                | 25.3065 | 1            | 0.997       |
|                       | LRR                | 24.9079 | 1            | 0.997       |
|                       | TLA                | 24.7472 | 1            | 0.997       |
| SM6_TITANX            | GTO                | 11.4022 | 1            | 0.9998      |
|                       | LRR                | 11.3493 | 1            | 0.9998      |
|                       | TLA                | 11.3159 | 1            | 0.9998      |
| SM7_QV100             | GTO                | 13.3544 | 0.6329       | 1           |
|                       | LRR                | 13.3364 | 0.633        | 1           |
|                       | TLA                | 13.2965 | 0.633        | 1           |
| SM7_TITANV            | GTO                | 38.7067 | 0.6328       | 0.9998      |
|                       | LRR                | 38.6423 | 0.6326       | 0.9998      |
|                       | TLA                | 38.2477 | 0.6326       | 0.9998      |
|                       | GTO                | 13.3446 | 0.6329       | 1           |
|                       | LRR                | 13.3341 | 0.6329       | 1           |
| SM75_RTX2060          | TLA                | 13.3335 | 0.6325       | 1           |



# **Question-2: Ans:**





# **Question-3:**

Ans:

- a. `<nsets>:<bsize>:<assoc>`: These parameters define the basic properties of the cache.
- b. `<nsets>` is set to 32, which means there are 32 sets in the cache.
- `<bsize>` is set to 128, which implies that each cache block (line) is 128 bytes in size.
- c. `<assoc>` is set to 2048, indicating a high level of associativity. There are 2048 cache lines per set.

# So, the ratio <nsets>:<bsize>:<assoc> would be N:32:128:2048

| Configuration    | L1     | LD     | L2     |        |  |
|------------------|--------|--------|--------|--------|--|
| Configuration    | 32KB   | 8MB    | 32KB   | 8MB    |  |
| SM2_GTX480       | 0.5925 | 0.6841 | 0.996  | 0.6205 |  |
| SM3_KEPLER_TITAN | 1      | 1      | 0.997  | 0.9998 |  |
| SM6_TITANX       | 1      | 1      | 0.9998 | 0.9998 |  |
| SM7_QV100        | 0.6329 | 0.6329 | 1      | 1      |  |
| SM7_TITANV       | 0.6328 | 0.6328 | 0.9998 | 0.9998 |  |
| SM75 RTX2060     | 0.6329 | 0.6527 | 1      | 1      |  |

#### **Question-4:**

Ans:

| Configuration    | Exe.   | DRAM | Reg.  | Tot.  | %EU   | %DRAM | %RF  |
|------------------|--------|------|-------|-------|-------|-------|------|
|                  | Time   |      | Files | power |       |       |      |
| SM2_GTX480       | 28.38  | 0    | 66.28 | 94.21 | 29.1  | 0     | 70   |
| SM3_KEPLER_TITAN | 72.65  | 0    | 6.17  | 78.89 | 92.1  | 0     | 7.8  |
| SM6_TITANX       | 35.98  | 0    | 36.21 | 72.1  | 49.81 | 0     | 50.2 |
| SM7_QV100        | 48.29  | 0    | 73.21 | 120.7 | 40.29 | 0     | 59.9 |
| SM7_TITANV       | 111.32 | 0    | 20.78 | 132.1 | 84.6  | 0     | 15.7 |
| SM75 RTX2060     | 40.41  | 0    | 20.81 | 61.32 | 65.81 | 0     | 34.0 |

• Execution Unit Power Consumption: In general, cache size and execution unit power consumption have a positive correlation. This discovery is illustrated in the context of particular GPU models, where an increase in cache capacity is matched by an increase in the percentage of power allotted to execution units.

•

- DRAM Power Consumption: Because managing a larger cache requires more power, the DRAM's power consumption is positively correlated with cache size. The observable increase in power consumption of dynamic random-access memory (DRAM) and the accompanying rise in the fraction of power given to DRAM can be seen in some graphics processing unit (GPU) models with an 8 MB cache, which supports the aforementioned observation.
- Aggregate Power: In some circumstances, a larger cache results in a higher overall power usage, mostly as a result of the execution units' and DRAM modules' higher power requirements.