Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several deadlocks detected in the release of may 2023 #359

Closed
alberto-ros opened this issue Jun 13, 2023 · 2 comments · Fixed by #363
Closed

Several deadlocks detected in the release of may 2023 #359

alberto-ros opened this issue Jun 13, 2023 · 2 comments · Fixed by #363
Assignees
Labels

Comments

@alberto-ros
Copy link
Collaborator

Describe the bug
I encountered several deadlocks when running the release of May 2023.

To Reproduce
Steps to reproduce the behavior:

  1. Go to branch https://github.com/ChampSim/ChampSim/pull/new/report_deadlock
  2. ./config.sh champsim_config.json
  3. make
  4. Download IPC1 server_038 trace.
  5. ./bin/champsim --warmup_instructions 0 server_038.champsimtrace.xz
  6. See deadlock message

Expected behavior
No deadlock.

Output
WARNING: physical memory size is smaller than virtual memory size.
WARNING: option --warmup_instructions is deprecated. Use --warmup-instructions instead.

*** ChampSim Multicore Out-of-Order Simulator ***
Warmup Instructions: 0
Simulation Instructions: 18446744073709551615
Number of CPUs: 1
Page size: 4096

Off-chip DRAM Size: 4 GiB Channels: 1 Width: 64-bit Data Race: 3200 MT/s
Warmup finished CPU 0 instructions: 0 cycles: 1 cumulative IPC: 0 (Simulation time: 00 hr 00 min 00 sec)
Warmup complete CPU 0 instructions: 0 cycles: 1 cumulative IPC: 0 (Simulation time: 00 hr 00 min 00 sec)
Heartbeat CPU 0 instructions: 10000000 cycles: 3683274 heartbeat IPC: 2.715 cumulative IPC: 2.715 (Simulation time: 00 hr 02 min 00 sec)
Heartbeat CPU 0 instructions: 20000000 cycles: 6724037 heartbeat IPC: 3.289 cumulative IPC: 2.974 (Simulation time: 00 hr 04 min 05 sec)
Heartbeat CPU 0 instructions: 30000003 cycles: 9723637 heartbeat IPC: 3.334 cumulative IPC: 3.085 (Simulation time: 00 hr 06 min 12 sec)
cpu0_PTW MSHR Entry
[cpu0_PTW_MSHR] 0 address: 13ef000 v_address: ffff8017d144f000 translation_level: 0 event_cycle: 18446744073709551615

DEADLOCK! CPU 0 cycle 11983147
IFETCH_BUFFER head instr_id: 34175632 fetched: 2 scheduled: 0 executed: 0 num_reg_dependent: 0 num_mem_ops: 0 event: 10983169
ROB head instr_id: 34175216 fetched: 2 scheduled: 2 executed: 1 num_reg_dependent: 0 num_mem_ops: 1 event: 10983147
Load Queue Entry
[LQ] entry: 12 instr_id: 34175216 address: ffff8017d144f000 fetched_issued: true event_cycle: 10983147

Store Queue Entry
[SQ] entry: 0 instr_id: 34175230 address: ffff8017d1087bef fetched_issued: true event_cycle: 10983151 LQ waiting:
[SQ] entry: 1 instr_id: 34175241 address: ffff8017d1087b80 fetched_issued: true event_cycle: 10983154 LQ waiting:
[SQ] entry: 2 instr_id: 34175243 address: ffff8017d1087b90 fetched_issued: true event_cycle: 10983155 LQ waiting:
[SQ] entry: 3 instr_id: 34175277 address: ffff8017d10058b0 fetched_issued: true event_cycle: 10983159 LQ waiting:
[SQ] entry: 4 instr_id: 34175279 address: ffff8017d1005890 fetched_issued: true event_cycle: 10983159 LQ waiting:
[SQ] entry: 5 instr_id: 34175281 address: ffff8017d10058c0 fetched_issued: true event_cycle: 10983167 LQ waiting:
[SQ] entry: 6 instr_id: 34175283 address: ffff8017d1005800 fetched_issued: true event_cycle: 10983159 LQ waiting:
[SQ] entry: 7 instr_id: 34175285 address: ffff8017d1005810 fetched_issued: true event_cycle: 10983160 LQ waiting:
[SQ] entry: 8 instr_id: 34175286 address: ffff8017d1005820 fetched_issued: true event_cycle: 10983161 LQ waiting:
[SQ] entry: 9 instr_id: 34175287 address: ffff8017d1005830 fetched_issued: true event_cycle: 10983161 LQ waiting:
[SQ] entry: 10 instr_id: 34175288 address: ffff8017d1005840 fetched_issued: true event_cycle: 10983161 LQ waiting:
[SQ] entry: 11 instr_id: 34175289 address: ffff8017d1005850 fetched_issued: true event_cycle: 10983162 LQ waiting:
[SQ] entry: 12 instr_id: 34175290 address: ffff8017d1005860 fetched_issued: true event_cycle: 10983163 LQ waiting:
[SQ] entry: 13 instr_id: 34175291 address: ffff8017d1005870 fetched_issued: true event_cycle: 10983164 LQ waiting:
[SQ] entry: 14 instr_id: 34175292 address: ffff8017d1005880 fetched_issued: true event_cycle: 10983164 LQ waiting:
[SQ] entry: 15 instr_id: 34175293 address: ffff8017d10058a0 fetched_issued: true event_cycle: 10983165 LQ waiting:
[SQ] entry: 16 instr_id: 34175294 address: ffff8017d10058be fetched_issued: true event_cycle: 10983165 LQ waiting:
[SQ] entry: 17 instr_id: 34175295 address: ffff8017d10058ba fetched_issued: true event_cycle: 10983166 LQ waiting:
[SQ] entry: 18 instr_id: 34175296 address: ffff8017d1005890 fetched_issued: true event_cycle: 10983166 LQ waiting:
[SQ] entry: 19 instr_id: 34175297 address: ffff8017d10058c8 fetched_issued: true event_cycle: 10983166 LQ waiting:
[SQ] entry: 20 instr_id: 34175298 address: ffff8017d10058d8 fetched_issued: true event_cycle: 10983169 LQ waiting:18446744073709551615
[SQ] entry: 21 instr_id: 34175299 address: ffff8017d10058dc fetched_issued: true event_cycle: 10983167 LQ waiting:
[SQ] entry: 22 instr_id: 34175300 address: ffff8017d144ee80 fetched_issued: true event_cycle: 10983168 LQ waiting:
[SQ] entry: 23 instr_id: 34175301 address: ffff8017d144ee90 fetched_issued: true event_cycle: 10983168 LQ waiting:
[SQ] entry: 24 instr_id: 34175302 address: ffff8017d144eea0 fetched_issued: true event_cycle: 10983168 LQ waiting:
[SQ] entry: 25 instr_id: 34175307 address: ffff8017d10059c0 fetched_issued: true event_cycle: 10983169 LQ waiting:
[SQ] entry: 26 instr_id: 34175309 address: ffff8017d100588e fetched_issued: true event_cycle: 10983177 LQ waiting:
[SQ] entry: 27 instr_id: 34175311 address: ffff8017d100596e fetched_issued: true event_cycle: 10983179 LQ waiting:
[SQ] entry: 28 instr_id: 34175324 address: ffff8017d1087bd0 fetched_issued: true event_cycle: 10983173 LQ waiting:18446744073709551615
[SQ] entry: 29 instr_id: 34175326 address: ffff8017d1087be0 fetched_issued: true event_cycle: 10983174 LQ waiting:18446744073709551615
[SQ] entry: 30 instr_id: 34175343 address: ffff8017d10058c0 fetched_issued: true event_cycle: 10983181 LQ waiting:
[SQ] entry: 31 instr_id: 34175347 address: ffff8017d10058ac fetched_issued: true event_cycle: 10983184 LQ waiting:
[SQ] entry: 32 instr_id: 34175348 address: ffff8017d10058d0 fetched_issued: true event_cycle: 10983184 LQ waiting:
[SQ] entry: 33 instr_id: 34175361 address: ffff8017d1005890 fetched_issued: true event_cycle: 10983184 LQ waiting:
[SQ] entry: 34 instr_id: 34175364 address: ffff8017d1087c00 fetched_issued: true event_cycle: 10983181 LQ waiting:
[SQ] entry: 35 instr_id: 34175366 address: ffff8017d1087c10 fetched_issued: true event_cycle: 10983186 LQ waiting:
[SQ] entry: 36 instr_id: 34175373 address: ffff8017d1005894 fetched_issued: true event_cycle: 10983186 LQ waiting:
[SQ] entry: 37 instr_id: 34175377 address: ffff8017d1005834 fetched_issued: true event_cycle: 10983187 LQ waiting:
[SQ] entry: 38 instr_id: 34175379 address: ffff8017d1005828 fetched_issued: true event_cycle: 10983189 LQ waiting:
[SQ] entry: 39 instr_id: 34175382 address: ffff8017d1005835 fetched_issued: true event_cycle: 10983188 LQ waiting:
[SQ] entry: 40 instr_id: 34175384 address: ffff8017d100588e fetched_issued: true event_cycle: 10983191 LQ waiting:
[SQ] entry: 41 instr_id: 34175385 address: ffff8017d144eea0 fetched_issued: true event_cycle: 10983190 LQ waiting:
[SQ] entry: 42 instr_id: 34175387 address: ffff8017d1005800 fetched_issued: true event_cycle: 10983191 LQ waiting:
[SQ] entry: 43 instr_id: 34175388 address: ffff8017d15bf8d8 fetched_issued: true event_cycle: 10983191 LQ waiting:
[SQ] entry: 44 instr_id: 34175389 address: ffff8017d15bf8e0 fetched_issued: true event_cycle: 10983189 LQ waiting:
[SQ] entry: 45 instr_id: 34175393 address: ffff8017d15bf8e8 fetched_issued: true event_cycle: 10983193 LQ waiting:
[SQ] entry: 46 instr_id: 34175395 address: ffff8017d15bf8d0 fetched_issued: true event_cycle: 10983191 LQ waiting:
[SQ] entry: 47 instr_id: 34175399 address: ffff8017d1087be0 fetched_issued: true event_cycle: 10983192 LQ waiting:
[SQ] entry: 48 instr_id: 34175401 address: ffff8017d1087bf0 fetched_issued: true event_cycle: 10983195 LQ waiting:
[SQ] entry: 49 instr_id: 34175415 address: ffff8017d15bfd1c fetched_issued: true event_cycle: 10983197 LQ waiting:
[SQ] entry: 50 instr_id: 34175417 address: ffff8017d15bfd2c fetched_issued: true event_cycle: 10983199 LQ waiting:
[SQ] entry: 51 instr_id: 34175427 address: ffff8017d15bf8c0 fetched_issued: true event_cycle: 10983202 LQ waiting:
[SQ] entry: 52 instr_id: 34175433 address: ffff8017d15bf878 fetched_issued: true event_cycle: 10983203 LQ waiting:
[SQ] entry: 53 instr_id: 34175458 address: ffff8017d1087cbc fetched_issued: true event_cycle: 10983209 LQ waiting:
[SQ] entry: 54 instr_id: 34175477 address: ffff8017d1087bf0 fetched_issued: true event_cycle: 10983207 LQ waiting:
[SQ] entry: 55 instr_id: 34175479 address: ffff8017d1087c00 fetched_issued: true event_cycle: 10983214 LQ waiting:
[SQ] entry: 56 instr_id: 34175480 address: ffff8017d1087c10 fetched_issued: true event_cycle: 10983214 LQ waiting:
[SQ] entry: 57 instr_id: 34175495 address: ffff8017d10058c0 fetched_issued: true event_cycle: 10983218 LQ waiting:
[SQ] entry: 58 instr_id: 34175496 address: ffff8017d1005880 fetched_issued: true event_cycle: 10983218 LQ waiting:
[SQ] entry: 59 instr_id: 34175516 address: ffff8017d1087bb0 fetched_issued: true event_cycle: 10983220 LQ waiting:
[SQ] entry: 60 instr_id: 34175518 address: ffff8017d1087bd0 fetched_issued: true event_cycle: 10983223 LQ waiting:
[SQ] entry: 61 instr_id: 34175520 address: ffff8017d1087be0 fetched_issued: true event_cycle: 10983224 LQ waiting:
[SQ] entry: 62 instr_id: 34175522 address: ffff8017d1087bc0 fetched_issued: true event_cycle: 10983224 LQ waiting:
[SQ] entry: 63 instr_id: 34175523 address: ffff8017d1087bf0 fetched_issued: true event_cycle: 10983224 LQ waiting:
[SQ] entry: 64 instr_id: 34175524 address: ffff8017d1087c00 fetched_issued: true event_cycle: 10983225 LQ waiting:
[SQ] entry: 65 instr_id: 34175549 address: ffff8017d144e8d0 fetched_issued: true event_cycle: 10983230 LQ waiting:
[SQ] entry: 66 instr_id: 34175554 address: ffff8017d144e8d4 fetched_issued: true event_cycle: 10983232 LQ waiting:
[SQ] entry: 67 instr_id: 34175566 address: ffff8017d1087b30 fetched_issued: true event_cycle: 10983226 LQ waiting:

LLC MSHR empty
LLC RQ empty
LLC WQ empty
LLC PQ empty

[cpu0_DTLB_MSHR] entry: 0 instr_id: 33720526 address: ffff8017d144f000 v_addr: ffff8017d144f000 type: LOAD event_cycle: 18446744073709551615
cpu0_DTLB RQ empty
cpu0_DTLB WQ empty
cpu0_DTLB PQ empty

cpu0_ITLB MSHR empty
cpu0_ITLB RQ empty
cpu0_ITLB WQ empty
cpu0_ITLB PQ empty

cpu0_L1D MSHR empty
cpu0_L1D RQ empty
cpu0_L1D WQ empty
cpu0_L1D PQ empty
cpu0_L1D RQ empty
cpu0_L1D WQ empty
cpu0_L1D PQ empty

cpu0_L1I MSHR empty
cpu0_L1I RQ empty
cpu0_L1I WQ empty
cpu0_L1I PQ empty

cpu0_L2C MSHR empty
cpu0_L2C RQ empty
cpu0_L2C WQ empty
cpu0_L2C PQ empty
cpu0_L2C RQ empty
cpu0_L2C WQ empty
cpu0_L2C PQ empty

[cpu0_STLB_MSHR] entry: 0 instr_id: 33720526 address: ffff8017d144f000 v_addr: ffff8017d144f000 type: LOAD event_cycle: 18446744073709551615
cpu0_STLB RQ empty
cpu0_STLB WQ empty
cpu0_STLB PQ empty
cpu0_STLB RQ empty
cpu0_STLB WQ empty
cpu0_STLB PQ empty
cpu0_STLB RQ empty
cpu0_STLB WQ empty
cpu0_STLB PQ empty

Aborted (core dumped)

Desktop (please complete the following information):

  • OS: Fedora
  • Compiler GCC 12.2.1
  • Version release_may_23
@ngober
Copy link
Collaborator

ngober commented Jul 3, 2023

I'm not able to reproduce this issue, either on GCC 11 or Clang 14. Both on release/2023-06 and develop, the trace completes without issue.

@ngober
Copy link
Collaborator

ngober commented Jul 6, 2023

Wait... never mind. I found it. I recognize this bug. I worked for about a month with @djimeneth fixing this one, and it's still here. 🙄

What's happening is that the L1D experiences a lot of translation misses, backing up the tag checker. The PTW then cannot get its translations through the lookup process, so everything deadlocks. I'm very familiar with this bug, and I've been fighting it for a while. I'll work on a fix and get it out soon.

@ngober ngober self-assigned this Jul 6, 2023
@ngober ngober linked a pull request Jul 7, 2023 that will close this issue
ngober added a commit that referenced this issue Jul 12, 2023
Fixes #359.

There was a bug in the tag lookup where cascading misses in the PTW
might cause the L1D to deadlock. This patch allows the
`inflight_tag_check` member to drain to the `translation_stash`,
allowing bandwidth for translated packets (from the PTW) to be looked
up.
@ngober ngober closed this as completed Jul 13, 2023
ngober added a commit that referenced this issue Jul 30, 2023
ngober added a commit that referenced this issue Aug 4, 2023
Fixes #359.

There was a bug in the tag lookup where cascading misses in the PTW
might cause the L1D to deadlock. This patch allows the
`inflight_tag_check` member to drain to the `translation_stash`,
allowing bandwidth for translated packets (from the PTW) to be looked
up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants