Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: ARM JTAG regressions #1389

Merged
merged 56 commits into from
Apr 3, 2023
Merged

Fix: ARM JTAG regressions #1389

merged 56 commits into from
Apr 3, 2023

Conversation

dragonmux
Copy link
Member

Detailed description

This PR addresses several regressions in ARM JTAG handling detailed below and necessarily rewrites the ADIv5 high level remote protocol.

The regressions fixed are as follows:

  • Fixed a regression introduced with 6dcc5d4 which broke non-multiple-of-8 TDI-TDO sequences
  • Fixed an issue whereby CMSIS-DAP adaptors such as ORBTrace should never have worked with JTAG ADIv5 devices due to initialising the IR length information too late
  • Fixed a timing issue in the TDI sequence function much like the TDI-TDO sequence one
  • Fixed the remote protocol going "everything's ok" even when dp->fault is set non-zero and everything is not ok
  • Fixed the remote protocol pretending exceptions don't exist and the ADIv5 JTAG and SWD layers can't throw them (they can) resulting in permanent comms loss when one occurs
  • Fixed the BMDA side of the remote protocol never doing protocol recovery and driving the remote into a state due to not handling any errors at all

As part of the remote protocol rewrite we have pre-emptively split the high-level component of the protocol into generic high-level commands and ADIv5-specific ones to make organisational space for the RISC-V specific ones when v2.0 happens. This happens to also make reading BMDA probe logs easier.

Your checklist for this pull request

  • I've read the Code of Conduct
  • I've read the guidelines for contributing to this repository
  • It builds for hardware native (make PROBE_HOST=native)
  • It builds as BMDA (make PROBE_HOST=hosted)
  • I've tested it to the best of my ability
  • My commit messages provide a useful short description of what the commits do

Closing issues

Fixes #858

@dragonmux dragonmux added Bug Confirmed bug Enhancement General project improvement HwIssue Mitigation Solving or mitigating a Hardware issue in Software Regression Bug caused by a regression labels Feb 16, 2023
@dragonmux dragonmux added this to the v1.10 milestone Feb 16, 2023
@dragonmux dragonmux requested a review from esden February 16, 2023 21:54
@tlyu
Copy link
Contributor

tlyu commented Feb 17, 2023

Tested on BMP with v1.9.0, with SWD to a GD32 target. Fallback seems to work. With BMP updated to firmware built from this PR, scan succeeds, but attach fails, with errors similar to what I previously reported on Discord.
bmp-pr-1389-updated-fw.log

@tlyu
Copy link
Contributor

tlyu commented Feb 19, 2023

Update: with ENABLE_DEBUG=1 on the BMP firmware (I had to disable some targets to get it to fit), I get

SWD access resulted in no response
Recovering and re-trying access

on the aux CDC channel around the time that BMDA is showing

swdptap_seq_in          3 clock_cycles: 00000007
!SI20#
       PFFFFFFFF
swdptap_seq_in_parity  32 clock_cycles: ffffffff ERR
DP Error 0x00000000
ap_mem_write_sized @ e000edf0 len 4, align 4: 03 00 5f a0
!AM00002300004002e000edf00000000403005FA0#
Timeout while waiting for BMP response
remote_adiv5_mem_write_bytes comms error: -4
remote_adiv5_mem_write_bytes error around 0xe000edf0

Which reinforces my suspicion that BMP is attempting the recovery procedure, but is somehow becoming uncommunicative with BMDA during or after.

That's about all I have time to debug at the moment.

@dragonmux
Copy link
Member Author

Thank you, that's incredibly helpful as we weren't certain if that wasn't still going to be a problem post-patch (it clearly is) so giving us something more to dig into to finish stabilising this branch before Esden reviews it with a view to merge

@dragonmux
Copy link
Member Author

So, error recovery was still broken under JTAG, which is now fixed, but we haven't yet dived into error recovery under SWD. Unless we can figure out something simple today, we're inclined to punt fixing that into a new PR so Esden gets a chance to review what's here, test it and merge it. This PR is intentionally not touching SWD as such as it was JTAG that was quite so broken

@tlyu
Copy link
Contributor

tlyu commented Feb 20, 2023

So, error recovery was still broken under JTAG, which is now fixed, but we haven't yet dived into error recovery under SWD. Unless we can figure out something simple today, we're inclined to punt fixing that into a new PR so Esden gets a chance to review what's here, test it and merge it. This PR is intentionally not touching SWD as such as it was JTAG that was quite so broken

Thanks for looking into it! I probably won't have time to test your latest changes today, so I'm fine with leaving the SWD fixes for the high-level protocol for separate PR or issue.

@tlyu
Copy link
Contributor

tlyu commented Feb 24, 2023

I just tried the latest changes, and there seems to be a regression that causes blackmagic -t to fail to detach the target after the scan. Connecting directly to BMP in GDB, then attaching and detaching the target, allows the target to resume.

I haven't tracked down what causes it yet, but from preliminary tests, it looks like the recent "target resumption" change is not the cause.

@tlyu
Copy link
Contributor

tlyu commented Feb 24, 2023

I went back to b865bd8 on both BMDA and BMP, which I had previously tested, and the regression on blackmagic -t shows up there as well. I guess I didn't test blackmagic -t at that time.

@dragonmux
Copy link
Member Author

Any chance you could share a src/blackmagic -tv 17 log of that behaviour so we can dig into it? We'll aim to add a fix to this PR if possible

@tlyu
Copy link
Contributor

tlyu commented Feb 24, 2023

I'll try to get to that soon. Another regression is that it's no longer possible to use blackmagic -t -H to disable the high-level remote protocol for scanning.

@tlyu
Copy link
Contributor

tlyu commented Feb 25, 2023

I'll upload logs from blackmagic -tv 17 once I recreate them, because I don't remember which runs caused the target to remain halted or not. I can't see any obvious differences that would cause it.

I do have some suspicions that the clock may be gated off too quickly after sending the detach commands, due to an unintended timing change happening as a side effect of these changes. (ADIv5.2 §B4.1.1 requires at least 8 clocks of idle if the clock is gated off after a command, to ensure that it's fully shifted in to the SW-DP.)

@tlyu
Copy link
Contributor

tlyu commented Feb 25, 2023

Logs from various combinations of BMDA and BMP firmware. The "pull 1405" happened to be what I had flashed most recently. I can repeat it with main if you want, but I think the changes there aren't going to affect this behavior.

I'm almost tempted to hook up a logic analyzer to see if there's something weird with the clock disable with the firmware built from this patch. It would have to be different between native and remote, despite a lot of shared code, though.

I did try adding swd_proc.seq_out(0, 8); in a couple of places after SWD reads, on the theory that a turnaround plus the required idle cycles would make things behave, but it didn't seem to help.

BMDA pull 1405, firmware pull 1405: no halt

bmp-t-pull-1405-no-halt.log

BMDA pull 1389, firmware pull 1405: no halt

bmp-t-pull-1389-fw-pull-1405-no-halt.log

BMDA pull 1389, firmware pull 1405, -H flag: no halt

bmp-t-H-pull-1389-fw-pull-1405-no-halt.log

BMDA pull 1389, firmware pull 1389: halt

bmp-t-pull-1389-fw-pull-1389-halt.log

Copy link
Contributor

@tlyu tlyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See file comment for the possible cause of the regression with high-level SWD.

src/remote.c Outdated Show resolved Hide resolved
@tlyu
Copy link
Contributor

tlyu commented Feb 25, 2023

There's still a regression where blackmagic -t -H fails, but I haven't finished investigating that yet.

Copy link
Contributor

@tlyu tlyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to remote_respond made a substantial change to the protocol with all-zeros responses returning K instead of K0. This causes some things to break, including blackmagic -t -H. Unfortunately, BMDA wasn't adjusted to expect it. It's probably easiest to revert to the previous protocol.

src/remote.c Outdated Show resolved Hide resolved
src/remote.c Show resolved Hide resolved
@dragonmux dragonmux force-pushed the fix/arm-jtag-regressions branch 2 times, most recently from 7d7907e to ebbd87a Compare February 26, 2023 00:23
@tlyu
Copy link
Contributor

tlyu commented Feb 26, 2023

Thanks for the updates! I did some testing, and it does fix the regressions that I found with this PR.

Unfortunately, the timeout during error recovery on SWD still happens. I think I failed to test my remote protocol fixes without the "add idle clocks after reads on SWD" changes, and those were preventing the error conditions that were triggering the timeouts.

That was a pre-existing problem on main, though, so I'm happy to defer it for another PR or issue.

@dragonmux
Copy link
Member Author

Excellent, and yeah - think we'll deal with the SWD problem on a separate PR as mentioned before.

Thank you for your excellent review! Much appreciated and helpful :)

@dragonmux
Copy link
Member Author

Apparently we hadn't gone quite far enough fixing the order of operations in adiv5.c's adiv5_dp_init(), with the new order for reading the DPIDR and all that both slightly accelerating the process if high-level functions are in use, and dealing with sticky errors that bit earlier.

@dragonmux dragonmux force-pushed the fix/arm-jtag-regressions branch 3 times, most recently from 8eaa61a to b8ddca4 Compare February 26, 2023 08:36
…as this call would fail to function properly and screw up probing for parts
…not trigger a write when there is no need for the ABORT DP register
…favour of the one baked into the firmware as the firmware one is more correct
Copy link
Member

@esden esden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, that is a lot of rework. I tested it to the best of my abilities. It does look good otherwise, and it definitely does fix the JTAG regression we had too. More use/testing needed to make sure, but that should happen in main. :)

@esden esden added this pull request to the merge queue Apr 3, 2023
Merged via the queue into main with commit 8985c2c Apr 3, 2023
@dragonmux dragonmux deleted the fix/arm-jtag-regressions branch April 3, 2023 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed bug Enhancement General project improvement HwIssue Mitigation Solving or mitigating a Hardware issue in Software Regression Bug caused by a regression
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BMP remote protocoll and exception/timeout handling
3 participants