Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed fix for issue 48. #49

Closed
wants to merge 2 commits into from

Conversation

bhathaway
Copy link
Contributor

After extensive testing at Zendar, we've found it.

The primary culprit for the elusive "magic stopped" problem is a
boundary condition where the starting adjacent descriptor count
written to the dma engine count is wrong and causes the dma engine
to attempt to read across a page boundary.

The primary culprit for the elusive "magic stopped" problem is a
boundary condition where the starting adjacent descriptor count
written to the dma engine count is wrong and causes the dma engine
to attempt to read across a page boundary.
@hmaarrfk
Copy link

It would be good if xilinx actually provided sample FPGA designs to test their drivers. Did you have to make your own block diagram (logic) to do these tests?

@bhathaway
Copy link
Contributor Author

@hmaarrfk Yes, a fellow team mate did the design, and by luck, one of our simple large transfer tests caught this bug very reliably because the user space address being unaligned caused what you might call a striping pattern in the descriptor bus addresses. This kind of work is really difficult to do alone.

@hmaarrfk
Copy link

I totally agree. Xilinx is really bad at leading this collaborative software design though.

There are quite a few PRs and issues that have been raised, with little attention from any developers or Product Managers at Xilinx.

Thanks for sharing!

You know, I think I had some bugs with unaligned memory (using QDMA), but I started to use an aligned allocator since I figured there was likely a bug lurking in there somewhere. I guess I'll circle back to this at a later time thinking about what you found.

Thanks for sharing!

Xilinx has made assumptions about the underlying architecture
regarding the adjacent descriptor fetch scheme. So I've added some
constants that make the code work (at least on aarch64).

Another thing I noticed is that XDMA_TRANSFER_MAX_DESC is too large
for all the hardware the driver pretends to support. I've lowered
it to 512, because that's what matches my hardware. The danger is
FIFO overflow, and I'm not sure what kind of error that would
result in.
(0x1000 - ((le32_to_cpu(desc->next_lo)) & 0xFFF)) / 32 -
1;
max_adj_4k = (PAGE_SIZE_X86 - ((le32_to_cpu(desc->next_lo)) &
PAGE_MASK_X86)) / sizeof(struct xdma_desc) - 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro is hardcore. I guess you all have moved on to aarch ;)

Copy link
Contributor Author

@bhathaway bhathaway Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to make sure that the driver works on x86 and aarch architectures, which is why I made this change. The adjacency accesses make x86 page size assumptions, which is why I propose adding these macros instead of just using "PAGE_SIZE" and "PAGE_MASK", which are standard. Either way it's much better than the magic numbers within the current code.

@bhathaway
Copy link
Contributor Author

@woshixiaohuya I found the change necessary on both x86 and aarch architectures, because it's a bug in the boundary condition. The macros are there to help it work on aarch.

@woshixiaohuya
Copy link

@woshixiaohuya I found the change necessary on both x86 and aarch architectures, because it's a bug in the boundary condition. The macros are there to help it work on aarch.

Okay, Thank you!

@@ -66,7 +66,7 @@
#define XDMA_OFS_CONFIG (0x3000UL)

/* maximum number of desc per transfer request */
#define XDMA_TRANSFER_MAX_DESC (2048)
#define XDMA_TRANSFER_MAX_DESC (512)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this a necessary change for the default configuration? can you comment on why this is necessary?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinks this change isn't nessary. It didn't solve my problem (see #54).

Copy link
Contributor Author

@bhathaway bhathaway Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation from Xilinx (https://www.xilinx.com/support/documentation/ip_documentation/xdma/v4_1/pg195-pcie-dma.pdf) notes that Gen3x16 has a descriptor FIFO depth of 1024 and Gen3x8 (which is what we're using) is 512. I'm having trouble interpretting the language on pages 24 and 25, but it sounds like the engine might be smart about not overflowing this FIFO. What's certain is that the engine can't possibly fetch more than the FIFO allows at once, so I believe the value I've shown is safest for all hardware.

I would be genuinely curious if anyone has benchmarked the performance at different values for this constant.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a message of a race condition here:
https://forums.aws.amazon.com/thread.jspa?threadID=295643

ref to
timeout_ms

Maybe your review and dev knowledge may get some ideas...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would open a new issue for that. It doesn't seem related to this. You need details like which commit hash you're using and the sample program that caused the error. For what it's worth, it seems odd to me that the engine apparently times out after only one descriptor completed (see completed_desc_count). Also, FYI, error 512 is ERESTARTSYS and is explicitly returned by xdma_xfer_submit on timeout.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would open a new issue for that. It doesn't seem related to this. You need details like which commit hash you're using and the sample program that caused the error. For what it's worth, it seems odd to me that the engine apparently times out after only one descriptor completed (see completed_desc_count). Also, FYI, error 512 is ERESTARTSYS and is explicitly returned by xdma_xfer_submit on timeout.

Hello,
I have opened a issue (#54) for I got the error 512 when I transfer 1MB C2H in stream mode.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried the changes by bhathaway. I also experimented with various numbers down to

XDMA_TRANSFER_MAX_DESC (16).

Job works for a few minutes and hangs eventually.
Ubuntu 18.04.02

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be genuinely curious if anyone has benchmarked the performance at different values for this constant.

Quite an old topic but without benchmarking, I can tell that testing using a loopback channel (3x16), I see the following:

  • If I queue 1024 descriptors on the C2H channel, and then queue write descriptors, the transfers are not done. If queuing 1023 read descriptors, it works. I think it means that the choice of this number needs to be done depending on the application. If there is no way reads could lock writes, then no problem, if on the contrary, in some cases, reads cannot happen before a write is done (like loopback mode), this number should not exceed 1023 (on the 3x16 IP).

  • To test this, fetch my adjacent descriptors fix (without that, loopback transfers for more than 1 page mostly don't work or crash the machine because of crossing page boundaries). Use the dma_streaming script and test with sizes up to 1023 pages (4190208 bytes). They work.

  • Now test with more than 4190208, the transfer locks up.

  • Reducing the max number of descriptors to 1023 makes the transfers work in all cases.

I cannot say for sure that the IP works the way I think it does but I'm just putting these here for others to possibly test/confirm or just to help them.

@hmaarrfk
Copy link

maybe we should just fork, and point people to our own repo. Keeping a few PRs in flight like this is really annoying for downstream users (i.e. ourselves).

@mindbeast
Copy link

mindbeast commented May 1, 2020

I've hit the same issue locally, again only when copy descriptors exceed a single page in size (i.e. over 128 32 byte descriptors). This fix looks legitimate, but haven't had the chance to test yet. Thanks for taking the time to actually debug this.

@lesjokolat
Copy link

Hi i realized I messed up and didn't install all your changes correctly.

I re ran them and now I get the following errors

Makefile:10: XVC_FLAGS: .
make -C /lib/modules/5.3.0-59-generic/build M=/home/paul/Downloads/xdma_driver_linux5/dma_ip_drivers-fixLinux5/XDMA/linux-kernel/xdma modules
make[1]: Entering directory '/usr/src/linux-headers-5.3.0-59-generic'
/home/paul/Downloads/xdma_driver_linux5/dma_ip_drivers-fixLinux5/XDMA/linux-kernel/xdma/Makefile:10: XVC_FLAGS: .
CC [M] /home/paul/Downloads/xdma_driver_linux5/dma_ip_drivers-fixLinux5/XDMA/linux-kernel/xdma/libxdma.o
/home/paul/Downloads/xdma_driver_linux5/dma_ip_drivers-fixLinux5/XDMA/linux-kernel/xdma/libxdma.c: In function ‘engine_start’:
/home/paul/Downloads/xdma_driver_linux5/dma_ip_drivers-fixLinux5/XDMA/linux-kernel/xdma/libxdma.c:708:2: error: implicit declaration of function ‘mmiowb’ [-Werror=implicit-function-declaration]
mmiowb();
^~~~~~
/home/paul/Downloads/xdma_driver_linux5/dma_ip_drivers-fixLinux5/XDMA/linux-kernel/xdma/libxdma.c: In function ‘engine_service_shutdown’:
/home/paul/Downloads/xdma_driver_linux5/dma_ip_drivers-fixLinux5/XDMA/linux-kernel/xdma/libxdma.c:749:2: error: implicit declaration of function ‘swake_up’; did you mean ‘wake_up’? [-Werror=implicit-function-declaration]
swake_up(&engine->shutdown_wq);
^~~~~~~~
wake_up
/home/paul/Downloads/xdma_driver_linux5/dma_ip_drivers-fixLinux5/XDMA/linux-kernel/xdma/libxdma.c: In function ‘xdma_xfer_submit’:
/home/paul/Downloads/xdma_driver_linux5/dma_ip_drivers-fixLinux5/XDMA/linux-kernel/xdma/libxdma.c:3585:4: error: implicit declaration of function ‘swait_event_interruptible_timeout’; did you mean ‘wait_event_interruptible_timeout’? [-Werror=implicit-function-declaration]
swait_event_interruptible_timeout(
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
wait_event_interruptible_timeout
cc1: some warnings being treated as errors
scripts/Makefile.build:288: recipe for target '/home/paul/Downloads/xdma_driver_linux5/dma_ip_drivers-fixLinux5/XDMA/linux-kernel/xdma/libxdma.o' failed
make[2]: *** [/home/paul/Downloads/xdma_driver_linux5/dma_ip_drivers-fixLinux5/XDMA/linux-kernel/xdma/libxdma.o] Error 1
Makefile:1656: recipe for target 'module/home/pablo/Downloads/xdma_driver_linux5/dma_ip_drivers-fixLinux5/XDMA/linux-kernel/xdma' failed
make[1]: *** [module/home/paul/Downloads/xdma_driver_linux5/dma_ip_drivers-fixLinux5/XDMA/linux-kernel/xdma] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.3.0-59-generic'
Makefile:27: recipe for target 'all' failed
make: *** [all] Error 2

The mmiowb(); was a regression to an earler error that was fixed. he others are new.

jrtc27 added a commit to jrtc27/connectal that referenced this pull request Jun 30, 2020
This is a minimal implementation based on [1] that matches the existing
code style for handling the page crossing edge case. Previously only one
place in the code correctly handled adjacent descriptors crossing a page
boundary, but now that code is copied to the other relevant places too.

We hit this case when attempting to re-load ELF files after a few
attempts, at least with my CheriBSD image.

[1] Xilinx/dma_ip_drivers#49
@jrtc27
Copy link
Contributor

jrtc27 commented Jun 30, 2020

Thanks for finding and fixing this, I just ran into this today. Some comments on your pull request, however:

  1. Try and match the style of existing code, regardless of your opinions of it. There is already code in xdma_desc_adjacent to handle this, so the best thing to do is copy that code rather than do it in your own way.

  2. Don't make unrelated changes. You have a bunch of reformatting and renaming changes that, whilst maybe are a good thing, do not belong in a bug fix, but as their own separate pull request (or requests). Especially changing the max number of descriptors belongs in a separate change. This makes it far easier to tell what you're actually trying to fix, making the review quicker and more likely to actually happen.

  3. Your assertions about the page size being X86-related are wrong. The XDMA product guide very clearly states "A block of adjacent descriptors must not cross a 4K address boundary" ([1] p. 24), so whilst also being noise in the commit it's incorrect.

  4. You missed the instance in transfer_desc_init. That's the third place where MAX_EXTRA_ADJ is consulted, so I assume it also should have the same fix applied to it.

I made my own minimal commit that follows these principles[2] (but, as it's in a local vendored copy, it's surrounded by noise to clearly indicate that this is a local change). Something like that seems to me like it would have a much greater chance of being reviewed and accepted, but ultimately you're the one who found the bug and worked out what needed fixing so this is not me trying to take credit for that, just trying to push this issue forward and tidy it up into a state where it can actually be fixed.

[1] https://www.xilinx.com/support/documentation/ip_documentation/xdma/v4_1/pg195-pcie-dma.pdf
[2] jrtc27/connectal@7a43fc1

jameyhicks pushed a commit to cambridgehackers/connectal that referenced this pull request Jun 30, 2020
This is a minimal implementation based on [1] that matches the existing
code style for handling the page crossing edge case. Previously only one
place in the code correctly handled adjacent descriptors crossing a page
boundary, but now that code is copied to the other relevant places too.

We hit this case when attempting to re-load ELF files after a few
attempts, at least with my CheriBSD image.

[1] Xilinx/dma_ip_drivers#49
@karenx-xilinx
Copy link
Collaborator

This commit:
5faf23e
should fix the issue.

Thanks!

@jrtc27
Copy link
Contributor

jrtc27 commented Jun 30, 2020

This commit:
5faf23e
should fix the issue.

Thanks!

Thanks, that does indeed also appear to fix the issue, at least for my use case.

@ynt1988
Copy link

ynt1988 commented Apr 9, 2021

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants