Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed fix for issue 48. #49

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 14 additions & 9 deletions XDMA/linux-kernel/libxdma/libxdma.c
Original file line number Diff line number Diff line change
Expand Up @@ -685,8 +685,15 @@ static struct xdma_transfer *engine_start(struct xdma_engine *engine)
(unsigned long)(&engine->sgdma_regs));

if (transfer->desc_adjacent > 0) {
extra_adj = transfer->desc_adjacent - 1;
if (extra_adj > MAX_EXTRA_ADJ)
u64 next_page_addr;
next_page_addr =
((transfer->desc_bus >> PAGE_SHIFT_X86) + 1) <<
PAGE_SHIFT_X86;
extra_adj = (next_page_addr - transfer->desc_bus) /
sizeof (struct xdma_desc) - 1;
if (extra_adj > transfer->desc_adjacent - 1)
extra_adj = transfer->desc_adjacent - 1;
else if (extra_adj > MAX_EXTRA_ADJ)
extra_adj = MAX_EXTRA_ADJ;
}
dbg_tfr("iowrite32(0x%08x to 0x%p) (first_desc_adjacent)\n", extra_adj,
Expand Down Expand Up @@ -2541,9 +2548,8 @@ static void xdma_desc_adjacent(struct xdma_desc *desc, int next_adjacent)
extra_adj = next_adjacent - 1;
if (extra_adj > MAX_EXTRA_ADJ)
extra_adj = MAX_EXTRA_ADJ;
max_adj_4k =
(0x1000 - ((le32_to_cpu(desc->next_lo)) & 0xFFF)) / 32 -
1;
max_adj_4k = (PAGE_SIZE_X86 - ((le32_to_cpu(desc->next_lo)) &
PAGE_MASK_X86)) / sizeof(struct xdma_desc) - 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro is hardcore. I guess you all have moved on to aarch ;)

Copy link
Contributor Author

@bhathaway bhathaway Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to make sure that the driver works on x86 and aarch architectures, which is why I made this change. The adjacency accesses make x86 page size assumptions, which is why I propose adding these macros instead of just using "PAGE_SIZE" and "PAGE_MASK", which are standard. Either way it's much better than the magic numbers within the current code.

if (extra_adj > max_adj_4k)
extra_adj = max_adj_4k;
if (extra_adj < 0) {
Expand All @@ -2552,7 +2558,7 @@ static void xdma_desc_adjacent(struct xdma_desc *desc, int next_adjacent)
}
}
/* merge adjacent and control field */
control |= 0xAD4B0000UL | (extra_adj << 8);
control |= DESC_MAGIC | (extra_adj << 8);
/* write control and next_adjacent */
desc->control = cpu_to_le32(control);
}
Expand Down Expand Up @@ -3232,10 +3238,9 @@ static int transfer_init(struct xdma_engine *engine, struct xdma_request_cb *req

/* Contiguous descriptors cannot cross PAGE boundry. Adjust max accordingly */
desc_align = engine->desc_idx + desc_max - 1;
desc_align = desc_align % 128;
if (desc_align < (desc_max - 1)) {
desc_align = desc_align % (PAGE_SIZE_X86 / sizeof(struct xdma_desc));
if (desc_align < desc_max)
desc_align = desc_max - desc_align - 1;
}
else
desc_align = desc_max;

Expand Down
7 changes: 6 additions & 1 deletion XDMA/linux-kernel/libxdma/libxdma.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
#define XDMA_OFS_CONFIG (0x3000UL)

/* maximum number of desc per transfer request */
#define XDMA_TRANSFER_MAX_DESC (2048)
#define XDMA_TRANSFER_MAX_DESC (512)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this a necessary change for the default configuration? can you comment on why this is necessary?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinks this change isn't nessary. It didn't solve my problem (see #54).

Copy link
Contributor Author

@bhathaway bhathaway Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation from Xilinx (https://www.xilinx.com/support/documentation/ip_documentation/xdma/v4_1/pg195-pcie-dma.pdf) notes that Gen3x16 has a descriptor FIFO depth of 1024 and Gen3x8 (which is what we're using) is 512. I'm having trouble interpretting the language on pages 24 and 25, but it sounds like the engine might be smart about not overflowing this FIFO. What's certain is that the engine can't possibly fetch more than the FIFO allows at once, so I believe the value I've shown is safest for all hardware.

I would be genuinely curious if anyone has benchmarked the performance at different values for this constant.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a message of a race condition here:
https://forums.aws.amazon.com/thread.jspa?threadID=295643

ref to
timeout_ms

Maybe your review and dev knowledge may get some ideas...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would open a new issue for that. It doesn't seem related to this. You need details like which commit hash you're using and the sample program that caused the error. For what it's worth, it seems odd to me that the engine apparently times out after only one descriptor completed (see completed_desc_count). Also, FYI, error 512 is ERESTARTSYS and is explicitly returned by xdma_xfer_submit on timeout.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would open a new issue for that. It doesn't seem related to this. You need details like which commit hash you're using and the sample program that caused the error. For what it's worth, it seems odd to me that the engine apparently times out after only one descriptor completed (see completed_desc_count). Also, FYI, error 512 is ERESTARTSYS and is explicitly returned by xdma_xfer_submit on timeout.

Hello,
I have opened a issue (#54) for I got the error 512 when I transfer 1MB C2H in stream mode.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried the changes by bhathaway. I also experimented with various numbers down to

XDMA_TRANSFER_MAX_DESC (16).

Job works for a few minutes and hangs eventually.
Ubuntu 18.04.02

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be genuinely curious if anyone has benchmarked the performance at different values for this constant.

Quite an old topic but without benchmarking, I can tell that testing using a loopback channel (3x16), I see the following:

  • If I queue 1024 descriptors on the C2H channel, and then queue write descriptors, the transfers are not done. If queuing 1023 read descriptors, it works. I think it means that the choice of this number needs to be done depending on the application. If there is no way reads could lock writes, then no problem, if on the contrary, in some cases, reads cannot happen before a write is done (like loopback mode), this number should not exceed 1023 (on the 3x16 IP).

  • To test this, fetch my adjacent descriptors fix (without that, loopback transfers for more than 1 page mostly don't work or crash the machine because of crossing page boundaries). Use the dma_streaming script and test with sizes up to 1023 pages (4190208 bytes). They work.

  • Now test with more than 4190208, the transfer locks up.

  • Reducing the max number of descriptors to 1023 makes the transfers work in all cases.

I cannot say for sure that the IP works the way I think it does but I'm just putting these here for others to possibly test/confirm or just to help them.


/* maximum size of a single DMA transfer descriptor */
#define XDMA_DESC_BLEN_BITS 28
Expand Down Expand Up @@ -161,6 +161,11 @@
#define XDMA_ID_H2C 0x1fc0U
#define XDMA_ID_C2H 0x1fc1U

/* x86 assumptions needed for other architectures */
#define PAGE_SIZE_X86 0x1000
#define PAGE_SHIFT_X86 12
#define PAGE_MASK_X86 0xfff

/* for C2H AXI-ST mode */
#define CYCLIC_RX_PAGES_MAX 256

Expand Down