Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Driver doesn`t work on AARCH64 #27

Closed
sploet opened this issue Oct 21, 2017 · 13 comments
Closed

Driver doesn`t work on AARCH64 #27

sploet opened this issue Oct 21, 2017 · 13 comments
Labels

Comments

@sploet
Copy link
Contributor

sploet commented Oct 21, 2017

I have cma=256M and using ultrascale+ (xilinx ZCU102 devboard with ARM64 chip)

Have a look at my log:

[ 0.000000] Linux version 4.9.0-dirty (roman@ws-278) (gcc version 6.2.1 20161114 (Linaro GCC Snapshot 6.2-2016.11) ) #76 SMP Sat Oct 21 12:01:12 +03 2017
[ 0.000000] Boot CPU: AArch64 Processor [410fd034]
[ 0.000000] earlycon: cdns0 at MMIO 0x00000000ff000000 (options '115200n8')
[ 0.000000] bootconsole [cdns0] enabled
[ 0.000000] cma: Reserved 256 MiB at 0x000000006d800000
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: MIGRATE_INFO_TYPE not supported.
[ 0.000000] percpu: Embedded 19 pages/cpu @ffffffc87ff6f000 s40088 r8192 d29544 u77824
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: enabling workaround for ARM erratum 845719
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 1034240
[ 0.000000] Kernel command line: earlycon clk_ignore_unused
[ 0.000000] log_buf_len individual max cpu contribution: 131072 bytes
[ 0.000000] log_buf_len total cpu_extra contributions: 393216 bytes
[ 0.000000] log_buf_len min size: 131072 bytes
[ 0.000000] log_buf_len: 524288 bytes
[ 0.000000] early log buf free: 129092(98%)
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
[ 0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
[ 0.000000] software IO TLB [mem 0x69800000-0x6d800000] (64MB) mapped at [ffffffc069800000-ffffffc06d7fffff]
[ 0.000000] Memory: 3786472K/4194304K available (5822K kernel code, 412K rwdata, 1796K rodata, 320K init, 382K bss, 145688K reserved, 262144K cma-reserved)
[ 0.000000] Virtual kernel memory layout:
[ 0.000000] modules : 0xffffff8000000000 - 0xffffff8008000000 ( 128 MB)
[ 0.000000] vmalloc : 0xffffff8008000000 - 0xffffffbebfff0000 ( 250 GB)
[ 0.000000] .text : 0xffffff8008080000 - 0xffffff8008630000 ( 5824 KB)
[ 0.000000] .rodata : 0xffffff8008630000 - 0xffffff8008800000 ( 1856 KB)
[ 0.000000] .init : 0xffffff8008800000 - 0xffffff8008850000 ( 320 KB)
[ 0.000000] .data : 0xffffff8008850000 - 0xffffff80088b7008 ( 413 KB)
[ 0.000000] .bss : 0xffffff80088b7008 - 0xffffff8008916814 ( 383 KB)
[ 0.000000] fixed : 0xffffffbefe7fd000 - 0xffffffbefec00000 ( 4108 KB)
[ 0.000000] PCI I/O : 0xffffffbefee00000 - 0xffffffbeffe00000 ( 16 MB)
[ 0.000000] vmemmap : 0xffffffbf00000000 - 0xffffffc000000000 ( 4 GB maximum)
[ 0.000000] 0xffffffbf00000000 - 0xffffffbf1dc00000 ( 476 MB actual)
[ 0.000000] memory : 0xffffffc000000000 - 0xffffffc880000000 ( 34816 MB)

After linux kernel loaded, i successfully insert kernel module:

# insmod /axidma.ko

[ 163.886158] axidma: loading out-of-tree module taints kernel.
[ 163.892920] axidma: axidma_dma.c: axidma_dma_init: 706: DMA: Found 1 transmit channels and 1 receive channels.
[ 163.902946] axidma: axidma_dma.c: axidma_dma_init: 708: VDMA: Found 0 transmit channels and 0 receive channels.

Next i tried to do the benchmark test:

# /axidma_benchmark

AXI DMA Benchmark Parameters:
Transmit Buffer Size: 7.91 Mb
Re[ 287.820644] axidma: axidma_chrdev.c: axidma_mmap: 277: Unable to allocate contiguous DMA memory region of size 8294400.
ceive Buffer Size: 7.91 Mb
Number of DMA Transfers: 10 transfe[ 287.837036] axidma: axidma_chrdev.c: axidma_mmap: 278: Please make sure that you specified cma= on the kernel command line, and the size is large enoug.
rs

axidma_benchmark: tx_size=8294400
libaxidma: mmap size=8294400
Unable to allocate transmit buffer from the AXI DMA device.: Cannot allocate memory

# dmesg

163.886158] axidma: loading out-of-tree module taints kernel.
[ 163.892645] cma: cma_alloc(cma ffffff80088eac40, count 8, align 3)
[ 163.892767] cma: cma_alloc(): returned ffffffbf017f4e00
[ 163.892798] cma: cma_alloc(cma ffffff80088eac40, count 1, align 0)
[ 163.892820] cma: cma_alloc(): returned ffffffbf017f4fc0
[ 163.892849] cma: cma_alloc(cma ffffff80088eac40, count 8, align 3)
[ 163.892872] cma: cma_alloc(): returned ffffffbf017f5180
[ 163.892891] cma: cma_alloc(cma ffffff80088eac40, count 1, align 0)
[ 163.892913] cma: cma_alloc(): returned ffffffbf017f4ff8
[ 163.892920] axidma: axidma_dma.c: axidma_dma_init: 706: DMA: Found 1 transmit channels and 1 receive channels.
[ 163.902946] axidma: axidma_dma.c: axidma_dma_init: 708: VDMA: Found 0 transmit channels and 0 receive channels.
[ 287.820644] axidma: axidma_chrdev.c: axidma_mmap: 277: Unable to allocate contiguous DMA memory region of size 8294400.
[ 287.837036] axidma: axidma_chrdev.c: axidma_mmap: 278: Please make sure that you specified cma= on the kernel command line, and the size is large enough.

I foud out that function dma_alloc_coherent always return NULL.
The same behaviour is for all buffer size i have set (16 byte, 1024 byte, 8Kb, 16 Kb, 128Kb, 1Mb, 16 Mb, etc..).

Yesterday I also found the same trouble in the Internet, and seem it was solved : ikwzm/udmabuf@b4ee3e3#diff-80d2ae607f6046e637a24ce2bc06ba30

I am not a kernel guru, can you help we with with?
Do you have any idea?

@sploet
Copy link
Contributor Author

sploet commented Oct 21, 2017

Maybe this is solution, don`t know:

I added of_dma_configure(dev->device, NULL);
and replaced
dma_alloc_coherent(NULL, and dma_mma_mmap_coherent(NULL,
to
** dma_alloc_coherent(dev->device,** and dma_mma_mmap_coherent(dev->device,

Like this:

// Allocate the requested region a contiguous and uncached for DMA
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

of_dma_configure(dev->device, NULL);

// dma_alloc->kern_addr = dma_alloc_coherent(NULL, dma_alloc->size, &dma_alloc->dma_addr, GFP_KERNEL);
dma_alloc->kern_addr = dma_alloc_coherent(dev->device, dma_alloc->size, &dma_alloc->dma_addr, GFP_KERNEL);
if (dma_alloc->kern_addr == NULL)
{
axidma_err("Unable to allocate contiguous DMA memory region of size %zu.\n", dma_alloc->size);
axidma_err("Please make sure that you specified cma= on the kernel command line, and the size is large enough.\n");
rc = -ENOMEM;
goto free_vma_data;
}

// Map the region into userspace

// rc = dma_mmap_coherent(NULL, vma, dma_alloc->kern_addr, dma_alloc->dma_addr, dma_alloc->size);
rc = dma_mmap_coherent(dev->device, vma, dma_alloc->kern_addr, dma_alloc->dma_addr, dma_alloc->size);

Now I have this situation:

insmod /axidma.ko

[ 55.577627] axidma: loading out-of-tree module taints kernel.
[ 55.584409] axidma: axidma_dma.c: axidma_dma_init: 706: DMA: Found 1 transmit channels and 1 receive channels.
[ 55.594423] axidma: axidma_dma.c: axidma_dma_init: 708: VDMA: Found 0 transmit channels and 0 receive channels.

dmesg|grep cma

[ 0.000000] cma: dma_contiguous_reserve(limit 100000000)
[ 0.000000] cma: dma_contiguous_reserve: reserving 256 MiB for global area
[ 0.000000] cma: cma_declare_contiguous(size 0x0000000010000000, base 0x0000000000000000, limit 0x0000000100000000 alignment 0x0000000000000000)
[ 0.000000] cma: Reserved 256 MiB at 0x000000006d800000
[ 0.000000] Memory: 3786472K/4194304K available (5822K kernel code, 412K rwdata, 1796K rodata, 320K init, 382K bss, 145688K reserved, 262144K cma-reserved)
[ 0.193143] cma: cma_alloc(cma ffffff80088eac40, count 64, align 6)
[ 0.193487] cma: cma_alloc(): returned ffffffbf017f4000
[ 6.818795] cma: cma_alloc(cma ffffff80088eac40, count 768, align 8)
[ 6.818973] cma: cma_alloc(): returned ffffffbf017f7800
[ 55.584138] cma: cma_alloc(cma ffffff80088eac40, count 8, align 3)
[ 55.584258] cma: cma_alloc(): returned ffffffbf017f4e00
[ 55.584289] cma: cma_alloc(cma ffffff80088eac40, count 1, align 0)
[ 55.584311] cma: cma_alloc(): returned ffffffbf017f4fc0
[ 55.584339] cma: cma_alloc(cma ffffff80088eac40, count 8, align 3)
[ 55.584362] cma: cma_alloc(): returned ffffffbf017f5180
[ 55.584382] cma: cma_alloc(cma ffffff80088eac40, count 1, align 0)
[ 55.584402] cma: cma_alloc(): returned ffffffbf017f4ff8

/axidma_benchmark

AXI DMA Benchmark Parameters:
Transmit Buffer Size: 7.91 Mb
Receive Buffer Size: 7.91 Mb
Number of DMA Transfers: 10 transfers

axidma_benchmark: tx_size=8294400
libaxidma: mmap size=8294400
axidma_benchmark: tx_size=8294400
libaxidma: mmap size=8294400
Using transmit channel 0 and receive channel 1.
[ 88.294196] axidma: axidma_dma.c: axidma_start_transfer: 299: DMA receive transaction timed out.
Failed to perform the AXI DMA read-write transfer: Timer expired

dmesg|grep cma

[ 0.000000] cma: dma_contiguous_reserve(limit 100000000)
[ 0.000000] cma: dma_contiguous_reserve: reserving 256 MiB for global area
[ 0.000000] cma: cma_declare_contiguous(size 0x0000000010000000, base 0x0000000000000000, limit 0x0000000100000000 alignment 0x0000000000000000)
[ 0.000000] cma: Reserved 256 MiB at 0x000000006d800000
[ 0.000000] Memory: 3786472K/4194304K available (5822K kernel code, 412K rwdata, 1796K rodata, 320K init, 382K bss, 145688K reserved, 262144K cma-reserved)
[ 0.193143] cma: cma_alloc(cma ffffff80088eac40, count 64, align 6)
[ 0.193487] cma: cma_alloc(): returned ffffffbf017f4000
[ 6.818795] cma: cma_alloc(cma ffffff80088eac40, count 768, align 8)
[ 6.818973] cma: cma_alloc(): returned ffffffbf017f7800
[ 55.584138] cma: cma_alloc(cma ffffff80088eac40, count 8, align 3)
[ 55.584258] cma: cma_alloc(): returned ffffffbf017f4e00
[ 55.584289] cma: cma_alloc(cma ffffff80088eac40, count 1, align 0)
[ 55.584311] cma: cma_alloc(): returned ffffffbf017f4fc0
[ 55.584339] cma: cma_alloc(cma ffffff80088eac40, count 8, align 3)
[ 55.584362] cma: cma_alloc(): returned ffffffbf017f5180
[ 55.584382] cma: cma_alloc(cma ffffff80088eac40, count 1, align 0)
[ 55.584402] cma: cma_alloc(): returned ffffffbf017f4ff8
[ 77.977792] cma: cma_alloc(cma ffffff80088eac40, count 2025, align 8)
[ 77.977972] cma: cma_alloc(): returned ffffffbf01802000
[ 77.979759] cma: cma_alloc(cma ffffff80088eac40, count 2025, align 8)
[ 77.979907] cma: cma_alloc(): returned ffffffbf0181e000

Maybe timeout due to FPGA loopback don`t work correctly. Will check.

BTW if you have any comments about it is good for me.

@sploet
Copy link
Contributor Author

sploet commented Oct 21, 2017

Also replaced in the same file:

dma_free_coherent(NULL, dma_alloc->size, dma_alloc->kern_addr, dma_alloc->dma_addr);
to
dma_free_coherent(dev->device, dma_alloc->size, dma_alloc->kern_addr, dma_alloc->dma_addr);

@bperez77 bperez77 added the bug label Oct 21, 2017
@bperez77
Copy link
Owner

It seems like that change should do the trick, based on the issue in the other repository to which you linked. I'll look into this more in the kernel code to see what the issue is.

Unfortunately, I don't have access to an Ultrascale devices at the moment, but I'll see if I can get a QEMU image up and running to verify the change independently.

@sploet
Copy link
Contributor Author

sploet commented Oct 23, 2017

it works in ultrascale+:

I used 128 bit bus width and the same bursts value in axi0dma, and 200Mhz clock for PL.
Axidma was Loopbacked without any FIFO.

Have a look:

DMA Timing Statistics:
Elapsed Time: 24.68 s
Transmit Throughput: 2835.88 Mb/s
Receive Throughput: 2835.88 Mb/s
Total Throughput: 5671.77 Mb/s

I think this bug has fixed.

@bperez77
Copy link
Owner

Ok, great, I'll add these fixes into the repoisotry, thanks for your help!

bperez77 pushed a commit that referenced this issue Oct 29, 2017
This closes #27, courtesy of @sploet. This is based on the fix provided
by issue ikwzm/udmabuf#5.  The issue is that for AARCH64, all of the DMA
functions require a non-NULL device as input. Additionally, a call to
`of_dma_configure` is required.
@bperez77
Copy link
Owner

@sploet Could you verify this fix when you get a chance?

@sploet
Copy link
Contributor Author

sploet commented Nov 3, 2017

Sorry for keeping your waiting.
I have already checked. It works!

@bperez77
Copy link
Owner

No problem, glad to hear it works!

@sploet
Copy link
Contributor Author

sploet commented May 16, 2018

Hi Clack!
I can help you, please write me to skype: rpk___ (3 underlines after rpk)
Or you may use my e-mail directly : sploet at gmail.com

@ghost
Copy link

ghost commented May 17, 2018

Hi all,
we have been working on zc706 using this driver that has ARM based SoC.
our problem was with dma_alloc_coherent() function returning NULL all the time. finally we figured out that commits corresponding this issue was the main problem.
dma_x_coherent(NULL, dma_alloc->size, dma_alloc->kern_addr, dma_alloc->dma_addr) works for us instead of dma_x_coherent(dev->device,...)

@bperez77
Copy link
Owner

@sh-ebrahimi I believe this is an issue with differing kernel versions. Originally, the driver did pass NULL to dma_x_coherent, but it seemed to cause an issue on newer kernel versions. However, I imagine that there are certain kernel versions where this change has not been backported.

Out of curiosity, what kernel version are you using?

@ghost
Copy link

ghost commented May 24, 2018 via email

@bperez77
Copy link
Owner

Got it, that version likely doesn't have the update to dma_x_coherent that requires a device to be passed in. Since future kernel version are going to use the newer style, I'm going to keep in the repo.

In the meantime, for anyone else who finds this thread, you can move to a newer kernel version, or you can revert commit 4b29989.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants