Batched 3D FFT memory access fault #311

upsj · 2021-02-23T01:54:24Z

I am working on an hipFFT wrapper for an HPC library, where I basically hipified our cuFFT wrapper code 1:1 (only replacing 64 bit calls by 32 bit calls). The CUDA and HIP-CUDA tests work and give correct results, only when compiling everything with rocFFT on an AMD device do I get a memory access fault. My guess would be that this is related to the "interleaved batch" memory layout we are using, where the innermost dimension is the batch dimension, followed by the 3 FFT dimensions.

Let me know if you need any additional information

What is the expected behavior

The FFT is executed

What actually happens

the execution fails with Memory access fault by GPU node-1 (Agent handle: ...) on address .... Reason: Page not present or supervisor privilege.

How to reproduce

Minimal reproducer: (setting batch to 1 works correctly)

#include <hip/hip_runtime.h>
#include <hipfft.h>


int main() {
	hipfftDoubleComplex* in;
	hipfftDoubleComplex* out;
	hipfftHandle handle;
	int sizes[] = {16, 32, 64};
	int size = sizes[0] * sizes[1] * sizes[2];
	int batch = 2;
	size_t worksize;
	hipMalloc(&in, sizeof(hipfftDoubleComplex) * size * batch);
	hipMalloc(&out, sizeof(hipfftDoubleComplex) * size * batch);
	hipfftCreate(&handle);
	hipfftMakePlanMany(handle, 3, sizes, sizes, batch, 1, sizes, batch, 1, HIPFFT_Z2Z, batch, &worksize);
	hipfftExecZ2Z(handle, in, out, HIPFFT_FORWARD);
	hipDeviceSynchronize();
	hipFree(out);
	hipFree(in);

}

Environment

Hardware	description
GPU	Radeon VII
CPU	AMD Ryzen Threadripper 1920X

Software	version
HIP	4.0.20496-4f163c68
hipFFT	1.0.2.57-be3a15d
rocFFT	1.0.8.966-rocm-rel-4.0-23-2d35fd6
hip-clang	dac2bfceaa8d4a90257dc8a6d58f268e172ce00e

The text was updated successfully, but these errors were encountered:

evetsso · 2021-02-23T03:45:19Z

Thanks for the bug report. It looks like a fix for this might already be coming in the next release but I'll confirm.

evetsso · 2021-02-23T04:46:21Z

@upsj After looking closer at your test program, it looks like you've got some errors in it:

istride and ostride must be 1 for contiguous data. Note that if your data is contiguous, you can pass null pointers for inembed and onembed and hipFFT will choose equivalent defaults.
idist and odist must be the number of elements between batches. For contiguous batches, that's the same as the int size variable in your code.
If you really do have 2 batches of size 16x32x64 each, you must allocate at least 16 *32 *64 * 2 elements for input and output. Non-contiguous data would require additional memory allocated.

This works:

#include "hipfft.h"

int main() {
	hipfftDoubleComplex* in;
	hipfftDoubleComplex* out;
	hipfftHandle handle;
	int sizes[] = {16, 32, 64};
	int size = sizes[0] * sizes[1] * sizes[2];
	int batch = 2;
	size_t worksize;
	hipMalloc(&in, sizeof(hipfftDoubleComplex) * size * batch);
	hipMalloc(&out, sizeof(hipfftDoubleComplex) * size * batch);
	hipfftCreate(&handle);
	hipfftMakePlanMany(handle, 3, sizes, sizes, 1, size, sizes, 1, size, HIPFFT_Z2Z, batch, &worksize);
	hipfftExecZ2Z(handle, in, out, HIPFFT_FORWARD);
	hipDeviceSynchronize();
	hipFree(out);
	hipFree(in);
}

I'm closing this issue - please feel free to comment if you have any questions. We can open this issue or another issue if you run into additional problems.

upsj · 2021-02-23T08:08:14Z

I think you slightly misunderstood my use case - the interleaved, non-contiguous storage is intended, since due to interface consideration, we store the FFT for each batch as a column in a row-major matrix.
Formally, with dimensions (n,m,k) and batch count c, the index of the entry (x,y,z) in batch b is x*s2*s3*c + y*s3*c + z*c + b.
The example I posted is minimized, we encounter the same issue in practice, and the identical invocation with cuFFT works.

evetsso · 2021-02-23T16:41:28Z

Ok, I see. I don't have an immediate solution to your problem but will investigate.

upsj · 2021-02-23T16:43:11Z

That's great to hear, thanks! Just let me know when you have a solution, I will disable the offending tests until then.

evetsso · 2021-02-26T20:26:40Z

f9006e4 fixes this in the develop branch. It should be included in the next release. Please comment/reopen if you still see problems.

evetsso self-assigned this Feb 23, 2021

evetsso closed this as completed Feb 23, 2021

evetsso reopened this Feb 23, 2021

upsj mentioned this issue Feb 25, 2021

Support CUDA-only builds ROCm/hipFFT#2

Closed

evetsso closed this as completed Feb 26, 2021

upsj mentioned this issue Sep 14, 2021

Add FFT LinOp ginkgo-project/ginkgo#701

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batched 3D FFT memory access fault #311

Batched 3D FFT memory access fault #311

upsj commented Feb 23, 2021 •

edited

Loading

evetsso commented Feb 23, 2021

evetsso commented Feb 23, 2021

upsj commented Feb 23, 2021

evetsso commented Feb 23, 2021

upsj commented Feb 23, 2021

evetsso commented Feb 26, 2021

Batched 3D FFT memory access fault #311

Batched 3D FFT memory access fault #311

Comments

upsj commented Feb 23, 2021 • edited Loading

What is the expected behavior

What actually happens

How to reproduce

Environment

evetsso commented Feb 23, 2021

evetsso commented Feb 23, 2021

upsj commented Feb 23, 2021

evetsso commented Feb 23, 2021

upsj commented Feb 23, 2021

evetsso commented Feb 26, 2021

upsj commented Feb 23, 2021 •

edited

Loading