-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batched 3D FFT memory access fault #311
Comments
Thanks for the bug report. It looks like a fix for this might already be coming in the next release but I'll confirm. |
@upsj After looking closer at your test program, it looks like you've got some errors in it:
This works: #include "hipfft.h"
int main() {
hipfftDoubleComplex* in;
hipfftDoubleComplex* out;
hipfftHandle handle;
int sizes[] = {16, 32, 64};
int size = sizes[0] * sizes[1] * sizes[2];
int batch = 2;
size_t worksize;
hipMalloc(&in, sizeof(hipfftDoubleComplex) * size * batch);
hipMalloc(&out, sizeof(hipfftDoubleComplex) * size * batch);
hipfftCreate(&handle);
hipfftMakePlanMany(handle, 3, sizes, sizes, 1, size, sizes, 1, size, HIPFFT_Z2Z, batch, &worksize);
hipfftExecZ2Z(handle, in, out, HIPFFT_FORWARD);
hipDeviceSynchronize();
hipFree(out);
hipFree(in);
} I'm closing this issue - please feel free to comment if you have any questions. We can open this issue or another issue if you run into additional problems. |
I think you slightly misunderstood my use case - the interleaved, non-contiguous storage is intended, since due to interface consideration, we store the FFT for each batch as a column in a row-major matrix. |
Ok, I see. I don't have an immediate solution to your problem but will investigate. |
That's great to hear, thanks! Just let me know when you have a solution, I will disable the offending tests until then. |
f9006e4 fixes this in the develop branch. It should be included in the next release. Please comment/reopen if you still see problems. |
I am working on an hipFFT wrapper for an HPC library, where I basically hipified our cuFFT wrapper code 1:1 (only replacing 64 bit calls by 32 bit calls). The CUDA and HIP-CUDA tests work and give correct results, only when compiling everything with rocFFT on an AMD device do I get a memory access fault. My guess would be that this is related to the "interleaved batch" memory layout we are using, where the innermost dimension is the batch dimension, followed by the 3 FFT dimensions.
Let me know if you need any additional information
What is the expected behavior
What actually happens
Memory access fault by GPU node-1 (Agent handle: ...) on address .... Reason: Page not present or supervisor privilege.
How to reproduce
batch
to 1 works correctly)Environment
The text was updated successfully, but these errors were encountered: