Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault in computing 1D FFT for different random sizes of arrays. #224

Closed
Pranay-Reddy-Kommera opened this issue Jul 29, 2019 · 12 comments
Assignees

Comments

@Pranay-Reddy-Kommera
Copy link

Pranay-Reddy-Kommera commented Jul 29, 2019

Code

What is the expected behavior

  • The code works as expected for any size of the array (for any value of N in the code).

What actually happens

  • The code works only for few random values of N and produce segmentation fault for other values of N. Below is the error information from the gdb.
Program received signal SIGSEGV, Segmentation fault.
(gdb) where
 #0  0x00002aaaac890dd4 in TransformPowX(ExecPlan const&, void**, void**, rocfft_execution_info_t*) () from /opt/rocm/lib/librocfft.so.0
#1  0x00002aaaac889d30 in rocfft_execute () from /opt/rocm/lib/librocfft.so.0
#2  0x0000000000400f74 in main () at hipsample.cpp:38

How to reproduce

  • Use the same code from the README.md of the link https://github.com/ROCmSoftwarePlatform/rocFFT and change the values of N. The code works as expected for N value of 16, 20, 25 and many more. And the code produces segmentation fault for N value of 21, 22, 23 and many more.

Hardware and library versions

  • CPU Device: AMD_EPYC_7551_32-Core_Processor
  • GPU Device: Radeon_MI25
  • Hip version: HIP_VERSION_MAJOR=1; HIP_VERSION_MINOR=5; HIP_VERSION_PATCH=19211
  • rocFFT Version: 0.9.3.0

Compilation and execution commands

export HIP_PLATFORM=hcc
hipcc -std=c++11 -O3 -g -c hipsample.cpp -lrocfft
hipcc -std=c++11 -O3 -g -o gpuCuda hipsample.o -lrocfft
./gpuCuda

Comments

  • I am not sure if their is any issue with my environment or what could be the reason for the error.
@dmcdougall
Copy link

Pranay emailed me personally as well to report this issue. I am trying locally to see if I can recreate the problem.

@malcolmroberts
Copy link
Contributor

This may have been fixed in PR 222; which is commit 923339e . Could you check that an up-to-date develop branch resolves this issue?

@dmcdougall
Copy link

I can reproduce the segfault with rocfft v0.9.4.0.

@malcolmroberts I will check with your commit hash.

@malcolmroberts
Copy link
Contributor

For reproduction, this didn't show up with the rocm 2.6 compiler; the first time that this showed up was with 2.7, and possibly also with hip-clang.

@dmcdougall
Copy link

Hmmm, I still get a segfault with 923339e. Are you saying that I also need to use a newer compiler too? I'm using hip_hcc v1.5.19255 from the radeon repository.

@dmcdougall
Copy link

Oops, I meant 'older', not 'newer'.

@malcolmroberts
Copy link
Contributor

@dmcdougall : PR202 dealt with an memory issue with compilers in 2.7; this issue didn't show up with the 2.6 compilers. You might need the 2.7 compilers in order to generate a segfault, assuming that this is the same issue. Sounds like this isn't a problem that you are having when trying to reproduce the issue.

One thing to check is if you have rocFFT already installed, then the HIP compilers will look first on /opt/rocm/... when linking (even if you do not add this to LD_LIBRARY_PATH). So one must remove all rocFFT libs in /opt/rocm, either by just deleting them or via package manager. One can check which versions are being used by looking at the output of "ldd gpuCuda".

@malcolmroberts malcolmroberts self-assigned this Jul 30, 2019
@dmcdougall
Copy link

You might need the 2.7 compilers in order to generate a segfault, assuming that this is the same issue. Sounds like this isn't a problem that you are having when trying to reproduce the issue.

With ROCm 2.6 and a custom build of rocfft (the tip of develop), I can still reproduce the segfault. So I think you're right about it not being related to the memory issues with 2.7 compilers.

One thing to check is if you have rocFFT already installed, then the HIP compilers will look first on /opt/rocm/... when linking (even if you do not add this to LD_LIBRARY_PATH).

Yep. I discovered that the hard way :)

So one must remove all rocFFT libs in /opt/rocm, either by just deleting them or via package manager. One can check which versions are being used by looking at the output of "ldd gpuCuda"

That's exactly what I did.

@malcolmroberts
Copy link
Contributor

Thanks for checking the linking; looks like this isn't caused by the issue resolved by PR202.

I've been able to reproduce this issue on my local machine using the latest version.

I think that the problem is due to the fact that transforms for general sizes must be performed using the Bluestein algorithm, and this requires extra work memory. We use this method when the problem size is not composed of factors of 2, 3, and 5. When we use Bluestein and the work memory is not passed to rocfft_execute via a rocfft_execution_info structure, the program segfaults. When one passes the work memory, execution succeeds. A working example for this can be found in docs/samples/complex_1d.cpp .

I'll look into improving the documentation and providing the user with better feedback about work memory.

Please let me know if this resolves the issue.

@Pranay-Reddy-Kommera
Copy link
Author

@malcolmroberts Thank you for your suggestions. I will try to use rocfft_execution_info and pass the info to rocfft_execute. Will update in the comments if this resolves the issue or not.

@Pranay-Reddy-Kommera
Copy link
Author

@malcolmroberts @dmcdougall The use of work memory via a rocfft_execution_info has resolved the segfault. I am able to execute the code and obtain expected results for any size of the array.

Thanks!

@malcolmroberts
Copy link
Contributor

Good to hear!

We'll work on improving the interface and documentation so that this is easier to get working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants