-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable FFTW threading by default (to match up to performance of octave and others) #17000
Comments
Please read: http://docs.julialang.org/en/release-0.4/manual/performance-tips/ |
I had already read both tips. Wrapping fft(R) (which is a single function call) in a second function did nothing to improve performance. I also already ran each timing statement twice -- I just excluded the first, unhelpful, timing result for brevity's sake. |
Cc @stevengj |
Hello, R=rand(5000,5000) Summary: Julia with two threads is about as fast as Octave. Julia with four threads is Versioninfo: GNU Octave, version 3.8.1 Octave was configured for "x86_64-pc-linux-gnu". |
I upgraded to Julia 0.4.5 from the Ubuntu ppa, but the timing results do not change. |
Hmmm. Yesterday, I had thought I was able to reproduce this on a VM, but I must have been mistaken, because attempting it again now on an Ubuntu Google Cloud instances I get the same results that you have shown. |
You are right, I had a typo. It is 0.17s for the 5000x5000 matrix with octave. Does this mean, that the issue can be closed, or is there still a problem on OS X? |
I'm still experiencing the issue on my computer. Is there any additional information I can provide to help diagnose? |
Since this sort of issue has come up a few times, perhaps we should add documentation to the various FFT functions in Julia about what they are comparable to in other languages (Matlab, R, etc.). |
I agree it should be documented better but I don't think that was ever the issue here. (The very first post of this issue uses |
Alright, I found another Mac OS X computer to test this on, but it was quite old, running OS X 10.8.5. for 512x512 matrix for 5000x5000 matrix Here, Julia seems just slightly slower than Octave. Julia version info:
Octave version info: (running an older version because that was all I could get running quickly on OS 10.8.5.)
My computer running OS X 10.11.5 continues to exhibit the order of magnitude performance difference. Can anyone else reproduce this? |
Here's the result of profiling the rfft of a 5000x5000 matrix on my 10.11.5 computer: http://pastebin.com/nmr5Nyvn |
Julia on my MacBook 3x slower than Octave
Maybe of interest, size of libraries in
and `/Applications/Julia-0.4.5.app/Contents/Resources/julia/lib/julia``
Used installation binaries |
Replacing the FFTW libraries from the Julia Mac OS X package with the FFTW libraries from the Octave Mac OS X package fixes the issue. Julia is now faster than Octave.
|
It would seem interesting to know the difference in how the two libraries were compiled. |
The speed of
The above test is just on master version of Julia which is a few days old .
octave:1> R=rand(5000,5000); tic, a=fft2(R); toc
Elapsed time is 0.22766 seconds.
octave:2> R=rand(5000,5000); tic, a=fft2(R); toc
Elapsed time is 0.201882 seconds.
octave:3> R=rand(5000,5000); tic, a=fft2(R); toc
Elapsed time is 0.173801 seconds.
octave:4> ver
----------------------------------------------------------------------
GNU Octave Version 3.8.1
GNU Octave License: GNU General Public License
Operating System: Linux 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015 x86_64
----------------------------------------------------------------------
Package Name | Version | Installation directory
--------------+---------+-----------------------
io | 2.2.9 | /home/guo/octave/io-2.2.9
mpi *| 1.1.1 | /usr/share/octave/packages/mpi-1.1.1
statistics | 1.2.4 | /home/guo/octave/statistics-1.2.4 |
@zhmz90: Could you please first, mention which computer (cpu, clock speed) you use, and second, also test the speed with octave? |
@ufechner7 I have added my test to the above post. The result shows |
@zhmz90: Is it a mac or Linux machine? If Linux, which distribution/ version? |
|
Who maintains the OS X package distribution? |
which one? exactly how did you install julia and which package distribution are you referring to? |
@tkelman Uh, the one displayed very prominently on Julia's web page: http://julialang.org/downloads/ Exact version info is in my first post. I have resolved my personal issue by replacing the libraries with libraries from Octave's distribution, but as @timholy noted, "It would seem interesting to know the difference in how the two libraries were compiled." |
Just wanted to check that you weren't getting it from homebrew or similar. So that build is produced from running a complete source build on our mac buildbots. Our makefile flags for fftw can be found under deps (maybe a handful of related flags in |
I can report similar performance issues on my machine (Mac OS X 10.11.5, 4GHz Intel Core i7). Octave is about 4 times faster than Julia to compute FFTs. As @loganwilliams suggested I copied the fftw libraries from the Octave package and this improved things. But Julia is still about 50% slower than Octave. See below for the results. In Octave (freshly installed from: https://sourceforge.net/projects/octave/files/Octave%20MacOSX%20Binary/2016-06-06-binary-octave-4.0.2/octave_gui_402.dmg/download )
Julia with FFTW shipping with Julia package: https://s3.amazonaws.com/julialang/bin/osx/x64/0.4/julia-0.4.6-osx10.7+.dmg
Julia with FFTW from octave package.
|
If it helps anyone, I had the same issue and I copied the fftw3* library files from julia 0.4.5 to julia 0.4.6 and recovered similar runtime to Matlab. |
I have a related question: isn't Julia starting up only with 1 FFTW thread by default? If so, why is that done when OpenBLAS is made to start with multiple threads? |
On MacOS with Julia master built from source, I get
which looks okay. |
With the official Julia 0.4.6 binary on MacOS, I get
which looks like zero compiler optimizations, which will definitely hurt performance. |
This is the same issue as #17751 (comment). The mac buildbot has a whole bunch of profile scripts that set |
(Here is how FFTW picks its cflags: https://github.com/FFTW/fftw3/blob/master/m4/ax_cc_maxopt.m4) |
Though the lack of optimization flags due to buildbot misconfiguration is a separate issue than the current title of
it is contributing to the difference in performance even when using the same number of threads. |
So a second issue should be created, like "Lack of optimization of FFTW due to buildbot misconfiguration" |
Or just change the title of this issue? |
Is a valid, but separate, issue to the buildbot flags problem. The latter should now be resolved, I believe. |
It is not only MacOS.
|
@GaborOszlanyi, that just means that the codelets are compiled with the same flags as the rest of FFTW, which is not necessarily a problem. Look at |
|
I still would like to create a second issue, because this are two different issues: |
@ufechner7 the second issue was already fixed, and does not require any changes in this repository. |
So issue b) this will be fixed in the next binary releases of 0.4 and 0.5? That would be nice. You write, that fixing the buildbots "does not require any changes in this repository". Is there a separate repository for the buildbots? |
Yep, the main one is wittingly named julia-buildbot |
Should be, if the fix was complete and correct. Once we resolve #18079 it should also be testable with 0.6-dev nightlies. |
As of yesterday the 0.4.6 Linux 64 binaries (julia-2e358ce975) shipped with the slow fftw libraries, which looked the same as the ones in 0.5 binaries. As I noted before, the binaries from 0.4.5 (2ac304d) are good, using those in 0.4.6 buys you a factor 5 speedup. Cheers! |
@tkelman Is this something we can fix in 0.5.x? |
Are you asking about multithreading, or are you asking about the other issue which now has its own #18245? Enabling threading by default is a bit much of a behavior change to backport I think. |
I thought #18245 was referring here for the fix for single-threaded perf. I don't think we should backport threading by default, but we should probably do it on master sooner rather than later for 0.6. |
Let's keep the discussions separate from now on. This issue is titled
and should stay focused on that going forward if we can. |
Just a side question to @tkelman out of curiosity, will FFTW move to a package in favor of other FFT implementation like MKL? is it related to the GPL of FFTW? thanks 😃 |
This issue should be reopened on FFTW.jl |
FFTW integration with julia's partr threads: JuliaMath/FFTW.jl#105 |
I've noticed that Julia is an order of magnitude slower to compute FFTs than GNU Octave. This discrepancy in speed confuses me, given that bought Octave and Julia ought to be calling the same FFTW library. Is this expected?
Times for Julia:
Times for Octave:
After setting
FFTW.set_num_threads=2
, and usingrfft
instead offft
, I saw a small improvement in Julia's performance, but a large discrepancy still remains.I have reproduced this issue on my personal computer (OS X 10.11.5), and on a Google Compute Engine VM running Ubuntu 16.04. Here is my Julia
versioninfo()
:The text was updated successfully, but these errors were encountered: