Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different performance results #2172

Closed
zkhatami88 opened this issue May 20, 2016 · 7 comments
Closed

Different performance results #2172

zkhatami88 opened this issue May 20, 2016 · 7 comments

Comments

@zkhatami88
Copy link
Contributor

zkhatami88 commented May 20, 2016

I ran stream benchmark, which is within https://github.com/STEllAR-GROUP/hpx/blob/4324defe863c07f5d9dd8b1b27f11f0747717bee/tests/performance/local/stream.cpp for this changeset: 4324def. Also I build the same benchmark out of tree. However, these results are not the same.
CMake file for the out of tree build:

cmake_minimum_required(VERSION 2.8)
project(test)

find_package(HPX REQUIRED NO_CMAKE_PACKAGE_REGISTRY)

set(SOURCE_FILES stream.cpp)  

add_hpx_executable(test
        ESSENTIAL
        SOURCES ${SOURCE_FILES}
        COMPONENT_DEPENDENCIES iostreams)

This is the original:

$ ./bin/stream --vector_size=1000000 --stream-threads=8 --stream-numa-domains=2
-------------------------------------------------------------
Modified STREAM bechmark based on
HPX version: V0.9.12-trunk (AGAS: V3.0), Git: 4324defe86
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000 (elements), Offset = 0 (elements)
Memory per array = 7.62939 MiB (= 0.00745058 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 16
Chunking policy requested: default
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 408 microseconds.
   (= 408 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           62713.2     0.000280     0.000255     0.000322
Scale:          59573.7     0.000296     0.000269     0.000309
Add:            64794.3     0.000439     0.000370     0.000531
Triad:          61374.5     0.000458     0.000391     0.000603

Total time: 0.017369 (per iteration: 0.0017369)
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
--------------------------------------------------------------------------------------------------------------------------

And this is for out of tree build:

./test --vector_size=1000000 --stream-threads=8 --stream-numa-domains=2
-------------------------------------------------------------
Modified STREAM bechmark based on
HPX version: V0.9.12-trunk (AGAS: V3.0), Git: 4324defe86
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000 (elements), Offset = 0 (elements)
Memory per array = 7.62939 MiB (= 0.00745058 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 16
Chunking policy requested: default
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 6108 microseconds.
   (= 6108 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            6995.0     0.007351     0.002287     0.010746
Scale:            729.3     0.025018     0.021940     0.029070
Add:              748.1     0.037643     0.032082     0.045145
Triad:            739.3     0.034227     0.032464     0.036396

Total time: 1.08643 (per iteration: 0.108643)
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
@zkhatami88
Copy link
Contributor Author

Based on the results, for the original "Each test below will take on the order of 408 microseconds" but for the out of tree build "Each test below will take on the order of 6108 microseconds".
I dont understand why these two are different while I run both on the same machine with the same parameters and they both link to the same HPX.
Is it because of my CMake?

@hkaiser hkaiser added this to the 0.9.12 milestone May 20, 2016
@hkaiser
Copy link
Member

hkaiser commented May 20, 2016

Theoretically there shouldn't be any difference. I can't think of any reason except that there is some mix-up in the used libraries. It could be something else, however. Could you try to reproduce this on a different machine, please?

@zkhatami88
Copy link
Contributor Author

for the original one, I have:
libhpx.so.0 => /home/zahra/Projects/HPX_Fork_2/build/lib/libhpx.so.0 (0x00007f078a798000)

and for the out of tree, I have:
libhpx.so.0 => /home/zahra/Projects/HPX_Fork_2/build/lib/libhpx.so.0 (0x00007f34af709000)

they linked to the same library.

@biddisco
Copy link
Contributor

double check that you didn't build your test with debug flags. Since most of hpx is headers only, linking to the same library doesn't guarantee much ...

@zkhatami88
Copy link
Contributor Author

@biddisco You were right. Thanks! Problem solved with setting CMAKE_CXX_FLAGS with -O3 -DNDEBUG

@hkaiser
Copy link
Member

hkaiser commented May 20, 2016

@zkhatami88 You should build with cmake -DCMAKE_BUILD_TYPE=Release instead.

@hkaiser hkaiser closed this as completed May 20, 2016
@zkhatami88
Copy link
Contributor Author

@hkaiser yes, Thank you so much for your suggestion. I applied it and it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants