Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GMT 5 much slower than GMT 4? #587

Open
alomax opened this issue Mar 29, 2019 · 19 comments
Open

GMT 5 much slower than GMT 4? #587

alomax opened this issue Mar 29, 2019 · 19 comments
Labels
longterm Long standing issues that need to be resolved

Comments

@alomax
Copy link

alomax commented Mar 29, 2019

Equivalent command line scripts take more than twice as long to run in GMT 5 than in GMT 4

We use GMT extensively for maps, earthquake mechanisms and other plots in a time-sensitive, real-time system: http://early-est.alomax.net

The latest versions of our software using GMT 5 are broken because the plotting and reporting module does not complete fast enough. We suspect this may be primarily due to much the longer processing time needed for command line GMT 5 than for GMT 4:

~/temp> gmt --version
GMT Version 4.5.7
~/temp> time ./test_GMT4.bash 
real	0m0.212s
user	0m0.146s
sys	0m0.050s
~/temp> time ./test_GMT4.bash 
real	0m0.194s
user	0m0.136s
sys	0m0.043s
~/temp> time ./test_GMT4.bash 
real	0m0.244s
user	0m0.139s
sys	0m0.044s
~/temp> time ./test_GMT4.bash 
real	0m0.190s
user	0m0.132s
sys	0m0.043s

~/temp> gmt --version
5.4.5
~/temp> time test_GMT5.bash 
real	0m0.455s
user	0m0.597s
sys	0m0.450s
~/temp> time test_GMT5.bash 
real	0m0.477s
user	0m0.530s
sys	0m0.500s
~/temp> time test_GMT5.bash 
real	0m0.415s
user	0m0.703s
sys	0m0.366s
~/temp> time test_GMT5.bash 
real	0m0.420s
user	0m0.643s
sys	0m0.404s

Full scripts that show the difference in run time:

https://www.dropbox.com/s/0asx13o1ahiz1dr/test_GMT4.bash?dl=0
https://www.dropbox.com/s/cz0acpj4cueyatz/test_GMT5.bash?dl=0

System information

  • Operating system:
    System Version: OS X 10.10.5 (14F2511)
    Kernel Version: Darwin 14.5.0

  • Hardware:
    Model Name: iMac
    Model Identifier: iMac14,2
    Processor Name: Intel Core i7
    Processor Speed: 3.5 GHz
    Number of Processors: 1
    Total Number of Cores: 4
    L2 Cache (per Core): 256 KB
    L3 Cache: 8 MB
    Memory: 16 GB

  • Version of GMT: 4.5.7 and 5.4.5

@welcome
Copy link

welcome bot commented Mar 29, 2019

👋 Thanks for opening your first issue here! Please make sure you filled out the template with as much detail as possible. We appreciate that you took the time to contribute!

Please make sure you read our Contributing Guide and abide by our Code of Conduct.

@PaulWessel
Copy link
Member

Hi @alomax, you are right that GMT5 is slower than GMT4 for many tasks, especially table i/o. The key reason for this is that the entire i/o machinery was replaced to enable us to extend GMT to other platforms such as MATLAB, Python, Julia and more via the API. As such, the API is a bit immature and has room for optimizing. It would be helpful to see your actual scripts for the jobs that are timing out as we may be able to offer some advice for the short-term, and hopefully we can also determine which are the most critical bottlenecks. For instance, if you have fixed areas and you are running pscoast, consider running that once, store the PostScript overlay, and simply cat it in place as needed instead of re-running the same command, etc.

@joa-quim
Copy link
Member

But it's nevertheless strange that the script is made up of basically pscoast calls (+ psscale). why would this be slower?

@PaulWessel
Copy link
Member

Yes, those are both unaffected by the i/o changes so perhaps we need to run a time and eventually profiling of the calls.

@anbj
Copy link
Contributor

anbj commented Mar 30, 2019

Not really comparable, but maybe a relevant footnote; http://gmt.soest.hawaii.edu/boards/1/topics/7996.

@alomax
Copy link
Author

alomax commented Apr 1, 2019

Thanks for the extensive comments @PaulWessel, and also @joa-quim and @abanbj

Here are the scripts, but the ensemble is a monster! https://www.dropbox.com/s/8gsszu86ts4k9m9/EarlyEest_GMT5_scripts.tgz?dl=0

We do have a fixed, world map, but for each earthquake detected there is a regional map with center at the current earthquake epicenter.

Is perhaps the main increase in execution time GMT5 vs GMT4 due to the creation and management of a new session GMT_Create_Session() for each gmt command called from the command line in a script?
If so, is the absolute, most efficient way to run GMT5+ to use a single, compiled C driver program which calls all gmt "commands" as module functions through the C API?

Thanks,
Anthony

@joa-quim
Copy link
Member

joa-quim commented Apr 1, 2019

Don't think the GMT_Create_Session() is to blame here. I made a quick benchmark in Julia. Just created functions with GMT_Create_Session() and destroy.

function sess1()
	API = GMT.GMT_Create_Session("GMT", 2, GMT.GMT_SESSION_NOEXIT + GMT.GMT_SESSION_EXTERNAL + GMT.GMT_SESSION_COLMAJOR)
end


function sess2()
	API = GMT.GMT_Create_Session("GMT", 2, GMT.GMT_SESSION_NOEXIT + GMT.GMT_SESSION_EXTERNAL + GMT.GMT_SESSION_COLMAJOR)
	gmt("destroy") 
end

and timings are

julia> @btime sess2()
  23.003 ms (11 allocations: 416 bytes)

julia> @btime sess1()
  10.613 ms (0 allocations: 0 bytes)

So, actually, destroy takes a little longer than create. And these tests were done with a GMT debug build (if that even matter for these tests).

Ah, and remind that these timings include the Julia parsing too, though they presumably very small.

@alomax
Copy link
Author

alomax commented Apr 1, 2019

@joa-quim I was wondering if the whole session architecture and management overhead is the main difference GMT 5 vs GMT 4 when run as individual commands from the shell. Not just the initial GMT_Create_Session() call.

@joa-quim
Copy link
Member

joa-quim commented Apr 1, 2019

Yes, not impossible, but in that case a single, tailored, executable with all the commands for your needs wouldn't be substantially faster since it would use the same architecture anyway.

@alomax
Copy link
Author

alomax commented Apr 1, 2019

But in GMT 5 it looks like each call to gmt <module> through the command line sets up a new session (and presumably manages it and cleans up / destroys later), and then calls the module function.
line 93-: https://github.com/GenericMappingTools/gmt/blob/master/src/gmt.c

While doing all in a single program with the new API would only set up one session.

@joa-quim
Copy link
Member

joa-quim commented Apr 1, 2019

Yes, and that is why I tested the timings for creating and deleting new sessions. Aside from the time consumed in in creating/deleting new sessions, both the single exe and the multiple commands solutions should spend the same time in managing the sessions. Off course I may be wrong and nothing like testing.

BTW, maintaining a single session is what the Julia wrapper does for the GMT scripts, but one has to pay also the parsing time plus some memory copies that sometimes must be made.

@joa-quim
Copy link
Member

joa-quim commented Apr 1, 2019

A bit intriguing. psscale -Z is way slower that discrete.

disc = makecpt("-Crainbow -T-20/60/10");
cont = makecpt("-Crainbow -T-20/60/10 -Z");

tic()
psscale(D="x1i/5i+w4i/0.5i+h", B="+tdiscrete", I=0.5, C=disc)
toc()
tic()
psscale(D="x1i/7i+w4i/0.5i+h", B="+tcontinuous", I=0.5, C=cont)
toc()

timings vary a bit but the relation is close to this

elapsed time: 0.003415336 seconds
elapsed time: 0.152392377 seconds

@alomax
Copy link
Author

alomax commented Apr 2, 2019

But in GMT 5 it looks like each call to gmt <module> through the command line sets up a new session (and presumably manages it and cleans up / destroys later), and then calls the module function.
line 93-: https://github.com/GenericMappingTools/gmt/blob/master/src/gmt.c

While doing all in a single program with the new API would only set up one session.

@PaulWessel
Copy link
Member

I would think the i/o is where the slowness is but we will need to have a look. If you scripts do a bunch of GMT calls building a plot then it would be possible to write a C program that just called Create, a bunch of GMT_Call_Module with strings, and GMT_Destroy. So you would save all those Create/Destroy times which certainly takes some time. But I don't want to send you down that path with the promise of speedup without doing some more testing.

@alomax
Copy link
Author

alomax commented Apr 3, 2019

Yes, it may be that a combination of i/o and Create/Destroy in many gmt calls is a major cause of slowness. In our scripts, there is a set of many psxy calls which shows up in timing tests as a relatively large user of time:

# plot requested and active background colors
gmt psxy ${PLOT_DATA_PATH}/sta.requested.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL[0]} -W0.2,0/0/0 -G${STA_REQUESTED_COLOR}  -O -K >> ${PS_NAME}
gmt psxy ${PLOT_DATA_PATH}/sta.healthy.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL[0]} -W0.2,0/0/0 -G${STA_ACTIVE_COLOR}  -O -K >> ${PS_NAME}
# plot available warning levels - t50
gmt psxy ${PLOT_DATA_PATH}/t50.sta.ltgreen.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL_LOWER_RIGHT} -G127/191/127  -O -K >> ${PS_NAME}
gmt psxy ${PLOT_DATA_PATH}/t50.sta.ltyellow.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL_LOWER_RIGHT} -G191/191/127  -O -K >> ${PS_NAME}
gmt psxy ${PLOT_DATA_PATH}/t50.sta.ltred.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL_LOWER_RIGHT} -G191/127/127  -O -K >> ${PS_NAME}
gmt psxy ${PLOT_DATA_PATH}/t50.sta.green.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL_LOWER_RIGHT} -G0/255/0  -O -K >> ${PS_NAME}
gmt psxy ${PLOT_DATA_PATH}/t50.sta.yellow.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL_LOWER_RIGHT} -G255/255/0  -O -K >> ${PS_NAME}
gmt psxy ${PLOT_DATA_PATH}/t50.sta.red.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL_LOWER_RIGHT} -G255/0/0  -O -K >> ${PS_NAME}
# plot available warning levels - t50
gmt psxy ${PLOT_DATA_PATH}/tauc.sta.ltgreen.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL_UPPER_LEFT} -G127/191/127  -O -K >> ${PS_NAME}
gmt psxy ${PLOT_DATA_PATH}/tauc.sta.ltyellow.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL_UPPER_LEFT} -G191/191/127  -O -K >> ${PS_NAME}
gmt psxy ${PLOT_DATA_PATH}/tauc.sta.ltred.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL_UPPER_LEFT} -G191/127/127  -O -K >> ${PS_NAME}
gmt psxy ${PLOT_DATA_PATH}/tauc.sta.green.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL_UPPER_LEFT} -G0/255/0  -O -K >> ${PS_NAME}
gmt psxy ${PLOT_DATA_PATH}/tauc.sta.yellow.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL_UPPER_LEFT} -G255/255/0  -O -K >> ${PS_NAME}
gmt psxy ${PLOT_DATA_PATH}/tauc.sta.red.xy ${R_VAL} ${J_VAL} -: ${N_VAL} ${SYMBOL_UPPER_LEFT} -G255/0/0  -O -K >> ${PS_NAME}

It does seem likely that an envelope C program instead of bash script would be (much) more efficient. But I will wait for more testing proof. And we do not currently have €€€€ for such development (€) and testing (€€€) of new envelope C programs...

@stale
Copy link

stale bot commented Jul 2, 2019

This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions.

@stale stale bot added the stale This will not be worked on label Jul 2, 2019
@joa-quim joa-quim removed the stale This will not be worked on label Jul 3, 2019
@stale
Copy link

stale bot commented Oct 1, 2019

This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions.

@stale stale bot added the stale This will not be worked on label Oct 1, 2019
@seisman
Copy link
Member

seisman commented Oct 2, 2019

It seems there is nothing we can do here. Closing?

@stale stale bot removed the stale This will not be worked on label Oct 2, 2019
@PaulWessel PaulWessel added the longterm Long standing issues that need to be resolved label Oct 2, 2019
@PaulWessel
Copy link
Member

No, simply have not had time to investigate - lets keep it alive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
longterm Long standing issues that need to be resolved
Projects
None yet
Development

No branches or pull requests

5 participants