Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VERDI_2.1.3 run time issues when creating and saving png of tileplot for each timestep. #296

Open
lizadams opened this issue Apr 7, 2022 · 2 comments
Assignees

Comments

@lizadams
Copy link
Contributor

lizadams commented Apr 7, 2022

Describe the bug
A user (Roger Kwok) reported the following issue:

Following CMAS’s advice, we downloaded and installed VERDI 2.1.3.

I ran a C-shell batch script calling VERDI to create hourly spatial plots out of a 25-hour gridded netcdf file.

The runtimes in VERDI 2.1.3 are considerably longer than version 2.1, as below.

 

  usr total (sec) sys total (sec) Wall clock total (min) Model Model and script locations (see tabs for full paths)
Case 1 799.4 69.8 20.5 Verdi2.1.3 Home; home
Case 2 811.2 71.0 23.3 Verdi2.1.3 Home; work
Case 3 1013.6 106.5 29.9 Verdi2.1.3 Runtime; runtime
Case 4 409.6 31.9 3.7 Verdi2.1 Home; home

 

Cases 1 through 3 run 2.1.3 with settings differing from locations of the model and the batch script.

Case 4 uses version 2.1.

I also noticed that the latest version takes up 21.0 GB while 2.1 occupies just 1.5 GB.

Our entire branch are switching to 2.1.3 nevertheless. I wonder, though, whether there are ways to reduce the runtime of it.

To Reproduce
Steps taken to reproduce the behavior:
The batch script, netcdf and config file are available in the following google drive.

https://drive.google.com/drive/folders/1ORTuAqEIjZUoP6Lz11fP7yDu9Qlpmg6S

Download and extract VERDI_2.1.3_linux64_20220106.tar.gz from the CMAS Center website to a remote machine (Azure CycleCloud)
Download and extract VERDI_2.1_linux64_20210920.tar.gz from the CMAS Center website to a remote machine (Azure CycleCloud)

Observed behavior
Modify the Batch script to use local paths on the installed machine and run the script for both VERDI_2.1.3 and VERDI_2.1

Observed that VERDI_2.1.3 is creating a splash screen when this script is run, whereas VERDI_2.1 is not. If this is displaying across a slow network, this will increase the time it takes to create each plot.

Here are the timings that I observed for VERDI_2.1
usr,sys,min,sec,pcent,zzk,io,pfw
18.162u,0.802s,0,12.40,152.9%,0+0k,0+120io,0pf+0w
18.790u,0.777s,0,11.98,163.2%,0+0k,0+112io,0pf+0w
18.113u,0.822s,0,09.73,194.5%,0+0k,0+112io,0pf+0w
17.942u,0.801s,0,09.71,192.9%,0+0k,0+120io,0pf+0w
17.891u,0.807s,0,11.89,157.1%,0+0k,0+112io,0pf+0w
17.790u,0.812s,0,09.84,189.0%,0+0k,0+112io,0pf+0w
18.038u,0.823s,0,10.01,188.3%,0+0k,56+120io,3pf+0w
18.494u,0.827s,0,09.88,195.4%,0+0k,344+112io,1pf+0w
18.383u,0.754s,0,11.07,172.8%,0+0k,0+112io,0pf+0w
18.227u,0.795s,0,10.12,187.8%,0+0k,0+112io,0pf+0w
17.693u,0.800s,0,09.62,192.2%,0+0k,0+120io,0pf+0w
17.451u,0.763s,0,09.42,193.3%,0+0k,0+112io,0pf+0w
17.533u,0.751s,0,09.28,196.9%,0+0k,0+112io,0pf+0w
17.685u,0.829s,0,10.66,173.5%,0+0k,0+112io,0pf+0w
18.035u,0.777s,0,09.99,188.1%,0+0k,0+120io,0pf+0w
17.547u,0.805s,0,09.94,184.5%,0+0k,0+112io,0pf+0w
17.699u,0.772s,0,09.96,185.3%,0+0k,0+112io,0pf+0w
17.635u,0.803s,0,10.01,184.1%,0+0k,0+112io,0pf+0w
18.555u,0.816s,0,10.12,191.3%,0+0k,0+120io,0pf+0w
18.225u,0.838s,0,09.49,200.7%,0+0k,0+112io,0pf+0w
17.556u,0.787s,0,09.30,197.0%,0+0k,0+112io,0pf+0w
17.249u,0.773s,0,10.26,175.5%,0+0k,0+112io,0pf+0w
17.522u,0.834s,0,09.76,188.0%,0+0k,0+120io,0pf+0w
17.120u,0.773s,0,11.98,149.3%,0+0k,0+112io,0pf+0w
sum of user time is 429.33 (this is the number I used to compare to the values in column 2 of the table above that were reported by Roger Kwok.)

Here is the timing I observed for VERDI_2.1.3 using xvfb-run.
xvfb-run allows you to create plots by using display on the machine that VERDI is run on, rather than displaying back to a local machine. (see install instructions below)
usr,sys,min,sec,pcent,zzk,io,pfw
21.798u,0.936s,0,07.28,312.0%,0+0k,32+120io,0pf+0w
22.231u,0.998s,0,08.13,285.6%,0+0k,0+112io,0pf+0w
22.615u,0.901s,0,07.60,309.3%,0+0k,0+112io,0pf+0w
22.056u,0.906s,0,08.16,281.2%,0+0k,0+112io,0pf+0w
21.249u,0.863s,0,07.28,303.5%,0+0k,0+120io,0pf+0w
22.482u,0.993s,0,08.30,282.7%,0+0k,0+112io,0pf+0w
22.708u,0.905s,0,08.31,283.9%,0+0k,0+112io,0pf+0w
21.393u,0.964s,0,07.59,294.4%,0+0k,0+112io,0pf+0w
22.757u,0.902s,0,08.26,286.3%,0+0k,0+112io,0pf+0w
22.811u,0.910s,0,08.27,286.8%,0+0k,0+120io,0pf+0w
23.414u,0.885s,0,08.36,290.5%,0+0k,0+112io,0pf+0w
22.499u,0.922s,0,08.10,289.0%,0+0k,0+112io,0pf+0w
22.340u,0.944s,0,08.02,290.2%,0+0k,0+112io,0pf+0w
21.674u,0.937s,0,07.44,303.7%,0+0k,0+112io,0pf+0w
22.396u,0.900s,0,08.01,290.7%,0+0k,0+120io,0pf+0w
21.469u,0.969s,0,07.48,299.7%,0+0k,0+120io,0pf+0w
22.475u,0.922s,0,08.01,292.0%,0+0k,0+112io,0pf+0w
22.781u,0.866s,0,08.07,292.9%,0+0k,0+176io,0pf+0w
21.669u,0.933s,0,07.42,304.4%,0+0k,0+112io,0pf+0w
21.371u,1.075s,0,07.79,288.0%,0+0k,0+120io,0pf+0w
21.926u,1.150s,0,07.98,289.0%,0+0k,0+112io,0pf+0w
21.342u,1.749s,0,08.02,287.7%,0+0k,0+112io,0pf+0w
22.444u,1.223s,0,08.45,280.0%,0+0k,0+112io,0pf+0w
21.837u,0.841s,0,07.27,311.8%,0+0k,0+120io,0pf+0w
sum 531.7

To install xvfb used the following:

sudo yum install Xvfb

Then ran the script using

xvfb-run ./Batch_verdi_NOx_hrly.csh

Here are the timings for VERDI_2.1.3 displaying back to local machine.
usr,sys,min,sec,pcent,zzk,io,pfw
24.648u,1.095s,0,15.41,166.9%,0+0k,6320+120io,4pf+0w
22.564u,0.812s,0,10.81,216.1%,0+0k,0+112io,0pf+0w
23.066u,0.814s,0,10.95,217.9%,0+0k,0+112io,0pf+0w
22.709u,0.836s,0,10.97,214.4%,0+0k,0+120io,0pf+0w
23.834u,0.945s,0,11.26,219.9%,0+0k,0+112io,0pf+0w
21.841u,0.833s,0,10.39,218.1%,0+0k,0+112io,0pf+0w
24.341u,1.155s,0,11.17,228.2%,0+0k,0+120io,0pf+0w
22.407u,0.840s,0,11.08,209.7%,0+0k,0+120io,0pf+0w
22.835u,0.951s,0,12.14,195.8%,0+0k,0+120io,0pf+0w
22.883u,0.944s,0,12.07,197.3%,0+0k,0+120io,0pf+0w
22.328u,0.877s,0,10.99,211.0%,0+0k,0+128io,0pf+0w
22.647u,0.910s,0,11.14,211.4%,0+0k,0+112io,0pf+0w
22.224u,0.913s,0,10.43,221.7%,0+0k,0+112io,0pf+0w
24.262u,0.988s,0,11.44,220.6%,0+0k,0+112io,0pf+0w
21.695u,1.087s,0,12.89,176.6%,0+0k,0+120io,0pf+0w
23.324u,0.881s,0,12.35,195.9%,0+0k,0+112io,0pf+0w
22.249u,0.898s,0,11.74,197.0%,0+0k,0+120io,0pf+0w
21.077u,0.820s,0,10.08,217.1%,0+0k,0+112io,0pf+0w
22.025u,0.938s,0,10.93,209.9%,0+0k,0+112io,0pf+0w
21.465u,0.860s,0,11.97,186.4%,0+0k,0+120io,0pf+0w
23.580u,1.000s,0,11.21,219.2%,0+0k,0+120io,0pf+0w
21.177u,0.869s,0,11.02,199.9%,0+0k,0+112io,0pf+0w
22.739u,0.870s,0,11.04,213.7%,0+0k,0+112io,0pf+0w
22.899u,0.902s,0,11.68,203.6%,0+0k,0+120io,0pf+0w
sum is 544.819

see more from: http://elementalselenium.com/tips/38-headless

Running VERDI_2.1.3 displaying the splashscreen back to my local machine gives worse performance.

Desktop (please complete the following information):

  • OS: Linux

Suggested resolution
Fix VERDI so that it does not display the splash screen when running in a script.

Another alternative is to use the batch script method, but doing this shows a memory leak message
OpenJDK 64-Bit Server VM warning: Option --illegal-access is deprecated and will be removed in a future release.
Apr 07, 2022 4:36:08 PM org.geotools.map.MapContent finalize
SEVERE: Call MapContent dispose() to prevent memory leaks
Apr 07, 2022 4:36:20 PM org.geotools.map.MapContent finalize
SEVERE: Call MapContent dispose() to prevent memory leaks

Perhaps this memory leak issue is what is also causing the poor run times.

The batch script is not ideal, as you need to specify a task for each hour, or create a c-shell script to iterate over the hours and calling the batch script.

Batch script:

                ####################################################################################
                # NOTE: Batch Scripting Language
                #
                #    * All parameter/value pairs should be inside one of the two blocks --
                #        <Global/> or <Task/>
                #    * Number of blocks is not limited
                #    * Only one <Global/> block is recommended. <Global/> blocks should
                #        contains different items if use multiple <Global/> blocks
                #    * Parameter values in <Task/> blocks will override those in <Global/>
                #    * Currently supported parameters (keys, case insensitive):
                #        configFile    -- configuration file full path
                #        f             -- dataset file path/name
                #        dir           -- dataset file folder
                #        pattern       -- dataset file name pattern
                #        gtype         -- plot type (tile, line, bar, vector)
                #        vector        -- vector plot variables
                #        vectorTile    -- vector plot variables
                #        s             -- variable name
                #        ts            -- time step
                #        titleString   -- plot title
                #        subdomain     -- xmin ymin xmax ymax
                #        subTitle1     -- plot subtitle one
                #        subTitle2     -- plot subtitle two
                #        saveImage     -- image file type (png, jpeg, eps, etc.)
                #        imageFile     -- image file path/name
                #        imageDir      -- image file folder
                #        drawGridLines -- draw grid lines on the tile plot if 'yes'
                #        imageWidth    -- image width
                #        imageHeight   -- image height
                #        unitString    -- units
                #
                #
                # Author: IE, UNC at Chapel Hill
                # Date: 10/27/2014
                # Version: 2
               #############################################################################


		<Global>
		configFile=/shared/build/Linear8colors.txt
		dir=/shared/build/
		pattern=*Gridded.hourly.point.20190101.1.4km.St_4k_by19fy19_SIP2019_C3.ncf
		s=NO[1]+NO2[1]
		imageDir=/shared/build
		</Global>

		#######################
		# Data files picked   #
		# up from patterns    #
		# specified for names #
		#######################

               <Task>
                ts=1
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190001-th_hr.batch.png
                </Task>

                <Task>
                ts=2
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190002-th_hr.batch.png
                </Task>

                <Task>
                ts=3
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190003-th_hr.batch.png
                </Task>

                <Task>
                ts=4
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190004-th_hr.batch.png
                </Task>

                <Task>
                ts=5
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190005-th_hr.batch.png
                </Task>

                <Task>
                ts=6
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190006-th_hr.batch.png
                </Task>

                <Task>
                ts=7
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190007-th_hr.batch.png
                </Task>

                <Task>
                ts=8
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190008-th_hr.batch.png
                </Task>

                <Task>
                ts=9
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190009-th_hr.batch.png
                </Task>

		
                <Task>
                ts=10
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190010-th_hr.batch.png
                </Task>

                <Task>
                ts=11
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190011-th_hr.batch.png
                </Task>

                <Task>
                ts=12
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190012-th_hr.batch.png
                </Task>

                <Task>
                ts=13
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190013-th_hr.batch.png
                </Task>

                <Task>
                ts=14
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190014-th_hr.batch.png
                </Task>

                <Task>
                ts=15
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190015-th_hr.batch.png
                </Task>

                <Task>
                ts=16
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190016-th_hr.batch.png
                </Task>

                <Task>
                ts=17
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190017-th_hr.batch.png
                </Task>

                <Task>
                ts=18
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190018-th_hr.batch.png
                </Task>

                <Task>
                ts=19
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190019-th_hr.batch.png
               </Task>

                <Task>
                ts=20
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190020-th_hr.batch.png
                </Task>

                <Task>
                ts=21
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190021-th_hr.batch.png
                </Task>

                <Task>
                ts=22
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190022-th_hr.batch.png
                </Task>

                <Task>
                ts=23
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190023-th_hr.batch.png
                </Task>

                <Task>
                ts=24
                gtype=tile
                saveImage=png
                imageFile=EpaC3_NOx_tile_20190024-th_hr.batch.png
                </Task>

time ../../verdi.sh -batch $cwd/O3_tileplot_png.txt

VERDI_2.1.3 time to create all 24 plots:
308.726u 4.647s 3:01.56 172.5% 0+0k 0+1392io 0pf+0w

time ../../../VERDI_2.1/verdi.sh -batch $cwd/O3_tileplot_png.txt
OpenJDK 64-Bit Server VM warning: Option --illegal-access is deprecated and will be removed in a future release.
Apr 07, 2022 4:42:23 PM org.geotools.map.MapContent finalize
SEVERE: Call MapContent dispose() to prevent memory leaks
Apr 07, 2022 4:42:35 PM org.geotools.map.MapContent finalize
..

VERDI_2.1 time to create all 24 plots:
298.966u 4.176s 3:01.07 167.4% 0+0k 661072+1392io 59pf+0w

There is not much difference in the performance of VERDI_2.1 and VERDI_2.1.3. However, I do think there is a memory leak issue that may be impacting performance for both versions.

Uploaded a copy of the Batch script that was created by Liz to try the verdi.sh -batch method.
Will need to be edited to update paths.
O3_tileplot_png.txt

@lizadams
Copy link
Contributor Author

lizadams commented Apr 7, 2022

Used top while the batch script method (verdi.sh -batch ) is running and that seems to indicate that VERDI 2.1.3 has a memory leak, with memory usage that continues to grow until the program finishes.
VERDI_2 1 3_top

Roger's method of running verdi from the command line doesn't use up as much memory.
./Batch_verdi_NOx_hrly.csh

VERDI_2 1 3_top_commandline

Using xvfb-run with the command line script method appears to be the best option.
xvfb-run $cwd/Batch_verdi_NOx_hrly.csh

To use xvfb-run

Need to have sudo permissions to install on your remote machine.
Use one of the following commands (ubuntu or redhat):

sudo apt-get install xvfb

or

sudo yum install Xvfb

Then change directory to where your script is installed and try:

xvfb-run $cwd/Batch_verdi_NOx_hrly.csh

@lizadams
Copy link
Contributor Author

lizadams commented Apr 8, 2022

Follow-up email from Roger.

With the help of Sarika, I got the Xvfb installed. The fix you suggested is working indeed. With Xvfb, the runtimes have been shortened to be comparable with that of Verdi 2.1:

 

    usr total (sec) sys total (sec) Wall clock total (min) Model Model and script locations (see tabs for full paths)
  Case 4 409.6 31.9 3.7 Verdi2.1 Home; home
  Case 5 551.0 41.7 3.9 Verdi2.1.3 Home; home
  Case 6 329.3 27.1 7.2 Verdi2.1.3 Home; home
  Case 7 260.1 20.8 6.0 Verdi2.1.3 Home; home

Script for Case5: Xvfb_verdi_Nox_hrly.csh |  
Scripts for Case6: SLURM bash shell script calls the Case5 run script on a BIGMEM node |  
Scripts for Case7: SLURM bash shell script calls the Case5 run script but on an AMS node

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants