Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of hisfile netcdf3 vs netcdf4 with different chunks #583

Open
veenstrajelmer opened this issue Oct 13, 2023 · 0 comments
Open

Comments

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Oct 13, 2023

Since DIMRset 2.26.14 (7 feb 2024, since delft3d PR 637) the hisfile is also written in netcdf4 instead of netcdf3. This performance test predates this new feature, so we had to manually convert a netcdf3 hisfile to netcdf4 with:

module load netcdf/v4.9.0_v4.6.0_intel22.2.0
nccopy -k 'netCDF-4' 'DCSM-FM_0_5nm_0000_his.nc' 'DCSM-FM_0_5nm_0000_his_netcdf4.nc'

Performance of reading/plotting is significantly different depending on the chunks and whether it is netcdf3/netcdf4:

import datetime as dt
import dfm_tools as dfmt
import xarray as xr
from dask.diagnostics import ProgressBar

file_nc = r'P:\archivedprojects\11208054-004-dcsm-fm\models\3D_DCSM-FM\2013-2017\B04_EOT20_RHO1_H1_H2\DFM_OUTPUT_DCSM-FM_0_5nm\DCSM-FM_0_5nm_0000_his.nc'
file_nc = r'P:\archivedprojects\11208054-004-dcsm-fm\models\3D_DCSM-FM\2013-2017\B04_EOT20_RHO1_H1_H2\DFM_OUTPUT_DCSM-FM_0_5nm\DCSM-FM_0_5nm_0000_his_netcdf4.nc'

chunks = {}
chunks = {'time':1,'stations':10}
chunks = {'time':1000,'stations':10}
chunks = 'auto' # results in non-unified chunks

#performance measurements can be influenced by partly caching in memory
#netcdf3
#Open: 1581 sec (26 min) #Plot: 12646 sec (210 min)
#Open: 864  sec (14 min) #Plot: 2422 sec (40 min)
#Open: 705  sec (12 min) #Plot: 1323 sec (22 min)
#netcdf4
#Open: 0.3 sec (0 min) #Plot: ?? sec (?? min) #killed after 33 minutes of plotting (0% progress)
#Open: 20  sec (0 min) #Plot: 7505 sec (125 min)
#Open: 0.3 sec (0 min) #Plot: 3405 sec (56 min)

print('>> performance test opening: ',end='')
dtstart = dt.datetime.now()
ds = xr.open_dataset(file_nc,chunks=chunks)#,decode_cf=False,decode_times=False,decode_coords=False)
print(f'{(dt.datetime.now()-dtstart).total_seconds():.2f} sec')

ds = dfmt.preprocess_hisnc(ds)

print('>> performance test plotting: ',end='')
dtstart = dt.datetime.now()
ds_toplot = ds.salinity.isel(laydim=39).sel(stations='HOEKVHLD')
with ProgressBar(): #contains not all time needed
    ds_toplot.plot()
print(f'{(dt.datetime.now()-dtstart).total_seconds():.2f} sec')
@veenstrajelmer veenstrajelmer changed the title Fix workinprogress_xarray_performance.py testcase Performance of his netcdf3 vs netcdf4 with different chunks May 15, 2024
@veenstrajelmer veenstrajelmer changed the title Performance of his netcdf3 vs netcdf4 with different chunks Performance of hisfile netcdf3 vs netcdf4 with different chunks May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant