You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First and foremost, thanks for providing this fantastic tool.
I'm using wrf.getvar() to open a time series for several variables (HFX, LH, some of the diagnostic variables) that are stored in WRF-written netCDF files. Each file contains a single temporal value (for a 30-minute period). It's a 4-month WRF run, so there are many of these files (> 5000).
For shorter WRF runs with fewer files I've passed getvar() a list of netCDF4.Datset objects.
Now I'm hitting a limit for number of open files (OSError: [Errno 24] Too many open files).
Is there a "best practice" for reading a single variable from lots and lots of netCDF files? It seems that xarray (1) isn't yet supported for getvar() and (2) may not work well anyway because xarray.open_mfdataset seems to want to read every variable from each WRF file and is thus very slow.
My WRF files are netCDF4 (not netCDF4-classic) which seems to rule out netcdf4.MFDataset().
Is my best bet to use something like ncrcat to make a temporary netCDF file containing only the variable I want? This could work but would, I guess, require some digging to supply all the WRF output variables needed for some of the wrf.getvar() diagnostic variables.
The text was updated successfully, but these errors were encountered:
This is the code I use, I haven't use it with >5000 files, but I think it should work.
importglobfromnetCDF4importDatasetfromwrfimportgetvar, ALL_TIMESlist_of_paths=glob.glob(r'../wrf/wrfout_d0*') #list the fileslist_of_paths.sort() #sort the files, I don't know why the glob function get them in any orderwrflist=[]
foriinrange(0, len(list_of_paths)-1):
wrflist.append(Dataset(list_of_paths[i]))
HFX=getvar(wrflist, "HFX", timeidx=ALL_TIMES, method="join") #join the variable from all files
This is the code I use, I haven't use it with >5000 files, but I think it should work.
importglobfromnetCDF4importDatasetfromwrfimportgetvar, ALL_TIMESlist_of_paths=glob.glob(r'../wrf/wrfout_d0*') #list the fileslist_of_paths.sort() #sort the files, I don't know why the glob function get them in any orderwrflist=[]
foriinrange(0, len(list_of_paths)-1):
wrflist.append(Dataset(list_of_paths[i]))
HFX=getvar(wrflist, "HFX", timeidx=ALL_TIMES, method="join") #join the variable from all files
Thank you for sharing this code. Could you add, the code for writing the data(HFX) as a time series to a fresh netcdf file?
First and foremost, thanks for providing this fantastic tool.
I'm using wrf.getvar() to open a time series for several variables (HFX, LH, some of the diagnostic variables) that are stored in WRF-written netCDF files. Each file contains a single temporal value (for a 30-minute period). It's a 4-month WRF run, so there are many of these files (> 5000).
For shorter WRF runs with fewer files I've passed getvar() a list of netCDF4.Datset objects.
Now I'm hitting a limit for number of open files (OSError: [Errno 24] Too many open files).
Is there a "best practice" for reading a single variable from lots and lots of netCDF files? It seems that xarray (1) isn't yet supported for getvar() and (2) may not work well anyway because xarray.open_mfdataset seems to want to read every variable from each WRF file and is thus very slow.
My WRF files are netCDF4 (not netCDF4-classic) which seems to rule out netcdf4.MFDataset().
Is my best bet to use something like ncrcat to make a temporary netCDF file containing only the variable I want? This could work but would, I guess, require some digging to supply all the WRF output variables needed for some of the wrf.getvar() diagnostic variables.
The text was updated successfully, but these errors were encountered: