getvar() from many wrf output files? #94

Timothy-W-Hilton · 2019-06-14T00:05:48Z

First and foremost, thanks for providing this fantastic tool.

I'm using wrf.getvar() to open a time series for several variables (HFX, LH, some of the diagnostic variables) that are stored in WRF-written netCDF files. Each file contains a single temporal value (for a 30-minute period). It's a 4-month WRF run, so there are many of these files (> 5000).

For shorter WRF runs with fewer files I've passed getvar() a list of netCDF4.Datset objects.

Now I'm hitting a limit for number of open files (OSError: [Errno 24] Too many open files).

Is there a "best practice" for reading a single variable from lots and lots of netCDF files? It seems that xarray (1) isn't yet supported for getvar() and (2) may not work well anyway because xarray.open_mfdataset seems to want to read every variable from each WRF file and is thus very slow.

My WRF files are netCDF4 (not netCDF4-classic) which seems to rule out netcdf4.MFDataset().

Is my best bet to use something like ncrcat to make a temporary netCDF file containing only the variable I want? This could work but would, I guess, require some digging to supply all the WRF output variables needed for some of the wrf.getvar() diagnostic variables.

cross85 · 2019-12-20T16:04:06Z

This is the code I use, I haven't use it with >5000 files, but I think it should work.

import glob
from netCDF4 import Dataset
from wrf import getvar, ALL_TIMES

list_of_paths = glob.glob(r'../wrf/wrfout_d0*') #list the files
list_of_paths.sort() #sort the files, I don't know why the glob function get them in any order
wrflist=[]
for i in range(0, len(list_of_paths)-1): 
    wrflist.append(Dataset(list_of_paths[i]))

HFX=getvar(wrflist, "HFX", timeidx=ALL_TIMES, method="join") #join the variable from all files

rajkumar8581 · 2021-11-26T18:18:35Z

This is the code I use, I haven't use it with >5000 files, but I think it should work.

import glob
from netCDF4 import Dataset
from wrf import getvar, ALL_TIMES

list_of_paths = glob.glob(r'../wrf/wrfout_d0*') #list the files
list_of_paths.sort() #sort the files, I don't know why the glob function get them in any order
wrflist=[]
for i in range(0, len(list_of_paths)-1): 
    wrflist.append(Dataset(list_of_paths[i]))

HFX=getvar(wrflist, "HFX", timeidx=ALL_TIMES, method="join") #join the variable from all files

Thank you for sharing this code. Could you add, the code for writing the data(HFX) as a time series to a fresh netcdf file?

Timothy-W-Hilton mentioned this issue Jan 9, 2020

Improve xarray support #16

Open

erogluorhan assigned michaelavs Sep 1, 2020

erogluorhan added question Further information is requested support Support request opened by outside user/collaborator labels Sep 1, 2020

erogluorhan unassigned michaelavs Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getvar() from many wrf output files? #94

getvar() from many wrf output files? #94

Timothy-W-Hilton commented Jun 14, 2019

cross85 commented Dec 20, 2019 •

edited

Loading

rajkumar8581 commented Nov 26, 2021

getvar() from many wrf output files? #94

getvar() from many wrf output files? #94

Comments

Timothy-W-Hilton commented Jun 14, 2019

cross85 commented Dec 20, 2019 • edited Loading

rajkumar8581 commented Nov 26, 2021

cross85 commented Dec 20, 2019 •

edited

Loading