Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getvar() from many wrf output files? #94

Open
Timothy-W-Hilton opened this issue Jun 14, 2019 · 2 comments
Open

getvar() from many wrf output files? #94

Timothy-W-Hilton opened this issue Jun 14, 2019 · 2 comments
Labels
question Further information is requested support Support request opened by outside user/collaborator

Comments

@Timothy-W-Hilton
Copy link

First and foremost, thanks for providing this fantastic tool.

I'm using wrf.getvar() to open a time series for several variables (HFX, LH, some of the diagnostic variables) that are stored in WRF-written netCDF files. Each file contains a single temporal value (for a 30-minute period). It's a 4-month WRF run, so there are many of these files (> 5000).

For shorter WRF runs with fewer files I've passed getvar() a list of netCDF4.Datset objects.

Now I'm hitting a limit for number of open files (OSError: [Errno 24] Too many open files).

Is there a "best practice" for reading a single variable from lots and lots of netCDF files? It seems that xarray (1) isn't yet supported for getvar() and (2) may not work well anyway because xarray.open_mfdataset seems to want to read every variable from each WRF file and is thus very slow.

My WRF files are netCDF4 (not netCDF4-classic) which seems to rule out netcdf4.MFDataset().

Is my best bet to use something like ncrcat to make a temporary netCDF file containing only the variable I want? This could work but would, I guess, require some digging to supply all the WRF output variables needed for some of the wrf.getvar() diagnostic variables.

@cross85
Copy link

cross85 commented Dec 20, 2019

This is the code I use, I haven't use it with >5000 files, but I think it should work.

import glob
from netCDF4 import Dataset
from wrf import getvar, ALL_TIMES

list_of_paths = glob.glob(r'../wrf/wrfout_d0*') #list the files
list_of_paths.sort() #sort the files, I don't know why the glob function get them in any order
wrflist=[]
for i in range(0, len(list_of_paths)-1): 
    wrflist.append(Dataset(list_of_paths[i]))

HFX=getvar(wrflist, "HFX", timeidx=ALL_TIMES, method="join") #join the variable from all files

@erogluorhan erogluorhan added question Further information is requested support Support request opened by outside user/collaborator labels Sep 1, 2020
@rajkumar8581
Copy link

This is the code I use, I haven't use it with >5000 files, but I think it should work.

import glob
from netCDF4 import Dataset
from wrf import getvar, ALL_TIMES

list_of_paths = glob.glob(r'../wrf/wrfout_d0*') #list the files
list_of_paths.sort() #sort the files, I don't know why the glob function get them in any order
wrflist=[]
for i in range(0, len(list_of_paths)-1): 
    wrflist.append(Dataset(list_of_paths[i]))

HFX=getvar(wrflist, "HFX", timeidx=ALL_TIMES, method="join") #join the variable from all files

Thank you for sharing this code. Could you add, the code for writing the data(HFX) as a time series to a fresh netcdf file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested support Support request opened by outside user/collaborator
Projects
None yet
Development

No branches or pull requests

5 participants