-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add utilities for building catalogs #47
Conversation
date_str_regex = r'\d{4}\-\d{4}|\d{6}\-\d{6}|\d{8}\-\d{8}|\d{10}Z\-\d{10}Z|\d{12}Z\-\d{12}Z|\d{10}\-\d{10}|\d{12}\-\d{12}' | ||
|
||
|
||
def cesm2_cmip6_parser(filepath): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put together a file parser for cesm2_cmip6 collection, however, I am not sure I am getting everything right especially the experiment
attribute for the Decadal Prediction (DCPP) output.
For instance, here's what I get for one file:
/glade/collections/cdg/timeseries-cmip6/DCPP/011-020/b.e11.BDP.f09_g16.1969-11.014/atm/proc/tseries/month_1/b.e11.BDP.f09_g16.1969-11.014.cam.h0.SOLIN.196911-197912.nc'
In [1]: from cesm import cesm2_cmip6_parser
In [2]: f = "/glade/collections/cdg/timeseries-cmip6/DCPP/011-020/b.e11.BDP.f09_g16.1969-11.014/atm/proc/tseries/month_1/b.e11.BDP.f09_g16.1969-11.014.cam.h0.SOLIN.196911-197912.nc"
In [3]: cesm2_cmip6_parser(f)
Out[3]:
{'path': '/glade/collections/cdg/timeseries-cmip6/DCPP/011-020/b.e11.BDP.f09_g16.1969-11.014/atm/proc/tseries/month_1/b.e11.BDP.f09_g16.1969-11.014.cam.h0.SOLIN.196911-197912.nc',
'case': 'b.e11.BDP.f09_g16.1969-11.014',
'variable': 'SOLIN',
'date_range': '196911-197912',
'stream': 'cam.h0',
'component': 'atm',
'experiment': '1969-11'}
Note that I am getting experiment=1969-11
. Is this right or should we treat DCPP outputs as a special case?
I seem to be getting the right attributes for outputs from other experiments:
In [4]: f2 = '/glade/collections/cdg/timeseries-cmip6/f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001/atm/proc/tseries/month_1/f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001.cam.h0.CLD_CAL_UN.187001-191912.nc'
In [5]: cesm2_cmip6_parser(f2)
Out[5]:
{'path': '/glade/collections/cdg/timeseries-cmip6/f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001/atm/proc/tseries/month_1/f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001.cam.h0.CLD_CAL_UN.187001-191912.nc',
'case': 'f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001',
'variable': 'CLD_CAL_UN',
'date_range': '187001-191912',
'stream': 'cam.h0',
'component': 'atm',
'experiment': 'CFMIP-amip-piForcing'}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andersy005 That's a good question. I don't know enough about how the DCPP is analyzed to give very good advice, but my instinct would be to treat 1969-11
as member_id
rather than an experiment name. I would think it would be useful read in multiple runs from the DCPP and align the time axes so that all the runs covering a specified time period can be looked at simultaneously. @sgyeager would be a good person to ask, though you'll probably need to catch him up on the purpose of intake
, intake-esm
, and intake-esm-datastore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't know what the 014
would be if 1969-11
was the member_id
so obviously I didn't think the above comment through very well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For CMORIZED output of CMIP6, we ended up creating an extra dcpp_init_year
attribute (dimension) for DCPP output.
For instance, /glade/u/home/abanihi/collections/cmip/CMIP6/DCPP/NCAR/CESM1-1-CAM5-CMIP5/dcppA-hindcast/s1968-r2i1p1f1
would end up with dcpp_init_year=1968
, member_id=r2i1p1f1
. I am now wondering whether we can have b.e11.BDP.f09_g16.1969-11.014
--> dcpp_init_year=1969
, member_id=014
. I am not sure what the experiment would be in this case though (DCPP maybe?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does 011-020
in /glade/collections/cdg/timeseries-cmip6/DCPP/011-020/
stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andersy005 The 011-020 in /glade/collections/cdg/timeseries-cmip6/DCPP/011-020/ is the sub-experiment id. I know you're referencing CESM time series here, but this CMIP6 document might provide some clarification on the terminology: http://goo.gl/v1drZl (page 17 contains the directory structure information and page 14 has the file naming conventions).
I am going to merge this as is.. I've created an issue for the CESM2-CMIP6 discussion in #50 |
Towards #43