Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utilities for building catalogs #47

Merged
merged 16 commits into from
Jan 7, 2020
Merged

Add utilities for building catalogs #47

merged 16 commits into from
Jan 7, 2020

Conversation

andersy005
Copy link
Contributor

Towards #43

date_str_regex = r'\d{4}\-\d{4}|\d{6}\-\d{6}|\d{8}\-\d{8}|\d{10}Z\-\d{10}Z|\d{12}Z\-\d{12}Z|\d{10}\-\d{10}|\d{12}\-\d{12}'


def cesm2_cmip6_parser(filepath):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mnlevy1981 & @matt-long,

I've put together a file parser for cesm2_cmip6 collection, however, I am not sure I am getting everything right especially the experiment attribute for the Decadal Prediction (DCPP) output.

For instance, here's what I get for one file:

  • /glade/collections/cdg/timeseries-cmip6/DCPP/011-020/b.e11.BDP.f09_g16.1969-11.014/atm/proc/tseries/month_1/b.e11.BDP.f09_g16.1969-11.014.cam.h0.SOLIN.196911-197912.nc'
In [1]: from cesm import cesm2_cmip6_parser                                                                                                                                                                                           

In [2]: f = "/glade/collections/cdg/timeseries-cmip6/DCPP/011-020/b.e11.BDP.f09_g16.1969-11.014/atm/proc/tseries/month_1/b.e11.BDP.f09_g16.1969-11.014.cam.h0.SOLIN.196911-197912.nc"                                                                                                                                                             

In [3]: cesm2_cmip6_parser(f)                                                                                                                                                                                                         
Out[3]: 
{'path': '/glade/collections/cdg/timeseries-cmip6/DCPP/011-020/b.e11.BDP.f09_g16.1969-11.014/atm/proc/tseries/month_1/b.e11.BDP.f09_g16.1969-11.014.cam.h0.SOLIN.196911-197912.nc',
 'case': 'b.e11.BDP.f09_g16.1969-11.014',
 'variable': 'SOLIN',
 'date_range': '196911-197912',
 'stream': 'cam.h0',
 'component': 'atm',
 'experiment': '1969-11'}

Note that I am getting experiment=1969-11. Is this right or should we treat DCPP outputs as a special case?

I seem to be getting the right attributes for outputs from other experiments:

In [4]: f2 = '/glade/collections/cdg/timeseries-cmip6/f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001/atm/proc/tseries/month_1/f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001.cam.h0.CLD_CAL_UN.187001-191912.nc'
                                                                                                                              

In [5]: cesm2_cmip6_parser(f2)                                                                                                                                                                                                        
Out[5]: 
{'path': '/glade/collections/cdg/timeseries-cmip6/f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001/atm/proc/tseries/month_1/f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001.cam.h0.CLD_CAL_UN.187001-191912.nc',
 'case': 'f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001',
 'variable': 'CLD_CAL_UN',
 'date_range': '187001-191912',
 'stream': 'cam.h0',
 'component': 'atm',
 'experiment': 'CFMIP-amip-piForcing'}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andersy005 That's a good question. I don't know enough about how the DCPP is analyzed to give very good advice, but my instinct would be to treat 1969-11 as member_id rather than an experiment name. I would think it would be useful read in multiple runs from the DCPP and align the time axes so that all the runs covering a specified time period can be looked at simultaneously. @sgyeager would be a good person to ask, though you'll probably need to catch him up on the purpose of intake, intake-esm, and intake-esm-datastore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't know what the 014 would be if 1969-11 was the member_id so obviously I didn't think the above comment through very well...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CMORIZED output of CMIP6, we ended up creating an extra dcpp_init_year attribute (dimension) for DCPP output.

For instance, /glade/u/home/abanihi/collections/cmip/CMIP6/DCPP/NCAR/CESM1-1-CAM5-CMIP5/dcppA-hindcast/s1968-r2i1p1f1 would end up with dcpp_init_year=1968, member_id=r2i1p1f1. I am now wondering whether we can have b.e11.BDP.f09_g16.1969-11.014 --> dcpp_init_year=1969, member_id=014. I am not sure what the experiment would be in this case though (DCPP maybe?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 011-020 in /glade/collections/cdg/timeseries-cmip6/DCPP/011-020/ stand for?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andersy005 The 011-020 in /glade/collections/cdg/timeseries-cmip6/DCPP/011-020/ is the sub-experiment id. I know you're referencing CESM time series here, but this CMIP6 document might provide some clarification on the terminology: http://goo.gl/v1drZl (page 17 contains the directory structure information and page 14 has the file naming conventions).

@andersy005 andersy005 marked this pull request as ready for review January 7, 2020 00:35
@andersy005
Copy link
Contributor Author

I am going to merge this as is.. I've created an issue for the CESM2-CMIP6 discussion in #50

@andersy005 andersy005 merged commit 3e46a0e into NCAR:master Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants