# The String Constructor


One of the most tidious part of dealing with CMIP data is constructing path and file name, usually in loops.

The string constructors helps programmatically build such path.

These objects are defined from a template and a set of **keys** in the template.

For example when I load data as follow:

In [23]:
ipsl_tas_file = cdms2.open("/global/cscratch1/sd/cmip6/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/historical/r1i1p1f1/Amon/pr/gr1/v20180701/pr_Amon_GFDL-CM4_historical_r1i1p1f1_gr1_195001-201412.nc")

I follow a local template to know where the nc file is which can be represented as follow:

[root]/[collection]/[type]/[institution]/[model]/[experiment]/[member]/[table]/[variable]/[grid]/[version]/[variable]_[table]_[model]_[experiment]_[member]_[grid]_[period].nc

We sure could loop through some of these and construct each time but it can be hard to read

Instead let's construct a string constructor object.

For StringConstructor object our *keys* must be delimited as: `%(key)`

In [24]:
import genutil
template = '%(root)/%(collection)/%(type)/%(institution)/%(model)/%(experiment)/%(member)/%(table)/%(variable)/%(grid)/%(version)/%(variable)_%(table)_%(model)_%(experiment)_%(member)_%(grid)_%(period).nc'
path = genutil.StringConstructor(template)

We can easily retrieve the defined keys as follow:

In [25]:
path.keys()

['root',
 'collection',
 'type',
 'institution',
 'model',
 'experiment',
 'member',
 'table',
 'variable',
 'grid',
 'version',
 'period']

In our case the `root` is fixed to: `/global/cscratch1/sd/cmip6`
Our collection in this exampe will `CMIP6` with the `type` `CMIP`, let's fix this:

In [26]:
path.root = '/global/cscratch1/sd/cmip6'  # this allows to easily change this from one machine to another
path.collection = 'CMIP6'
path.type = 'CMIP'

Going back to our original example (`/global/cscratch1/sd/cmip6/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/historical/r1i1p1f1/Amon/pr/gr1/v20180701/pr_Amon_GFDL-CM4_historical_r1i1p1f1_gr1_195001-201412.nc`) let's fill the remaining keys:

In [27]:
path.institution = 'NOAA-GFDL'
path.model = 'GFDL-CM4'
path.experiment = 'historical'
path.member = 'r1i1p1f1'
path.table = 'Amon'
path.variable = 'tas'
path.grid = 'gr1'
path.version = 'v20180701'
path.period = '195001-201412'

Now let's build that string:

In [28]:
print("File located at:", path())

File located at: /global/cscratch1/sd/cmip6/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/historical/r1i1p1f1/Amon/tas/gr1/v20180701/tas_Amon_GFDL-CM4_historical_r1i1p1f1_gr1_195001-201412.nc


This probably looks silly in this example but let's see some power of this

Let's look at all available files:

In [30]:
import glob
path.period = '*'
print(sorted(glob.glob(path())))

['/global/cscratch1/sd/cmip6/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/historical/r1i1p1f1/Amon/tas/gr1/v20180701/tas_Amon_GFDL-CM4_historical_r1i1p1f1_gr1_185001-194912.nc', '/global/cscratch1/sd/cmip6/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/historical/r1i1p1f1/Amon/tas/gr1/v20180701/tas_Amon_GFDL-CM4_historical_r1i1p1f1_gr1_195001-201412.nc']


Now let's list all the files available for `historical` experiment in the `Amon` table

In [17]:
path.experiment = 'historical'
path.table = 'Amon'
path.institution = "*"
path.model = "*"
path.grid = '*'
path.period = '*'
path.member = '*'
path.variable = '*'
path.version = '*'
print("PATH GENERAL:", path())
all_hist_Amon = glob.glob(path())
for pth in all_hist_Amon[:10]:  # First 10
    print(pth)
print("Total:", len(all_hist_Amon))

PATH GENERAL: /global/cscratch1/sd/cmip6/CMIP6/CMIP/*/*/historical/*/Amon/*/*/*/*_Amon_*_historical_*_*_*.nc
/global/cscratch1/sd/cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r1i1p1f1/Amon/prw/gr/v20190711/prw_Amon_EC-Earth3_historical_r1i1p1f1_gr_200401-200412.nc
/global/cscratch1/sd/cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r1i1p1f1/Amon/prw/gr/v20190711/prw_Amon_EC-Earth3_historical_r1i1p1f1_gr_193701-193712.nc
/global/cscratch1/sd/cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r1i1p1f1/Amon/prw/gr/v20190711/prw_Amon_EC-Earth3_historical_r1i1p1f1_gr_185601-185612.nc
/global/cscratch1/sd/cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r1i1p1f1/Amon/prw/gr/v20190711/prw_Amon_EC-Earth3_historical_r1i1p1f1_gr_192201-192212.nc
/global/cscratch1/sd/cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r1i1p1f1/Amon/prw/gr/v20190711/prw_Amon_EC-Earth3_historical_r1i1p1f1_gr_185201-185212.nc
/global/cscratch1/sd/cmip6/CMIP6/CMIP/EC-Earth

We can now construct list of some of keys

In [None]:
institutions = set()
models = set()
variables = set()
for nm in all_hist_Amon:
    sp = nm.split("/")
    institutions.add(sp[6])
    models.add(sp[7])
    variables.add(sp[12])
print("Institutions:", institutions)
print("Models:", models)
print("Variables:", variables)

Let's focus only on `r1i1p1f1` members for variable `pr`

In [20]:
path.member = 'r1i1p1f1'
path.variable = 'pr'
all_pr = glob.glob(path())
for i, name in enumerate(all_pr):
    f = cdms2.open(name)
    print(i, name, "shape:", f["pr"].shape)

0 /global/cscratch1/sd/cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r1i1p1f1/Amon/pr/gr/v20190711/pr_Amon_EC-Earth3_historical_r1i1p1f1_gr_198401-198412.nc shape: (12, 256, 512)
1 /global/cscratch1/sd/cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r1i1p1f1/Amon/pr/gr/v20190711/pr_Amon_EC-Earth3_historical_r1i1p1f1_gr_185301-185312.nc shape: (12, 256, 512)
2 /global/cscratch1/sd/cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r1i1p1f1/Amon/pr/gr/v20190711/pr_Amon_EC-Earth3_historical_r1i1p1f1_gr_196801-196812.nc shape: (12, 256, 512)
3 /global/cscratch1/sd/cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r1i1p1f1/Amon/pr/gr/v20190711/pr_Amon_EC-Earth3_historical_r1i1p1f1_gr_191901-191912.nc shape: (12, 256, 512)
4 /global/cscratch1/sd/cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r1i1p1f1/Amon/pr/gr/v20190711/pr_Amon_EC-Earth3_historical_r1i1p1f1_gr_186801-186812.nc shape: (12, 256, 512)
5 /global/cscratch1/sd/cmip6/CMIP6/CMIP/EC-Ea