Loading in initial-condition ('r') but not other variants #3

Timh37 · 2023-01-23T10:47:33Z

Since we want ensembles of initial condition members, we need a way to distinguish variants of CMIP6 models that differ in their initial condition (i.e., 'r' of 'ripf') from variants that differ in other regards ('ipf'). This may mean we need to prescribe a dictionary with each CMIP6 model and the corresponding 'ipf' to use?

jbusecke · 2023-01-23T22:43:18Z

This part of xmip will help you here: https://cmip6-preprocessing.readthedocs.io/en/latest/postprocessing.html#Custom-combination-functions

I would try the following:
Match all attributes except 'member_id' (which will group the datasets with different members together), and then define a custom function like this:

def concat_only_realization_members(ds_list):
    member_ids = [ds.member_id.data for ds in ds_list]
    # find unique members and decide which values of 'ipf' give the most members/variants?
    # pick only the matching datasets from the list
    ds_pick = [ds for ds in ds_list if 'i*p*f*' in ds.member_id]
    return xr.concat(ds_pick, dim='member_id')

Timh37 · 2023-01-24T10:14:28Z

The following works but only if each variant of a model contains the same variables.

def concat_realizations_most_common_ipf(ds_list):
    member_ids = [ds.member_id.data[0] for ds in ds_list]
    
    member_ids.sort() #often i1 is the baseline?
    
    ipf_ids = [s[s.find('i'):] for s in member_ids] #separate 'ipf' from 'r'
    from collections import Counter

    most_common_ipf = Counter(ipf_ids).most_common()[0][0]

    # find unique members and decide which values of 'ipf' give the most members/variants?
    # pick only the matching datasets from the list
    ds_pick = [ds for ds in ds_list if ((most_common_ipf in ds.member_id.data[0]) & ('sfcWind' in ds.variables) & ('psl' in ds.variables)) ]
    
    return xr.concat(ds_pick, dim='member_id')

When I do

reqVars = ['sfcWind','psl']
ddict_filtered = {k: v for k, v in ddict_merged.items() if set(reqVars).issubset(list(ddict_merged[k].variables))}

first, it works well. I guess what the custom function does not address is models with multiple 'ipf's with only single 'r's. In my experience 'i1p1f1' is often the baseline experiment but that is not always the case.

Timh37 · 2023-01-25T11:51:27Z

Note that both this and xMIP's concat_members is too slow to work with when concatenating ~50 members for a single model; maybe this means we need to store files per variant?

jbusecke · 2023-01-26T03:34:34Z

I think #6 might help with getting only the source/members with all variables?

Note that both this and xMIP's concat_members is too slow to work with when concatenating ~50 members

can you try this in your combine_function?

return xr.concat(ds_pick, dim='member_id', join='override', coords='minimal')

I suspect that might make it faster.

Timh37 · 2023-01-30T12:47:54Z

Yes #6 resolves this.

And concatenation seems to speed up by a lot with those kwargs. Thanks!

Timh37 added the question Further information is requested label Jan 25, 2023

Timh37 closed this as completed Jan 30, 2023

Timh37 mentioned this issue Feb 2, 2023

Function to combine/drop matching datasets with different grid labels #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading in initial-condition ('r') but not other variants #3

Loading in initial-condition ('r') but not other variants #3

Timh37 commented Jan 23, 2023

jbusecke commented Jan 23, 2023

Timh37 commented Jan 24, 2023 •

edited

Loading

Timh37 commented Jan 25, 2023

jbusecke commented Jan 26, 2023

Timh37 commented Jan 30, 2023

Loading in initial-condition ('r') but not other variants #3

Loading in initial-condition ('r') but not other variants #3

Comments

Timh37 commented Jan 23, 2023

jbusecke commented Jan 23, 2023

Timh37 commented Jan 24, 2023 • edited Loading

Timh37 commented Jan 25, 2023

jbusecke commented Jan 26, 2023

Timh37 commented Jan 30, 2023

Timh37 commented Jan 24, 2023 •

edited

Loading