Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading in initial-condition ('r') but not other variants #3

Closed
Timh37 opened this issue Jan 23, 2023 · 5 comments
Closed

Loading in initial-condition ('r') but not other variants #3

Timh37 opened this issue Jan 23, 2023 · 5 comments
Labels
question Further information is requested

Comments

@Timh37
Copy link
Owner

Timh37 commented Jan 23, 2023

Since we want ensembles of initial condition members, we need a way to distinguish variants of CMIP6 models that differ in their initial condition (i.e., 'r' of 'ripf') from variants that differ in other regards ('ipf'). This may mean we need to prescribe a dictionary with each CMIP6 model and the corresponding 'ipf' to use?

@jbusecke
Copy link
Collaborator

This part of xmip will help you here: https://cmip6-preprocessing.readthedocs.io/en/latest/postprocessing.html#Custom-combination-functions

I would try the following:
Match all attributes except 'member_id' (which will group the datasets with different members together), and then define a custom function like this:

def concat_only_realization_members(ds_list):
    member_ids = [ds.member_id.data for ds in ds_list]
    # find unique members and decide which values of 'ipf' give the most members/variants?
    # pick only the matching datasets from the list
    ds_pick = [ds for ds in ds_list if 'i*p*f*' in ds.member_id]
    return xr.concat(ds_pick, dim='member_id')

@Timh37
Copy link
Owner Author

Timh37 commented Jan 24, 2023

The following works but only if each variant of a model contains the same variables.

def concat_realizations_most_common_ipf(ds_list):
    member_ids = [ds.member_id.data[0] for ds in ds_list]
    
    member_ids.sort() #often i1 is the baseline?
    
    ipf_ids = [s[s.find('i'):] for s in member_ids] #separate 'ipf' from 'r'
    from collections import Counter

    most_common_ipf = Counter(ipf_ids).most_common()[0][0]

    # find unique members and decide which values of 'ipf' give the most members/variants?
    # pick only the matching datasets from the list
    ds_pick = [ds for ds in ds_list if ((most_common_ipf in ds.member_id.data[0]) & ('sfcWind' in ds.variables) & ('psl' in ds.variables)) ]
    
    return xr.concat(ds_pick, dim='member_id')

When I do

reqVars = ['sfcWind','psl']
ddict_filtered = {k: v for k, v in ddict_merged.items() if set(reqVars).issubset(list(ddict_merged[k].variables))}

first, it works well. I guess what the custom function does not address is models with multiple 'ipf's with only single 'r's. In my experience 'i1p1f1' is often the baseline experiment but that is not always the case.

@Timh37
Copy link
Owner Author

Timh37 commented Jan 25, 2023

Note that both this and xMIP's concat_members is too slow to work with when concatenating ~50 members for a single model; maybe this means we need to store files per variant?

@Timh37 Timh37 added the question Further information is requested label Jan 25, 2023
@jbusecke
Copy link
Collaborator

I think #6 might help with getting only the source/members with all variables?

Note that both this and xMIP's concat_members is too slow to work with when concatenating ~50 members

can you try this in your combine_function?

return xr.concat(ds_pick, dim='member_id', join='override', coords='minimal')

I suspect that might make it faster.

@Timh37
Copy link
Owner Author

Timh37 commented Jan 30, 2023

Yes #6 resolves this.

And concatenation seems to speed up by a lot with those kwargs. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants