# CMIP6 catalog: remove unused grid gr1

when: 2023.11.14

Here is the list of CNRM CMIP6 datasets which have both gr1 and gn grids. As said, we should consider "gn" only as it covers the whole period for affected scenarios. For consistency, we should select "gn" also for the corresponding historical runs.

I check the whole C3S-CMIP6 archive and they are no other datasets having both "gr1" and "gn".

To keep:
```
C3S-CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp126/r1i1p1f2/Omon/tos/gn
C3S-CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp370/r1i1p1f2/Omon/tos/gn
C3S-CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp585/r1i1p1f2/Omon/tos/gn
C3S-CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/Omon/tos/gn
C3S-CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/Omon/sos/gn
C3S-CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/historical/r1i1p1f2/Omon/sos/gn
C3S-CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/historical/r1i1p1f2/Omon/tos/gn
```

To remove:
```
C3S-CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp126/r1i1p1f2/Omon/tos/gr1
C3S-CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp370/r1i1p1f2/Omon/tos/gr1
C3S-CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-CM6-1/ssp585/r1i1p1f2/Omon/tos/gr1
C3S-CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/Omon/tos/gr1
C3S-CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/Omon/sos/gr1
C3S-CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/historical/r1i1p1f2/Omon/sos/gr1
C3S-CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/historical/r1i1p1f2/Omon/tos/gr1
```

In [1]:
import intake
import pandas as pd

In [2]:
cat_url = "https://raw.githubusercontent.com/cp4cds/c3s_34g_manifests/master/intake/catalogs/c3s.yaml"

cat = intake.open_catalog(cat_url)
list(cat)

['c3s-cmip5',
 'c3s-cmip5-daily-pressure-level',
 'c3s-cmip5-daily-single-level',
 'c3s-cmip5-monthly-pressure-level',
 'c3s-cmip5-monthly-single-level',
 'c3s-cmip6',
 'c3s-cmip6-decadal',
 'c3s-cordex',
 'c3s-ipcc-atlas']

## Load cmip6 catalog

In [3]:
df_cmip6 = cat['c3s-cmip6'].read()
df_cmip6

Unnamed: 0,ds_id,path,size,mip_era,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,version,start_time,end_time,bbox,level
0,c3s-cmip6.ScenarioMIP.MOHC.UKESM1-0-LL.ssp245....,ScenarioMIP/MOHC/UKESM1-0-LL/ssp245/r1i1p1f2/A...,28037112,c3s-cmip6,ScenarioMIP,MOHC,UKESM1-0-LL,ssp245,r1i1p1f2,Amon,ts,gn,v20190507,2015-01-16T00:00:00,2049-12-16T00:00:00,"0.94, -89.38, 359.06, 89.38",
1,c3s-cmip6.ScenarioMIP.MOHC.UKESM1-0-LL.ssp245....,ScenarioMIP/MOHC/UKESM1-0-LL/ssp245/r1i1p1f2/A...,38838222,c3s-cmip6,ScenarioMIP,MOHC,UKESM1-0-LL,ssp245,r1i1p1f2,Amon,ts,gn,v20190507,2050-01-16T00:00:00,2100-12-16T00:00:00,"0.94, -89.38, 359.06, 89.38",
2,c3s-cmip6.ScenarioMIP.NCAR.CESM2.ssp370.r4i1p1...,ScenarioMIP/NCAR/CESM2/ssp370/r4i1p1f1/Amon/pr...,104081588,c3s-cmip6,ScenarioMIP,NCAR,CESM2,ssp370,r4i1p1f1,Amon,pr,gn,v20200528,2015-01-15T12:00:00,2064-12-15T12:00:00,"0.00, -90.00, 358.75, 90.00",
3,c3s-cmip6.ScenarioMIP.NCAR.CESM2.ssp370.r4i1p1...,ScenarioMIP/NCAR/CESM2/ssp370/r4i1p1f1/Amon/pr...,74977662,c3s-cmip6,ScenarioMIP,NCAR,CESM2,ssp370,r4i1p1f1,Amon,pr,gn,v20200528,2065-01-15T12:00:00,2100-12-15T12:00:00,"0.00, -90.00, 358.75, 90.00",
4,c3s-cmip6.ScenarioMIP.AS-RCEC.TaiESM1.ssp370.r...,ScenarioMIP/AS-RCEC/TaiESM1/ssp370/r1i1p1f1/Am...,144277888,c3s-cmip6,ScenarioMIP,AS-RCEC,TaiESM1,ssp370,r1i1p1f1,Amon,rlut,gn,v20201014,2015-01-16T12:00:00,2100-12-16T12:00:00,"0.00, -90.00, 358.75, 90.00",
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
133154,c3s-cmip6.CMIP.MIROC.MIROC-ES2H.historical.r1i...,CMIP/MIROC/MIROC-ES2H/historical/r1i1p4f2/Amon...,204026121,c3s-cmip6,CMIP,MIROC,MIROC-ES2H,historical,r1i1p4f2,Amon,tauu,gn,v20220322,1850-01-16T12:00:00,2014-12-16T12:00:00,"0.00, -88.93, 358.59, 88.93",
133155,c3s-cmip6.CMIP.MIROC.MIROC-ES2H.historical.r1i...,CMIP/MIROC/MIROC-ES2H/historical/r1i1p4f2/Amon...,164344947,c3s-cmip6,CMIP,MIROC,MIROC-ES2H,historical,r1i1p4f2,Amon,rlus,gn,v20220322,1850-01-16T12:00:00,2014-12-16T12:00:00,"0.00, -88.93, 358.59, 88.93",
133156,c3s-cmip6.CMIP.MIROC.MIROC-ES2H.historical.r1i...,CMIP/MIROC/MIROC-ES2H/historical/r1i1p4f2/Amon...,3420481903,c3s-cmip6,CMIP,MIROC,MIROC-ES2H,historical,r1i1p4f2,Amon,ua,gn,v20220322,1850-01-16T12:00:00,2014-12-16T12:00:00,"0.00, -88.93, 358.59, 88.93",100.00 500.00 1000.00 2000.00 3000.00 5000.00 ...
133157,c3s-cmip6.CMIP.MIROC.MIROC-ES2H.historical.r1i...,CMIP/MIROC/MIROC-ES2H/historical/r1i1p4f2/Amon...,143468073,c3s-cmip6,CMIP,MIROC,MIROC-ES2H,historical,r1i1p4f2,Amon,tasmin,gn,v20220322,1850-01-16T12:00:00,2014-12-16T12:00:00,"0.00, -88.93, 358.59, 88.93",2.00


In [4]:
df_cmip6.nunique()

ds_id              10433
path              133159
size              122429
mip_era                1
activity_id            2
institution_id        31
source_id             58
experiment_id          9
member_id             17
table_id               8
variable_id           51
grid_label             3
version              242
start_time           633
end_time             674
bbox                  54
level                  6
dtype: int64

In [5]:
list(df_cmip6.grid_label.unique())

['gn', 'gr1', 'gr']

In [6]:
# add new ds_id column without the grid name

def gen_new_id(ds_id):
    new_id = ".".join(ds_id.split(".")[:-2])
    return new_id

df = df_cmip6.copy()

df["_new_id"] = df["ds_id"].apply(gen_new_id)
df.nunique()

ds_id              10433
path              133159
size              122429
mip_era                1
activity_id            2
institution_id        31
source_id             58
experiment_id          9
member_id             17
table_id               8
variable_id           51
grid_label             3
version              242
start_time           633
end_time             674
bbox                  54
level                  6
_new_id            10426
dtype: int64

In [7]:
# filter datasets which have more than 1 grid label

grouped = df.groupby("_new_id")
for name, group in grouped:
    if len(list(group.ds_id.unique())) > 1:
        print(f'Group {name}:')
        print(group.ds_id.unique())
        print('\n')

Group c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-CM6-1.historical.r1i1p1f2.Omon.sos:
['c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-CM6-1.historical.r1i1p1f2.Omon.sos.gn.v20180917'
 'c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-CM6-1.historical.r1i1p1f2.Omon.sos.gr1.v20180917']


Group c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-CM6-1.historical.r1i1p1f2.Omon.tos:
['c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-CM6-1.historical.r1i1p1f2.Omon.tos.gn.v20180917'
 'c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-CM6-1.historical.r1i1p1f2.Omon.tos.gr1.v20180917']


Group c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.historical.r1i1p1f2.Omon.sos:
['c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.historical.r1i1p1f2.Omon.sos.gn.v20181206'
 'c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.historical.r1i1p1f2.Omon.sos.gr1.v20181206']


Group c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.historical.r1i1p1f2.Omon.tos:
['c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.historical.r1i1p1f2.Omon.tos.gn.v20181206'
 'c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.historical.r1i1p1f2.Omon.tos.gr1.v20181206']


Group 

In [8]:
# collect the gr1 grids ds_ids with more than 1 grid

filtered_ids = []

for name, group in grouped:
    if len(list(group.ds_id.unique())) > 1:
        ds_ids = list(group.ds_id)
        for ds_id in ds_ids:
            if '.gr1.' in ds_id:
                filtered_ids.append(ds_id)

filtered_ids

['c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-CM6-1.historical.r1i1p1f2.Omon.sos.gr1.v20180917',
 'c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-CM6-1.historical.r1i1p1f2.Omon.tos.gr1.v20180917',
 'c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.historical.r1i1p1f2.Omon.sos.gr1.v20181206',
 'c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.historical.r1i1p1f2.Omon.tos.gr1.v20181206',
 'c3s-cmip6.ScenarioMIP.CNRM-CERFACS.CNRM-CM6-1.ssp126.r1i1p1f2.Omon.tos.gr1.v20190219',
 'c3s-cmip6.ScenarioMIP.CNRM-CERFACS.CNRM-CM6-1.ssp370.r1i1p1f2.Omon.tos.gr1.v20190219',
 'c3s-cmip6.ScenarioMIP.CNRM-CERFACS.CNRM-CM6-1.ssp585.r1i1p1f2.Omon.tos.gr1.v20190219']

## drop redundant datasets with gr1

In [9]:
df_new = df_cmip6[~df_cmip6['ds_id'].isin(filtered_ids)]
df_new

Unnamed: 0,ds_id,path,size,mip_era,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,version,start_time,end_time,bbox,level
0,c3s-cmip6.ScenarioMIP.MOHC.UKESM1-0-LL.ssp245....,ScenarioMIP/MOHC/UKESM1-0-LL/ssp245/r1i1p1f2/A...,28037112,c3s-cmip6,ScenarioMIP,MOHC,UKESM1-0-LL,ssp245,r1i1p1f2,Amon,ts,gn,v20190507,2015-01-16T00:00:00,2049-12-16T00:00:00,"0.94, -89.38, 359.06, 89.38",
1,c3s-cmip6.ScenarioMIP.MOHC.UKESM1-0-LL.ssp245....,ScenarioMIP/MOHC/UKESM1-0-LL/ssp245/r1i1p1f2/A...,38838222,c3s-cmip6,ScenarioMIP,MOHC,UKESM1-0-LL,ssp245,r1i1p1f2,Amon,ts,gn,v20190507,2050-01-16T00:00:00,2100-12-16T00:00:00,"0.94, -89.38, 359.06, 89.38",
2,c3s-cmip6.ScenarioMIP.NCAR.CESM2.ssp370.r4i1p1...,ScenarioMIP/NCAR/CESM2/ssp370/r4i1p1f1/Amon/pr...,104081588,c3s-cmip6,ScenarioMIP,NCAR,CESM2,ssp370,r4i1p1f1,Amon,pr,gn,v20200528,2015-01-15T12:00:00,2064-12-15T12:00:00,"0.00, -90.00, 358.75, 90.00",
3,c3s-cmip6.ScenarioMIP.NCAR.CESM2.ssp370.r4i1p1...,ScenarioMIP/NCAR/CESM2/ssp370/r4i1p1f1/Amon/pr...,74977662,c3s-cmip6,ScenarioMIP,NCAR,CESM2,ssp370,r4i1p1f1,Amon,pr,gn,v20200528,2065-01-15T12:00:00,2100-12-15T12:00:00,"0.00, -90.00, 358.75, 90.00",
4,c3s-cmip6.ScenarioMIP.AS-RCEC.TaiESM1.ssp370.r...,ScenarioMIP/AS-RCEC/TaiESM1/ssp370/r1i1p1f1/Am...,144277888,c3s-cmip6,ScenarioMIP,AS-RCEC,TaiESM1,ssp370,r1i1p1f1,Amon,rlut,gn,v20201014,2015-01-16T12:00:00,2100-12-16T12:00:00,"0.00, -90.00, 358.75, 90.00",
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
133154,c3s-cmip6.CMIP.MIROC.MIROC-ES2H.historical.r1i...,CMIP/MIROC/MIROC-ES2H/historical/r1i1p4f2/Amon...,204026121,c3s-cmip6,CMIP,MIROC,MIROC-ES2H,historical,r1i1p4f2,Amon,tauu,gn,v20220322,1850-01-16T12:00:00,2014-12-16T12:00:00,"0.00, -88.93, 358.59, 88.93",
133155,c3s-cmip6.CMIP.MIROC.MIROC-ES2H.historical.r1i...,CMIP/MIROC/MIROC-ES2H/historical/r1i1p4f2/Amon...,164344947,c3s-cmip6,CMIP,MIROC,MIROC-ES2H,historical,r1i1p4f2,Amon,rlus,gn,v20220322,1850-01-16T12:00:00,2014-12-16T12:00:00,"0.00, -88.93, 358.59, 88.93",
133156,c3s-cmip6.CMIP.MIROC.MIROC-ES2H.historical.r1i...,CMIP/MIROC/MIROC-ES2H/historical/r1i1p4f2/Amon...,3420481903,c3s-cmip6,CMIP,MIROC,MIROC-ES2H,historical,r1i1p4f2,Amon,ua,gn,v20220322,1850-01-16T12:00:00,2014-12-16T12:00:00,"0.00, -88.93, 358.59, 88.93",100.00 500.00 1000.00 2000.00 3000.00 5000.00 ...
133157,c3s-cmip6.CMIP.MIROC.MIROC-ES2H.historical.r1i...,CMIP/MIROC/MIROC-ES2H/historical/r1i1p4f2/Amon...,143468073,c3s-cmip6,CMIP,MIROC,MIROC-ES2H,historical,r1i1p4f2,Amon,tasmin,gn,v20220322,1850-01-16T12:00:00,2014-12-16T12:00:00,"0.00, -88.93, 358.59, 88.93",2.00


In [10]:
# compare with original
set(df_cmip6.ds_id.unique()) - set(df_new.ds_id.unique())

{'c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-CM6-1.historical.r1i1p1f2.Omon.sos.gr1.v20180917',
 'c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-CM6-1.historical.r1i1p1f2.Omon.tos.gr1.v20180917',
 'c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.historical.r1i1p1f2.Omon.sos.gr1.v20181206',
 'c3s-cmip6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.historical.r1i1p1f2.Omon.tos.gr1.v20181206',
 'c3s-cmip6.ScenarioMIP.CNRM-CERFACS.CNRM-CM6-1.ssp126.r1i1p1f2.Omon.tos.gr1.v20190219',
 'c3s-cmip6.ScenarioMIP.CNRM-CERFACS.CNRM-CM6-1.ssp370.r1i1p1f2.Omon.tos.gr1.v20190219',
 'c3s-cmip6.ScenarioMIP.CNRM-CERFACS.CNRM-CM6-1.ssp585.r1i1p1f2.Omon.tos.gr1.v20190219'}

## Write new catalog

In [11]:
import datetime

last_updated = datetime.datetime.now(datetime.UTC)
version = last_updated.strftime('v%Y%m%d')

print(version)

cat_name = f"c3s-cmip6_{version}.csv.gz"
cat_path = f"../intake/catalogs/c3s-cmip6/{cat_name}"

df_new.to_csv(cat_path, index=False, compression="gzip")


v20231114
