In [1]:
import intake

The conda envs have already this catalogue installed so intake knows where to find it

In [2]:
cat = intake.cat.nci

Use list to see all datasets available

In [3]:
list(cat)

['era5', 'era5_land', 'ecmwf', 'esgf', 'cosima', 'erai']

Select the era5 dataset, this is basically a Pandas DataFrame so you can use the "df" accessor to explore it

In [4]:
era5 = cat.era5

In [5]:
era5.df.head()

Unnamed: 0,sub_collection,dataset,product_type,parameter,year,month,startdate,file_variable,path
0,era5-1,pressure-levels,monthly-averaged-by-day,cc,2000,1,20000101,cc,/g/data/rt52/era5-1/pressure-levels/monthly-av...
1,era5-1,pressure-levels,monthly-averaged-by-day,cc,2000,2,20000201,cc,/g/data/rt52/era5-1/pressure-levels/monthly-av...
2,era5-1,pressure-levels,monthly-averaged-by-day,cc,2000,3,20000301,cc,/g/data/rt52/era5-1/pressure-levels/monthly-av...
3,era5-1,pressure-levels,monthly-averaged-by-day,cc,2000,4,20000401,cc,/g/data/rt52/era5-1/pressure-levels/monthly-av...
4,era5-1,pressure-levels,monthly-averaged-by-day,cc,2000,5,20000501,cc,/g/data/rt52/era5-1/pressure-levels/monthly-av...


Unique shows all the unique values for each available attribute

In [12]:
print(era5.unique()['sub_collection'])
print(era5.unique()['dataset'])
print(era5.unique()['product_type'])

{'count': 2, 'values': ['era5', 'era5-1']}
{'count': 2, 'values': ['single-levels', 'pressure-levels']}
{'count': 4, 'values': ['monthly-averaged-by-day', 'monthly-averaged', 'reanalysis', 'monthly-averaged-by-hour']}


Search will subset based on the selected attributes.<br>
NB. I used range to create a list of the years you want!<br>

In [33]:
sub = era5.search(sub_collection='era5',dataset='single-levels', product_type='reanalysis',
                  parameter='10v', year=[x for x in range(1979,2001)])
sub

Unnamed: 0,unique
sub_collection,1
dataset,1
product_type,1
parameter,1
year,22
month,12
startdate,264
file_variable,1
path,264


In [34]:
sub.df.head()

Unnamed: 0,sub_collection,dataset,product_type,parameter,year,month,startdate,file_variable,path
0,era5,single-levels,reanalysis,10v,1979,1,19790101,v10,/g/data/rt52/era5/single-levels/reanalysis/10v...
1,era5,single-levels,reanalysis,10v,1979,2,19790201,v10,/g/data/rt52/era5/single-levels/reanalysis/10v...
2,era5,single-levels,reanalysis,10v,1979,3,19790301,v10,/g/data/rt52/era5/single-levels/reanalysis/10v...
3,era5,single-levels,reanalysis,10v,1979,4,19790401,v10,/g/data/rt52/era5/single-levels/reanalysis/10v...
4,era5,single-levels,reanalysis,10v,1979,5,19790501,v10,/g/data/rt52/era5/single-levels/reanalysis/10v...


This method aggregate the files and create a dictionary containing the resulting datasets. The aggregation is preset in the catalogue definition, you can disable it by passing aggregate=False to the same method.<br>
To see the details of how the data is aggregate use <br>
`era5.aggregation_info`

In [35]:
ds_dict = sub.to_dataset_dict()


--> The keys in the returned dictionary of datasets are constructed as follows:
	'sub_collection.dataset.product_type'


In [36]:
ds_dict.keys()

dict_keys(['era5.single-levels.reanalysis'])

In this case the result is one dataset so you can simply load it using the key.

In [39]:
ds = ds_dict['era5.single-levels.reanalysis']
ds

Unnamed: 0,Array,Chunk
Bytes,745.95 GiB,2.88 GiB
Shape,"(192864, 721, 1440)","(744, 721, 1440)"
Count,792 Tasks,264 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 745.95 GiB 2.88 GiB Shape (192864, 721, 1440) (744, 721, 1440) Count 792 Tasks 264 Chunks Type float32 numpy.ndarray",1440  721  192864,

Unnamed: 0,Array,Chunk
Bytes,745.95 GiB,2.88 GiB
Shape,"(192864, 721, 1440)","(744, 721, 1440)"
Count,792 Tasks,264 Chunks
Type,float32,numpy.ndarray


Now you can treat this as any other xarray Dataset, for example select a region and a variable

In [40]:
myregion = ds.sel(latitude=slice(0,-60),longitude=slice(100,180))

In [44]:
v10 = myregion['v10']