# Download model data using climet-lab

----

This notebook shows how to use the climet-lab data store to download the data for ECMWF, ECCC, and NCEP.

More information about climet-lab, what models and variables are available can be found here:

https://github.com/ecmwf-lab/climetlab-s2s-ai-challenge

In [1]:
import climetlab as cml
import xarray as xr
xr.set_options(keep_attrs=True)

from dask.distributed import Client
import dask.config
dask.config.set({"array.slicing.split_large_chunks": False})

<dask.config.set at 0x2b47b2e49c70>

In [2]:
client = Client("tcp://10.12.206.54:34204")

## This command shows the climet-lab settings.

Depending on how much data you are downloading, you will want to change your cache-directory, maximum-cache-size, and number-of-dowload-threads. There are a few examples below this cell.

In [3]:
cml.settings

0,1,2
cache-directory,'/glade/scratch/jaye/climetlab-jaye','/glade/scratch/jaye/climetlab-jaye'
datasets-directories,['/glade/u/home/jaye/.climetlab/datasets'],['/glade/u/home/jaye/.climetlab/datasets']
layers-directories,['/glade/u/home/jaye/.climetlab/layers'],['/glade/u/home/jaye/.climetlab/layers']
maximum-cache-size,'1800GB','10GB'
number-of-download-threads,53,5
plotting-options,{},{}
projections-directories,['/glade/u/home/jaye/.climetlab/projections'],['/glade/u/home/jaye/.climetlab/projections']
styles-directories,['/glade/u/home/jaye/.climetlab/styles'],['/glade/u/home/jaye/.climetlab/styles']


In [4]:
# good to set that cache does not delete files
#cml.settings.set("maximum-cache-size", "1800GB")

In [5]:
# increase parallel downloads: default 5
#cml.settings.set("number-of-download-threads", 53)

## What are you downloading?

Here you choose your variable, model, and pressure level (if applicable). This only downloads one variable at a time, so you will need to rerun it if you want multiple variables.

In [6]:
var = 'v'
model = 'eccc'
plev = 850

In [7]:
ds = cml.load_dataset('s2s-ai-challenge-training-input', origin=model, parameter=var, format='netcdf').to_xarray()

By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data. 


  0%|          | 0/53 [00:00<?, ?it/s]

distributed.client - ERROR - Failed to reconnect to scheduler after 10.00 seconds, closing client
ERROR:asyncio:_GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=CancelledError()>
asyncio.exceptions.CancelledError


- Renaming variables to work in climpred and dropping coordinates we don't need.

In [9]:
ds = ds.rename({"realization": "member","forecast_time": "init","lead_time": "lead","latitude": "lat","longitude": "lon"}).drop("valid_time")

In [11]:
if var=='gh':
    ds = ds.sel(plev=plev).drop("plev").rename({"gh": "gh_"+str(plev)})
elif var=='v':
    ds = ds.sel(plev=plev).drop("plev").rename({"v": "v_"+str(plev)})
elif var=='u':
    ds = ds.sel(plev=plev).drop("plev").rename({"u": "u_"+str(plev)})
elif var=="ttr":
    ds = ds.rename({"ttr": "olr"}).drop("nominal_top")

In [12]:
ds

Unnamed: 0,Array,Chunk
Bytes,14.68 GiB,283.59 MiB
Shape,"(4, 1060, 32, 121, 240)","(4, 20, 32, 121, 240)"
Count,212 Tasks,53 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 14.68 GiB 283.59 MiB Shape (4, 1060, 32, 121, 240) (4, 20, 32, 121, 240) Count 212 Tasks 53 Chunks Type float32 numpy.ndarray",1060  4  240  121  32,

Unnamed: 0,Array,Chunk
Bytes,14.68 GiB,283.59 MiB
Shape,"(4, 1060, 32, 121, 240)","(4, 20, 32, 121, 240)"
Count,212 Tasks,53 Chunks
Type,float32,numpy.ndarray


In [13]:
ds.lead

In [14]:
#Each model has different lead days available
if model=='ecmwf':
    ds = ds.sel(lead=slice("1 days","47 days")) # for ecmwf
elif model=='ncep':
    ds = ds.sel(lead=slice("2days","44 days")) # for ncep

In [15]:
ds = ds.sortby('init')
ds = ds.chunk({'init':-1,'lead':-1,'lon':'auto','lat':'auto','member':'auto'})

  return self.array[key]


## Write to zarr

In [16]:
# If you are creating a new zarr use mode='w', if you are adding a variable to an existing zarr use mode='a'
%time ds.to_zarr('/glade/campaign/mmm/c3we/jaye/S2S_zarr/ECCC.uvolr.raw.daily.geospatial.zarr', mode="a", consolidated=True)

CPU times: user 478 ms, sys: 34 ms, total: 512 ms
Wall time: 51.4 s


<xarray.backends.zarr.ZarrStore at 0x2af174599b20>