# Uploading data to `pangeo-data` bucket

Example upload of `zarr` dataset to Pangeo bucket in Google Cloud

Author: Charles Blackmon-Luca

## Getting started

We start by importing the necessary packages:

In [1]:
import xarray as xr

Note that we are using developmental versions of `xarray`:

In [2]:
print(xr.__version__)

0.11.1+64.g612d390


We start a `dask` cluster:

In [3]:
from dask.distributed import Client

client = Client("tcp://127.0.0.1:37511")
client

0,1
Client  Scheduler: tcp://127.0.0.1:37511  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 16  Memory: 135.44 GB


## Google Cloud Authentication

Once we have converted a netCDF4 dataset to `zarr`, we want to upload it to the Pangeo cloud bucket located at `gs://pangeo-data/`. First we generate a sample `zarr` dataset:

In [6]:
!rm -rf /data2/tracmip/zarr/test/

monthly = xr.open_mfdataset('/data2/tracmip/ECHAM-6.3/LandOrbit/Amon/*.nc').chunk(chunks={'time' : 'auto', 'plev' : 'auto'})
monthly.to_zarr('/data2/tracmip/zarr/test/', consolidated=True)

!ls /data2/tracmip/zarr/test/

cl     clwvi	hus   prc   rlds    rsds    rsut     tauu  uas
cli    evspsbl	lat   prsn  rldscs  rsdscs  rsutcs   tauv  va
clivi  hfls	lon   prw   rlus    rsdt    sfcWind  time  vas
clt    hfss	plev  ps    rlut    rsus    ta	     ts    wap
clw    hur	pr    psl   rlutcs  rsuscs  tas      ua    zg


Once we have our `zarr` dataset, we must get our Google Cloud credentials by running the following in the terminal:
```bash
gcloud auth login
```
This will direct us to a login, which will then provide the verification code we need to recieve credentials. Once this is done, we can upload our data to `gs://pangeo-data`:

In [None]:
!gsutil -m cp -r /data2/tracmip/zarr/test/ gs://pangeo-data/

This process can take some time, and depending on the size of the data it may be advantageous to run the process in the background.