# `zarr` Conversion and Uploading

With our TRACMIP datasets uploaded to the GCS bucket, now we can download the files from there directly to our Lamont machine to perform the conversion. This can be accomplished with a simple long running command:

```
gsutil -m cp -r gs://pangeo-data/tracmip_temp/* /d3/charles/tracmip/
```

It is recommended to run this process in the background using either `bg` or `nohup`, as it **will** take hours. With the data downloaded, we can now work to convert and upload each _individual variable_; this is done to prevent cut down on computational time used to merge the different grids of each variable, and will work to prevent grid size conflicts.

In [None]:
import json
import zarr
import xarray as xr

Once again, this process will take a long time, so we keep track of each variable's conversion status using a JSON file:

In [None]:
with open("converted.json", "r") as f:
    d = json.load(f)

for path in d:
    if not d[path]:
        
        # get GCS path
        time     = path.split("/")[3]
        exp      = path.split("/")[4]
        model    = path.split("/")[5]
        variable = path.split("/")[7].split("_")[0]
        version  = path.split("/")[6]
        gcs_path = "gs://pangeo-data/tracmip/%s/%s/%s/%s/%s/" % (time, exp, model, variable, version)
        
        print(gcs_path)
                
        # open dataset
        ds = xr.open_dataset(path, chunks={})
        ds[variable] = ds[variable].chunk({"time" : "auto"})
        
        # convert to zarr
        compressor = zarr.Blosc(cname='zstd', clevel=3, shuffle=2)
        encoding = {var: {'compressor': compressor} for var in ds.data_vars}
        ds.to_zarr("zarr/", mode="w", encoding=encoding, consolidated=True)
        
        # upload to bucket
        !/home/charles/google-cloud-sdk/bin/gsutil -m cp -r zarr/* zarr/.z* {gcs_path}
        
        # mark as uploaded
        d[path] = True
        with open("converted.json", "w") as f:
            json.dump(d, f)