# Calculating differences between World Ocean Atlas 2023 and GFDL-MOM6-COBALT2 model outputs
**Author:** Denisse Fierro Arcos  
**Date:** 2024-10-03  
  
Cloud optimised WOA23 data (i.e., `zarr` files produced in the [02P_WOA_netcdf_to_zarr.ipynb](02P_WOA_netcdf_to_zarr.ipynb) script) are regridded to match the GFDL-MOM6-COBALT2 grid (horizontally and vertically). Regridded WOA23 data is saved as `zarr` file for future use.

In [1]:
import xarray as xr
import os
from glob import glob
from dask.distributed import Client
import xesmf as xe
import pandas as pd

## Starting a cluster
This will allow us to automatically parallelising tasks on large datasets.

In [2]:
client = Client(threads_per_worker = 1)

Perhaps you already have a cluster running?
Hosting the HTTP server on port 43185 instead


In [2]:
base_folder = '/g/data/vf71/fishmip_inputs/ISIMIP3a/global_inputs/obsclim/025deg'
base_woa = '/g/data/vf71/WOA_data/global'
out_folder = os.path.join(base_folder, 'comp_clim_woa')
os.makedirs(out_folder, exist_ok = True)

## Loading GFDL model outputs
This will be used as the target grid used in the regridding process.

In [3]:
gfdl_temp = xr.open_zarr(
    os.path.join(out_folder, 
                 'gfdl-mom6-cobalt2_obsclim_global_clim_mean_temp_1981_2010.zarr')).thetao

## Loading temperature data from World Ocean Atlas (WOA)
We will interpolate depth levels in WOA to match GFDL outputs.

In [8]:
temp_woa = xr.open_zarr(
    os.path.join(base_woa, 
                 'woa23_clim_mean_temp_1981-2010.zarr/')).t_an
temp_woa = temp_woa.interp({'depth': gfdl_temp.depth.values})
temp_woa

Unnamed: 0,Array,Chunk
Bytes,138.43 MiB,3.85 MiB
Shape,"(35, 720, 1440)","(35, 120, 240)"
Dask graph,36 chunks in 11 graph layers,36 chunks in 11 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 138.43 MiB 3.85 MiB Shape (35, 720, 1440) (35, 120, 240) Dask graph 36 chunks in 11 graph layers Data type float32 numpy.ndarray",1440  720  35,

Unnamed: 0,Array,Chunk
Bytes,138.43 MiB,3.85 MiB
Shape,"(35, 720, 1440)","(35, 120, 240)"
Dask graph,36 chunks in 11 graph layers,36 chunks in 11 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Calculating regridder

In [9]:
#Calculate regridder
reg = xe.Regridder(temp_woa, gfdl_temp, method = 'conservative')
reg

--------------------------------------------------------------------------
but there are no active ports detected (or Open MPI was unable to use
them).  This is most certainly not what you wanted.  Check your
cables, subnet manager configuration, etc.  The openib BTL will be
ignored for this job.

  Local host: gadi-cpu-bdw-0042
--------------------------------------------------------------------------
  lon_bnds = ds.cf.get_bounds('longitude')


xESMF Regridder 
Regridding algorithm:       conservative 
Weight filename:            conservative_720x1440_720x1440.nc 
Reuse pre-computed weights? False 
Input grid shape:           (720, 1440) 
Output grid shape:          (720, 1440) 
Periodic in longitude?      False

## Saving regridder 

In [None]:
reg.to_netcdf(os.path.join(base_woa, 'regridder.nc'))

### *Optional: Loading regridder*
The regridder needs to be calculated once only and can be simply loaded to avoid recalculating it.

In [4]:
#Loading regridder
fn = xr.open_dataset(os.path.join(base_woa, 'regridder.nc'))

## Regridding temperature data

In [24]:
temp_woa_reg = reg(temp_woa, output_chunks = (144, 288))
temp_woa_reg.name = 'temperature'
temp_woa_reg

Unnamed: 0,Array,Chunk
Bytes,138.43 MiB,5.54 MiB
Shape,"(35, 720, 1440)","(35, 144, 288)"
Dask graph,25 chunks in 19 graph layers,25 chunks in 19 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 138.43 MiB 5.54 MiB Shape (35, 720, 1440) (35, 144, 288) Dask graph 25 chunks in 19 graph layers Data type float32 numpy.ndarray",1440  720  35,

Unnamed: 0,Array,Chunk
Bytes,138.43 MiB,5.54 MiB
Shape,"(35, 720, 1440)","(35, 144, 288)"
Dask graph,25 chunks in 19 graph layers,25 chunks in 19 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Loading salinity data from World Ocean Atlas (WOA)
We will interpolate depth levels in WOA to match GFDL outputs.

In [37]:
salt_woa = xr.open_zarr(os.path.join(base_woa,
                                     'woa23_clim_mean_sal_1981-2010.zarr/')).s_an
salt_woa = salt_woa.interp({'depth': gfdl_temp.depth.values})
salt_woa_reg = reg(salt_woa, output_chunks = (144, 288))
salt_woa_reg.name = 'salinity'
salt_woa_reg

Unnamed: 0,Array,Chunk
Bytes,138.43 MiB,5.54 MiB
Shape,"(35, 720, 1440)","(35, 144, 288)"
Dask graph,25 chunks in 19 graph layers,25 chunks in 19 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 138.43 MiB 5.54 MiB Shape (35, 720, 1440) (35, 144, 288) Dask graph 25 chunks in 19 graph layers Data type float32 numpy.ndarray",1440  720  35,

Unnamed: 0,Array,Chunk
Bytes,138.43 MiB,5.54 MiB
Shape,"(35, 720, 1440)","(35, 144, 288)"
Dask graph,25 chunks in 19 graph layers,25 chunks in 19 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Saving regridded files

In [41]:
temp_woa_reg.to_zarr(os.path.join(base_woa, 
                                  'regridded_woa_clim_mean_temp_1981-2010.zarr'),
                     consolidated = True, mode = 'w')

salt_woa_reg.to_zarr(os.path.join(base_woa, 
                                  'regridded_woa_clim_mean_sal_1981-2010.zarr'),
                     consolidated = True, mode = 'w')

This may cause some slowdown.
Consider scattering data ahead of time and using futures.


<xarray.backends.zarr.ZarrStore at 0x14c373311440>

## Regridding monthly climatologies
The regridding will be done with the original monthly datasets from WOA.

### Defining function to regrid and store data

In [33]:
def reg_woa(file_woa, gfdl_da, varname, regridder, folder_out):
    da = xr.open_dataset(file_woa, decode_times = False)[varname]
    da = da.interp({'depth': gfdl_da.depth.values})
    da = regridder(da, output_chunks = (144, 288))
    da.name = varname
    f_out = os.path.basename(file_woa).replace('.nc', '.zarr')
    da.to_zarr(os.path.join(folder_out, f_out), consolidated = True, mode = 'w')

### Calculating regridder using saved weights

In [26]:
reg = xe.Regridder(da, gfdl_temp, method = 'conservative', weights = fn)

  lon_bnds = ds.cf.get_bounds('longitude')


### Creating folder to save regridded monthly temperature data

In [34]:
temp_out = os.path.join(base_woa, 'regridded_monthly_temp')
os.makedirs(temp_out, exist_ok = True)

### Getting list of monthly temperature files to be regridded

In [40]:
list_temp = sorted(glob(os.path.join(base_woa, 'temperature/*')))
list_temp = [f for f in list_temp if 't00' not in f]

### Applying regridder

In [35]:
for f in list_temp:
    reg_woa(f, gfdl_temp, 't_an', reg, temp_out)

### Creating folder to save regridded monthly salinity data

In [61]:
salt_out = os.path.join(base_woa, 'regridded_monthly_sal')
os.makedirs(salt_out, exist_ok = True)

### Getting list of monthly salinity files to be regridded

In [64]:
list_salt = sorted(glob(os.path.join(base_woa, 'salinity/*')))
list_salt = [f for f in list_salt if 's00' not in f]

### Applying regridder

In [65]:
for f in list_salt:
    reg_woa(f, gfdl_temp, 's_an', reg, salt_out)

## Merging temperature and salinity data into a single file

In [75]:
sal_month = xr.open_mfdataset(sorted(glob(os.path.join(salt_out, '*')))).s_an
sal_month.name = 'salinity'
sal_month['time']  = pd.date_range(start = '1981-01-01', periods = 12,
                                   freq = 'MS').strftime('%B')
sal_month = sal_month.rename({'time': 'month'})
sal_month

Unnamed: 0,Array,Chunk
Bytes,3.24 GiB,1.24 MiB
Shape,"(12, 35, 720, 1440)","(1, 5, 90, 360)"
Dask graph,2688 chunks in 25 graph layers,2688 chunks in 25 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.24 GiB 1.24 MiB Shape (12, 35, 720, 1440) (1, 5, 90, 360) Dask graph 2688 chunks in 25 graph layers Data type float64 numpy.ndarray",12  1  1440  720  35,

Unnamed: 0,Array,Chunk
Bytes,3.24 GiB,1.24 MiB
Shape,"(12, 35, 720, 1440)","(1, 5, 90, 360)"
Dask graph,2688 chunks in 25 graph layers,2688 chunks in 25 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [77]:
temp_month = xr.open_mfdataset(sorted(glob(os.path.join(temp_out, '*')))).t_an
temp_month.name = 'temperature'
temp_month['time']  = pd.date_range(start = '1981-01-01', periods = 12,
                                    freq = 'MS').strftime('%B')
temp_month = temp_month.rename({'time': 'month'})
temp_month



Unnamed: 0,Array,Chunk
Bytes,3.24 GiB,1.24 MiB
Shape,"(12, 35, 720, 1440)","(1, 5, 90, 360)"
Dask graph,2688 chunks in 25 graph layers,2688 chunks in 25 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.24 GiB 1.24 MiB Shape (12, 35, 720, 1440) (1, 5, 90, 360) Dask graph 2688 chunks in 25 graph layers Data type float64 numpy.ndarray",12  1  1440  720  35,

Unnamed: 0,Array,Chunk
Bytes,3.24 GiB,1.24 MiB
Shape,"(12, 35, 720, 1440)","(1, 5, 90, 360)"
Dask graph,2688 chunks in 25 graph layers,2688 chunks in 25 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


### Saving temperature and salinity files

In [76]:
sal_month.to_zarr(os.path.join(base_woa, 
                               'regridded_woa_month_clim_mean_sal_1981-2010.zarr'), 
                  consolidated = True, mode = 'w')

temp_month.to_zarr(os.path.join(base_woa, 
                                'regridded_woa_month_clim_mean_temp_1981-2010.zarr'), 
                  consolidated = True, mode = 'w')

<xarray.backends.zarr.ZarrStore at 0x14b9cfbb9640>