# Extracting WOA23 data using FishMIP regional model boundaries
**Author:** Tormey Reimer  
**Date:** 2024-08-27  
**Edited by:** Denisse Fierro Arcos  
**Last edited on:** 2024-09-04  
  
This script transforms the original WOA files in `netcdf` format downloaded in the [01R_download_WOA_data.R](01R_download_WOA_data.R) script, and then converts them to a cloud-optimised `zarr` file.  

## Setting working directory
Remember to change the working directory below to the location of the scripts in your own local machine. Update the `your_path` variable below before continuing with the next chunk.

In [2]:
your_path = ''

In [3]:
import os
os.chdir(os.path.join(your_path, 'processing_WOA_data/scripts'))

## Loading libraries
We will load published Python libraries as well as our custom-made `useful_functions` library.

In [4]:
from dask.distributed import Client
from glob import glob
import useful_functions as uf

## Starting a cluster
This will allow us to automatically parallelising tasks on large datasets.

In [5]:
client = Client(threads_per_worker = 1)

## Defining basic variables

In [6]:
# Defining path to WOA_data directory
WOA_path = '/g/data/vf71/WOA_data/global/'

#Get a list of all temperature files in the WOA directory
WOA_temp = sorted(glob(os.path.join(WOA_path, '*/*_t*.nc')))
#Get a list of all salinity files in the WOA directory
WOA_sal = sorted(glob(os.path.join(WOA_path, '*/*_s*.nc')))

## Processing monthly WOA data
These files contain the month as two digits in the filename. For more information, you can refer to WOA documentation [here](https://www.ncei.noaa.gov/data/oceans/woa/WOA23/DOCUMENTATION/WOA23_Product_Documentation.pdf).

In [None]:
# Selecting monthly data for temperature
WOA_temp_month = [f for f in WOA_temp if '00' not in f]
temp_out = os.path.join(WOA_path, 'woa23_month_clim_mean_temp_1981-2010.zarr')
uf.netcdf_to_zarr(WOA_temp_month, 't_an', temp_out)

# Selecting monthly data for salinity
WOA_sal_month = [f for f in WOA_sal if '00' not in f]
sal_out = os.path.join(WOA_path, 'woa23_month_clim_mean_sal_1981-2010.zarr')
uf.netcdf_to_zarr(WOA_sal_month, 's_an', sal_out)

## Processing climatological mean over entire period
These files can be identified because the include `00` in their filename.

In [None]:
# Climatological temperature mean
WOA_temp = [f for f in WOA_temp if '00' in f]
temp_out = os.path.join(WOA_path, 'woa23_clim_mean_temp_1981-2010.zarr')
uf.netcdf_to_zarr(WOA_temp, 't_an', temp_out)
# Number of observations (temperature)
obs_temp_out = os.path.join(WOA_path, 'woa23_number_obs_temp_1981-2010.zarr')
uf.netcdf_to_zarr(WOA_temp, 't_dd', obs_temp_out)

# Climatological salinity mean
WOA_sal = [f for f in WOA_sal if '00' in f]
sal_out = os.path.join(WOA_path, 'woa23_clim_mean_sal_1981-2010.zarr')
uf.netcdf_to_zarr(WOA_sal, 's_an', sal_out)
# Number of observations (salinity)
obs_sal_out = os.path.join(WOA_path, 'woa23_number_obs_salt_1981-2010.zarr')
uf.netcdf_to_zarr(WOA_sal, 's_dd', obs_sal_out)