# Calculating mean climatology for GFDL data (3D fields)
**Author:** Denisse Fierro Arcos  
**Date:** 2024-09-09  
  
Calculating mean climatological conditions within the boundaries of FishMIP regional models using GFDL-MOM6-COBALT2 model outputs. Only variables that include multiple depth bins are processed here. Climatologies calculated here are shown as maps in shiny app.

## Loading libraries

In [24]:
import xarray as xr
import pandas as pd
import os
from glob import glob
from dask.distributed import Client

## Starting cluster

In [26]:
client = Client(threads_per_worker = 1)
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: /proxy/8787/status,

0,1
Dashboard: /proxy/8787/status,Workers: 14
Total threads: 14,Total memory: 63.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:46051,Workers: 14
Dashboard: /proxy/8787/status,Total threads: 14
Started: Just now,Total memory: 63.00 GiB

0,1
Comm: tcp://127.0.0.1:38373,Total threads: 1
Dashboard: /proxy/41019/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:41897,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-7jb2fnf4,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-7jb2fnf4

0,1
Comm: tcp://127.0.0.1:45035,Total threads: 1
Dashboard: /proxy/41519/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:46185,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-euzs3mu0,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-euzs3mu0

0,1
Comm: tcp://127.0.0.1:39015,Total threads: 1
Dashboard: /proxy/39713/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:35781,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-oqbvuwm7,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-oqbvuwm7

0,1
Comm: tcp://127.0.0.1:37203,Total threads: 1
Dashboard: /proxy/34491/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:41751,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-u2oxfpls,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-u2oxfpls

0,1
Comm: tcp://127.0.0.1:33063,Total threads: 1
Dashboard: /proxy/38667/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:46637,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-6wjpqs55,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-6wjpqs55

0,1
Comm: tcp://127.0.0.1:46211,Total threads: 1
Dashboard: /proxy/45491/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:34889,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-y4jtfrev,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-y4jtfrev

0,1
Comm: tcp://127.0.0.1:38399,Total threads: 1
Dashboard: /proxy/33525/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:36407,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-tx72jbt6,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-tx72jbt6

0,1
Comm: tcp://127.0.0.1:38335,Total threads: 1
Dashboard: /proxy/42061/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:33093,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-d96snt8r,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-d96snt8r

0,1
Comm: tcp://127.0.0.1:46791,Total threads: 1
Dashboard: /proxy/36559/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:44829,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-6_t9dm_5,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-6_t9dm_5

0,1
Comm: tcp://127.0.0.1:36243,Total threads: 1
Dashboard: /proxy/40819/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:43747,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-n4ltoipa,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-n4ltoipa

0,1
Comm: tcp://127.0.0.1:39657,Total threads: 1
Dashboard: /proxy/44971/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:39903,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-1xfb768r,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-1xfb768r

0,1
Comm: tcp://127.0.0.1:39437,Total threads: 1
Dashboard: /proxy/37169/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:33123,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-6i9qd_lq,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-6i9qd_lq

0,1
Comm: tcp://127.0.0.1:43075,Total threads: 1
Dashboard: /proxy/33021/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:34843,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-__4tugdh,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-__4tugdh

0,1
Comm: tcp://127.0.0.1:39635,Total threads: 1
Dashboard: /proxy/34279/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:33675,
Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-ucjqpfox,Local directory: /jobfs/124607587.gadi-pbs/dask-scratch-space/worker-ucjqpfox


## Defining basic variables

In [98]:
#Location of zarr files
base_dir = '/g/data/vf71/fishmip_inputs/ISIMIP3a/regional_inputs/obsclim/025deg'

#Get list of zarr files
zarr_list = glob(os.path.join(base_dir, 'download_data/*zarr'))

#Folder where mean climatologies with all data will be saved
base_out_maps = os.path.join(base_dir, 'maps_data')
os.makedirs(base_out_maps, exist_ok = True)

#Folder where mean climatologies for comparison will be saved
base_out_comp = os.path.join(base_out_maps, 'comp_clim')
os.makedirs(base_out_comp, exist_ok = True)

## Defining function to calculate climatologies

In [65]:
def calc_clim(file_path, path_out, monthly = False, **kwargs):
    '''
    Open netCDF files and calculate climatologies.
    
    Inputs:
    file_path (character): Full file path where data is stored
    path_out (character): Full file path where masked data should be stored
    monthly (boolean): Default is FALSE. If set to TRUE, monthly climatology is
    calculated
    min_year (integer): Optional. First year to be included in climatology
    max_year (integer): Optional. Last year to be included in climatology
    '''

    #Get base file path
    if monthly:
        base_file = os.path.basename(file_path).replace('monthly', 
                                                        'mthly_clim_mean')
    else:
        base_file = os.path.basename(file_path).replace('monthly', 
                                                        'climatological_mean')
    base_file = base_file.replace('zarr', 'parquet')
        
    #Load file
    ds = xr.open_zarr(file_path)
    #Get name of variable
    [var] = list(ds.data_vars)
    ds = ds[var]

    #Save attributes
    ds_attrs = pd.DataFrame([ds.attrs])
    
    #Get years included in dataset
    years = pd.unique(ds.time.dt.year.data)

    #Check start year is later or equal to first year in data
    if 'min_year' in kwargs.keys():
        min_year = kwargs.get('min_year')
        if min_year < min(years):
            print('"min_year" must be later or equal to the first year '+
                   'included in the data. Calculating mean values from ' +
                   str(min(years)))
            min_year = str(min(years))
        else:
            print('Calculating mean values from ' + str(min_year))
            min_year = str(min_year)
            base_file = base_file.replace(str(min(years)), min_year)
    else:
         min_year = str(min(years))
    if 'max_year' in kwargs.keys():
        max_year = kwargs.get('max_year')
        if max_year > max(years):
            print('"max_year" must be earlier or equal to the last year '+
                   'included in the data. Calculating mean values from ' +
                   str(max(years)))
            max_year = str(max(years))
        else:
            print('Calculating mean values from ' + str(max_year))
            max_year = str(max_year)
            base_file = base_file.replace(str(max(years)), max_year)
    else:
        max_year = str(max(years))

    #Filter data 
    ds = ds.sel(time = slice(min_year, max_year))

    #Calculate climatology
    if monthly:
        ds_clim = ds.groupby('time.month').mean('time')
        ind_wider = ['lat', 'lon', 'depth_bin_m', 'month', 'vals']
    else:
        ds_clim = ds.mean('time')
        ind_wider = ['lat', 'lon', 'depth_bin_m', 'vals']

    #Turn extracted data into data frame and remove rows with NA values
    df = ds_clim.to_series().to_frame().reset_index().dropna()
    #Changing column name to standardise across variables
    df = df.rename(columns = {ds.name: 'vals'}).reset_index(drop = True)
    #Reorganise data
    df = df[ind_wider]
    #Include original dataset attributes
    df = pd.concat([df, ds_attrs], axis = 1)
    #Saving data frame
    df.to_parquet(os.path.join(path_out, base_file))

In [100]:
for f in zarr_list:
    calc_clim(f, base_out_maps)
    calc_clim(f, base_out_comp, monthly = True, min_year = 1981, max_year = 2010)

Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating mean values from 2010
Calculating mean values from 1981
Calculating me