# *\*Reference Only\**: Compute seasonal and annual means for 43 years of data from 1979 - 2022, and save locally as net cdf files

#### Information: The code in this section is for reference only because the computations take a long time to run at ~6 minutes per year (over 4 hours for the full 43 years of data). This code has already been ran and net cdf files have been saved out, so those files can be read in and used as pre-computed datasets for later parts of this notebook.

#### Data source: CFSR https://climatedataguide.ucar.edu/climate-data/climate-forecast-system-reanalysis-cfsr
- Local copies of CFSR datasets were used in this notebook
- Parent directory of local datasets: /cfsr/data/

#### Referenced notebooks: 
- UAlbany ATM622 computing-seasonal.ipynb

#### Notes (may remove for final version): 
- Similarly to notebook UAlbany ATM622 observed-circulation.ipynb, we used the pre-computed 30 year sesonal and monthly climatologies (done in the computing-seasonal.ipynb which we did not re-run as it takes a while and was saved out to disk after being ran once)
    - But note: these are climatologies by month and season, so if we need data by year, we cannot use this dataset since year has been averaged out
 

In [1]:
import xarray as xr
import datetime #remove for final version
import os

In [4]:
ds = xr.open_mfdataset(f'/nfs/roselab_rit/data/cfsr_climatology/t.seas_clim.0p5.nc', chunks={'time':30*4, 'lev': 4}, parallel=True)


In [5]:
ds

Unnamed: 0,Array,Chunk
Bytes,126.91 MiB,15.86 MiB
Shape,"(4, 32, 361, 720)","(4, 4, 361, 720)"
Count,2 Graph Layers,8 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 126.91 MiB 15.86 MiB Shape (4, 32, 361, 720) (4, 4, 361, 720) Count 2 Graph Layers 8 Chunks Type float32 numpy.ndarray",4  1  720  361  32,

Unnamed: 0,Array,Chunk
Bytes,126.91 MiB,15.86 MiB
Shape,"(4, 32, 361, 720)","(4, 4, 361, 720)"
Count,2 Graph Layers,8 Chunks
Type,float32,numpy.ndarray


#### Define functions to be used when looping over years for a given group mean (ex: seasonal, annual) and variable (ex: temp, zonal wind, etc.)

In [2]:
def make_dir(path):
    """ 
    Input directory path as string
    Creates the directory if it doesn't already exist
    """ 
    if not os.path.exists(path):
        os.makedirs(path)
        
def open_ds(yr):
    """ Open dataset using dask """
    ds = xr.open_mfdataset(f'/network/daes/cfsr/data/{yr}/{var}.{yr}.0p5.anl.nc', chunks={'time':30*4, 'lev': 4}, parallel=True)
    return ds

def compute_save_means(ds_for_mean, yr):
    
    """ 
    Input dataset based depends on whether grouping seasonally, annually, etc.
    Perform lazy execution averaging on the input dataset
    Calculation is executed when saving to path
    """
    ds_mean = ds_for_mean.mean(dim=('lon','time'), skipna=True)
    save_path = f'{save_dir}/{group_desc}_{var}_{yr}.nc'
    ds_mean.to_netcdf(save_path)
    print(save_path) #comment out for final version
    print(f"finished {yr} at {datetime.datetime.now()}") #comment out for final version
    

#### *For seasonal mean temperatures*

In [None]:
# Define variables 
var = 't' 

# Describes how data should be grouped, used in file and directory names
group_desc = 'seasonal' 

# Directory where averaged net cdf files will be saved out
save_dir = f'/home11/grad/2021/cs436778/general-circulation/project/data/{group_desc}'

# Years of CFSR data to include; each will be looped over
years = range(1979, 1980)


In [None]:
# execute functions

make_dir(save_dir)

for year in years:

    ds = open_ds(year)

    # group dataframe depending on seasonal, annual, monthly means
    ds_grouped = ds.groupby(ds.time.dt.season)

    compute_save_means(ds_grouped, year)

In [None]:
save_new.close()

#### *For annual mean temperatures*

In [None]:
# Define variables 
var = 't' 

# Describes how data should be grouped, used in file and directory names
group_desc = 'annual' 

# Directory where averaged net cdf files will be saved out
save_dir = f'/home11/grad/2021/cs436778/general-circulation/project/data/{group_desc}'

# Years of CFSR data to include; each will be looped over
years = range(1980, 2023)


In [None]:
# execute functions

make_dir(save_dir)

for year in years:

    ds = open_ds(year)

    # group dataframe depending on seasonal, annual, monthly means
    ds_grouped = ds.groupby(ds.time.dt.year)

    compute_save_means(ds_grouped, year)

#### *For seasonal wind v*

In [20]:
# Define variables 
var = 'v' 

# Describes how data should be grouped, used in file and directory names
group_desc = 'seasonal' 

# Directory where averaged net cdf files will be saved out
save_dir = f'/home11/grad/2021/cs436778/general-circulation/project/data/{group_desc}'

# Years of CFSR data to include; each will be looped over
years = range(2000, 2023)

In [21]:
years

range(2000, 2023)

In [None]:
# execute functions

make_dir(save_dir)

for year in years:

    ds = open_ds(year)

    # group dataframe depending on seasonal, annual, monthly means
    ds_grouped = ds.groupby(ds.time.dt.season)

    compute_save_means(ds_grouped, year)

/home11/grad/2021/cs436778/general-circulation/project/data/seasonal/seasonal_v_2000.nc
finished 2000 at 2022-11-27 20:52:52.986452
/home11/grad/2021/cs436778/general-circulation/project/data/seasonal/seasonal_v_2001.nc
finished 2001 at 2022-11-27 21:06:15.461307
/home11/grad/2021/cs436778/general-circulation/project/data/seasonal/seasonal_v_2002.nc
finished 2002 at 2022-11-27 21:20:06.008925
/home11/grad/2021/cs436778/general-circulation/project/data/seasonal/seasonal_v_2003.nc
finished 2003 at 2022-11-27 21:34:21.777639
/home11/grad/2021/cs436778/general-circulation/project/data/seasonal/seasonal_v_2004.nc
finished 2004 at 2022-11-27 21:48:26.396577
/home11/grad/2021/cs436778/general-circulation/project/data/seasonal/seasonal_v_2005.nc
finished 2005 at 2022-11-27 22:02:16.182361
/home11/grad/2021/cs436778/general-circulation/project/data/seasonal/seasonal_v_2006.nc
finished 2006 at 2022-11-27 22:16:17.281359
/home11/grad/2021/cs436778/general-circulation/project/data/seasonal/seasona