# Pangeo-Enabled ESM Pattern Scaling (PEEPS)
# libraries and helper functions

For simplicity in this notebook, many of the repeated calculations used for
linear pattern scaling are included as functions in the `helper.py` script,
which is a part of this repository and sourced:

In [1]:
# Load all of the libraries
from matplotlib import pyplot as plt
import pandas as pd


%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.rcParams['figure.figsize'] = 12, 6

# easier to read displays in console
pd.set_option('display.max_columns', None)

# source the helper functions
%run helpers.py

# Intro
We will pull CMIP6 netcdf data from Pangeo (
https://gallery.pangeo.io/repos/pangeo-data/pangeo-tutorial-gallery/).
These will include:
- monthly `tas` data from which to calculate `tgav`
- any specified variables of interest to pattern  scale

# Data


In [2]:
fetch_pangeo_table().to_csv('pangeo_table.csv', index=False)
# in same directory as this notebook, easy
pangeo_data = pd.read_csv('pangeo_table.csv')


esm_list = pangeo_data.model.unique().copy()
print(esm_list)

# Specify the scenarios and variables of interest for scaling
exp_list = ['historical', 'ssp370',
            'ssp585', 'ssp126', 'ssp245',
            'ssp119', 'ssp460', 'ssp434'  ]

var_list = ['tas', 'pr', 'hurs']

['HadGEM3-GC31-MM' 'GFDL-ESM4' 'GFDL-CM4' 'IPSL-CM6A-LR' 'CNRM-CM6-1'
 'GISS-E2-1-G' 'GISS-E2-1-H' 'BCC-CSM2-MR' 'BCC-ESM1' 'CNRM-ESM2-1'
 'MIROC6' 'AWI-CM-1-1-MR' 'EC-Earth3-LR' 'MRI-ESM2-0' 'CESM2-WACCM'
 'CESM2' 'SAM0-UNICON' 'GISS-E2-1-G-CC' 'UKESM1-0-LL' 'EC-Earth3'
 'CanESM5' 'CanESM5-CanOE' 'EC-Earth3-Veg' 'HadGEM3-GC31-LL'
 'MPI-ESM-1-2-HAM' 'NESM3' 'CAMS-CSM1-0' 'MPI-ESM1-2-LR' 'MPI-ESM1-2-HR'
 'MCM-UA-1-0' 'NorESM2-LM' 'FGOALS-g3' 'FGOALS-f3-L' 'MIROC-ES2L'
 'FIO-ESM-2-0' 'NorCPM1' 'NorESM1-F' 'CNRM-CM6-1-HR' 'ACCESS-CM2'
 'NorESM2-MM' 'ACCESS-ESM1-5' 'IITM-ESM' 'CESM2-FV2' 'CESM2-WACCM-FV2'
 'GISS-E2-2-G' 'GISS-E2-2-H' 'TaiESM1' 'AWI-ESM-1-1-LR' 'CIESM'
 'CMCC-CM2-SR5' 'EC-Earth3-AerChem' 'IPSL-CM5A2-INCA' 'CMCC-CM2-HR4'
 'EC-Earth3-Veg-LR' 'CAS-ESM2-0' 'EC-Earth3-CC' 'CMCC-ESM2' 'MIROC-ES2H'
 'ICON-ESM-LR' 'IPSL-CM6A-LR-INCA']


# For a specific esm, variable, scenario, pattern scale

## prep ensemble average data for training
For linear pattern scaling, we fit the regression on the ensemble mean for each
ESM, variable, and scenario. This calculation is included in `helpers.py > do_ps()`.



## loop for annual patterns

In [None]:
OUTPUT_DIR = 'outputs/'

for esm in esm_list:
    for scn in exp_list:
        for variable in var_list:

            # annual patterns
            savename = (esm + '_' + scn + '_' + variable + '_annual_pattern.nc' )
            print(savename)
            try:
                ensemble_ds = do_ps(esm_name = esm,
                                    var_name = variable,
                                    exp_name  = scn,
                                    monthly_or_annual = 'annual',
                                    fit_intercept = True,
                                    save_resids = True,
                                    tgav_DIR = (OUTPUT_DIR + 'tgav'))

                if not ensemble_ds == None:
                    ensemble_ds[0].to_netcdf(OUTPUT_DIR + 'patterns/'+ savename)
                    ensemble_ds[1].to_netcdf(OUTPUT_DIR + 'residuals/' + esm + '_' + scn +
                                             '_' + variable + '_annual_pattern_resids.nc' )

                del(ensemble_ds)




            except:
                print('issue creating ' + savename)
                print('Specific print statement above if data does not exist in pangeo_data.csv.')
                print('And this exception would not have triggered.')
                print('Bigger issue to figure out.')



        # end for over variable
    #end for over experiment
# end for over esm

HadGEM3-GC31-MM_historical_tas_annual_pattern.nc
gs://cmip6/CMIP6/CMIP/MOHC/HadGEM3-GC31-MM/historical/r1i1p1f3/Amon/tas/gn/v20191207/
gs://cmip6/CMIP6/CMIP/MOHC/HadGEM3-GC31-MM/historical/r2i1p1f3/Amon/tas/gn/v20191218/
gs://cmip6/CMIP6/CMIP/MOHC/HadGEM3-GC31-MM/historical/r3i1p1f3/Amon/tas/gn/v20200601/
gs://cmip6/CMIP6/CMIP/MOHC/HadGEM3-GC31-MM/historical/r4i1p1f3/Amon/tas/gn/v20200601/
tgav calculation complete, doing scaling with call to reshape_and_scale.
HadGEM3-GC31-MM_historical_pr_annual_pattern.nc
gs://cmip6/CMIP6/CMIP/MOHC/HadGEM3-GC31-MM/historical/r1i1p1f3/Amon/pr/gn/v20191207/
issue creating HadGEM3-GC31-MM_historical_pr_annual_pattern.nc
Specific print statement above if data does not exist in pangeo_data.csv.
And this exception would not have triggered.
Bigger issue to figure out.
HadGEM3-GC31-MM_historical_hurs_annual_pattern.nc
gs://cmip6/CMIP6/CMIP/MOHC/HadGEM3-GC31-MM/historical/r1i1p1f3/Amon/hurs/gn/v20191207/
issue creating HadGEM3-GC31-MM_historical_hurs_annual_

## loop for monthly patterns

In [None]:
months = pd.DataFrame(data={
            'month': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
            'month_name': ['jan', 'feb', 'mar', 'apr',
                           'may', 'jun', 'jul', 'aug',
                           'sep', 'oct', 'nov', 'dec']
        })

OUTPUT_DIR = 'outputs/'

for esm in esm_list:
    for scn in exp_list:
        for variable in var_list:

            savename = (esm + '_' + scn + '_' + variable + '_monthly_patterns' )
            print(savename)
            try:
                ensemble_ds = do_ps(esm_name = esm,
                                    var_name = variable,
                                    exp_name  = scn,
                                    monthly_or_annual = 'monthly',
                                    fit_intercept = True,
                                    save_resids=True)



                if not ensemble_ds == None:
                    for ind in range(1, 13):
                        month_nm = months.loc[(ind - 1), 'month_name']
                        ensemble_ds[(ind-1)][0].to_netcdf(OUTPUT_DIR + 'patterns/' + savename +
                                                          '_' + month_nm + '.nc')
                        ensemble_ds[(ind-1)][1].to_netcdf(OUTPUT_DIR + 'residuals/' +savename +
                                                          '_' + month_nm + '_resids.nc')

                del(ensemble_ds)




            except:
                print('issue creating ' + savename)
                print('Specific print statement above if data does not exist in pangeo_data.csv.')
                print('And this exception would not have triggered.')
                print('Bigger issue to figure out.')



        # end for over variable
    #end for over experiment
# end for over esm