# CUPiD

The CESM Unified Postprocessing and Diagnostics (CUPiD) package is a new python-based system for running diagnostics across all CESM components with a common user and developer interface.
This notebook is a chance to try out CUPiD and run it on your own model simulation.  Note that the underlying python code is very similar to the routines shown in the component-specific diagnostics, which is why we recommend trying those notebooks first.  Additional info, including the source code, can be found on [Github here](https://github.com/NCAR/CUPiD).

**BEFORE BEGINNING THIS EXERCISE** -  Check that your kernel (upper right corner, above) is `Bash`. This should be the default kernel, but if it is not, click on that button and select `Bash`.

CUPiD is currently a command line tool.  This means that instead of running python code directly, this notebook will run unix commands CUPiD provides in order to generate the relevant diagnostics.  To start, we need to clone CUPiD from Github:

In [1]:
#Delete old CUPiD directory if one exists:
if [ -d "CUPiD" ]; then
  rm -rf CUPiD
fi

#Clone CUPiD source code from Github repo:
git clone --recurse-submodules https://github.com/NCAR/CUPiD.git
cd CUPiD #Need to enter CUPiD directory for remaining commands

Cloning into 'CUPiD'...
remote: Enumerating objects: 1137, done.        
remote: Counting objects: 100% (429/429), done.        
remote: Compressing objects: 100% (206/206), done.        
remote: Total 1137 (delta 263), reused 282 (delta 214), pack-reused 708        
Receiving objects: 100% (1137/1137), 3.32 MiB | 9.98 MiB/s, done.
Resolving deltas: 100% (656/656), done.
Submodule 'manage_externals' (https://github.com/ESMCI/manage_externals.git) registered for path 'manage_externals'
Cloning into '/glade/u/home/nusbaume/CESM_tutorials/2024/CESM-Tutorial/notebooks/diagnostics/CUPiD/manage_externals'...
remote: Enumerating objects: 1868, done.        
remote: Counting objects: 100% (178/178), done.        
remote: Compressing objects: 100% (78/78), done.        
remote: Total 1868 (delta 103), reused 116 (delta 97), pack-reused 1690        
Receiving objects: 100% (1868/1868), 857.16 KiB | 4.46 MiB/s, done.
Resolving deltas: 100% (1129/1129), done.
Submodule path 'manage_externals': che

We'll also need to grab some external libraries, which is done the same way as CESM via `checkout_externals`:

In [2]:
./manage_externals/checkout_externals

Processing externals description file : Externals.cfg (/glade/u/home/nusbaume/CESM_tutorials/2024/CESM-Tutorial/notebooks/diagnostics/CUPiD)
Checking local status of required & optional components: adf, mom6-tools, 
Checking out externals: adf, mom6-tools, 



This downloads two additional diagnostics packages that CUPiD will use.  One is the [AMWG Diagnostics Framework (ADF)](https://github.com/NCAR/ADF), which is a command-line tool that can be used to generate CAM diagnostics, and [mom6-tools](https://github.com/NCAR/mom6-tools.git), which is a python package that can be used to analyze MOM6, which is the ocean model that will be used in CESM3 (but for this tutorial we'll ignore).

Next we need to setup the proper python environment using conda/mamba, and activate the `cupid-dev` environment:

In [16]:
#Load conda to your environment:
module load conda

#Install 'cupid-dev' environment if it doesn't already exist:
if ! { conda env list | grep 'cupid-dev'; } >/dev/null 2>&1; then
  mamba env create -f environments/dev-environment.yml
fi

#Install 'cupid-analysis' environment if ti doesn't already exist:
if ! { conda env list | grep 'cupid-analysis'; } >/dev/null 2>&1; then
  mamba env create -f environments/cupid-analysis.yml
fi

#Activate CUPiD conda environemnt:
conda activate cupid-dev
#NOTE: You may see a red ": 1" message below, but it can be ignored.

#Check that cupid-run can be accessed appropriately:
which cupid-run
if ! [ $? -ne 0 ]; then
  #If not then use pip to install:
  pip install -e .
fi

(cupid-dev) (cupid-dev) (cupid-dev) (cupid-dev) (cupid-dev) (cupid-dev) (cupid-dev) (cupid-dev) (cupid-dev) (cupid-dev) (cupid-dev) (cupid-dev) (cupid-dev) (cupid-dev) /glade/work/nusbaume/conda-envs/cupid-dev/bin/cupid-run
(cupid-dev) Obtaining file:///glade/u/home/nusbaume/CESM_tutorials/2024/CESM-Tutorial/notebooks/diagnostics/CUPiD/examples/cesm_tutorial
[31mERROR: file:///glade/u/home/nusbaume/CESM_tutorials/2024/CESM-Tutorial/notebooks/diagnostics/CUPiD/examples/cesm_tutorial does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.[0m[31m
[0m(cupid-dev) 

: 1

CUPiD is controlled via a config YAML file.  Here we create a new directory and write the relevant config file for our tutorial simulation.  Please note that if your tutorial simulations didn't finish then you can use the provided simulations instead:   

In [24]:
#cd examples         #Go to the examples directory
#mkdir cesm_tutorial #Make a new directory to hold our config file
#cd cesm_tutorial    #Go to newly made CESM tutorial example directory
cat << EOF > config.yml
################## SETUP ##################

#NOTE:  CUPiD ocean diagnostics are currently only designed for upcoming MOM6
#       ocean model, so for this tutorial we will only do example atmosphere,
#       land, and sea ice diagnostics.

################
# Data Sources #
################
data_sources:
    # sname is any string used as a nickname for this configuration. It will be
    ### used as the name of the folder your computed notebooks are put in
    sname: cesm_tutorial_quick_run

    # run_dir is the path to the folder you want
    ### all the files associated with this configuration
    ### to be created in
    run_dir: .

    # nb_path_root is the path to the folder that cupid will
    ### look for your template notebooks in. It doesn't have to
    ### be inside run_dir, or be specific to this project, as
    ### long as the notebooks are there
    nb_path_root: ../nblibrary

######################
# Computation Config #
######################

computation_config:

    # default_kernel_name is the name of the environment that
    ### the notebooks in this configuration will be run in by default.
    ### It must already be installed on your machine. You can also
    ### specify a different environment than the default for any
    ### notebook in NOTEBOOK CONFIG

    default_kernel_name: cupid-analysis

############# NOTEBOOK CONFIG #############

############################
# Notebooks and Parameters #
############################

# All parameters under global_params get passed to all the notebooks

global_params:
  #CESM_output_dir: /glade/derecho/scratch/USERNAME/archive  #<-Replace "USERNAME" with your own username
  #Uncomment code here if you need a complete CESM tutorial simulation:
  CESM_output_dir: /glade/campaign/cesm/tutorial/tutorial_2023_archive
  lc_kwargs:
    threads_per_worker: 1

timeseries:
  # This section of the config file controls the time series generator, which
  # takes standard CESM history (time-slice) files and converts them into single
  # variable time series files.
 
  num_procs: 8
  ts_done: [False]
  overwrite_ts: [False]
  #ts_output_dir: /glade/derecho/scratch/USERNAME/archive #<-Replace "USERNAME" with your own username
  ts_output_dir: /glade/derecho/scratch/nusbaume/archive
  case_name: 'b.day2.1'

  #Variables can either be provided as a list (e.g. ['X', 'Y', 'Z']) or,
  #if you want to convert everything on the file, by using the ['process_all']
  #keyword.  For the example below we'll only convert a single variable
  #from each component.

  atm:
    vars: ['TREFHT']
    derive_vars: []
    hist_str: 'h0'
    start_years: [1]
    end_years: [3]
    level: 'lev'

  lnd:
    vars: ['ALTMAX']
    derive_vars: []
    hist_str: 'h0'
    start_years: [1]
    end_years: [3]
    level: 'lev'

  ocn:
    vars: [] # Not doing ocean analyses
    derive_vars: []
    hist_str: 'h'
    start_years: [1]
    end_years: [3]
    level: 'lev'

  ice:
    vars: ['hi']
    derive_vars: []
    hist_str: 'h'
    start_years: [1]
    end_years: [3]
    level: 'lev'

  glc:
    vars: ['usurf']
    derive_vars: []
    hist_str: 'initial_hist'
    start_years: [1]
    end_years: [3]
    level: 'lev'

compute_notebooks:

  # This is where all the notebooks you want run and their
  # parameters are specified. Several examples of different
  # types of notebooks are provided.

  # This section controls the actual diagnostics calculations.
  # In practice one would usually compare the model simulation
  # to a baseline run, or possibly to observations.  However,
  # given that in this tutorial we only have one "long" run,
  # we'll go ahead and compare it to itself.

    infrastructure:
      index:
        parameter_groups:
          none: {}

  #  atm:
  #    adf_quick_run:
  #      parameter_groups:
  #        none:
  #          adf_path: ../../../externals/ADF
  #          config_path: /glade/campaign/cesm/tutorial/cupid_nbs/
  #          config_fil_str: "config_f.cam6_3_119.FLTHIST_ne30.r328_gamma0.33_soae.001.yaml"

    lnd:
      land_comparison:
        parameter_groups:
          none:
            cases:
              - b.day2.1
              - b.day2.1
            type:
              - b.day2.1
              - b.day2.1

    ice:
      seaice:
        parameter_groups:
          none:
            cases:
              - b.day2.1
              - b.day2.1
            begyr1: 1
            endyr1: 3
            begyr2: 1
            endyr2: 3
            nyears: 3

########### JUPYTER BOOK CONFIG ###########

##################################
# Jupyter Book Table of Contents #
##################################
book_toc:

  # See https://jupyterbook.org/en/stable/structure/configure.html for
  # complete documentation of Jupyter book construction options

  format: jb-book

  # All filenames are notebook filename without the .ipynb, similar to above

  root: infrastructure/index # root is the notebook that will be the homepage for the book
  parts:

    # Parts group notebooks into different sections in the Jupyter book
    # table of contents, so you can organize different parts of your project.

#    - caption: Atmosphere

      # Each chapter is the name of one of the notebooks that you executed
      # in compute_notebooks above, also without .ipynb
#      chapters:
#        - file: atm/adf_quick_run

    - caption: Land
      chapters:
        - file: lnd/land_comparison

    - caption: Sea Ice
      chapters:
        - file: ice/seaice

#####################################
# Keys for Jupyter Book _config.yml #
#####################################
book_config_keys:

  title: CESM Tutorial - CUPiD  # Title of your jupyter book

  # Other keys can be added here, see https://jupyterbook.org/en/stable/customize/config.html
  ### for many more options   
EOF

(cupid-dev) (cupid-dev) (cupid-dev) (cupid-dev) 

: 1

Now we are ready to run CUPiD!

## Generating time series and running diagnostics

One of CUPiD's functions is to help convert CESM history files into single-variable time series files, which are required for various different diagnostic systems, as well for submitting to CMIP.
Here we can create some time series files from the tutorial simulation using the config file we just created above and the `-ts` flag.  After time-series generation is done it will run the example notebooks specified in the config file to create 

In [25]:
which cupid-run
echo $?

/glade/work/nusbaume/conda-envs/cupid-dev/bin/cupid-run
(cupid-dev) 0
(cupid-dev) 

: 1

In [26]:
module load nco #Currently the NetCDF Operators are needed by CUPiD in order to run the time series generator
cupid-run -ts

(cupid-dev) 
  Generating atm time series files...
	 Processing time series for case 'b.day2.1' :
	 - time series for TREFHT
  ... atm time series file generation has finished successfully.

  Generating ocn time series files...
	 Processing time series for case 'b.day2.1' :
  ... ocn time series file generation has finished successfully.

  Generating lnd time series files...
	 Processing time series for case 'b.day2.1' :
	 - time series for ALTMAX
  ... lnd time series file generation has finished successfully.

  Generating ice time series files...
	 Processing time series for case 'b.day2.1' :
	 - time series for hi
  ... ice time series file generation has finished successfully.

  Generating glc time series files...
	 Processing time series for case 'b.day2.1' :
	 - time series for usurf
  ... glc time series file generation has finished successfully.
[33m--- NotebookRunner: index -> File('computed_notebook...ucture/index.ipynb') ----[0m
[33m- /glade/u/home/nusbaume/CESM_tutor