# Extremes Metrics

This notebook shows users how to run the PMP Extremes metrics. 

This notebook should be run in an environment with python, jupyterlab, pcmdi metrics package, and cdat installed. It is expected that you have downloaded the sample data as demonstrated in the download notebook.

The following cell reads in the choices you made during the download data step:

In [1]:
from user_choices import demo_data_directory, demo_output_directory

## Environment Preparation

This driver requires the installation of climextRemes, which is not part of the standard PMP installation. To install this climextRemes, uncomment and run the following cell:

In [2]:
"""
import sys
!conda install --yes --prefix {sys.prefix} -c cascade climextremes
"""

'\nimport sys\n!conda install --yes --prefix {sys.prefix} -c cascade climextremes\n'

## Basic Use

The PMP Extremes driver is controlled via a parameter file. The parameter file for this demo is shown here:

In [3]:
with open("basic_extremes_param.py") as f:
    print(f.read())

case_id = "extremes_ex1"
vars = ['pr']
test_data_set = ['GISS-E2-H']
realization = ['r6i1p1']
test_data_path = '$INPUT_DIR$/CMIP5_demo_timeseries/historical/atmos/day/pr/'
filename_template = '%(variable)_day_%(model)_historical_%(realization)_20000101-20051231.nc'
sftlf_filename_template = 'demo_data/CMIP5_demo_data/cmip5.historical.%(model).sftlf.nc'

metrics_output_path = "demo_output"

dec_mode="JFD"
annual_strict = True
drop_incomplete_djf = True
nc_out=False
regrid=False
plots=False
generate_sftlf = False


To run the extremes driver, use the following command in the terminal. This will generate a metrics file based on the models, observations, and other criteria in `basic_param.py`
```
pmp_extremes_driver.py -p basic_extremes_param.py
```  
In the next cell, bash cell magic is used to run this command as a subprocess:

In [4]:
%%bash
pmp_extremes_driver.py  -p basic_extremes_param.py

Traceback (most recent call last):
  File "/home/ordonez4/miniconda3/envs/pcmdi_metrics_dev/bin/pmp_extremes_driver.py", line 16, in <module>
    from pcmdi_metrics.extremes.lib import (
ModuleNotFoundError: No module named 'pcmdi_metrics.extremes.lib'


CalledProcessError: Command 'b'pmp_extremes_driver.py  -p basic_extremes_param.py\n'' returned non-zero exit status 1.

Running the mean climate driver produces an output json file in the demo output directory. The metrics are stored in the "RESULTS" object of the json. Since only one model was provided as input, the only metrics generated are the mean and standard deviation.

In [None]:
import json
import os
output_path = os.path.join(demo_output_directory,"")
with open(output_path) as f:
    metric = json.load(f)["RESULTS"]
print(json.dumps(metric, indent=2))

## Customizing parameters in the extremes driver

It is possible to override the parameter file from the command line. Use `pmp_extremes_driver.py --help` to see all the flag options.  

### Reference data

A reference data set (e.g. observations or a control run) can be provided to generate additional metrics. Each test data set will be compared to this reference. 

These are the parameters that control the reference data settings:  

--reference_data_path:      The file path for the reference data set.  
--reference_data_set:       A short name for the reference data set.  
--reference_sftlf_template: The file path for the reference land/sea mask (optional if --generate_sftlf = True)  

An example of using reference data is shown next:

In [None]:
%%bash
pmp_extremes_driver.py  -p basic_extremes_param.py \\
--case_id extremes_ex2
--reference_data_path demo_data/obs4MIPs_PCMDI_daily/NASA-JPL/GPCP-1-3/day/pr/gn/latest/pr_day_GPCP-1-3_PCMDI_gn_19961002-20170101.nc  \\
--reference_data_set GPCP-1-3 \\
--reference_sftlf_template demo_data/misc_demo_data/fx/sftlf.GPCP-IP.1x1.nc

In this case, the results JSON contains more statistics.

In [None]:
import json
import os
output_path = os.path.join(demo_output_directory,"extremes_ex2")
with open(output_path) as f:
    metric = json.load(f)["RESULTS"]
print(json.dumps(metric, indent=2))

### Saving additional output

Along with the JSON file of metrics, this driver can also produce a set of diagnostic plots and save the block extrema data as netcdf files.   

To save the netcdf files, use the flag "--nc_out" on the command line or nc_out = True in the parameter file.  
To generate plots, use the flag "--plots" on the command line or plots = True in the parameter file.

The diagnostics plots will always display a world map. There is no option to customize the plots.

The next cell demonstrates these flags.

In [None]:
%%bash
pmp_extremes_driver.py  -p basic_extremes_param.py --case_id "extremes_ex3" --nc_out --plots

The plots and netcdf files can be found in the output directory.

In [None]:
from IPython.display import Image
from IPython.core.display import HTML 

Image(filename = demo_output_directory + "")

### Land/Sea Masking

By default, the extremes driver calculates block extrema over land areas (grid cells where land fraction is over 50%), excluding Antarctica. This requires a land/sea mask.

Users can provide a land/sea mask file template using the flag "--sftlf_filename_template". The placeholders %(model) and %(realization) can be used in places of the model name and realization when there are multiple land/sea mask files. In the demo parameter file, this looks like:
```
sftlf_filename_template = 'demo_data/CMIP5_demo_data/cmip5.historical.%(model).sftlf.nc'
```
Alternatively, the land/sea mask can be estimated on-the-fly using the setting "--generate_sftlf":

In [None]:
%%bash
pmp_extremes_driver.py  -p basic_extremes_param.py --case_id "extremes_ex4" --generate_sftlf

### Regional metrics

Users can define a custom region over which to calculate the extremes metrics. There are two way to do this.

#### Coordinate method

The first method is to provide coordinate pairs that define a contiguous region. This region does not have to be rectangular, but it cannot have holes. The following example provides lat/lon pairs that roughly outline the state of California, USA. The region name flag "--region_name" is optional in this case.



In [None]:
%%bash
pmp_extremes_driver.py  -p basic_extremes_param.py \\
--case_id "extremes_ex5" \\
--region_name "California" \\
--coords [[42.53,-125.73],[42.53,-119.59],[39.15,-119.59],[35.02,-113.89],[32.43,-113.89],[32.34,-117.47],[34.03,-121.26],[40.28,-125.49]]

#### Shapefile method
The second method is to provide a shapefile containing the region of interest. The region of interest must be completely defined by a single, uniquely identifiable feature in the shapefile. For example, if the region of interest is the fifty states of the USA, there must be a single feature in the shapefile that contains all the land areas of all fifty states. The region name flag is required in this case.

--shp_path is the path of the shapefile containing your region  
--region_name is the name of your region, which can be found under the shapefile attribute given by "--attribute"

This example shows how to get metrics for a region called "CANADA" under the "COUNTRY" attribute in a shapefile called "my_shapefile.shp":  
```
pmp_extremes_driver.py -p basic_extremes_param.py --shp_path my_shapefile.shp --attribute "COUNTRY" --region_name "CANADA"
```

### Other options

TODO: Add links
The full suite of options is described in the README file for the extremes metrics.