# Quick plots - Compare experiments

This tutorial compares results from a control (out of the box) simulation and a model experiment. Specifically it looks at the examples from namelist changes where we shut off the representation of plant hydraulic stress (PHS).

In this tutorial you will find steps and instructions to:

1. Load datasets with xarray and 
2. Quickly look at results from h0 and h1 files

***

# 1. Load Datasets

## 1.1 Load Python Libraries

In [None]:
import os
import time
import datetime

import numpy as np
import pandas as pd
import xarray as xr

from glob import glob
from os.path import join

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from neon_utils import fix_time_h0
from neon_utils import fix_time_h1

In [None]:
print('xarray '+xr.__version__) ##-- was working with 2023.1.0

## 1.2 Point to history files 

### 1.2.1 Where are my simulation results?
After your simulations finish, history files are all saved in your `/scratch/NEON_cases/archive/` directory

We can print the cases we have to look at using bash magic, `%%bash` or `!` which turns the python cell block below into a bash cell.  

In [None]:
%%bash
ls ~/scratch/NEON_cases/archive/

<div class="alert alert-block alert-info">
<b>Note</b> you can accomplish the same thing with the following.

> `!ls ~/scratch/NEON_cases/archive/`
    
</div>

---

### 1.2.2 Point to the data folder with history files 
**We'll set the following:**
- site to look at; 
- path to our archive directory;
- directory with input data (where history files are found).
By doing this more generally, it makes the script easier to modify for different sites.

In [None]:
neon_site = 'WOOD'  #NEON site we're going to look at
archive = '~/scratch/NEON_cases/archive' #Path to archive directory
ex =['transient','BTRAN.transient']
# this unpacks the and expands the shortcut we used above

archive = os.path.realpath(os.path.expanduser(archive)) 

# Create a path to the data folder
pattern = archive+'/'+neon_site+'.{ex}/lnd/hist'
data_folders = [pattern.format(ex=ex) for ex in ex]
data_folders

**Is this the path for input data, `data_folders`, correct?** *HINT:* You can check in the terminal window or using bash magic.

---

### 1.2.3 Create some functions we'll use when opening the data
1. `preprocess` will limit the number of variables we're reading in. This is an xarray feature that helps save time (and memory resources).
2. `fix_time` corrects anoying features related to how CTSM history files handle time and is provided in `neon_utils.py`.
*Don't worry too much about the details of these functions right now*


In [None]:
# -- read all variables from the netcdf files
def preprocess_all (ds):
    ds_new= ds.isel(lndgrid=0) 
    return ds_new

# -- read some of the variables from the monthly history files, 
def preprocess_h0 (ds):
    variables = ['H2OSOI', 'TSOI']
    ds_new= ds[variables].isel(lndgrid=0)
    return ds_new

# -- read only these variables from the 30 minute, h1, history files.
def preprocess_h1 (ds):
    variables = ['FCEV', 'FCTR', 'FGEV','FSH','GPP','FSA','FIRA','AR','HR','ELAI']

    ds_new= ds[variables].isel(lndgrid=0)
    return ds_new


Now we have created the functions needed to manipulate our datasets

---

### 1.2.4 List all the files we're going to open
The the monthly history output (**'h0' files**) are written out for NEON cases. 

To open all of these files we're going to need to know their names.  This can be done if we:
- Create an empty list `[]` of simulation files that is
- `.extend`ed with a 
- `sorted` list of files generted with the 
- `glob` function in python of the 
- `*h0*`files in our `data_folder` 

You'll notice that **all of this gets combined in a single line of code** that runs through a 
- `for` loop over defined simulation years (written as a list of strings)

<div class="alert alert-block alert-info">
<b>Note</b> If you're new to python it's dense, but efficient.  I actually borrowed a bunch this code from a colleague, Negin Sobhani, who's good at python! Sharing code is really helpful. 
</div>



In [None]:
# This list gives you control over the years of data to read in
# We're just going to look at one year of data
years = ["2018"]  

# Create an empty list of all the file names to extend
# We'll create different list for control and experiment cases.
control_h0,  control_h1 = [], []
experiment_h0, experiment_h1 = [], []
for year in years:
    control_h0.extend(sorted(glob(join(data_folders[0],"*h0."+year+"*.nc"))))
    control_h1.extend(sorted(glob(join(data_folders[0],"*h1."+year+"*.nc"))))
    experiment_h0.extend(sorted(glob(join(data_folders[1],"*h0."+year+"*.nc"))))
    experiment_h1.extend(sorted(glob(join(data_folders[1],"*h1."+year+"*.nc"))))


print("All simulation files for all years: [", len(control_h0), "files]")
print(control_h0[-1])



How many files are you going have to read in?  What is the last day of the simulation you'll be looking at?

---

### 1.2.5 Read in the data
`.open_mfdataset` will open all of these data files and concatinate them into a single **xarray dataset**.

There are lot of files here! Be patient it should be done in < 1 minute. This can be done more quickly with dask, but we're not going to mess with it right now.

We are going to also going use or `preprocess` and `fix_time` functions in this step.

In [None]:
start = time.time()
print ('---------------------------')
print ("Reading in h0 data for "+neon_site)
ds_control = xr.open_mfdataset(control_h0, decode_times=True, combine='by_coords',
                               preprocess=preprocess_h0)
ds_control = fix_time_h0 (ds_control)
ds_experiment = xr.open_mfdataset(experiment_h0, decode_times=True, combine='by_coords',
                                  preprocess=preprocess_h0)
ds_experiment = fix_time_h0 (ds_experiment)

# Combine datasets 
ds_h0 = xr.combine_nested([ds_control, ds_experiment], 'sim').assign_coords({'sim': ["PHS","BTRAN"]})
ds_h0.sim
end = time.time()
print("Reading all simulation files took:", end-start, "s.")


In [None]:
start = time.time()
print ('---------------------------')
print ("Reading in h1 data for "+neon_site)
ds_control = xr.open_mfdataset(control_h1, decode_times=True, combine='by_coords',preprocess=preprocess_h1)
ds_control = fix_time_h1 (ds_control)
ds_experiment = xr.open_mfdataset(experiment_h1, decode_times=True, combine='by_coords',preprocess=preprocess_h1)
ds_experiment = fix_time_h1 (ds_experiment)

# Combine datasets 
ds_h1 = xr.combine_nested([ds_control, ds_experiment], 'sim').assign_coords({'sim': ["PHS","BTRAN"]})
ds_h1.sim
end = time.time()
print("Reading all simulation files took:", end-start, "s.")


#### Take a quick look at the dataset.
- What are your coodinate variables?
- How long is the time dimensions?
- What variables do we have to look at?
- What are the long names of some of these variables? (HINT try `ds_h1.GPP`)
- What are other metadata are associated with this dataset? 

In [None]:
ds_h1

---

# 2 Manipulating data and making plots

We're just going to have a quick look at differences in h0 and h1 files

---

## 2.1 Quick look at soil moisture with depth

In [None]:
ds_h0.H2OSOI.isel(time=7).plot(y='levsoi',marker='o',ylim=(0,2),hue='sim')
plt.gca().invert_yaxis() 
plt.suptitle(neon_site) ;

## 2.2 Quick look at daily fluxes
Here' we'll:
- sum components of evapotranspiration fluxes and
- calculate and plot daily averages

In [None]:
ds_h1['ET'] = ds_h1.FCEV + ds_h1.FCTR + ds_h1.FGEV
ds_h1['ET'].resample(time='D').mean().plot(hue='sim') ;

In [None]:
# similary for GPP, modifying gC/m2/s to a daily flux
secperday = 3600 * 24
((ds_h1['GPP'].resample(time='D').mean())*secperday).plot(hue='sim') 
plt.ylabel('GPP (gC/m2/day)')
plt.suptitle(neon_site) ;

---

<div class="alert alert-block alert-success">
<b>Congratulations:</b> 
    
You can look at history files and quickly compare results between simulations.
    
There's lots more you can look at, what do you want to explore?
</div>

