# Quick plots - Comparision
## Simple way to quickly look at 30 minute history files (*h1*) to   

This tutorial is an introduction to [xarray](https://docs.xarray.dev/en/stable/user-guide/terminology.html) and [matplotlib](https://matplotlib.org/stable/index.html). There is plenty more information to be found at the documentation for these libraries.

This tutorial can be be run either on data from cases that you ran earlier, or can be run on pre-staged data.

In this tutorial you will find steps and instructions to:

1. Load python libraries needed for plotting
2. Locate h1 history files for control and experimental cases
3. Load datasets with xarray
4. Plot the data
------

# 1. Load Datasets

## 1.1 Load Python Libraries
We always start by loading in the libraries we're going to use for the script. There are more libraries being loaded here than we'll likely use, but this list is a good one to get started for most of your plotting needs.


In [None]:
import os
import time
import datetime

import numpy as np
import pandas as pd
import xarray as xr

from glob import glob
from os.path import join

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from neon_utils import fix_time_h1

In [None]:
# It's helpful to document the version of some tools that are quickly changing
print('xarray '+xr.__version__) # was working with 2023.5.0

## 1.2 Point to history files 

### 1.2.1 Where are my simulation results?
After your simulations finish, history files are all saved in your `/scratch/NEON_cases/archive/` directory

We can print the cases we have to look at using bash magic, `%%bash` or `!` which turns the python cell block below into a bash cell.  

In [None]:
%%bash
ls ~/scratch/NEON_cases/archive/

<div class="alert alert-block alert-info">
<b>Note</b> you can accomplish the same thing with the following.

> `!ls ~/scratch/NEON_cases/archive/`
    
</div>

<div class="alert alert-block alert-info">
<b>Note</b> if you prefer to look at example data instead of your own data, you can read in data located at `/scratch/data/NEONv2/hist`. We'll go over this in the next section.

</div>

---

### 1.2.2 Point to the data folder with history files 
**We'll set the following:**
- site to look at
- path to our archive directory
- directory with input data (where history files are found)


By doing this more generally, it makes the script easier to modify for different sites.

<div class="alert alert-block alert-info">
<b>Note</b> you can accomplish the same thing with the following.

> `!ls ~/scratch/NEON_cases/archive/`
    
</div>

---

### 1.2.2 Point to the data folder with history files 
**We'll set the following:**
- site to look at
- path to our archive directory
- directory with input data (where history files are found)  

By doing this more generally, it makes the script easier to modify for different sites.

In [None]:
neon_site = 'HARV'  # NEON site we're going to look at
experiment = 'foliarCN-30'
archive = '~/scratch/NEON_cases/archive' # Path to archive directory

# If you prefer to look at example data, you can uncomment the following line
# archive = '~/../../scratch/data/NEONv2/hist'

# This unpacks the and expands the shortcut, '~', we used above
archive = os.path.realpath(os.path.expanduser(archive)) 

# Create a path to the data folder directories
control_dir = archive+'/'+neon_site+'.transient/lnd/hist'
experiment_dir = archive+'/'+neon_site+'.'+experiment+'.transient/lnd/hist'

control_dir

**Is this path for input data, `control_dir`, correct?** 

*HINT:* You can check in the terminal window or use bash magic (`%%bash`) and then list the contents of `control_dir` with `ls` in the same cell.

---

### 1.2.3 Create some functions we'll use when opening the data
1. `preprocess` will limit the number of variables we're reading in. This is an xarray feature that helps save time (and memory resources).
2. `fix_time` corrects annoying features related to how CTSM history files handle time and is provided in `neon_utils.py`.

*Don't worry too much about the details of these functions right now*.

In [None]:
# Read all variables from the netcdf files
# This just drops an unused coordinate variable (lndgrid) from the dataset
def preprocess_all (ds):
    ds_new= ds.isel(lndgrid=0) 
    return ds_new

# Read some of the variables from the netcdf files, 
# This will make things faster, but requires you to list the variables you want to look at
def preprocess_some (ds):
    variables = ['GPP']
    ds_new= ds[variables].isel(lndgrid=0)
    return ds_new

Now we have created the functions needed to manipulate our datasets.

---

### 1.2.4 List all the files we're going to open
The monthly history output (**'h1' files**) are written out for NEON cases. 

To open all of these files we're going to need to know their names.  This can be done if we:
- Create an empty list `[]` of simulation files that is
- `.extend`ed with a 
- `sorted` list of files generated with the 
- `glob` function in python of the 
- `*h0*`files in our `data_folder` 

You'll notice that **all of this gets combined in a single line of code** that runs through a 
- `for` loop over defined simulation years (written as a list of strings)

<div class="alert alert-block alert-info">
<b>Note</b> If you're new to python it's dense, but efficient.  I actually borrowed a bunch this code from a colleague, Negin Sobhani, who's good at python! Sharing code is really helpful. 
</div>



In [None]:
# Create an empty list of all the file names to extend
control_files = []
experiment_files = []

# If you want to choose a few particular years, you can use this loop.  
# You can modify the list of years you're looking at.
years = ["2021"]
for year in years:
    control_files.extend(sorted(glob(join(control_dir,"*h1."+year+"*.nc"))))
    experiment_files.extend(sorted(glob(join(experiment_dir,"*h1."+year+"*.nc"))))



How many files are you going have to read in?  What is the last day of the simulation you'll be looking at?

In [None]:
print("Total number of simulation files: ", len(control_files), "files")
print("Last simulation file:", control_files[-1])

---

### 1.2.5 Read in the data
`xr.open_mfdataset` will open all of these data files and concatenate them into a single **xarray dataset**.

We are also going to use the `preprocess` and `fix_time` functions in this step.

In [None]:
start = time.time()
print ("Reading in data for "+neon_site)

# Just reading *some* of the data here, could use preprocess_some instead.
# start with the control case
ds_control = xr.open_mfdataset(control_files, decode_times=True, combine='by_coords',
                            preprocess=preprocess_all)
ds_control = fix_time_h1(ds_control)

# repeat for the experimental case
print ("Reading in data for "+neon_site+"."+experiment)
ds_experiment = xr.open_mfdataset(experiment_files, decode_times=True, combine='by_coords',
                            preprocess=preprocess_all)
ds_experiment = fix_time_h1(ds_experiment)

end = time.time()
print("Reading all simulation files took:", end-start, "s.")


Note, merging the datasets above could be helpful...

### Print the dataset you're working with

In [None]:
# sum up components of ET fluxes
ds_control['ET'] = ds_control.FCEV + ds_control.FCTR + ds_control.FGEV
ds_control['ET'].attrs['long_name'] = "Evapotranspiration flux"
ds_control['ET'].attrs['units'] = ds_control.FCEV.attrs['units']

ds_experiment['ET'] = ds_experiment.FCEV + ds_experiment.FCTR + ds_experiment.FGEV
ds_experiment['ET'].attrs['long_name'] = "Evapotranspiration flux"
ds_experiment['ET'].attrs['units'] = ds_experiment.FCEV.attrs['units']
ds_experiment

#### Take a quick look at the dataset.
- What are your coodinate variables?
- How long is the time dimensions?
- What variables do we have to look at?
- What are the long names of some of these variables? (HINT: try `ds_ctsm.TSOI`)
- What are other metadata are associated with this dataset? 

---

### Let's start plotting!

### Time series of daily GPP fluxes
**Note** this isn't the most efficient yway of calculating daily means for a full dataset, but it works.

In [None]:
var = ['GPP','ELAI','FSH','ET']

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=[12,8],sharex=True)

for v in range(len(var)):
    plt.subplot(2,2,(v+1))
    
    ds_control[var[v]].resample(time='D').mean().plot(label='control')
    ds_experiment[var[v]].resample(time='D').mean().plot(label=experiment)
    if v==0:
        plt.legend()
        plt.title(neon_site)
    if v<2:    
        plt.xlabel(None)

# Show plot
plt.show() ;

---
**You also may want to look at:** 
- ecosystem C pools (h0 files), 
- the sum of annual flxues, 
- more years of data, 
- NEON observations, or 
- additional variables


<div class="alert alert-block alert-success">
<b>Congratualtions:</b> 
You've compared a control and experimental case!
    
</div>