# Quick plots - CTSM-FATESsp
## Quickly look at various output from FATESsp

This tutorial is an introduction to [xarray](https://docs.xarray.dev/en/stable/user-guide/terminology.html) and [matplotlib](https://matplotlib.org/stable/index.html). There is plenty more information to be found at the documentation for these libraries.

This tutorial can be be run either on data from cases that you ran earlier, or can be run on pre-staged data.

In this tutorial you will find steps and instructions to:

1. Load python libraries
2. Locate history files
3. Read in history files
4. Make plots for variables including soil moisture and GPP

------

## 1. Load Datasets

### 1.1 Load Python Libraries
We always start by loading in the libraries we're going to use for the script.  There are more libraries being loaded here than we'll likely use, but this list is a good one to get started for most of your plotting needs.


In [None]:
import os
import time
import datetime

import numpy as np
import pandas as pd
import xarray as xr

from glob import glob
from os.path import join

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from neon_utils import fix_time_h1

In [None]:
# It's helpful to document the version of some tools that are quickly changing
print('xarray '+xr.__version__) # was working with 2023.5.0

## 1.2 Point to history files 

### 1.2.1 Where are my simulation results?
After your simulations finish, history files are all saved in your `/scratch/NEON_cases/archive/` directory.

We can print the cases we have to look at using bash magic, `%%bash` or `!` which turns the python cell block below into a bash cell.  

In [None]:
%%bash
ls ~/scratch/NEON_cases/archive/

<div class="alert alert-block alert-info">
<b>Note</b> you can accomplish the same thing with the following.

> `!ls ~/scratch/NEON_cases/archive/`
    
</div>


<div class="alert alert-block alert-info">
<b>Note</b> if you prefer to look at example data instead of your own data, you can read in data located at `/scratch/data/NEONv2/hist`. We'll go over this in the next section.

</div>

---

### 1.2.2 Point to the directory with history files 
**We'll set the following:**
- site to look at
- path to our archive directory
- directory with input data (where history files are found)  

By doing this more generally, it makes the script easier to modify for different sites.

In [None]:
neon_site = 'STEI'  # NEON site we're going to look at

# If you would like to look at your own data, set the path to your archive directory
archive = '~/scratch/NEON_cases/archive' # Path to archive directory

# If you would like to look at example data, set the path to this archive directory
archive = '/scratch/wwieder/NEON_cases/archive'  #TODO: COMMENT OUT!

# This expands the shortcut we used above
archive = os.path.realpath(os.path.expanduser(archive)) 

# Identify path to the data folder
data_folder = archive+'/'+neon_site+'_FATESsp_test/lnd/hist'
data_folder

**Is this the path for input data, `data_folder`, correct?** 

*HINT:* You can check in the terminal window or using bash magic.

---

### 1.2.3 Create some functions we'll use when opening the data
1. `preprocess` will limit the number of variables we're reading in. This is an xarray feature that helps save time (and memory resources).
2. `fix_time_h1` corrects annoying features related to how CTSM history files handle time and is provided as part of `neon_utils.py`.

*Don't worry too much about the details of these functions right now.*


In [None]:
# Read only these variables from the netcdf files
def preprocess_some (ds):
    variables = ['FCEV', 'FCTR', 'FGEV','FSH','GPP','FSA','FIRA','AR','HR','ELAI']
    ds_new= ds[variables].isel(lndgrid=0)
    return ds_new

# Read all these variables from the netcdf files
def preprocess_all (ds):
    ds_new= ds.isel(lndgrid=0)
    return ds_new

Now we have created the functions needed to manipulate our datasets.

---

### 1.2.4 List all the files we're going to open
The the 30-minute, high frequency history output (**'h1' files**) are written out every day in for NEON cases. 

To open all of these files we're going to need to know their names.  This can be done if we:
- Create an empty list `[]` of simulation files that is
- `.extend`ed with a 
- `sorted` list of files generted with the 
- `glob` function in python of the 
- `*h1*`files in our `data_folder` 

In [None]:
# This list gives you control over the years of data to read in
# We're just going to look at one year of data
years = ["2019"]  

# Create an empty list of all the file names to extend
sim_files = []
for year in years:
    sim_files.extend(sorted(glob(join(data_folder,"*h1."+year+"*.nc"))))

How many files are you going have to read in?  What is the last day of the simulation you'll be looking at?

In [None]:
print("Total number of simulation files: ", len(sim_files), "files")
print("Last simulation file:", sim_files[-1])

---

### 1.2.5 Read in the data
`xr.open_mfdataset` will open all of these data files and concatinate them into a single **xarray dataset**.

We are going to also going use or `preprocess` and `fix_time` functions in this step.

In [None]:
start = time.time()
print ("Reading in data for "+neon_site)

# Just reading *some* of the data here, could use preprocess_all instead.
ds_ctsm = xr.open_mfdataset(sim_files, decode_times=True, combine='by_coords',
                            preprocess=preprocess_all)
ds_ctsm = fix_time_h1(ds_ctsm)

end = time.time()
print("Reading all simulation files took:", end-start, "s.")


### Print the dataset you're working with ds_ctsm

In [None]:
ds_ctsm

#### Take a quick look at the dataset.
- What are your coodinate variables?
- How long is the time dimensions?
- What variables do we have to look at?
- What are the long names of some of these variables? (HINT: try `ds_ctsm.FATES_GPP`)
- What other metadata are associated with this dataset? 

---
Let's start plotting!
#### What do GPP fluxes look like in this FATES_SP run?

In [None]:
ds_ctsm.FATES_GPP.plot(); 

Let's repeat this, but look at daily mean fluxes

In [None]:
# Convert from kgC/m2/s to daily flux (gC/m2/d)
spd = 24 * 60 * 60
((ds_ctsm.FATES_GPP.resample(time='D').mean())*spd*1e3).plot()
plt.ylabel('GPP (gC/m2/d)');

#### The `FATES_GPP` variable actually includes several PFTs on the surface dataset.  
This is different from how single point *"Big Leaf"* CLM simulations are done by default, which only have a single PFT on the surface data.

We can look at each of the PFTs from our FATES-SP run below.

In [None]:
temp = ((ds_ctsm.FATES_GPP_PF.resample(time='D').mean())*spd*1e3)
temp.plot(hue='fates_levpft') 
plt.ylabel('FATES GPP (gC/m2/d)') ;

You can find out what PFTs this corresponds to with the following commnads

```
cat ~/scratch/NEON_cases/STEI_FATESsp_test/run/lnd_in | grep fates_paramfile
ncdump -v fates_pftname <path to the fates_paramfile from above>
```

For the STEI site, it looks like FATES PFTs # 2, 6, & 11.
You'll have to go to the FATES github  
This corresponds to:
- needleleaf_evergreen_extratrop_tree
- broadleaf_colddecid_extratrop_tree
- cool_c3_grass


In [None]:
temp.sel(fates_levpft=[2,6,11]).plot(hue='fates_levpft') 
plt.ylabel('FATES GPP (gC/m2/d)');

This raises questions about why the GPP for certain PFTs is so much lower?  
Units are per m2, so it's not that the grass makes up a smaller fraction of the total grid area.
- Per unit leaf area, is the photosynthetic capacity of grasses lower?
  - A number of parameter control photosynthetic capacity but vcmax is one important place to start
  - HINT to get started try
     >cat ~/scratch/NEON_cases/STEI_FATESsp_test/run/lnd_in | grep fates_paramfile
     
     >ncdump -v fates_leaf_vcmax25top `<path to the fates_paramfile from above>`

- Does this have to do with canopy scaling?  That is, does the grass PFT just have a lower LAI?
  - You can look at this by printing the LAI for each corresponding PFT on the CLM surface dataset.
  - HINT to get started try:
    > cat ~/scratch/NEON_cases/STEI_FATESsp_test/run/lnd_in | grep fsurdat
   
    > ncdump -v MONTHLY_LAI,PCT_NAT_PFT `<path to the surface dataset from above>`
   
Note: the CLM PFT indexes are different from what FATES uses.

Moreover, LAI on the surface dataset is kind of hard to interpret, as they are monthly values dimensioned [time x PFT].  

<div class="alert alert-block alert-warning">
<b>CHALLENGE</b> 
    
Can you write a few lines of code to open the surface dataset and plot the monthly PFT values for the PFTs represented in your FATES-SP case?
    
</div>

It's also helpful to look at the simulated energy budget.
- Does net radiation, sensible heat flux, and latent heat flux seem OK?
- How do we compare fluxes from multiple PFTs to flux tower measurements that integrate fluxes across their entire footprint?
- We have information on the gridcell weighted mean fluxes on our h1 files, but can you write out these fluxes at a PFT level?
- Would it be helpful to have a results from a *Big Leaf* CLM simulation to compare to?  

There's a lot to start looking into here! As you can see, this quickly gets complicated to investigate.  

#### Make a contour plot of soil moisture over time with depth on the y axis

In [None]:
ds_ctsm.H2OSOI.plot(robust=True, y='levsoi')
plt.gca().invert_yaxis();
plt.title(neon_site);

### This example plots: 
- vertical profiles of soil moisture 
- for one time step
- over the top 4m of soil 
- with depth on the y axis, and reversed so deeper soil levels are at the bottom

In [None]:
ds_ctsm.H2OSOI.isel(time=(181*48)).plot(y='levsoi',marker='o',ylim=(0,4))
plt.gca().invert_yaxis() 
plt.suptitle(neon_site);

It seems odd that surface soil layers are wetter than deeper ones.

### This example plots a time series of soil moisture 
- for a single soil level

In [None]:
ds_ctsm.H2OSOI.isel(levsoi=4).plot();

<div class="alert alert-block alert-success">
<b>Congratulations:</b> 
    
You've quickly looked at some of the monthly output from CLM-FATESsp run.

What other sites or variables would you like to look at?

Give it a shot, you can make lots of plots quickly with all this data!
    
</div>
