# Quick plots - CTSM - Monthly 
## Simple way to quickly look at monthly history files (*h0*).  

This tutorial is an intdroduction to [xarray](https://docs.xarray.dev/en/stable/user-guide/terminology.html) and [matplotlib](https://matplotlib.org/stable/index.html). There's lot's more information to be found at the documentation for for these libraries.  Note, some users like using the seaborn library instead of matplotlib, we don't have examples using seaborn at this point.

In this tutorial you will find steps and instructions to:

1. Load datasets with xarray and quickly look at the data

------

**This tutorial uses a Jupyter Notebook.** 
For more information on Jupyter notebooks please see the information in the 1a_GitStarted tutorial, 1c_NEON_Simulation_Visualization tutorial, or visit the [Jupyter Notebook Quick Start Guide](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html). 

***

# 1. Load Datasets

## 1.1 Load Python Libraries
We always start by loading in the libraries we're going to use for the script.  There are more libraries being loaded here than we'll likely use, but this list is a good one to get started for most of your plotting needs.


In [2]:
import os
import time
import datetime

import numpy as np
import pandas as pd
import xarray as xr

from glob import glob
from os.path import join

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

import calendar

import tqdm
import cftime
from neon_utils import download_eval_files

In [3]:
print('xarray '+xr.__version__) ##-- was working with 2023.1.0

xarray 2023.1.0


## 1.2 Point to history files 

### 1.2.1 Where are my simulation results?
After your simulations finish, history files are all saved in your `/scratch/NEON_cases/archive/` directory

We can print the cases we have to look at using bash magic, `%%bash` or `!` which turns the python cell block below into a bash cell.  

In [4]:
%%bash
ls ~/scratch/NEON_cases/archive/

CPER.transient
KONZ.transient
TREE.transient
WOOD.ad
WOOD.BTRAN.ad
WOOD.EXP_test1.ad
WOOD.EXP_test1.postad
WOOD.postad
WOOD.transient


<div class="alert alert-block alert-info">
<b>Note</b> you can accomplish the same thing with the following.

> `!ls ~/scratch/NEON_cases/archive/`
    
</div>

---

### 1.2.2 Point to the data folder with history files 
**We'll set the following:**
- site to look at; 
- path to our archive directory;
- directory with input data (where history files are found).
By doing this more generally, it makes the script easier to modify for different sites.

In [5]:
neon_site = 'WOOD'  #NEON site we're going to look at
archive = '~/scratch/NEON_cases/archive' #Path to archive directory

# this unpacks the and expands the shortcut we used above
archive = os.path.realpath(os.path.expanduser(archive)) 

# Create a path to the data folder
data_folder = archive+'/'+neon_site+'.transient/lnd/hist'
data_folder = archive+'/'+neon_site+'.BTRAN.ad/lnd/hist'
data_folder

'/glade/scratch/wwieder/NEON_cases/archive/WOOD.BTRAN.ad/lnd/hist'

**Is this the path for input data, `data_folder`, correct?** 

*HINT:* You can check in the terminal window or using bash magic.

---

### 1.2.3 Create some functions we'll use when opening the data
1. `preprocess` will limit the number of variables we're reading in. This is an xarray feature that helps save time (and memory resources).
2. `fix_time` corrects anoying features related to how CTSM history files handle time.
*Don't worry too much about the details of these functions right now*


In [None]:
# -- read all variables from the netcdf files
#    this just drops an unused coordinate variable (lndgrid) from the dataset
def preprocess_all (ds):
    ds_new= ds.isel(lndgrid=0) 
    return ds_new

# -- read some of the variables from the netcdf files, 
#    this will make things faster, but requires you to list the variables you want to look at
def preprocess_some (ds):
    variables = ['H2OSOI', 'TSOI']
    ds_new= ds[variables].isel(lndgrid=0)
    return ds_new

In [None]:
# -- fix timestamp on CTSM files
def fix_time_h0 (ds):
    nsteps = len(ds.time)
    yr0 = ds['time.year'][0].values
    month0 = ds['time.month'][0].values - 1 
    day0 = ds['time.day'][0].values 

    date = cftime.datetime(yr0,month0,day0).isoformat() 
    ds['time'] = xr.cftime_range(date, periods=nsteps, freq='M')
    ds['time']= ds['time'].dt.strftime("%Y-%m").astype("datetime64[ns]")
    return ds

Now we have created the functions needed to manipulate our datasets

---

### 1.2.4 List all the files we're going to open
The the monthly history output (**'h0' files**) are written out for NEON cases. 

To open all of these files we're going to need to know their names.  This can be done if we:
- Create an empty list `[]` of simulation files that is
- `.extend`ed with a 
- `sorted` list of files generted with the 
- `glob` function in python of the 
- `*h0*`files in our `data_folder` 

You'll notice that **all of this gets combined in a single line of code** that runs through a 
- `for` loop over defined simulation years (written as a list of strings)

<div class="alert alert-block alert-info">
<b>Note</b> If you're new to python it's dense, but efficient.  I actually borrowed a bunch this code from a colleague, Negin Sobhani, who's good at python! Sharing code is really helpful. 
</div>



In [6]:
# This list gives you control over the years of data to read in
# We're just going to look at one year of data
years = ["2018"]  

# Create an empty list of all the file names to extend
sim_files = []
#for year in years:
#    sim_files.extend(sorted(glob(join(data_folder,"*h0."+year+"*.nc"))))

sim_files.extend(sorted(glob(join(data_folder,"*h0.*.nc"))))

print("All simulation files for all years: [", len(sim_files), "files]")
print(sim_files[-1])

All simulation files for all years: [ 11 files]
/glade/scratch/wwieder/NEON_cases/archive/WOOD.BTRAN.ad/lnd/hist/WOOD.BTRAN.ad.clm2.h0.0218-01-01-00000.nc


How many files are you going have to read in?  What is the last day of the simulation you'll be looking at?

---

### 1.2.5 Read in the data
`.open_mfdataset` will open all of these data files and concatinate them into a single **xarray dataset**.

We are going to also going use or `preprocess` and `fix_time` functions in this step.

In [None]:
start = time.time()
print ('---------------------------')
print ("Reading in data for "+neon_site)

# Just reading some of the data here, could use preprocess_all instead.
ds_ctsm = xr.open_mfdataset(sim_files, decode_times=True, combine='by_coords',
                            preprocess=preprocess_some)
ds_ctsm = fix_time_h0(ds_ctsm)

end = time.time()
print("Reading all simulation files took:", end-start, "s.")


### Print the dataset you're working with ds_ctsm

In [None]:
ds_ctsm

#### Take a quick look at the dataset.
- What are your coodinate variables?
- How long is the time dimensions?
- What variables do we have to look at?
- What are the long names of some of these variables? (HINT try `ds_ctsm.TSOI`)
- What are other metadata are associated with this dataset? 

---
Let's start plotting!

### Make a contour plot of soil moisture over time with depth on the y axis

In [None]:
ds_ctsm.H2OSOI.plot(robust=True, y='levsoi')
plt.gca().invert_yaxis() ;
plt.title(neon_site) ;

### This example plots: 
- vertical profiles of soil moisture 
- for one month
- over the top 2m of soil 
- with depth on the y axis, and reversed so deeper soil levels are at the bottom

In [None]:
ds_ctsm.H2OSOI.isel(time=5).plot(y='levsoi',marker='o',ylim=(0,2))
plt.gca().invert_yaxis() 
plt.suptitle(neon_site) ;

It seems odd that surface soil layers are wetter than deeper ones.

### This example plots a time series of soil moisture 
- for a single soil level

In [None]:
ds_ctsm.H2OSOI.isel(levsoi=4).plot(marker='o') ;

#### 

<div class="alert alert-block alert-success">
<b>Congratualtions:</b> 
    
You've quickly looked at some of the monthly output from CLM.

What other sites or variables would you like to look at?

Give it a shot, you can make lots of plots quickly with all this data!
    
</div>
