# Tutorial 2c - Code Modifications Tutorial (Visualization)

This tutorial is an introduction to analyzing results from your code moficiations for NEON cases.  It uses results from the case you ran in the 0b and 2c tutorials, but you don't have to wait for those runs to complete before doing this tutorial too. We've prestaged model results from this simulation in a shared directory. This way, you can get started on analyzing simulations results before your simulations finish running.

You can also check (NEON visualization)[`https://ncar.github.io/ncar-neon-books/notebooks/NEON_Visualization_Tutorial.html`] tutorial for more advance visualization features. 

## In this tutorial

The tutorial has several objectives: 
1. Increase familiarity with `Xarray` and `pandas`.
2. Increase knowledge of python packages and their utilities
3. Compare results from original code with the modified code for a NEON tower.


***
**This tutorial uses a Jupyter Notebook.** A Jupyter Notebook is an interactive computing environment that enables the creation and sharing of documents that contain discrete cells of text or documentation and executable code, including plots. It allows users to access, run, and edit the code in a browser. To interact with this notebook:

- Execute or "run" cells of executable code (cells denoted with '[ ]:') using the play button in the menu at the top (see below image)

- The results of running code, such as plots, will appear below that cell

- Each step must be followed in order, otherwise you may encounter errors

![run cell](https://problemsolvingwithpython.com/02-Jupyter-Notebooks/images/run_cell.png)

For more information on Jupyter notebooks please see the [Jupyter Notebook Quick Start Guide](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html). 


<div class="alert alert-block alert-info">
<b>NOTE:</b> In Day 2c, executable code blocks used a Bash shell, or had to be executed on the command-line.  In this tutorial, we will be using Python code, and you should directly execute the contents of code blocks by running individual cells in this Jupyter notebook, similar to the Day 0b <i>Run NEON</i> tutorial.
</div>

***

# 1. Load our python packages

Here we are importing python package and libraries we are using for this simulations:

In [1]:
#Import Libraries
%matplotlib inline

import os
import sys
import time
import datetime

import numpy as np
import pandas as pd
import xarray as xr

from glob import glob
from os.path import join, expanduser

import matplotlib
import matplotlib.pyplot as plt

from scipy import stats

In [2]:
neon_site = "KONZ"
year = "2018"

## 2. Load and explore CTSM data:

### 2.1 Load original CTSM results:
Here, we want to read and analyze the result from original (unmodified) CTSM code.
First, let's list all our CTSM files:

In [None]:
sim_path = "~/archive_original/"+neon_site+".transient/lnd/hist/"
sim_files = sorted(glob(join(sim_path,neon_site+".transient.clm2.h1."+year+"*.nc")))

print("All Simulation files: [", len(sim_files), "files]")
print(*sim_files,sep='\n')

Next, let's load read ctsm history files into memory. For this purpose, we are using `open_mfdataset` function which opens up multiple netcdf files at the same time. 

In [None]:
start = time.time()
ds_ctsm_orig = xr.open_mfdataset(sim_files, decode_times=True, combine='by_coords',parallel=True)
end = time.time()
print("Reading original simulation files took:", end-start, "s.")

This step looks at the dataset that was just created from the simulation data. This step is not required, but will allow you to explore the python dataset and become familiar with the data.

Run the below cell to find more information about the data:

In [4]:
ds_ctsm

NameError: name 'ds_ctsm' is not defined

In the output, you can click on Dimensions, Coordinates, Data Variables, and Attributes to expand and see the details and metadata associated with this dataset.

If you click on Data Variables, you will see a list of all the available variables. You can click on the ‘note’ icon at the right end of the line for each variable to see a description of the variable (the long_name) and its units, as well as other information. Here are a few questions to consider:

Questions to consider

1. What variables are available in the dataset?

2. What is the long_name and unit of the variable FSH?

3. Can you find the dimensions of this variable?


<div class="alert alert-block alert-info">

<b>💡 Tip: </b>  Xarray has built-in plotting functions. For quick inspection of a variable, we can use .plot() to see it. Xarray plotting functionality is a thin wrapper around the popular `matplotlib` library. For more advanced plots, we use `matplotlib` directly.

</div>

Let's quickly inspect GPP from original simulation.

<div class="alert alert-block alert-info">

<b>INFO:</b>  Gross Primary Production (GPP) is the total amount of CO2 that is fixed by plants through photosynthesis.

</div>

The code below will make a basic plot of the Gross Primary Production (GPP) variable:

In [None]:
ds_ctsm_orig.GPP.plot()

### 2.2 Load original CTSM results:

Now, we have to load the modified code:

In [None]:
sim_path_mod = "~/archive/"+neon_site+".transient/lnd/hist/"
sim_files_mod = sorted(glob(join(sim_path_mod,neon_site+".transient.clm2.h1."+year+"*.nc")))

start = time.time()
ds_ctsm_mod = xr.open_mfdataset(sim_files_mod, decode_times=True, combine='by_coords',parallel=True)
end = time.time()
print("Reading modified simulation files took:", end-start, "s.")

In [None]:
Now, let's inspect GPP from the modified simulation:

In [None]:
ds_ctsm_mod.GPP.plot()

**Question**: Can you noticed how the two simulations are different?


______________________________________________________________

## 3. Load evaluation (NEON) data
Next, let's download evaluation files from NEON server for creating these plots:

In [None]:
eval_dir = "~/evaluation_files/"
download_eval_files(neon_site, eval_dir)

Now, let's read these downloaded evaluation files from NEON:

In [None]:
eval_path = os.path.join(eval_dir,neon_site)
eval_files = sorted(glob(join(eval_path,neon_site+"_eval_"+year+"*.nc")))

start = time.time()
ds_eval = xr.open_mfdataset(eval_files, decode_times=True, combine='by_coords')
end = time.time()
print("Reading all observation files took:", end-start, "s.")

Let's inspect the evaluation files from NEON: 

In [None]:
ds_eval.GPP.plot()

In [None]:
Can you tell which one of the simulations is closest to the NEON evaluation files?

In [None]:
___________________________

In [None]:
Let's create a fancier plot that shows all three time series on top of each other:

In [None]:
#Convert NEON data to a Pandas Dataframe for easier handling:
eval_vars = ['GPP','NEE']

df_all = pd.DataFrame({'time':ds_eval.time})

for var in eval_vars:
    field = np.ravel ( ds_eval[var])     
    df_all[var]=field
    

#Convert CTSM data to a Pandas Dataframe for easier handling:
ctsm_vars = ['GPP','NEE']
df_ctsm = pd.DataFrame({'time':ds_ctsm.time})

for var in ctsm_vars:
    sim_var_name = "sim_"+var
    field = np.ravel ( ds_ctsm_orig[var])     
    df_ctsm[sim_var_name]=field
    
    sim_var_name = "sim_"+var+"_mod"
    field = np.ravel ( ds_ctsm_mod[var]) 
    df_ctsm[sim_var_name] = field


#-- add simulation data to df_all and adjust for offset time dimension:
for var in ctsm_vars:
    sim_var_name = "sim_"+var
    #-- shift simulation data by one
    df_all[sim_var_name]=df_ctsm[var].shift(-1).values
    if var =='NEE'
        df_all [sim_var_name]= df_all[sim_var_name]*60*60*24
