# CLIM 614 -- Homework #0
This homework assignment consists of three parts. Each is meant to give you some familiarity and experience with some of the basic concepts needed for this class:

* __A. Scale analysis__
    * How the magnitudes of various terms contributing to the time series of a climate state determine which are important.
* __B. Prognostic equations__
    * Visualizing the differences between state variables, fluxes and parameters.
* __C. Basic budgets__
    * Assessing closure of the terms in a budget and how to use the residual of a budget to estimate missing terms.

In [None]:
# Import useful software packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.integrate import odeint

rng = np.random.default_rng()

# Set some universal parameters
seconds_per_day = 86400
days_per_year = 365

## A. Scale analysis
In this exercise, you will construct idealized representations of time series of temperature to 
understand how the scale of various terms lead to relatively important (and unimportant) variations.


In [None]:
#########################################################
#########################################################
### Parameters for this exercise:
timestep = 3600          # in seconds
total_years = 1          # Number of years to simulate 

diurnal_magnitude = 1   # Try 1 and 10, at least
annual_magnitude = 10    # Try 1 and 10, at least
random_magnitude = 1    # Try 1 and 10, at least
#########################################################
#########################################################

# Construct a synthetic time series with multiple components.
total_steps = int(total_years * days_per_year * seconds_per_day / timestep)
time_series = np.full(total_steps,np.nan) # Set up an empty time series to fill below

for t in range(total_steps):  # Loop through each hour
    #if t % t_per_day == 0: print(".",end="")  # Uncomment this line if you want to see the loop progress
    diurnal = diurnal_magnitude * np.sin(t*timestep/seconds_per_day*np.pi*2)               # Add a diurnal cycle
    annual = annual_magnitude * np.sin(t*timestep/(days_per_year*seconds_per_day)*np.pi*2) # Add a seasonal cycle
    noise = random_magnitude * rng.standard_normal()                                   # Add Gaussian noise
    time_series[t] = diurnal + annual + noise                     # Time series is a combination of the three 
    
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
# Plot the results
fig = plt.figure(figsize = (21, 6))
spec = fig.add_gridspec(ncols=2, nrows=1, width_ratios=[1,4])
plt.subplots_adjust(wspace=0.07)
ax1 = fig.add_subplot(spec[0,0])
ax2 = fig.add_subplot(spec[0,1])

# First panel is zoomed in on a 240-timestep period (10-days if timestep is hourly)
ax1.plot(time_series[:240])
ax1.plot(pd.Series(time_series).rolling(24,center=True).mean()[:240])
ax1.set_xlabel("Time Step [hours]")

# Second panel is the entire time series
ax2.plot(time_series)
plt.plot(pd.Series(time_series).rolling(24,center=True).mean())
ax2.set_xlabel("Time Step [hours]")
ax2.legend(["Time series","Running Mean"],loc=2)

### Time to tinker

Make a copy of the cell above where you can experiment: 
1. Try changing the magnitudes for diurnal, annual and random (8 possible combinations if you stick to 1 and 10 but feel free to try other values as well).
2. Change `total_years` from 1/12 to 1, and to 10. Here you are changing the time scale of visualization.

### Answer some questions
In the cell below, record your observations. Be sure to comment on:
1. How do the relative magnitudes of `diurnal`, `annual` and `random` affect the appearance of the time series?
2. When the `random` magnitude is large, how does it affect the running mean? The ability to perceive the other cycles?
3. Consider `random` as *noise* and the other two cycles as *signal*. What appears to be the threshold of *signal:noise* below which *signal* becomes hard to detect?

--------------

## B. Prognostic equations
In this exercise, we will use the classic 3-state Lorenz System as the
basis to explore how states, fluxes and parameters work when parameterizing aspects
of the climate system, including commonly the land-atmosphere part of the system:

$${dx}/{dt} = \sigma (y - x)$$
$${dy}/{dt} = x (\rho - z) - y $$
$${dz}/{dt} = xy - \beta z$$

In this system we have three state variables: $x$, $y$, and $z$; and three parameters: $\sigma$, $\rho$, and $\beta$.

The first equation has the form of a linear conduction equation: the difference $(y - x)$ is *like a gradient*, and the parameter $\sigma$ controls the degree to which the gradient controls the rate of change of $x$, which is ${dx}/{dt}$. The term $\sigma (y - x)$ behaves like a flux.

The other two equations are nonlinear (i.e., they include products of state vairables), and are thus more complicated.

We plot three views of the Lorenz system's evolution over time (each dot shows the values of $x$, $y$ and $z$ at each time step) - one plot for each pairing of variables, plotted as dimensions along the axes ([$x,y$], [$x,z$] and [$y,z$]).

In [None]:
# The Lorenz system model. 
# The classical values are: σ = 10, ρ = 28, β = 8/3
#########################################################
#########################################################
### Parameters for this exercise:
sigma = 10  
rho = 28
beta = 8/3
#########################################################
### Initial conditions:
state_0 = [2.0, 3.0, 1.0]    # [x0,y0,z0]
#state_0 = [8.48525, 8.48525, 27.0]    # [x0,y0,z0]
#########################################################
#########################################################

def lorenz(state, t): # Function to calculate the Lorenz model
    x, y, z = state  # Unpack the state vector (3 states)
    return sigma * (y - x), x * (rho - z) - y, x * y - beta * z  # The 3 derivatives

t = np.arange(0.0, 40.0, 0.01)           # Time steps
states = odeint(lorenz, state_0, t)      # Integrate the system of ordinary differential equations

#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
# Plot the results, colored by time from the initial state (red to green to blue and purple)
# Three views are given - one for each pairing of dimensions ([x,y], [x,z] and [y,z])
fig, (ax1,ax2,ax3) = plt.subplots(ncols=3,figsize=(21, 6),gridspec_kw={'width_ratios': [5,5,6]})
im1 = ax1.scatter(states[:,0],states[:,1],c=t,cmap='gist_rainbow',s=1)
im2 = ax2.scatter(states[:,0],states[:,2],c=t,cmap='gist_rainbow',s=1)
im3 = ax3.scatter(states[:,1],states[:,2],c=t,cmap='gist_rainbow',s=1)
# Mark the starting point
ax1.plot(states[0,0],states[0,1],'kv')
ax2.plot(states[0,0],states[0,2],'kv')
ax3.plot(states[0,1],states[0,2],'kv')

ax1.plot([-15,15],[-15,15],c='grey',ls='-.') # Diagonal line on the first panel, showing x=y
fig.colorbar(im3,ax=ax3)                    # Color bar to the right of the 3rd panel
ax1.set_xlabel("x",fontsize=14) ; ax1.set_ylabel("y",rotation='horizontal',fontsize=14)
ax2.set_xlabel("x",fontsize=14) ; ax2.set_ylabel("z",rotation='horizontal',fontsize=14)
ax3.set_xlabel("y",fontsize=14) ; ax3.set_ylabel("z",rotation='horizontal',fontsize=14)
fig.suptitle(f"Lorenz system with σ={sigma:.4g}, ρ={rho:.4g}, β={beta:.4g}",fontsize=16) ;

### Time to tinker

Make a copy of the cell above where you can experiment:
1. Try changing the initial conditions and see what happens (compare to above).
2. Try changing the parameters, one at a time, and see what happens (compare to above).

### Answer some questions
In the cell below, write your answers to these questions.
1. Does changing the initial state affect the result? In what way? And what about the plots seems not to change regardless how you change the initial state?
2. The "butterfly" pattern that Lorenz equations make consists of dots *orbiting* around two points. The middle of these two points is at the same value of $z$. What is it and why?
3. Think of ${dx}/{dt}$ like a velocity, indicating the distance between successive points in the X direction. What is the effect in the first equation of the term $\sigma (y - x)$ being positive versus negative, and when does each happen? A diagonal line along $x=y$ is plotted in the first panel as a visual guide.  


----------------

## C. Basic Budgets

In this exercise, we will use observations from a FLUXNET site, where near-surface meteorology, surface radiation, 
heat and moisture fluxes, soil temperature and water content are measured and recorded every 30 minutes.

**Instruments are not perfect** - they have both random and systematic errors, limited precision, 
calibration difficulties, and problems with drift over time. 
Furthermore, some quantities such as surface latent and sensible heat fluxes are not measured directly,
but are inferred from other measurements based on theoretical relationships and assumptions.
As a result, measurements from different instruments representing components of a budget that we know should balance perfectly, 
such as energy or water, do not always add up to zero.

**C.1**: In the plots below, the terms are abbreviated:
* `SW↓ ` = downward shortwave (solar) radiation at the surface
* `LW↓ ` = downward longwave (thermal) radiation at the surface
* `SW↑ ` = upward (reflected) shortwave radiation from the surface
* `LW↑ ` = upward (emitted) longwave radiation from the surface
* `LHF↑` = latent heat flux (via evapotranspiration) from the surface
* `SHF↑` = sensible heat flux from the surface 
* `GHF↓` = heat flux into the ground from the surface

In [None]:
# Open and prepare the data set
df = pd.read_csv("FLX_DE-Lnf_FLUXNET2015_subset_HH_2011.csv")  # Location: Leinefelde, Germany
df['TIMESTAMP_START'] = pd.to_datetime(df['TIMESTAMP_START'],format="%Y%m%d%H%M") # Timestamp in datetime format
df = df.set_index(['TIMESTAMP_START']) # Set the time as the index for the dataframe
df = df.replace(-9999,np.NaN)          # Set cells with the missing value -9999 to NaN instead

In [None]:
# This cell generates a 1-day plot of measurements from instruments recording terms of the surface energy balance
one_day = df.loc['2011-07-01']  # Choose a single day to display

# The energy imbalance is calculated as the residual of all the suface energy budget terms: 
energy_imbalance = (one_day['SW_IN_F_MDS'] + one_day['LW_IN_F_MDS'] 
                    - one_day['SW_OUT'] - one_day['LW_OUT']
                    - one_day['LE_F_MDS'] - one_day['H_F_MDS'] - one_day['G_F_MDS'])

bias_imbalance = energy_imbalance.mean()   # Mean of the imbalance
rms_imbalance = energy_imbalance.std()     # Root mean square error (Assuming 0 imbalance is correct)

fig = plt.figure(figsize = (15, 6))
plt.plot(one_day[['SW_IN_F_MDS','LW_IN_F_MDS','SW_OUT','LW_OUT','LE_F_MDS','H_F_MDS','G_F_MDS']])
plt.bar(energy_imbalance.index,energy_imbalance,width=0.004,edgecolor='k',facecolor='grey')
plt.legend(["SW↓","LW↓","SW↑","LW↑","LHF↑","SHF↑","GHF↓","Imbalance"])
plt.xticks(rotation=30)
plt.ylabel("$W/m^2$",fontsize=14)
plt.title(f"Uncorrected fluxes -- Imbalance bias: {bias_imbalance:.0f} $W/m^2$, RMSE: {rms_imbalance:.0f} $W/m^2$",fontsize=16) ;

The cell above plots the data as measured. The cell below plots "corrected" values of latent and sensible heat flux. The corrections were applied as part of the data processing to try to reduce biases under the assumption that biases in measuring sensible and latent heat fluxes are the same (i.e., the ratio `SHF↑/LHF↑`, known as the *Bowen Ratio*, is correct). (see Section 3 of the [FLUXNET2015 data processing documentation](https://fluxnet.org/data/fluxnet2015-dataset/data-processing/) for a detailed description of the correction procedure).

In [None]:
# Repeat for same day with the "energy balance corrected" surface heat flux terms
energy_imbalance = (one_day['SW_IN_F_MDS'] + one_day['LW_IN_F_MDS'] 
                    - one_day['SW_OUT'] - one_day['LW_OUT']
                    - one_day['LE_CORR'] - one_day['H_CORR'] - one_day['G_F_MDS'])

bias_imbalance = energy_imbalance.mean()   # Mean of the imbalance
rms_imbalance = energy_imbalance.std()     # Root mean square error (Assuming 0 imbalance is correct)

fig = plt.figure(figsize = (15, 6))
plt.plot(one_day[['SW_IN_F_MDS','LW_IN_F_MDS','SW_OUT','LW_OUT','LE_CORR','H_CORR','G_F_MDS']])
plt.bar(energy_imbalance.index,energy_imbalance,width=0.004,edgecolor='k',facecolor='grey')
plt.legend(["SW↓","LW↓","SW↑","LW↑","LHF corr","SHF corr","GHF↓","Imbalance"])
plt.xticks(rotation=30)
plt.ylabel("$W/m^2$",fontsize=14)
plt.title(f"Bowen ratio corrected fluxes -- Imbalance bias: {bias_imbalance:.0f} $W/m^2$, RMSE: {rms_imbalance:.0f} $W/m^2$",fontsize=16) ;

### Answer some questions
In the cell below, write your answers to these questions.
1. Which term has the highest daily mean magnitude? The greatest peak value? Does this agree with your intuition?
2. Based on the equation for `energy_imbalance` in the code above, what's the meaning of a <u>positive</u> vs <u>negative</u> value of *imbalance*?
3. Does the "correction" improve the bias? What about the root-mean-square-error (RMSE)? Try putting in some other dates (the setting for the variable `one_day`) and comparing to see if the results are consistent; report on what you find.

***
**C.2**: Below, the energy budget analysis is extended to a full month....

In [None]:
# This cell generates a 1-month plot of measurements from instruments recording terms of the surface energy balance
one_month = df.loc['2011-06']  # Choose a month to display
energy_imbalance = (one_month['SW_IN_F_MDS'] + one_month['LW_IN_F_MDS'] 
                    - one_month['SW_OUT'] - one_month['LW_OUT']
                    - one_month['LE_F_MDS'] - one_month['H_F_MDS'] - one_month['G_F_MDS'])

bias_imbalance = energy_imbalance.mean()   # Mean of the imbalance
rms_imbalance = energy_imbalance.std()     # Root mean square error (Assuming 0 imbalance is correct)

fig = plt.figure(figsize = (15, 6))
plt.plot(one_month[['SW_IN_F_MDS','LW_IN_F_MDS','SW_OUT','LW_OUT','LE_F_MDS','H_F_MDS','G_F_MDS']])
#plt.plot(energy_imbalance,"k:")
plt.bar(energy_imbalance.index,energy_imbalance,width=0.004,edgecolor='k',facecolor='grey')
plt.legend(["SW↓","LW↓","SW↑","LW↑","LHF↑","SHF↑","GHF↓","Imbalance"])
plt.xticks(rotation=30)
plt.title(f"Uncorrected fluxes -- Imbalance bias: {bias_imbalance:.0f} $W/m^2$, RMSE: {rms_imbalance:.0f} $W/m^2$",fontsize=16) ;

In [None]:
energy_imbalance = (one_month['SW_IN_F_MDS'] + one_month['LW_IN_F_MDS'] 
                    - one_month['SW_OUT'] - one_month['LW_OUT']
                    - one_month['LE_CORR'] - one_month['H_CORR'] - one_month['G_F_MDS'])

bias_imbalance = energy_imbalance.mean()   # Mean of the imbalance
rms_imbalance = energy_imbalance.std()     # Root mean square error (Assuming 0 imbalance is correct)

fig = plt.figure(figsize = (15, 6))
plt.plot(one_month[['SW_IN_F_MDS','LW_IN_F_MDS','SW_OUT','LW_OUT','LE_CORR','H_CORR','G_F_MDS']])
#plt.plot(energy_imbalance,"k:")
plt.bar(energy_imbalance.index,energy_imbalance,width=0.004,edgecolor='k',facecolor='grey')
plt.legend(["SW↓","LW↓","SW↑","LW↑","LHF corr","SHF corr","GHF↓","Imbalance"])
plt.xticks(rotation=30)
plt.ylabel("$W/m^2$")

plt.title(f"Bowen ratio corrected fluxes -- Imbalance bias: {bias_imbalance:.0f} $W/m^2$, RMSE: {rms_imbalance:.0f} $W/m^2$",fontsize=16) ;

### Answer some questions
In the cell below, write your answers to these questions.
1. Now the day-to-day variability in the surface energy budget can be seen. First, study only at the "uncorrected" flux panel - there are some systematic behaviors between variables ("on days when variable A does this, variable B usually does that"). Comment on what you see.
2. Now do the same for the imbalance - do you see any systematic behaviors, either within the diurnal cycle, or across different kinds of days? Explain what you see.
3. With a full month of data, you now have a fairly stable estimate of the bias and RMSE for the energy balance with uncorrected versus corrected heat fluxes. Now would you say the "correction" improve the bias? What about the root-mean-square-error (RMSE)? Comment on what the correction appears to address well, and what it does not. How might this affect your use of the data?

***
**C.3**: Accumulated errors are an issue for any budget calculation. When terms are supposed to be in balance, even a very small error, if systematic, may grow to generate a significant bias.

Below, we calculate and display the cumulative sum of the imbalances over the one-month period.

In [None]:
uncorr_imbalance   = (one_month['SW_IN_F_MDS'] + one_month['LW_IN_F_MDS'] 
                    - one_month['SW_OUT'] - one_month['LW_OUT']
                    - one_month['LE_F_MDS'] - one_month['H_F_MDS'] - one_month['G_F_MDS'])
bowen_imbalance    = (one_month['SW_IN_F_MDS'] + one_month['LW_IN_F_MDS'] 
                    - one_month['SW_OUT'] - one_month['LW_OUT']
                    - one_month['LE_CORR'] - one_month['H_CORR'] - one_month['G_F_MDS'])

fig = plt.figure(figsize = (15, 6))
plt.step(uncorr_imbalance.index,uncorr_imbalance.cumsum()*1800/1e6)
plt.step(bowen_imbalance.index,bowen_imbalance.cumsum()*1800/1e6)
plt.legend(["Uncorrected","Bowen ratio corrected"])
plt.axhline(y=0,ls=":",c="grey")
plt.xticks(rotation=30)
plt.ylabel("$10^6 \\ Joules$")
plt.title(f"Accumulated Imbalance Biases -- Uncorrected: {uncorr_imbalance.sum()*1800/1e6:.0f}$×10^6\\ J$, Corrected: {bowen_imbalance.sum()*1800/1e6:.0f}$×10^6\\ J$",fontsize=16);

### Answer some questions
In the cell below, write your answers to these questions.
1. If you were using the uncorrected flux data to drive a model that predicts land temperature, what would happen to the predicted temperature error over the course of the month? (Clue: your {correct} answer to question 2 in C1 will help you answer this).
2. Even though the Bowen ratio correction results in a very small bias by the end of the month, there is a *wander* into a large accumulated negative bias around the middle of June. Why might this happen? (Clue: see the description of "ECB_CF Method 1" the [FLUXNET2015 data processing documentation](https://fluxnet.org/data/fluxnet2015-dataset/data-processing/)).