<img src="../ancillarydata/logos/LundUniversity_C2line_RGB.png" width="150" align="left"/>
<br>
<img src="../ancillarydata/logos/Icos_Logo_CMYK_Regular_SMpng.png" width="327" align="right"/>
<br>


<br>

# <font color=#B98F57>Exercises taken from a PhD course titled</font>
### <font color=#000083>From CO$_2$ in situ measurements to carbon balance maps as a tool to support national carbon accounting</font>

<br>

<meta content="text/html; charset=UTF-8">

<style>td{padding: 3px;}
</style>

<table style="width: 100%">
    <colgroup>
        <col span="1" style="width: 17%;">
        <col span="1" style="width: 83%;">
    </colgroup>
    <tr>
        <th><font size="2">Organized by:</font></th>
        <td>
            <div style="float:left;">
                <font size="2">
                    The PhD course was held by the dept. of Physical Geography and Ecosystem Science at Lund University and<br>supported by ICOS Carbon Portal, ICOS Sweden and Lund University ClimBEco Graduate Research School.
                </font>
            </div>
        </td>
    </tr>
    <tr>
        <th><font size="2">Course dates:</font></th>
        <td><div style="float:left;"><font size="2">March 9th 2020 - March 13th 2020<br>September 2nd 2024 - September 6th 2024</font></div></td>
    </tr>
    <tr>
        <th><font size="2">Location:</font></th>
        <td><div style="float:left;"><font size="2">Lund, Sweden</font></div></td>
    </tr>
    <tr>
        <th><font size="2">Exercise developed by:</font></th>
        <td><div style="float:left;"><font size="2">Karolina Pantazatou & Claudio D' Onofrio<br>Updated 2024-06-11 by Ida Storm</font></div></td>
    </tr>
    <tr>
        <th><font size="2">Data references:</font></th>
        <td><div style="float:left;"><font size="2">For more information regarding the datasets used in this exercise please click on the <a href="#data_ref">link</a>.</font></div></td>
    </tr>
</table>
</font>
<br>
<br>




<a id="intro"></a>
<br>
<br>


# <font color=#800000> Exercise 1</font>  
## _Intro to Jupyter Notebooks using ICOS/FLUXNET data_
<br> 

In this exercise, you will learn how to read data from CSV files into Pandas DataFrames (i.e., two-dimensional arrays) using Python. You will also learn how to process data within your Pandas DataFrame and create static plots using Matplotlib. Pandas and Matplotlib are both libraries, and it is functions within these libraries that allow us to read the data and generate the plots. You will then be able to use these tools to statistically compare measurements from different stations in northern Europe. Follow the [link](https://www.kirschbaum.id.au/definitions.pdf) to get a quick overview over how NPP, GPP and Respiration are defined and related.

The exercise includes the following tasks:

- [Read data from csv into a Pandas DataFrame](#import_data)


- [Create plot with Matplotlib](#create_plot)
    
    - [Plot single variable](#plot_single_var)
    
    - [Plot two variables](#plot_two_var)


- [Format plot parameters](#plot_param)


- [Compare variables from different stations](#compare_var)


- [Calculate statistics](#calc_stat)



<a id="import_data"></a>

### <font color='#B22222'>Task 1</font> - Import data from csv 
Read the FLUXNET data for a specific ICOS station and save it to a Pandas DataFrame. A reference to the data at the Carbon Portal can be found at the [bottom of this page](#data_ref). For these exercises, data for a selection of stations have been placed in a "hidden" folder on the same server where this notebook is run. Because the folder is hidden, you cannot see the files in the folder structure. However, you can still access the folder to retrieve the data you need using the lines below. There are many columns because various types of data are collected at the ecosystem sites. We will only use a few of the columns in these examples.<br> There are two ways to read the FLUXNET data: <br>
1. Read data from FLUXNET csv-file to a Pandas DataFrame using the example below. Here, we assign DataFrames to the variables dd_df and hh_df for daily and hourly data for Hyltemossa (Htm).. 

```python
    dd_df = pd.read_csv("/data/project/climbeco/data/fluxnet/obs_dd_warm_winter/FLX_SE-Htm_FLUXNET2015_FULLSET_DD_2015_2020.csv",
                header = 0,
                sep = ",",
                # Dates are originally in the format '20150101'. 
                # After this line, they are recognized as dates and appear like this: '2015-01-01' 
                parse_dates = ["TIMESTAMP"])
    
    # use .head() to see the first five rows of data. 
    dd_df.head()
    
    hh_df = pd.read_csv("/data/project/climbeco/data/fluxnet/obs_hh_warm_winter/FLX_SE-Htm_FLUXNET2015_FULLSET_HH_2015_2020.csv",
            header = 0,
            sep = ",",
            # Dates are originally in the format '201501010000'. 
            # After this line, they are recognized as dates and appear like this: '2015-01-01 00:00:00' 
            parse_dates = ["TIMESTAMP_START", "TIMESTAMP_END"])

    # use .head() to see the first five rows of data. 
    hh_df.head()
```


2. Use the pre-prepared functions (*dd* for daily values, *hh* for half hourly values):
```python
    #pre-prepared function
    #read_fluxnet_dd(path, station_code)

    #example 
    dd_df = read_fluxnet_dd(path_fluxnet_dd, 'Htm')
    
    #see the first five rows. Use print unless it is at the end of the code block. 
    display(dd_df.head())

    #pre-prepared function
    #read_fluxnet_hh(path, station_code)

    #example
    hh_df = read_fluxnet_hh(path_fluxnet_hh, 'Htm')
         
```


#### <font color='#8b0000'> Information to read the files </font>

**Paths** <br>
The path to the FLUXNET files (daily values) is: <font color='royalblue'> _"~/climbeco/fluxnet/obs_dd/"_ </font>
<br>
The path to the FLUXNET files (half-hourly values) is: <font color='royalblue'> _"~/climbeco/fluxnet/obs_hh/"_ </font>


**Data filename format** <br>
The filename format of a FLUXNET file (daily values) is: _"FLX_countryCode-stationCode_FLUXNET2015_FULLSET_DD_2015_2020.csv"_
<br>
(e.g. <font color='darkorange'> _FLX_<b>SE</b>-<b>Htm</b>__FLUXNET2015_FULLSET_DD_2015_2020.csv_</font>)
<br>
The filename format of a FLUXNET file (half-hourly values) is: _"FLX_countryCode-stationCode_FLUXNET2015_FULLSET_HH_2015_2020.csv"_
<br>
(e.g. <font color='darkorange'> _FLX_<b>SE</b>-<b>Htm</b>__FLUXNET2015_FULLSET_HH_2015_2020.csv_</font>)
<br>

To import dara for a specific station you need to know the 3-character long station code (e.g. "Htm" for Hyltemossa). A list of the available station names can be found [here](../ancillarydata/docs/climbeco_course_station_list.png).

In [None]:
%matplotlib inline

#Import modules (files containing Python definitions and statements):
import pandas as pd
from datetime import datetime
from matplotlib import pyplot as plt
import numpy as np
import seaborn as sns

#from tools import read_fluxnet_dd, read_fluxnet_hh
# Import Python definitions and statements located in the notebook "tools.ipynb" (see folder structure)
%run ./tools.ipynb

In [None]:
#Define path to data (located in hidden folders on the same server as this notebook):
path_fluxnet_hh = "/data/project/climbeco/data/fluxnet/obs_hh_warm_winter/" #Path to directory storing half-hourly values
path_fluxnet_dd = "/data/project/climbeco/data/fluxnet/obs_dd_warm_winter/" #Path to directory storing daily values

In [None]:
#Write your own code here:


<br>
<br>
<div style="text-align: right"> 
    <a href="#intro">Back to top</a>
</div>

<a id="create_plot"></a>

<br>

### <font color='#B22222'>Task 2</font> - Create a plot using Matplotlib
Matplotlib is a visualization library for plotting in Python. In this twofold task you will learn how to create plots for one or more variables stored as columns in a Pandas DataFrame.

Before you move on creating the plots, take a look at the columns of your Pandas DataFrame. It can be hard to guess the content of a column based on its current name. A detailed description of the FLUXNET variables along with their corresponding unit can be found [here](https://fluxnet.fluxdata.org/data/fluxnet2015-dataset/fullset-data-product/). 

Observe that some variables are only available for the dataset with daily values whilst others might only be available for the dataset with half-hourly values. Also note that the unit for a variable available in both datasets (i.e. half-hourly & daily) might be different. 

<a id="plot_single_var"></a>


#### <font color='#B22222'>Task 2.1</font> - Create a plot using Matplotlib (1 variable)
In this part, you will plot a timeseries for one variable. The code in the example below plots air temperature (daily values) over time (2015-2021).

<br>

Here's the syntax to create a simple plot in Python:

```python
%matplotlib inline

#Import modules:
from matplotlib import pyplot as plt

#Create a plot (i.e. "figure") object and set the size of your plot:
fig = plt.figure(figsize=(10, 6))

#Plot Air Temperature (daily values):
plt.plot(dd_df.TIMESTAMP, dd_df.TA_F)

#Show plot:
plt.show()

```

<br>

```%matplotlib inline``` is a command that tells your Jupyter notebook to create static matplotlib plots. The command is used once, in the first code-cell of the notebook. **dd_df** is a variable representing a Pandas DataFrame with FLUXNET daily values. **TIMESTAMP** and **TA_F** are the FLUXNET variable names for time and air temperature respectively.  
If you execute the code above, it should generate the following output. 

<img src="../ancillarydata/images/exercise1/plot1_ex.png" width = 60%>

<br>

Try to create plots for other variables in your Pandas DataFrame.

<br>
<br>

In [None]:
#Write your own code here:


<br>
<br>
<div style="text-align: right"> 
    <a href="#intro">Back to top</a>
</div>
<br>
<br>

<a id="plot_two_var"></a>

#### <font color='#B22222'>Task 2.2</font> - Create a plot using Matplotlib (two variables)
Here's the syntax to create a simple plot of two variables in Python:

```python

#Import modules:
from matplotlib import pyplot as plt

#Create a plot (i.e. "figure") object and set the size of your plot:
fig = plt.figure(figsize=(10, 6))

#Plot values for Air Temperature (daily values):
plt.plot(dd_df.TIMESTAMP, dd_df.TA_F)

#Plot values for daytime GPP (daily values):
plt.plot(dd_df.TIMESTAMP, dd_df.GPP_DT_VUT_MEAN)

#Show plot:
plt.show()

```

<br>

 **dd_df** is a variable representing a Pandas DataFrame with daily values for FLUXNET variables measured at Hyltemossa ICOS station in Sweden. **TIMESTAMP**, **TA_F** and **GPP_DT_VUT_MEAN** are the corresponding FLUXNET variable names for time, air temperature and GPP (daytime).  
If you execute the code above, it should generate the following output. 

<img src="../ancillarydata/images/exercise1/plot2_ex.png" width = 60%>

<br>

Try to produce a plot for another combination of variables.

In [None]:
#Write your own code here:


<br>
<br>
<div style="text-align: right"> 
    <a href="#intro">Back to top</a>
</div>
<br>
<br>

<a id="plot_param">

<br>

### <font color='#B22222'>Task 3</font> - Format plot parameters
A plot is not complete without a title, axes labels and a legend. The next example showcases how these can be added to your plot. You will also see how to change the type and/or color of the line as well as how to add a secondary y-axis to the plot. More information on how to style your plot can be found [here](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.plot.html).

```python

#Add grid (use Seaborn's whitegrid style)
sns.set(style="whitegrid")

#Create a plot (i.e. "figure") object and set the size of your plot:
fig, ax = plt.subplots(figsize=(16, 6))

#Add plot title:
plt.title('AIR TEMPERATURE & GPP - FLUXNET TIMESERIES (HTM, SE)')

#Plot values for daily Air Temperature:
ax.plot(dd_df.TIMESTAMP, dd_df.TA_F,
        linestyle = '-.', linewidth = 0.6, color = 'darkorange',
        label = 'Air Temperature')

#Plot values for daytime GPP (daily values):
ax.plot(dd_df.TIMESTAMP, dd_df.GPP_DT_VUT_MEAN,
        linestyle = '-', linewidth = 1.7, color = '#6699CC',
        label = 'GPP')

#Add secondary y-axis:
secaxy = ax.secondary_yaxis('right')

#Add x-axis label:
plt.xlabel('Time')

#Add y-axis label:
plt.ylabel('Air Temperature (C \N{DEGREE SIGN})')

#Add secondary y-axis label:
secaxy.set_ylabel('GPP (gC / m2 / d)')

#Add legend:
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
    
#Show plot:
plt.show()
```

<br>

If you run the code above, you should get the following output:
<br>

<img src='../ancillarydata/images/exercise1/plot2ndyaxis_ex.png'  width = 60%>

<br>

Try to add and format the plot parameters of the plots you created in task 2.

<br>
<br>

In [None]:
#Write your own code here:


<br>
<br>
<div style="text-align: right"> 
    <a href="#intro">Back to top</a>
</div>
<br>
<br>

<a id="compare_var">

<br>

### <font color='#B22222'>Task 4</font> - Compare the same variable from 2 stations
Here you will create a matplotlib plot including values from the same variable from two different stations. Before you create the plot, you need to read the values from another station to a Pandas DataFrame (see task 1). Then you plot the values of the same variable for both stations.

Here's an example of the code:

```python

#Read data from Hyltemossa station:
htm_df = pd.read_csv("/data/project/climbeco/data/fluxnet/obs_dd_warm_winter/FLX_SE-Htm_FLUXNET2015_FULLSET_DD_2015_2020.csv",
                     header = 0,
                     sep = ",",
                     parse_dates = ["TIMESTAMP"])

#Read data from Gebesee station:
geb_df = pd.read_csv("/data/project/climbeco/data/fluxnet/obs_dd_warm_winter/FLX_DE-Geb_FLUXNET2015_FULLSET_DD_2015_2020.csv",
                     header = 0,
                     sep = ",",
                     parse_dates = ["TIMESTAMP"])



#Add grid:
plt.style.use('seaborn-whitegrid')

#Create a plot (i.e. "figure") object and set the size of your plot:
fig = plt.figure(figsize=(16, 6))

#Add plot title:
plt.title('GPP - FLUXNET TIMESERIES (daily values)')

#Plot values for Hyltemossa:
plt.plot(htm_df.TIMESTAMP, htm_df.GPP_DT_VUT_MEAN,
         linestyle = '-.', linewidth = 0.5, color = 'green',
         label = 'Hyltemossa')

#Plot values for Gebsee:
plt.plot(geb_df.TIMESTAMP, geb_df.GPP_DT_VUT_MEAN,
         linestyle = '-', linewidth = 1.7, color = 'darkgreen',
         label = 'Gebesee')

#Add x-axis label:
plt.xlabel('Time')

#Add y-axis label:
plt.ylabel('GPP (gC / m2 / d)')

#Add legend:
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
    
#Show plot:
plt.show()

```

<br>

The code from the example above creates a plot of GPP daily values for the stations Hyltemossa, Sweden, and Gebesee, Germany. The output should look like this:

<img src='../ancillarydata/images/exercise1/plot2stations.png' width="60%">

<br>

Now, try to create plots for other stations. For instance, it might be interesting to compare variables from stations with the same vegetation type. It might also be worth to compare the same variable from stations that have different vegetation types.

Use the _inspect_fluxnet_files_ notebook to view a map of the stations. Hover over the markers to get a pop-up window with additional info about the corresponding station. Alternatively, click [here](../ancillarydata/docs/climbeco_course_station_list.png) to view the station info table.

<br>
<br>

In [None]:
#Write your own code here:


<br>
<br>
<div style="text-align: right"> 
    <a href="#intro">Back to top</a>
</div>
<br>
<br>

<a id="calc_stat">

<br>

### <font color='#B22222'>Task 5</font> - Calculate and plot statistics

<font color='darkred'>**Example 1 (line-plot)**</font>
<br>
Here's an example of calculating the mean GPP per month for Hyltemossa station:

```python

#Read data from Hyltemossa station:
htm_df = pd.read_csv("/data/project/climbeco/data/fluxnet/obs_dd_warm_winter/FLX_SE-Htm_FLUXNET2015_FULLSET_DD_2015_2020.csv",
                     header = 0,
                     sep = ",",
                     parse_dates = ["TIMESTAMP"])

#Set the "TIMESTAMP" column as index:
htm_df_ind = htm_df.set_index('TIMESTAMP')

#Calculate the mean GPP value for every month in the dataset:
monthly_mean_gpp = htm_df_ind.GPP_DT_VUT_MEAN.resample('M').mean().dropna()

#Create line-plot:
monthly_mean_gpp.plot(kind='line', figsize=(10,6))
```

<br>

The code above will generate the following output:
<br>
<img src="../ancillarydata/images/exercise1/lineplot.png" width=40%>


<font color='darkred'>**Example 2 (barplot)**</font>
<br>
Here's an example of calculating the annual GPP and RECO per year for Hyltemossa station:

```python


#Read data from Hyltemossa station:
htm_df = pd.read_csv("/data/project/climbeco/data/fluxnet/obs_dd_warm_winter/FLX_SE-Htm_FLUXNET2015_FULLSET_DD_2015_2020.csv",
                     header = 0,
                     sep = ",",
                     parse_dates = ["TIMESTAMP"])

#Set the "TIMESTAMP" column as index:
htm_df_ind = htm_df.set_index('TIMESTAMP')

# Calculate annual means for GPP and RECO
# The variables named called '_MEAN' indicates that these columns contain the average 
# GPP and RECO values based on different methods.
annual_mean_GPP = htm_df_ind.GPP_DT_VUT_MEAN.resample('Y').mean().dropna()
annual_mean_RECO = htm_df_ind.RECO_DT_VUT_MEAN.resample('Y').mean().dropna()

# Define the width of the bars
bar_width = 0.4
# Create an array of indices for the x-axis based on the number of years
indices = np.arange(len(annual_mean_GPP))

# Plot the GPP bars in green
plt.bar(indices, annual_mean_GPP, width=bar_width, color='green', label='GPP')

# Plot the RECO bars in red, shifted by the width of one bar to place them next to the GPP bars
plt.bar(indices + bar_width, annual_mean_RECO, width=bar_width, color='red', label='RECO')

# Customize the plot
plt.xlabel('Year')
plt.ylabel('gC m$^{-2}$ d$^{-1}$')
plt.title('Annual Mean of GPP and RECO')
plt.xticks(indices + bar_width / 2, annual_mean_GPP.index.year, rotation=45)  # Set x-axis labels to the years
plt.legend()
plt.grid(True)

# Display the plot
plt.tight_layout()  # Adjust layout to prevent clipping of tick-labels
plt.show()

```

<br>

The code above will generate the following output:
<br>
<img src="../ancillarydata/images/exercise1/barplot.png" width=40%>


If you have time, use the _drought 2018_ - notebook (htm_eco_drought_2018.ipynb) with ICOS data from Hyltemossa station to see how to filter a Pandas DataFrame by time and calculate statistics. Note that these examples use an outdated collection of ecosystem data, which can be found <a href="https://www.icos-cp.eu/data-products/YVR0-4898" target = "blank">here</a>. Apply the same code to the FLUXNET-data to calculate statistics for variables and stations of your choice. 


In [None]:
#Write your own code here:


<br>
<br>
<div style="text-align: right"> 
    <a href="#intro">Back to top</a>
</div>
<br>
<br>
<a id='data_ref'>


# Data references
    
   
Warm Winter 2020 Team, & ICOS Ecosystem Thematic Centre. (2022). Warm Winter 2020 ecosystem eddy covariance flux product for 73 stations in FLUXNET-Archive format—release 2022-1 (Version 1.0). ICOS Carbon Portal. https://doi.org/10.18160/2G60-ZHAK

<br>
<br>
<div style="text-align: right"> 
    <a href="#intro">Back to top</a>
</div>
<br>
<br>