<img src='https://www.icos-cp.eu/sites/default/files/2017-11/ICOS_CP_logo.png' width=400 align=right>

# ICOS Carbon Portal Python Library
## Example: STILT timeseries

In this example we load different STILT timeseries data, make some plots and compare the STILT data with observed data.

## Documentation
Full documentation for the library on the [project page](https://icos-carbon-portal.github.io/pylib/), how to install and wheel on [pypi.org](https://pypi.org/project/icoscp/), the source code is available on [github](https://github.com/ICOS-Carbon-Portal/pylib)

In [None]:
# Import STILT tools:
from icoscp.stilt import stiltstation

# Import matplotlib and pandas
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd

# In this notebook we use matplot widgets, 
# that opens up for interactions in plots.
%matplotlib widget


In [None]:
# We can set common properties for plots using a function
def my_figure_properties(fig):
    # We do not want figure labels 
    fig.canvas.header_visible = False
    # We wish to see a menu for figures
    fig.canvas.toolbar_visible = True  
    fig.canvas.toolbar_position = 'top'
    # Disable the resizing feature
    fig.canvas.resizable = True
    # If true then scrolling while the mouse is over the 
    # canvas will not move the entire notebook
    fig.canvas.capture_scroll = True
    fig.tight_layout()
    return fig

def my_figure():
    fig = plt.figure()
    my_figure_properties(fig)
    return fig


### Create a STILT station object

In [None]:
st = stiltstation.get(id='kit100')
print(st)

### Retrieve the default time series data
Now that we have our station, we can download its time series for a specific time period. This is retrieved using `get_ts` which is part of the `stiltstation`. In our case we will look into CO$_2$-values, other options are described in the [documentation](https://icos-carbon-portal.github.io/pylib/modules/#get_tsstart_date-end_date-hours-columns).

In [None]:
yearstart = '2018-01-01'  # This is a version of the ISO 8601 format,
yearend   = '2018-12-31'  # for different formats see the documentantion.

# When retrieving the station data the default is to load the columns
# ["isodate","co2.stilt","co2.bio","co2.fuel","co2.cement","co2.background"]
data = st.get_ts(yearstart, yearend)
data.head()

### Plot STILT time series

In [None]:
fig_with_properties = my_figure()
axis_to_plot = fig_with_properties.gca()  # Note: gca = get current axis
data.plot(y = ['co2.stilt', 'co2.background'], 
          title = st.id, ylabel = 'ppm', 
          figsize = (8,4), ax = axis_to_plot)
plt.show()

### Extract time series with `columns = 'co2'`
Again, we refer to the [documentation](https://icos-carbon-portal.github.io/pylib/modules/#get_tsstart_date-end_date-hours-columns) what columns you can return. <a id='set_dates'></a>

In [None]:
# These date constraints will be used in the rest of this example notebook
start = '2018-01-01'
end   = '2018-01-31'

In [None]:
stiltdata = st.get_ts(start, end, columns='co2')
stiltdata.head()

In [None]:
# This dataset has 18 columns given by
stiltdata.columns

### A comment on relations between the columns 
The columns are related by
- `co2.stilt = co2.bio + co2.fuel + co2.cement + co2.background`

Where the biospheric natural fluxes: `co2.bio` split into photosynthetic uptake and release by respiration:
- `co2.bio = co2.bio.gee + co2.bio.resp`

The anthropogenic emissions related to fuel burning split up according to the fuel types:
- `co2.fuel = co2.fuel.coal + co2.fuel.oil + co2.fuel.gas + co2.fuel.bio + co2.fuel.waste`

Other anthropogenic source category emissions are related according to the formula: 
- `co2.fuel + co2.cement = co2.energy + co2.transport + co2.industry + co2.residential + co2.other_categories`


In [None]:
df_bio = stiltdata[['co2.bio','co2.bio.gee', 'co2.bio.resp']]
df_fuel = stiltdata[['co2.fuel','co2.fuel.coal','co2.fuel.oil','co2.fuel.gas','co2.fuel.bio','co2.fuel.waste']]
df_source = stiltdata[['co2.fuel','co2.cement','co2.energy','co2.transport','co2.industry','co2.residential','co2.other_categories']]


### Pie charts
In the next example we sum each column and visualize the data in pie charts.<br> 
**Note:** Here we take the *absolute value* in the biospheric components since the columns contain both positive and negative values. The biospheric flux relation should then be replaced by `co2.bio = - co2.bio.gee + co2.bio.resp`, the proportions are however the same.   

In [None]:
biocomponent = abs(df_bio.agg('sum'))
fuelcomponent = df_fuel.agg('sum')
sourcecomponent = df_source.agg('sum')

# Note: 
biocomponent

In [None]:
fig, axes = plt.subplots(1,3,figsize=(8,2))
fig = my_figure_properties(fig)

for ax in range(0,3):
    current_data = [biocomponent, fuelcomponent, sourcecomponent][ax]
    curent_title = ['STILT bio components', 'STILT fuel components','STILT source components'][ax]
    axes[ax].pie(current_data, labels=current_data.index, textprops={'fontsize': 8}) 
    axes[ax].set_xlabel(str(curent_title))
fig.subplots_adjust(wspace = 0.5)
fig.suptitle('STILT CO$_2$ components', fontsize=14)


plt.show()

### Example of the widget

In [None]:
# When using the widgets, we can change behaviour afterwards
fig.canvas.toolbar_position = 'bottom'
fig.suptitle('CO$_2$ pie charts', fontsize=14)


### Plotting the CO${}_2$ components

In [None]:
fig = my_figure()
axis_to_plot = fig.gca()
pd.concat([df_bio,df_fuel,df_source.drop(['co2.fuel'],axis=1)],axis=1).plot(ax = axis_to_plot)
axis_to_plot.legend(bbox_to_anchor=(0.88,1.04), loc='upper left', fontsize=8)
plt.plot()

### Aggregate by day
As an example, we now aggregate the data daily and create a stacked bar graph

In [None]:
day = stiltdata.iloc[:,2:17].resample('D').sum()
day

In [None]:
# Plot the bar graph
fig = my_figure()
axis_to_plot = day.plot.bar(stacked='True', ax = fig.gca())

# Adjust the xticks
axis_to_plot.legend(loc='best', fontsize=8)
axis_to_plot.xaxis.set_major_formatter(mdates.DateFormatter('%m-%d'))
axis_to_plot.set_ylabel('$\mathbf{Note:}$ These values are not independent')
axis_to_plot.set_title('$\mathbf{Note:}$ These values are not independent')
# display
plt.plot()

## Load observation and compare to model result
Next, we will compare the data from the STILT model with observed/measured data. 

In [None]:
from icoscp.cpb.dobj import Dobj

In [None]:
kit100 = Dobj('https://hdl.handle.net/11676/LJ4uetvEho7-k9K9TUnLHfFh')

In [None]:
kit100.citation

### Create a mask to get the same timeframe as the STILT data
Here, `start` and `end` dates are from this <a href="#set_dates">notebook cell</a>. Now the last timestamp of our STILT data is `2018-01-31 21:00:00`, in order to avoid discrepancies when we compare observed data with the STILT data, we first filter the observed data.  

In [None]:
enddate = pd.to_datetime(end) + pd.DateOffset(hours=21)
mask = (kit100.data['TIMESTAMP'] >= start) & (kit100.data['TIMESTAMP'] <= enddate)

obsdata = kit100.data[mask]
obsdata.set_index('TIMESTAMP', inplace=True)
obsdata['co2']

### Resample STILT data
Next, since the observation data has hourly aggregates, we now resample the STILT output to make our lives easier to compare the observation vs model.

In [None]:
stilthourly = stiltdata.resample('1H').mean().interpolate('linear')
stilthourly['co2.stilt']

### Data harmonization and plot
Next, we merge our data frames. If we look at the length of the dataframes there could have been a discrepancy. Observations may contain fewer data points due to some interruption of the measurement or QA / QC removed values. When merging dataframes the data missing values would be represented as NaN.

In [None]:
harmonized = stilthourly.join(obsdata)
fig_for_plot = my_figure()
harmonized.plot(y = ['co2.stilt', 'co2'], grid=True, linewidth=0.5, ax =fig_for_plot.gca())

### Plot difference

In [None]:
harmonized['diff'] = harmonized['co2.stilt']-harmonized['co2']

In [None]:
fig_dif = my_figure()
ax_dif = plt.axes()
ax_dif.plot(harmonized['diff'])
ax_dif.grid(color='0.9')
ax_dif.set_ylabel('diff (ppm)')

# adjust the xticks
ax_dif.xaxis.set_major_formatter(mdates.DateFormatter('%d'))