# Using the Fal mooring in situ data to calculate air-sea CO$_2$ gas fluxes #

## Introduction
This notebook explains how we can use the Fal mooring data to calculate air-sea CO$_2$ gas fluxes

### Load Relevant Modules
To begin with the required Python packages are loaded.

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
from mpl_toolkits.basemap import Basemap
from netCDF4 import Dataset
# Install basemap-data-hires

### Loading the mooring data
Now we need to load the mooring data which is provided as a tab separated variable file (.tsv). And we can then view the first 5 rows of the dataset using the .head(5) command. Alternatively you can view the last 5 rows of the dataset using .tail(5).

In [None]:
# Load data file
region_data = pd.read_csv('Fal_mooring_flux.tsv', sep='\t', index_col=0)
# Show small proportion of the data
region_data.head(5)

### Preparing to Plot the Recorded Data
We want to plot a 'time series' of the data that was recorded. One way to show this is to plot 'Days since [first recording]' along the x-axis and the data along the y-axis. The cell below finds the number of days since the first measurement (technically it finds the number of seconds since the first recording and divides this by 86,400) and creates a new column in the Dataframe to show these values.

Note: if your own dataset doesn't have columns for 'Year', 'Month', 'Day', etc. then the below won't work and you need to add this to your dataset. This can be done in Excel (but better to do it Pythonically if possible to prevent Excel making changes to it's own formatting), and see example datasets for the required format.

In [None]:
# Initialise the new Dataframe column and fill with a hold value
region_data['Days_since'] = 'hold value'

# Produce a datetime object for the first recording 
# - the zeros in the line below show it's the first row (index starts at zero)
start_date = dt.datetime(region_data.loc[0,'Year'],region_data.loc[0,'Month'],region_data.loc[0,'Day'],
                            region_data.loc[0,'Hour'],region_data.loc[0,'Minute'],region_data.loc[0,'Second'])

# Loop over all rows in the Dataframe - i.e from 0 to the length of the Dataframe
for i in range(0,len(region_data)):
    # Get the date time object for the currently indexed recording - indexed by i
    future_date = dt.datetime(region_data.loc[i,'Year'],region_data.loc[i,'Month'],region_data.loc[i,'Day'],
                              region_data.loc[i,'Hour'],region_data.loc[i,'Minute'],region_data.loc[i,'Second'])
    
    # Find difference between current datetime and inital datetime
    day_diff = future_date - start_date
    
    # Fill Dataframe column with time difference in seconds (found using .total_seconds()) 
    # divided by 86400 (proportion of days that have passed)
    region_data.loc[i,'Days_since'] = day_diff.total_seconds()/(60*60*24)

We can filter the Dataframe to show just the 'Datetime' and 'Days_since' columns. Showing the head can give an idea if the previous cell worked - although a more thorough check is advised if possible depending on Dataframe size.

In [None]:
# Filter data to 'Datetime' and 'Days_since' columns and show first 5 rows.
region_data[['Datetime', 'Days_since']].head(5)

### Plotting the Time Series  

The Matplotlib .subplots() function is ideal for this and I have used Seaborn to do the actual plotting - these two packages work well together as Seaborn is built on top of Matplotlib, and Seaborn also integrates easily with Pandas Dataframes. 

Producing nice looking plots with lovely axes labels and colors etc. can be fiddly, but you can refer to the documentation (and StackOverflow!) for hints and tips.

In [None]:
import matplotlib.dates as mdates
import warnings

# Set up a figure with 4 axes on it. Sharex=True means all axes will share the bottom axes (can help with clarity)
fig,ax = plt.subplots(3,2, sharex=True)
# Set figure height and width
fig.set_figheight(15), fig.set_figwidth(15)
# Suppress all warnings
warnings.filterwarnings("ignore")

### PLOTTING THE DATA ### (- *s indicate a plot keyword below)
# These Seaborn commands state that we want a *lineplot*, where the *data* is coming 
# from our region_data Dataframe, and we chose the *x* & *y* columns that we want, as well
# as the axis (*ax*) we want to plot on (indexed by 0 at the top and 3 at the bottom)
sns.lineplot(data=region_data, x='Date', y='salinity', color='turquoise', ax=ax[0,0])
sns.lineplot(data=region_data, x='Date', y='sstskin_k', color='red', ax=ax[0,1])
sns.lineplot(data=region_data, x='Date', y='windu10', color='green', ax=ax[1,0])
sns.lineplot(data=region_data, x='Date', y='pco2sw_corr_split', color='orange', ax=ax[1,1], hue=region_data["pco2sw_corr_split"].isna().cumsum(), palette=["blue"]*sum(region_data["pco2sw_corr_split"].isna()), legend=False, markers=True)
sns.lineplot(data=region_data, x='Date', y='pressure_met', color='purple', ax=ax[2, 0])
sns.lineplot(data=region_data, x='Date', y='pco2_air_noaa_2018', color='black', ax=ax[2, 1])

# Use WeekdayLocator and DateFormatter to show only weekly dates on x-axis
date_fmt = mdates.DateFormatter('%d-%m-%Y')
week_locator = mdates.WeekdayLocator(byweekday=mdates.MO)
ax[2,0].xaxis.set_major_locator(week_locator)
ax[2,0].xaxis.set_major_formatter(date_fmt)
ax[2,1].xaxis.set_major_locator(week_locator)
ax[2,1].xaxis.set_major_formatter(date_fmt)

# Set x axis label
ax[2,0].set_xlabel('Date', fontsize = 20) 
ax[2,1].set_xlabel('Date', fontsize = 20) 
ax[2,0].set_xticklabels(region_data['Date'], rotation='vertical')
ax[2,1].set_xticklabels(region_data['Date'], rotation='vertical')

# Set x-axis tick labels
tick_locs = ax[2,0].get_xticks() # Get the current tick locations
ax[2,0].set_xticks(tick_locs) # Set the same tick locations
ax[2,0].set_xticklabels(region_data['Date'][::len(region_data['Date'])//len(tick_locs)][:-1], rotation='vertical')
tick_locs = ax[2,1].get_xticks() # Get the current tick locations
ax[2,1].set_xticks(tick_locs) # Set the same tick locations
ax[2,1].set_xticklabels(region_data['Date'][::len(region_data['Date'])//len(tick_locs)][:-1], rotation='vertical')

# Set y label for each axis
ax[0,0].set_ylabel('Salinity', fontsize = 20) 
ax[0,1].set_ylabel('SST (K)', fontsize = 20)
ax[1,0].set_ylabel('Windspeed (m/s)', fontsize = 20)
ax[1,1].set_ylabel('Est.pCO2 (µ atm)', fontsize = 20)
ax[2, 0].set_ylabel('Atm.Pressure (mbar)', fontsize = 20)
ax[2, 1].set_ylabel('Atm.pCO2 (ppm)', fontsize = 20)

# Changes how axis ticks are displayed for last two axes
# - you can comment these out with # to see the effect when removed
ax[0,0].yaxis.set_major_formatter('{x:9<5.2f}')
ax[0,1].yaxis.set_major_formatter('{x:9<5.2f}')
ax[1,0].yaxis.set_major_formatter('{x:9<5.2f}')
ax[1,1].yaxis.set_major_formatter('{x:9<5.2f}')
ax[2,0].yaxis.set_major_formatter('{x:9<5.2f}')
ax[2,1].yaxis.set_major_formatter('{x:9<5.2f}')

# Set a tight layout to remove extra space around the plots
fig.tight_layout()
# Reduce gap between top of figure and the title
fig.subplots_adjust(top=0.95)

# Show figure!
plt.show()

### Using FluxEngine
To calculate the air-sea gas fluxes we are going to be using a bulk formulation of the calculation and using the FluxEngine python module toolbox. So lets check to see which version of the FluxEngine we have installed. It should be at least version 4.0.

In [None]:
# We primarily use FluxEngine from the command line, but here we can import it just to check the version
import fluxengine as fe
import fluxengine.tools.lib_text2ncdf as nc
print(fe.__version__)

Now we need to convert our .tsv data file in the filetype that the FluxEngine uses which is NetCDF. NetCDF is a standard file format used by many scientific and engineering communities and it allows the data to be compressed and the file can contain the metadata that describes how the data were collected, created and processed.

In [None]:
# Converting to netCDF
nc.convert_text_to_netcdf(['Fal_mooring_flux.tsv'],startTime='2018-09-14 00:00:00',endTime='2018-11-30 08:00:00',ncOutPath='Fal_mooring.nc',temporalResolution='0 01:00',
    colNames=['salinity', 'sstskin_c', 'sstskin_k', 'windu10', 'windu10_moment2', 'pco2sw_corr_split', 'pressure_met', 'pco2_air_noaa_2018', 'pco2_sst'],
    latProd='Lat',lonProd='Lon',dateIndex=3,parseUnits=False,temporalChunking=1857,limits=[50,51,-6,-5],dateFormatDayFirst=True)

We now have our NetCDF (version 3) file and we can now use the FluxEngine to calculate the air-sea gas fluxes. 

Note: We have had to suppress the output of the code section below due to a recent problem in Jupyter Notebook. Look for the [*] to the left of the block to know its still running, and a number in brackets when it's complete.

This section can take up to ~15 minutes to run

In [None]:
%%capture cap --no-stderr
# Line above suppressing the console output that causes issues for Jupyter Notebook currently, which we will save to a file later
# Running FluxEngine - we first import FluxEngine run tool then run the "Fal_mooring.conf" config file

from fluxengine.core import fe_setup_tools as fe_run
fe_run.run_fluxengine("Fal_mooring.conf", "2018-09-14 00:00", "2018-11-30 08:00", singleRun = False, processLayersOff=True)

In [None]:
# Saving the console output, that we suppressed to a file
with open('fluxengine_log.txt', 'w') as file:
    file.write(cap.stdout)

The [*] to the left of the code block changing to a [number] indicates FluxEnigne has run and we now have the air-sea gas fluxes calculated from the Fal mooring data. The output is currently in a NetCDF file. You can view this using Panoply which is data viewer developed and provided for free by NASA. Alternatively we can extract the data from the NetCDF file and store it back into our original .tsv file as additional columns of data.

In [None]:
# Appending FluxEngine results

region_data = pd.read_csv('Fal_mooring_flux.tsv', sep='\t', index_col=0)
region_data =region_data.drop([0])

vars = ['OF','OK3','OSFC','OIC1']
c = Dataset('FalEstuary_output.nc','r')
for v in vars:
    a = np.squeeze(np.array(c[v])) # Load the data and remove the extra dataset dimensions (i.e lon=1, lat=1, time=1857 - np.squeeze removes the lon/lat dimensions)
    a[a ==c[v]._FillValue] = np.nan # Remove data that is the fill value
    region_data[v+' ['+c[v].units+']'] = a #Append back to the table, with the units (as is done in append2insitu
c.close()
region_data.to_csv('flux_final.tsv',sep='\t') 

We can easily view and plot the air-sea CO 2  gas flux results using some simple python plotting routines.

In [None]:
# Load merged data
merged_data = pd.read_csv('flux_final.tsv', sep='\t',index_col=0).reset_index(drop=True)

In [None]:
# View top of merged data
merged_data.head()

We need to add out 'Days_since' to this new merged dataframe:

In [None]:
# Initialise the new Dataframe column and fill with a hold value
merged_data['Days_since'] = 'hold value'

# Produce a datetime object for the first recording 
# - the zeros in the line below show it's the first row (index starts at zero)
start_date = dt.datetime(merged_data.loc[0,'Year'],merged_data.loc[0,'Month'],merged_data.loc[0,'Day'],
                            merged_data.loc[0,'Hour'],merged_data.loc[0,'Minute'],merged_data.loc[0,'Second'])

# Loop over all rows in the Dataframe - i.e from 0 to the length of the Dataframe
for i in range(0,len(merged_data)):
    # Get the date time object for the currently indexed recording - indexed by i
    future_date = dt.datetime(merged_data.loc[i,'Year'],merged_data.loc[i,'Month'],merged_data.loc[i,'Day'],
                              merged_data.loc[i,'Hour'],merged_data.loc[i,'Minute'],merged_data.loc[i,'Second'])
    
    # Find difference between current datetime and inital datetime
    day_diff = future_date - start_date
    
    # Fill Dataframe column with time difference in seconds (found using .total_seconds()) 
    # divided by 86400 (proportion of days that have passed)
    merged_data.loc[i,'Days_since'] = day_diff.total_seconds()/(60*60*24)

In [None]:
# Show section of 'Days_since' column for visual check
merged_data[['Datetime', 'Days_since']].head(5)

In [None]:
print(merged_data.iloc[570])

### Plot the Flux

In [None]:
# Set up a figure with 1 axes
fig, ax = plt.subplots(1, 1, sharex=True, figsize=(20, 10))

# Plot data on the ax object
sns.lineplot(data=merged_data, x='Date', y='OF [g C m-2 day-1]', hue=merged_data["OF [g C m-2 day-1]"].isna().cumsum(), palette=["blue"]*sum(merged_data["OF [g C m-2 day-1]"].isna()), legend=False, markers=True, ax=ax)

# Set plot features 
plt.xlabel(f'Days since {merged_data.loc[0,"Datetime"]}', fontdict={'size':20})
plt.ylabel('Flux [g C m-2 day-1]', fontdict={'size':20})
plt.tick_params(labelsize=15)

# Use WeekdayLocator and DateFormatter to show only weekly dates on x-axis
date_fmt = mdates.DateFormatter('%d-%m-%Y')
week_locator = mdates.WeekdayLocator(byweekday=mdates.MO)
ax.xaxis.set_major_locator(week_locator)
ax.xaxis.set_major_formatter(date_fmt)

# Set x-axis tick labels
tick_locs = ax.get_xticks() # Get the current tick locations
ax.set_xticks(tick_locs) # Set the same tick locations
ax.set_xticklabels(merged_data['Date'][::len(merged_data['Date'])//len(tick_locs)][:-1], rotation='vertical')

# Set x axis label
ax.set_xlabel('Date', fontsize=20) 

# Show figure!
plt.show()

In [None]:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import warnings

# Set up a figure with 7 axes on it.
fig, ax = plt.subplots(7, 1, sharex=True, figsize=(15, 25))
# Suppress all warnings
warnings.filterwarnings("ignore")

# Plotting data for the first 6 subplots
sns.lineplot(data=region_data, x='Date', y='salinity', color='turquoise', ax=ax[0])
sns.lineplot(data=region_data, x='Date', y='sstskin_k', color='red', ax=ax[1])
sns.lineplot(data=region_data, x='Date', y='windu10', color='green', ax=ax[2])
sns.lineplot(data=region_data, x='Date', y='pco2sw_corr_split', color='orange', ax=ax[3], hue=region_data["pco2sw_corr_split"].isna().cumsum(), palette=["blue"]*sum(region_data["pco2sw_corr_split"].isna()), legend=False, markers=True)
sns.lineplot(data=region_data, x='Date', y='pressure_met', color='purple', ax=ax[4])
sns.lineplot(data=region_data, x='Date', y='pco2_air_noaa_2018', color='black', ax=ax[5])

# Use WeekdayLocator and DateFormatter to show only weekly dates on x-axis
date_fmt = mdates.DateFormatter('%d-%m-%Y')
week_locator = mdates.WeekdayLocator(byweekday=mdates.MO)
for i in range(6):
    ax[i].xaxis.set_major_locator(week_locator)
    ax[i].xaxis.set_major_formatter(date_fmt)

    # Set x-axis tick labels
    tick_locs = ax[i].get_xticks()  # Get the current tick locations
    ax[i].set_xticks(tick_locs)  # Set the same tick locations
    ax[i].set_xticklabels(region_data['Date'][::len(region_data['Date'])//len(tick_locs)][:-1], rotation='vertical')

    # Set x axis label
    ax[i].set_xlabel('Date', fontsize=15) 

# Set y labels for each subplot
for i, ylabel in enumerate(['Salinity', 'SST (K)', 'Windspeed (m/s)', 'Est.pCO2 (µ atm)', 'Atm.Pressure (mbar)', 'Atm.pCO2 (ppm)']):
    ax[i].set_ylabel(ylabel, fontsize=15)

# Changes how axis ticks are displayed for the last two axes
for i in range(5, 7):
    ax[i].yaxis.set_major_formatter('{x:9<5.2f}')

# Set a tight layout to remove extra space around the plots
fig.tight_layout()
# Reduce gap between top of figure and the title
fig.subplots_adjust(top=0.95)

# Create a new subplot for the air-sea gas flux
ax_flux = plt.subplot2grid((7, 1), (6, 0))
# Plot data on the ax_flux object
sns.lineplot(data=merged_data, x='Date', y='OF [g C m-2 day-1]', hue=merged_data["OF [g C m-2 day-1]"].isna().cumsum(), palette=["blue"]*sum(merged_data["OF [g C m-2 day-1]"].isna()), legend=False, markers=True, ax=ax_flux)

# Set plot features 
ax_flux.set_xlabel('Date', fontsize=15)
ax_flux.set_ylabel('Flux [g C m-2 day-1]', fontsize=15)

# Use WeekdayLocator and DateFormatter to show only weekly dates on x-axis
ax_flux.xaxis.set_major_locator(week_locator)
ax_flux.xaxis.set_major_formatter(date_fmt)

# Set x-axis tick labels
tick_locs_flux = ax_flux.get_xticks()  # Get the current tick locations
ax_flux.set_xticks(tick_locs_flux)  # Set the same tick locations
ax_flux.set_xticklabels(merged_data['Date'][::len(merged_data['Date'])//len(tick_locs_flux)][:-1], rotation='vertical')

# Show the combined figure!
plt.show()
