# Using the Fal river mooring data to study the physical oceanography and alkalinity #

## Introduction
This notebook explains how we can visualise the mooring data and then use them to estimate alkalinity

### Loading some Python software packages

To begin with we need to first load some Python packages.

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
from mpl_toolkits.basemap import Basemap
from netCDF4 import Dataset
# Install basemap-data-hires

### Loading the mooring in situ data
Next we need to load the in situ data. This is very simple using the Python Pandas library where we can use the .read_csv() function, which can load comma seperated values (csv) or tab seperated Values (tsv) files. If you are using a csv file the the 'sep' keyword (the delimeter) should be changed to a comma ',' but if you are using a tsv file then the 'sep' keyword can be the sep='\t' option (which selects tab as the delimeter). Additionally, the 'index_col' keyword is set to 0 to define that the first column in the data are simply indexing/counting the rows (i.e its just a count rather than being some actual data). You can try removing this with the example data to see what happens.

Now we will use the 'Alkalinity Buoy.csv' data, or you can input your own data file. We then show the first 5 rows of the data using the .head() function and you can see the bottom 5 rows by changing this to .tail()

In [None]:
# Load data file
region_data = pd.read_csv('alkalinity_buoy_hourly.csv', sep=',', index_col=0, nrows=200)
# Show small proportion of the data
region_data.head(5)

### Preparing to plot the mooring data
We want to plot a time series of the data that were recorded. Here we will first just plot a few hours of data. The cell below finds the number of hours since the first measurement and then creates a new column in the Dataframe to show these values.

In [None]:
# Initialise the new Dataframe column and fill with a hold value
region_data['Hours_since'] = 'hold value'

# Produce a datetime object for the first recording 
# - the zeros in the line below show it's the first row (index starts at zero)
start_date = dt.datetime(region_data.loc[0,'Year'],region_data.loc[0,'Month'],region_data.loc[0,'Day'],
                            region_data.loc[0,'Hour'],region_data.loc[0,'Minute'],region_data.loc[0,'Second'])

# Loop over all rows in the Dataframe - i.e from 0 to the length of the Dataframe
for i in range(0,len(region_data)):
    # Get the date time object for the currently indexed recording - indexed by i
    future_date = dt.datetime(region_data.loc[i,'Year'],region_data.loc[i,'Month'],region_data.loc[i,'Day'],
                              region_data.loc[i,'Hour'],region_data.loc[i,'Minute'],region_data.loc[i,'Second'])
    
    # Find difference between current datetime and inital datetime
    day_diff = future_date - start_date
    
    # Fill Dataframe column with time difference in seconds (found using .total_seconds()) 
    # divided by 86400 (proportion of days that have passed)
    region_data.loc[i,'Hours_since'] = day_diff.total_seconds()/(60*60)

To help view our data, we can filter the Dataframe to show just the 'Datetime' and 'Days_since' columns. This allows us to easily check that the previous commands have worked.

In [None]:
# Filter data to 'Datetime' and 'Days_since' columns and show first 5 rows.
region_data[['Datetime', 'Hours_since']].head(5)

### Plotting the mooring time Series  

The Matplotlib .subplots() package is ideal for creating a plot containg multiple windows and we can then use Seaborn package to create the actual time series plots - these two packages work well together as Seaborn is built on top of Matplotlib, and Seaborn also integrates easily with Pandas Dataframes. 

Producing nice looking plots with professional axes labels and colors etc. can be time consuming, but is worth it for any written work, so you can refer to Python documentation online (e.g StackOverflow examples or tutorials) for hints and tips.

In [None]:
# Set up a figure with 4 axes on it. Sharex=True means all axes will share the bottom axes (can help with clarity)
fig,ax = plt.subplots(1,2, sharex=True)
# Set figure height and width
fig.set_figheight(15), fig.set_figwidth(15)

### PLOTTING THE DATA ### (- *s indicate a plot keyword below)
# These Seaborn commands state that we want a *lineplot*, where the *data* is coming 
# from our region_data Dataframe, and we chose the *x* & *y* columns that we want, as well
# as the axis (*ax*) we want to plot on (indexed by 0 at the top and 3 at the bottom)
sns.lineplot(data=region_data, x='Hours_since', y='salinity', color='turquoise', ax=ax[0])
sns.lineplot(data=region_data, x='Hours_since', y='temperature_c', color='red', ax=ax[1])
#sns.lineplot(data=region_data, x='Days_since', y='xCO2air', ax=ax[2])
#sns.lineplot(data=region_data, x='Days_since', y='fCO2', ax=ax[3])

# Set x axis label
ax[0].set_xlabel(f'Hours since {region_data.loc[0,"Datetime"]}', fontdict={'size':15})
ax[1].set_xlabel(f'Hours since {region_data.loc[0,"Datetime"]}', fontdict={'size':15})

# Set y label for each axis
ax[0].set_ylabel('Salinity', fontsize = 15) 
ax[1].set_ylabel('Temperature (celsius)', fontsize = 15)


# Changes how axis ticks are displayed for last two axes
# - you can comment these out with # to see the effect when removed
ax[0].yaxis.set_major_formatter('{x:9<5.2f}')
ax[1].yaxis.set_major_formatter('{x:9<5.2f}')


# Set a tight layout to remove extra space around the plots
fig.tight_layout()
# Reduce gap between top of figure and the title
fig.subplots_adjust(top=0.95)

# Show figure!
plt.show()

### Using the salinity data to estimate the alkalinity

Previous work has provided us with the salinity to alkalinity linear relationship for an example riverine environment. So we can estimate the alkalinty by using the following linear relationship:

$ Alkalinty = 889 + (37.9 x Salinity)$

What can you say about the alkalinity and how it varies with time and what is the cyclic nature to the data?

In [None]:
# Create new columns in dataframe for alkalinity
region_data['alkalinity'] = 889 + 37.9*(region_data['salinity']) 

In [None]:
# Plot alaklinity - uses same commands as above but as only one plot is required we don't need to index the axes
fig, ax = plt.subplots(1,1, figsize=(15,5))
sns.lineplot(data=region_data, x='Hours_since', y='alkalinity', color="green", ax=ax)
plt.xlabel(f'Hours since {region_data.loc[0,"Datetime"]} 12:00:00', fontdict={'size':15})
ax.set_ylabel('Alkalinity (µmol/kg)', fontsize = 15) 
ax.yaxis.set_major_formatter('{x:9<5.2f}')
plt.show()

### Displaying the mooring location on a simple map

Now we want to display the location of the mooring onto a simple map of the Fal estuary. There are multiple Python packages that could this, but here we just want a simple approach so we will use Basemap (which is part of Matplotlib). Note: if you are familar with GIS and producing Shapefiles then you could attempt to use Geopandas.

First we need to determine the minimum and maximum longitude and latitude for our data to give us an idea of the geographical region to plot.

In [None]:
# Get min and max of longitude and latitude
region_data.describe().loc[['min','max'],['Lon','Lat']]

This has just identified the location of the mooring, as it was stationary (and luckily did not go wandering off!).  The code in the cells below have been written to allow easy plotting of the mooring data but it can also be used to create your own plot for other datasets. First run the next cell which will setup the region definitions needed by the Basemap plotting function (i.e so nothing visual will happen when you run this cell). You can see that we have included a second region definition (called Agulhus) to illustrate how you can use this code for your own plots.

In [None]:
def get_coords(location):
    if location == 'Alkalinity Buoy':
        lon_min = -5.2
        lon_max = -4.9
        lat_min = 50.1
        lat_max = 50.25
        return lon_min, lon_max, lat_min, lat_max
    
    elif location == 'Agulhas':
        lon_min = 19.7
        lon_max = 20.2
        lat_min = -35.0
        lat_max = -34.7
        return lon_min, lon_max, lat_min, lat_max
    
    else:
        lon_min = input('Enter minimum longitude (most Westerly): ')
        lon_max = input('Enter minimum longitude (most Easterly): ')
        lat_min = input('Enter minimum latitude (most Southerly): ')
        lat_max = input('Enter maximum latitude (most Northerly): ')
        print('\n\n')
        if (lon_min >= lon_max) or (lat_min >= lat_max):
            print('Check if min/max were entered in the correct order (is a min greater than a max?)')
            return np.nan, np.nan, np.nan, np.nan
        
        
        return float(lon_min), float(lon_max), float(lat_min), float(lat_max)

In the cell below we are reading in the maximum and minimum values of the latitude and longitude from the Dataframe and these are then printed to the screen.

In [None]:
### Change string to 'CarrickRoads', 'Agulhas' or your own region name (use '_' for spaces)
region_name = 'Alkalinity Buoy'
# Performs function
lon_min, lon_max, lat_min, lat_max = get_coords(region_name)

# Note: if you're having problems with the input fields you can uncomment the line below 
# and  just enter the values instead, but also comment out the line above to avoid confusion.

# lon_min, lon_max, lat_min, lat_max = __ , __ , __ , __

# Print out current values
print('Current values:')
print(f'Longitude -> \t min:{lon_min} \t max:{lon_max}')
print(f'Latitude -> \t min:{lat_min} \t max:{lat_max}')

Now that we have the longitude and latitude bounds we can now create the map. The code below initialises the figure (1), defines our map (2), adds details to the plot (3), adds the gridlines (4), and then plots the location of the dataset (5).

The current resolution is set to 'i' (for intermediate) which produces a relatively coarse spatial scale map. You could install the basemap-data-hires package and then use 'f' (for full) to get a higher resolution map image.

In [None]:
# 1) Intialise figure and figure size
fig = plt.gcf()
fig.set_size_inches(20,10, forward=True)

# 2) Define the map 
# Here we have a cylindrical equidistant projection bound by our chosen latitude and longitudes,
# and a chosen resolution ('i' = intermediate)
m = Basemap(projection='cyl',
            llcrnrlat=lat_min,urcrnrlat=lat_max,
            llcrnrlon=lon_min,urcrnrlon=lon_max,
            resolution='i')

# 3) Fill land masses with green colour
m.fillcontinents(color='green')

# 4) Draw map gridlines - the 'split_lat' and 'split_lon' have been set to show a 0.05x0.05 degree 
# grid, which reflects that given in the ESA CCI data (covered next)
split_lon = round((lon_max - lon_min)/0.05) + 1
lons = np.linspace(lon_min,lon_max,split_lon)
m.drawmeridians(lons,labels=[1,0,0,1])

split_lat = round((lat_max - lat_min)/0.05) + 1
lats = np.linspace(lat_min,lat_max,split_lat)
m.drawparallels(lats,labels=[1,0,0,1])


track_lon, track_lat = m(np.asarray(region_data['Lon']),np.asarray(region_data['Lat']))
plt.scatter(track_lon,track_lat, s=10, marker='o', color='Red') 


plt.show()