### `Analysis of 1D Timeseries SST Data`

<div class = "alert alert-info" role = "alert" 
     style = "font-size: 1.2em; padding: 15px; margin: 0px 0; text-align: center">
    
    In this tutorial, we will be assessing if Ocean Temperatures have been changing 
    over time in our area of interest.

</div>

<div class="alert alert-info" role="alert"
     style="color:#000; font-size:1.1em; background-color:white; padding:10px; margin:1px; text-align:left;">


### Daily [Sea Surface temperature (SST)](https://data.marine.copernicus.eu/product/SST_GLO_SST_L4_REP_OBSERVATIONS_010_011/description) product available from the Copernicus Marine Service, [CMEMS](https://marine.copernicus.eu).

### ▶ For data downloading, YOU need to follow these steps:
- `Regsiter` on CMEMS so you can download their freely available datasets. You should get an email validation link within minutes.
- Once your account is validated, log in, and type `OSTIA` in the `FREE TEXT SEARCH` box (top left-hand-side), press enter to load results.
- Hover over the map thumbnail then click on the blue `VIEW PRODUCT` banner -- this should open the `MyOcean Pro` global map.
- Navigate to Fuerteventura, the second largest of Spain’s Canary Islands, where an endemic species of limpet is facing extinction despite concerted conservation efforts. 
- Click on any part of the map near Fuerteventura - the pop-up graph from your selected point in space should show SST over time. 
- Click on `...` on `Graph Options` and export the SST data in **`CSV`** (Comma Separated Values) format.
- Move your downloaded CSV file to the `data` folder nested in the folder your are running your Jupyter Notebooks from.

</div>

<div class="alert alert-info" role="alert" 
     style="font-size: 1.em; text-align: left; border-radius: 8px;">

     Assessing SST... 
     How would we instruct a computer to do this for us? 
     
     When you're still learning to code, it helps to first write out the steps you would need to analyse your data. 
     Code - whether it is Python or R - is a type of instructive language that a computer "understands". In other words, 
     coding is the process of giving your computer clear instructions so that it can do your work for you.
     

</div>

<div class="alert alert-info" role="alert"
     style="color:#000; font-size:1.1em; background-color:white; padding:10px; margin:1px; text-align:left;">

    
### ▶ For data processing, Python needs to follow these steps:
1. Import the libraries needed for (i) data analysis (`numpy`, `pandas`) and (ii) plotting (`matplotlib`).
1. Load the CMEMS SST data into a neat table that is easy to work with, say a `dataframe` called sst_df.
1. Let me LOOK at the information inside the dataframe I just made:
    - was this import successful?
    - are my successfully imported data in an ideal format?
1. If no, help me to fix my problems. For example, convert `UTC` to a Date format that is nicer to work with.
1. Let me see variations of SST through time with a `line plot` (Fig 1).
2. Help do some analysis, like finding the `min`, `max`, and `mean` SST.
3. Add a red horizontal line of `mean` plus stars to show positions of the `min` and `max` SST (Fig 2).
1. Calculate `slope` and `intercept` (linear regression of SST over time); add this to my plot (Fig 3).
1. Check if the change in SST we see here is statistically significant using a `Mann-Kendall` test.
1. Show me features and distributions of SST over the full time series with a table and a `boxplot`.

</div>

### Step 1 ▶ ALWAYS first import the `libraries` you'll need -- basic setup

In [None]:
# Data Handling and Manipulation
import os
import numpy as np
import pandas as pd

# Date Handling and Manipulation
import matplotlib.dates as mdates
from matplotlib.dates import num2date

# Data Visualisation and Plots
import matplotlib.pyplot as plt

# Statistical Testing
import pymannkendall as mk
from scipy.stats import linregress

# Supress Warnings (not errors)
import warnings
warnings.filterwarnings('ignore')

### Step 2 ▶ Import the data into a dataframe using `pandas` `read_csv` function

In [None]:
# Step 1: - Give directions to your data (set the path) 
#         - Give your long filename a short name, file

path = "data/"
file = "P3_TestData_MET-Global-SST-L4.csv"


In [None]:
# Step 2: Import your sst data into pandas dataframe, df_sst

df_sst = pd.read_csv(os.path.join(path, file), comment = '#')   #hashtag prevents text from being read


### Step 3 ▶ Use `info` and `print` to look inside dataframe, and check success

In [None]:
# Info Dump!
df_sst.info()

In [None]:
# Print first n lines using .head(n)
# or use .tail(n) to print last few!
print(df_sst.tail(3))

<div class="alert alert-info" role="alert" 
     style="font-size: 1.0em; text-align: center; border-radius: 8px;">
    
    Dates are provided as 'time' in UTC (Coordinated Universal Time), which can be tricky to work with. 
    To simplify, we can convert time to dates using the datetime function in the pandas library. 

</div>

### Step 4 ▶ Basic Data Analysis including use of `pandas` `datetime` function to fix Dates

<div class="alert alert-info" role="alert"
     style="color:#000; font-size:1em; background-color:white; padding:10px; margin:1px; text-align:left;">

#### ⇑⇓ Quick Reminder ⇑⇓
    By default, imported dates / times will be data type 'object'. Here, 'time' was in UTC (Coordinated Universal Time), 
    which is tricky to work with. 
    
    We will now (1) apply pandas.to_datetime function to successfully convert the 'object' time to 'datetime64[ns]' and 
    (2) save output in a new column 'Dates' in dataframe, df_sst. 
    
    This is going to be easier to work with, particularly for the upcoming plotting of our data over the time-series.
    
    
</div>    

In [None]:
# Convert object 'time' to datetime 'Dates'
#- Add new 'Dates' column to df_sst and fill it with converted UTC 'time' values 
df_sst['Dates'] = pd.to_datetime(df_sst['time'], format = '%Y-%m-%dT%H:%M:%S.%fZ', 
                                 utc = True).dt.floor('T').dt.tz_convert(None)
print(df_sst['time'].dtype )
print(df_sst['Dates'].dtype)
print() # --- prints an empty line ---

print(df_sst.head(3))

In [None]:
# OPTIONAL commands for neatening dataframe:

# 1: Drop the 'time' column  
# - we won't need it as we have 'Dates'
df_sst = df_sst.drop(columns = ['time'])

# 2: Shuffle columns so Date is 1st again
df_sst = df_sst[['Dates','analysed_sst']]

# 3. Rename 'analysed_sst' header for simplicity
df_sst.rename(columns = {'analysed_sst': 'SST'}, 
              inplace = True)

# Display with print
print(df_sst.head(5))

<div class="alert alert-info" role="alert"
     style="color:#000; font-size:1em; background-color:white; padding:10px; margin:1px; text-align:left;">

#### ⇑⇓ Quick Reminder ⇑⇓

    To make a line of code 'active' simply remove the hashtag then run the cell.
    
    Thus, if you want to save your nicely formatted dataframe, df_sst, remove the hashtag from the cell below and 
    run the code to create a new .csv (comma-separated) file on your computer. 
    ---
` Tip: Check the directory where you are running this notebook from to find it.`
    
</div>


In [None]:
#df_sst.to_csv('My_SST_datafile.csv', index = False)

### Step 4 ▶ Basic Data Analysis including use of `year.unique` function to count Years

In [None]:
# Use 'nunique()' to count number of years
num_yrs = df_sst['Dates'].dt.year.nunique()

# Length of dataframe = number of days
num_day = len(df_sst)

# Print number of years, days in timeseries 
print("My Timeseries has: ")
print(' -',num_yrs, "years") 
print(' -',num_day, "days ")

### Step 5 ▶ `Line plot` of SST over the full time-series

In [None]:
# Figure 1:
fig1, ax = plt.subplots(figsize = (15, 4))

# Plot the data using 's' square markers in blue and a dotted line ':'
plt.plot(df_sst['Dates'], df_sst['SST'], linestyle = ':', marker = 's', 
         markersize = 2, color = 'royalblue')

# Format x,y axes
# Add bold labels
ax.set_xlabel('Time', fontweight = 'bold')
ax.set_ylabel('Sea Surface Temperature (°C)', weight = 'bold')

# Format figure
# Add gridlines
ax.grid(True, color = 'silver', linestyle = ':', linewidth = 0.5)

# Add title with f-string
ax.set_title(f"Sea Surface Temperatures over {num_yrs} years", 
             fontsize = 12)     # Change the size of the font

# Show legends
ax.legend()

# Show the plot
plt.show()

### Step 6 ▶ Basic Data Analysis including finding of `min`, `max`, and `mean` SST

In [None]:
# We can calculate data features with pandas, pd

# MEAN SST
mean_sst = df_sst['SST'].mean()
# round to n=2 decimal places
mean_sst = round(mean_sst, 2)

# Print Mean
print('SST °C')
print('mean =', mean_sst)

# MIN, MAX
# Find position (idx) of min and max sst
idx_min, idx_max = df_sst['SST'].idxmin(), df_sst['SST'].idxmax()

# Rows with date, min and max SST values
sst_min, sst_max = df_sst.loc[[idx_min]],  df_sst.loc[[idx_max]]

# Actual min SST value, rounded to 2dps
min_sst = round(df_sst['SST'].min(), 2)
# Actual max SST value, rounded to 2dps
max_sst = round(df_sst['SST'].max(), 2)

# Print Min
print('min  =', min_sst)
# Print Max
print('max  =', max_sst)

### Step 7 ▶ `Line plot` of SST over time ['Dates'] with `Mean`, `Min`, and `Max` shown

In [None]:
# Figure 2:
fig2, ax = plt.subplots(figsize = (15, 4))

# Plot the data using 's' square markers in blue with dotted line ':'
# This time, also add a label to your SST 
plt.plot(df_sst['Dates'], df_sst['SST'], linestyle = ':', marker = 's', 
         markersize = 2, color = 'royalblue',
         label = 'CMEMS SST °C')

# Plot the mean as a red dashed line '--' of 1.5 width, add a label
plt.plot([df_sst['Dates'].min(), df_sst['Dates'].max()], [mean_sst, mean_sst],
         linestyle = '--', color = 'red', linewidth = 1.5, 
         label = f'Mean: {mean_sst}°C')

# Plot a star marker to highlight min and max SST at their Dates
plt.plot(sst_min['Dates'], sst_min['SST'], marker = '*', markersize = 10, 
         color = 'navy' )
plt.plot(sst_max['Dates'], sst_max['SST'], marker = '*', markersize = 10, 
         color = 'orangered')

# Add axis labels using bold font, size 11
ax.set_xlabel('Time', fontsize = 11, weight = 'bold')
ax.set_ylabel('Sea Surface Temperature (°C)', fontsize = 11, weight = 'bold')

# Grid formatting
ax.grid(True, color = 'silver', linestyle = ':', linewidth = 0.5)

# Display
ax.legend()
plt.show()

### Step 8 ▶ `Line plot` of SST over time ['Dates'] with addition of `Linear Regression`

<div class="alert alert-info" role="alert" 
     style="font-size: 1.1em; text-align: center; border-radius: 8px;">

    Linear Regression: Ordinary Least Squares (OLS)
</div>

In [None]:
# Make Dates Numeric for Regression
x = mdates.date2num(df_sst['Dates'])
y = df_sst['SST']

# Linear Regression:
# slope m, intercept c
m, c, *_ = linregress(x, y)

# Change over timeseries
change = m * num_day

print(f"Slope,m : {m :.5f}°C per day")
print(f"Change  : {change:.2f}°C over time-series ({num_day} days)")

In [None]:
# prepare fit line for plotting (next cell)
x_fit = np.linspace(x.min(), x.max(), 100)
y_fit = c + m * x_fit

In [None]:
# Figure 3:
fig3, ax = plt.subplots(figsize = (15, 4))

# Plot the data using 's' square markers in blue with dotted line ':'
plt.plot(df_sst['Dates'], df_sst['SST'], linestyle = ':', marker = 's', 
         markersize = 2, color = 'royalblue',
         # add a label
         label = 'CMEMS SST °C')

# Plot the regression fit line in red, 1.5 width
ax.plot(num2date(x_fit), y_fit, color = 'red', linewidth = 1.5, 
        # add a label
        label = f"y = {m :.5f}x + {c :.2f}")

# Add axis labels using bold font, size 11
ax.set_xlabel('Time', fontsize = 11, weight = 'bold')
ax.set_ylabel('Sea Surface Temperature (°C)', fontsize = 11, weight = 'bold')

# Grid formatting
ax.grid(True, color = 'silver', linestyle = ':', linewidth = 0.5)

# Display
ax.legend()
plt.show()

### Step 9 ▶ Test if the change in SST over time (`trend`) is statistically `significant`

<div class="alert alert-info" role="alert" 
     style="font-size: 1.1em; text-align: center; border-radius: 8px;">

    Statistical Testing: Mann-Kendall Test
</div>

#### The `Mann-Kendall` Test is used to determine whether or not a significant `trend` exists in timeseries data. It is a `non-parametric` test, meaning there is no underlying assumption made about `normality`.

    `OUTPUT-`
    - 0. trend: This tells the trend-increasing, decreasing, or no trend.
    - 1. h: True if the trend is present. False if no trend is present.
    - 2. p: The p-value of the test.
    - 3. z: The normalized test statistic.
    - 4. Tau: Kendall Tau.
    - 5. s: Mann-Kendal’s score
    - 6. var_s: Variance S
    - 7. slope: Theil-Sen estimator/slope
    - 8. intercept: Intercept of Kendall-Theil Robust Line

#### If you are worried about autocorrelation in your data, use a modified Mann Kendall test instead.

In [None]:
# Perform Mann Kendall Test
trend_mk = mk.original_test(df_sst['SST'])

# All MK Output
print(trend_mk)

In [None]:
# Results in a friendly format
print("Friendly format M_K")
print("• Trend:", trend_mk[0])
print("• P-val:", "%.6f" % trend_mk[2])
print("• Slope:", "%.6f" % trend_mk[7])

### Step 10 ▶ `Tabulate` key features and show distribution of SST values with a `boxplot`

<div class="alert alert-info" role="alert" 
     style="font-size: 1.em; text-align: center; border-radius: 8px;">
    
    Tabulate SST Parameters over the Time series
    
</div>

In [None]:
params = {"Min °C": [min_sst],                         # already calculated min SST
          "Max °C": [max_sst],                         # already calculated max SST
          "Median": [round(df_sst['SST'].median(),2)], # find median, round to 2 dps
          "Mean  ": [mean_sst],                        # already calculated mean SST
          "Sig. Trend": [trend_mk[0]]}                 # from Mann-Kendall stat test

header = ["SST"]

# General table formatting as dataframe
tabledf = pd.DataFrame(params, 
                       index = header)
# Show Summary Table
print("Table 1: Key SST Features over Time-Series.")
display(tabledf)

<div class="alert alert-info" role="alert" 
     style="font-size: 1.em; text-align: center; border-radius: 8px;">
    
    Boxplot SST Distribution over the Time series
    
</div>

In [None]:
# Figure 4
fig4, ax = plt.subplots(figsize = (4, 4))

# Boxplot:
# - data and label
plt.boxplot(df_sst['SST'], labels = ['SST'],       # required: sst data and label name
           widths = 0.4, patch_artist = True,      # optional: box width, colour patch
           boxprops = dict(facecolor = 'silver'),  # optional: colour to fill box
           medianprops = dict(color  = 'red'))     # optional: colour for median line

# Add gridlines
ax.grid(True, linestyle = ':', 
        linewidth = 0.5, color = 'grey')

# Add a y-label
ax.set_ylabel('Distribution (°C)')

# Add figure title
ax.set_title(f"SST over {num_yrs}-year timeseries")

# Show the plot
plt.show()

In [None]:
# Save your figure by uncommenting the next line of code
#fig4.savefig('SST_Boxplot.png', dpi = 300, bbox_inches = 'tight')