<img src='./img/LogoWekeo_Copernicus_RGB_0.png' alt='' align='centre' width='30%'></img>

# Using Copernicus data to investigate Marine Heatwaves

    Version: 3.0
    Date:    14/07/2020
    Author:  Hayley Evers-King (EUMETSAT) and Ben Loveday (InnoFlair, Plymouth Marine Laboratory)
    Credit:  This code was developed for EUMETSAT under contracts for the European Commission Copernicus 
             programme.
    License: This code is offered as open source and free-to-use in the public domain, 
             with no warranty, under the MIT license associated with this code repository.

**What is this notebook for?**

This notebook will download EUMETSAT Sentinel-3 SLSTR data for composite plotting, as well as CMEMS time series data of SST, from both reprocessed and NRT data stream. The notebook covers the application of some simple plotting techniques and application of basic analysis to investigate both historical and current potential for the occurrence of marine heat waves.

**What specific tools does this notebook use?**

Beyond general Python modules, this notebook imports some functions for managing the harmonised data access api (harmonised_data_access_api_tools.py) which can be found in the wekeo-hda folder on JupyterLab, and additional libraries for marine heatwave analysis provided openly and based on the work of <a href="https://github.com/ecjoliver/marineHeatWaves"> Hobday et al.</a>.

**What are marine heatwaves and how can Copernicus data be used to observe them?**

Like heatwaves on land, marine heatwaves are extended periods of higher than normal temperatures. They have occurred in different areas around the world and can be caused by different oceanographic driving forces. You can find out all about them <a href="http://www.marineheatwaves.org/all-about-mhws.html">here</a>. Marine heatwaves can be devastating for marine life, particularly those that can suffer from thermal stress, such as coral reefs. 


The variable drivers, and historical context for defining heatwaves regionally, mean that we need the ability to measure the range of ocean temperatures that occur all over the world at any given time, and also a long-term base-line understanding of what ‘normal’ temperatures look like.

Sea surface temperature measurements from satellites can support the identification of marine heatwaves, both through the daily measurements they make, and their contributions to long-term data records. The Sentinel-3 satellites are the Copernicus programme's contribution to climate-scale monitoring of sea surface temperatures, with the Sea and Land Surface Temperature Radiometer (SLSTR) on Sentinel-3A now able to function as a reference sensor, when combining multiple sea surface temperature data sets, such as those available from the <a href="http://marine.copernicus.eu">Copernicus Marine Service</a>.

In this notebook we will work through an example of how you can access near-real-time data to view current SST in a region that is often affected by marine heatwaves. We will then look at this area in a long term context using a reprocessed time series, to see how the current situation compares to historical marine heat wave episodes.

<div class="alert alert-block alert-warning">
    <b>Get WEkEO User credentials</b>
<hr>
If you want to download the data to use this notebook, you will need WEkEO User credentials. If you do not have these, you can register <a href="https://www.wekeo.eu/web/guest/user-registration" target="_blank">here</a>.



***

Let's get started! Python is divided into a series of libraries, packages, and modules that each contain a series of methods for specific tasks. The box below imports everything we need to complete the tasks in this notebook including data access, manipulation, analysis and plotting. 

In [None]:
# standard tools
import os, sys, json
from zipfile import ZipFile
import shutil
import sys
import datetime
import numpy as np
from IPython.core.display import HTML
import xarray as xr
import matplotlib.pyplot as plt
from matplotlib import gridspec
import glob
import warnings
warnings.filterwarnings("ignore")

# specific tools (which can be found here ../../tools/)
sys.path.append(os.path.join(os.getcwd(),'tools'))
sys.path.append(os.path.join(os.getcwd(),'tools','mhw_master'))

import image_tools as img
import SST_plotting_tools as sstp
import marineHeatWaves as mhw

#### Install the WEkEO HDA client

The WEkEO HDA client is a python based library. It provides support for both Python 2.7.x and Python 3.

In order to install the WEkEO HDA client via the package management system pip, you have to running on Unix/Linux the command shown below.

In [None]:
pip install hda

Please verify the following requirements are installed before skipping to the next step:
   - Python 3
   - requests
   - tqdm

#### Load WEkEO HDA client

The hda client provides a fully compliant Python 3 client that can be used to search and download products using the Harmonized Data Access WEkEO API.
HDA is RESTful interface allowing users to search and download WEkEO datasets.
Documentation about its usage can be found at https://www.wekeo.eu/.

In [None]:
from hda import Client

To run this script, we will download some data from WEkEO harmonised data access. WEkEO provides access to a huge number of datasets through its **'harmonised-data-access'** API. This allows us to query the full data catalogue and download data quickly and directly onto the Jupyter Lab. You can search for what data is available <a href="https://wekeo.eu/data?view=catalogue">here</a>

In order to use the HDA client we need to provide some authentication credentials. Each user first makes sure the file "$HOME/.hdarc" exists with the URL to the API end point and your user and password.

For example, to search for the file .hdarc in the $HOME diretory, the user would open a terminale and run the following command:


Then he could copy the code below in the file "$HOME/.hdarc" (in your Unix/Linux environment) and adapt the following template with the credentials of your WEkEO account:

In [None]:
# where the data should be downloaded to:
download_dir_path = os.path.join(os.getcwd(),'products')
# where we can find our data query form:
JSON_query_dir = os.path.join(os.getcwd(),'JSON_templates','mhw')
# HDA-API loud and noisy?
verbose = False

# make the output directory if required
if not os.path.exists(download_dir_path):
    os.makedirs(download_dir_path)

Now we are ready to get our data. 

In [None]:
# SLSTR L2 SST KEY
dataset_id = "EO:EUM:DAT:SENTINEL-3:SL_2_WST___"
download_data = True #Set this to False if you've already downloaded the data!

We use our dataset ID to load the correct JSON query file from ../JSON_templates/mhw/

You can edit this query if you want to get different data, but be aware of asking for too much data - you could be here a while and might run out of space to use this data in the JupyterLab. The box below gets the correct query file.

In [None]:
# find query file
JSON_query_file = os.path.join(JSON_query_dir,dataset_id.replace(':','_')+".json")
if not os.path.exists(JSON_query_file):
    print('Query file ' + JSON_query_file + ' does not exist')
else:
    print('Found JSON query file for '+dataset_id)

Now we have a query, we need to launch it to WEkEO to get our data. The box below uses directly the client to download data.

This is quite a complex process, so much of the functionality has been buried 'behind the scenes'. If you want more information, you can check out the <a href="./wekeo-hda">Harmonised Data Access API </a></span> tutorials. The code below will report some information as it runs. At the end, it should tell you that one product has been downloaded.

In [None]:
if download_data:

    # set maximum results to find to make sure we capture JSON pagination
    total_results = 1e6

    # load the query
    with open(JSON_query_file, 'r') as f:
        query = json.load(f)
    
    
    # download data
    print('Downloading data...')
    c = Client(debug=True)

    matches = c.search(query)
    print(matches)
    matches.download()

Sentinel data is usually distributed as a zip file, which contains the SAFE format data within. To use this, we must unzip the file. The code below handles this.

In [None]:
if download_data:
    # unzip file
    HAPI_dict = []
    for filename in os.listdir(os.getcwd()):
        if os.path.splitext(filename)[-1] == '.zip':
            HAPI_dict.append(filename)
            print('Unzipping file')
            try:
                with ZipFile(filename, 'r') as zipObj:
                    # Extract all the contents of zip file in current directory
                    zipObj.extractall(os.path.dirname(filename))

                # clear up the zip file
                os.remove(filename)
            except:
                print("Failed to unzip....")

***


### Plotting SLSTR data

We plot our SLSTR scenes using a function that manages data ingestion, flagging, bias correction and makes some map embelishements (e.g. adds dotted lines to the scene edges, so we can tell where the boundaries are). We call this function for each image in the boxes further down.

We start by finding all the necessary files (which glob.glob takes care of). The files are added to a list which is then sent to our large function above.

In [None]:
# verbosity
verbose = False

# figure options
figure_font_size = 20
plot_extents = [-148, -120, 32, 47]
vmin = 8
vmax = 20
grid_factor = 3 #sub-sample to reduce plot resolution
# get the files
SLSTR_files = glob.glob(os.path.join(download_dir_path,'S3*WST*202106*','*.nc'))

And now we pass our list of files to the plotting routine. The plotting routine returns the handles of our axes, so that we can still make some changes once the main plot is done (e.g. add the annotations). Finally, it will save the figure in the same directory as this notebooks.

In [None]:
# make the plot: we will call this as a function as it contains a 'for' loop to make the plot
fig, axis, colbar = sstp.make_SLSTR_composite_plot(SLSTR_files, plot_extents=plot_extents,\
                                              fsz=figure_font_size, vmin=vmin, vmax=vmax, grid_factor=grid_factor)

# add some embellishments
plt.savefig('SLSTR_All_SST_California_20210617.png',bbox_inches='tight')

You can now find the image in the folder where your code is stored in your instance of the JupyterLab. The image is a composite of three images from the SLSTR sensors aboard the Sentinel-3A and B satellites which captured warm temperatures around the eastern Pacific in June 2021. This highlights the increased coverage that can be achieved in both time and space, using multiple sensors, but in order to understand if these temperatures could be an indicator of the beginning of a marine heatwave we need some longer term context...

***

### Looking at time series data of sea surface temperature

To place the images above in context we'll need to look at a time series of data. For this we can access data from the Copernicus services, where multiple data sources (different satellites etc) are combined to produce data sets that cover longer time periods. In this case we are going to look at the OSTIA Sea Surface Temperature using both the NRT and reprocessed data streams, so we can look at data right up to the time period of the SLSTR images we looked at previously. 

Reprocessed and NRT data streams are produced separately and we must be careful when we interpret these data sets together. As time progresses, we understand better how satellite instruments perform, algorithms improve, and data is reprocessed to ensure good quality and consistency between sources. With NRT we do not have all the information we have in hindsight, particularly about instrument characterisation, but this data is vitally important for understanding events that are happening right now. So, for example, you would not use combined NRT and reprocessed data sets for long-term trend analysis, and you would want to consider NRT measurements with a higher degree of uncertainty that you might consider with reprocessed data. So whilst we can use the NRT data now to get an indication of whether this is event is looking like it might be significant, we would eventually want to use a longer term time series that have been reprocessed, to establish how unusual this is event is in a more climatic context.

As before, we will construct and submit a query to the WEkEO Harmonised Data Access API to get this data. Here we have supplied the JSON files for the data sets of interest, but you could edit these files to look at your own time frames/regions of interest etc.

In [None]:
# makes a date array for the REP product
start_dates = []
end_dates = []
dataset_ids = []
for ii in range(2008,2019+1):
    start_dates.append(str(ii)+"-01-01T00:00:00.000Z")
    end_dates.append(str(ii)+"-12-31T00:00:00.000Z")
    dataset_ids.append("EO:MO:DAT:SST_GLO_SST_L4_REP_OBSERVATIONS_010_011")

# add the NRT product
for ii in range(2020,2021+1):
    start_dates.append(str(ii)+"-01-01T00:00:00.000Z")
    end_dates.append(str(ii)+"-12-31T00:00:00.000Z")
    dataset_ids.append("EO:MO:DAT:SST_GLO_SST_L4_NRT_OBSERVATIONS_010_001")

# start loop over dates
if download_data:

    for start_date, end_date, dataset_id in zip(start_dates, end_dates, dataset_ids):
        
        # find query file
        JSON_query_file = os.path.join(JSON_query_dir,dataset_id.replace(':','_')+".json")
        if not os.path.exists(JSON_query_file):
            print('Query file ' + JSON_query_file + ' does not exist')
        else:
            print('Found JSON query file for '+dataset_id)

        print('Running for: ' + start_date + ' to ' + end_date)

        date_tag = start_date.split('T')[0] + '_' + end_date.split('T')[0]
        date_string = start_date.split('T')[0].replace('-','') \
                      + '_' + end_date.split('T')[0].replace('-','')

        # load the query
        with open(JSON_query_file, 'r') as f:
            query = f.read()
            query = query.replace("%DATE_START%",start_date)
            query = query.replace("%DATE_END%",end_date)
            query = json.loads(query)
        

        # download data
        print('Downloading data...')
        c = Client(debug=True)

        matches = c.search(query)
        print(matches)
        matches.download()

***

### Plotting time series data to investigate occurrence and potential for marine heatwaves 

First we will set up some parameters including times and space we want to work over when plotting.

In [None]:
# The data product time is measured in seconds since the following refernce date
Date_ref = datetime.datetime(1981,1,1)

# Select the times we want to use for our spatial anomaly plots [month, day]
month_day_start = [1,1]
month_day_end = [12,31]

# Plotting font size
fsz = 30

Then we will find our datasets and concatenate them.

In [None]:
SST_files = []
my_files = glob.glob(os.path.join(download_dir_path,'METOFFICE-GLO-SST-L4-*'))
for SST_file in sorted(my_files):
    SST_files.append(SST_file)

The next cell performs the bulk of the processing work for our plot. We start by loading in coordinates that we can use to subset the data (if required) and make our plots. Subsequently, we loop through the SST products and read in the data, correct Kelvin to Celsius and perform our averaging in either time and space.

In [None]:
#load the co-ordinate variables we need for subsetting/plotting
ds = xr.open_dataset(SST_files[-1])
lat = ds.lat.data
lon = ds.lon.data
ds.close()

# initialise lists for output times series variables for MWH    
all_times = []
all_SST = []

# initialise arrays for output SST fields
iter_SST = np.ones([len(SST_files), len(lat), len(lon)])*np.nan

# now we get the area-averaged data
count = -1
for SST_file in SST_files:
    count = count + 1

    # xarray does not read times consistently between RAN and NRT, so we load as integer
    ds = xr.open_dataset(SST_file, decode_times=False)
    this_SST = ds.analysed_sst.data
    times = ds.time.data
    ds.close()

    my_times = []
    for time in times:
        my_times.append(Date_ref + datetime.timedelta(seconds=int(time)))
    times = np.asarray(my_times)
    
    t0 = datetime.datetime(times[0].year, month_day_start[0], month_day_start[1])
    t1 = datetime.datetime(times[0].year, month_day_end[0], month_day_end[1])
    tt = np.where((times >= t0) & (times <= t1))[0]

    if np.nanmean(this_SST) > 100:
        this_SST = this_SST - 273.15

    time_subset_SST = np.nanmean(np.nanmean(this_SST[tt,:,:],axis=1),axis=1)

    iter_SST[count,:,:] = np.squeeze(np.nanmean(time_subset_SST, axis=0))

    all_times.append(my_times)
    all_SST.append(np.nanmean(np.nanmean(this_SST, axis=1), axis=1))


A little bit of formatting is necessary for the plots we want to make...

In [None]:
# flatten the SST list
SST_time_series = [item for sublist in all_SST for item in sublist]
SST_time_series = np.asarray(SST_time_series)

# format the dates for the MWH toolkit
Dates_time_series = [item for sublist in all_times for item in sublist]
Dates_time_series_formatted = [datetime.date.toordinal(tt) for tt in Dates_time_series]

# make climatology of time subset region
clim_SST = np.nanmean(iter_SST, axis=0)

# calculate heat waves
mhws, clim = mhw.detect(np.asarray(Dates_time_series_formatted), SST_time_series)

# make matrix
stripe_array = np.ones([20,len(SST_files)])*np.nan

# now make the anomaly
for ii in range(len(SST_files)):
    stripe_array[:, ii] = np.nanmean(np.squeeze(iter_SST[ii,:,:]) - clim_SST)

The first plot we'll make, shows ‘stripes’ of the average sea surface temperature anomaly for our region of interest. Recently, ‘climate stripes’  have been used by 'citizen scientists' all over the world to show long-term trends in regional temperatures. The plot below shows a stripes-style graphic derived using the SST time series we have extracted. High anomalies are apparent during 2014, in 2015 during the previous marine heatwave, and were associated with reports in 2019 and 2020.


In [None]:
fig = plt.figure(figsize=(30,10), dpi =300)
vmax = np.nanmax(abs(stripe_array))

date_ticks = []
for ii in range(len(SST_files)):
    date_ticks.append(str(2008+ii))
    
plt.pcolormesh(stripe_array[:,:],vmin=vmax*-1,vmax=vmax,cmap=plt.cm.RdBu_r)
plt.xticks(np.arange(len(SST_files))+0.5,date_ticks, rotation='90', fontsize=fsz/1.25)
plt.xlim([0,len(SST_files)-1])
plt.yticks([],[], fontsize=fsz/1.25)
cbar = plt.colorbar()
cbar.set_label('SST anomaly [$^{o}$C]',fontsize=fsz/1.25)
plt.savefig('Stripes.png')

In [None]:
# plot MWHs
ev = np.argmax(mhws['intensity_max']) # Find largest event

fig = plt.figure(figsize=(35,15), dpi = 300)
plt.rc('font', size=fsz)

# Find indices for all n-MHWs before and after event of interest and shade accordingly
n_before=10
n_after=10
for ev0 in np.arange(ev-n_before, ev+n_after, 1):
    t1 = np.where(Dates_time_series_formatted==mhws['time_start'][ev0])[0][0]
    t2 = np.where(Dates_time_series_formatted==mhws['time_end'][ev0])[0][0]
    p1 = plt.fill_between(Dates_time_series[t1:t2+1], SST_time_series[t1:t2+1],\
                          clim['thresh'][t1:t2+1], color=(1,0.85,0.85))
    
# Find indices for MHW of interest and shade accordingly
t1 = np.where(Dates_time_series_formatted==mhws['time_start'][-1])[0][0]
t2 = np.where(Dates_time_series_formatted==mhws['time_end'][-1])[0][0]
p2 = plt.fill_between(Dates_time_series[t1:t2+1], SST_time_series[t1:t2+1],\
                      clim['thresh'][t1:t2+1], color='r')

# Plot SST, seasonal cycle, threshold, shade MHWs with main event in red
p3, = plt.plot(Dates_time_series, SST_time_series, 'k-', linewidth=2)
p4, = plt.plot(Dates_time_series, clim['thresh'], 'b--', linewidth=2)
p5, = plt.plot(Dates_time_series, clim['seas'], 'b-', linewidth=2)

xmin = datetime.datetime(2013,1,1)
xmax = datetime.datetime(2021,12,31)
plt.xlim(xmin,xmax)

plt.ylim(clim['seas'].min() - 0.3, clim['seas'].max() + mhws['intensity_max'][ev] + 0.2)
plt.ylabel('SST [$^\circ$C]')
plt.annotate('Plotting script credit:\nhttps://github.com/ecjoliver/marineHeatWaves',\
             (0.005, 0.935), xycoords='axes fraction', color='0.5', fontsize=fsz/1.25)

leg = plt.legend([p3, p5, p4, p2, p1],\
                 ['OSTIA SST','SST Seas. clim.','SST Seas. thresh.','Recent heatwave','Past heatwave'],\
                 bbox_to_anchor=(0.5, -0.3), ncol=5, loc="lower center")

leg.get_frame().set_linewidth(0.0)
plt.savefig('MHW.png')

This is just an example of the sorts of analyses that can be developed with SST data for marine heatwaves. The diversity of data available through Copernicus programme allows for the investigation of this phenomena at both the event and climate scales. To extend this analysis you could use a longer time series of solely reprocessed data to determine climate related trends, you could also routinely compare NRT data to a climatology (with the caveat of greater uncertainty associated with the NRT data source) for any region of interest. 


<img src='./img/all_partners_wekeo.png' alt='' align='center' width='75%'></img>

<p style="text-align:left;">This project is licensed under the <a href="./LICENSE">MIT License</a> <span style="float:right;"><a href="https://github.com/wekeo/wekeo-jupyter-lab">View on GitHub</a> | <a href="https://www.wekeo.eu/">WEkEO Website</a> | <a href=mailto:support@wekeo.eu>Contact</a></span></p>