## DestinE Data Streaming

This service offers compressed climate and era5 data and makes it available via a high quality and memory efficient streaming solution. The [SSIM](https://en.wikipedia.org/wiki/Structural_similarity_index_measure) and the mean relative error serve as quality measures.

<div style='white-space: nowrap', align='center'>

<div style='display:inline-block', align='center'>Era5 2 meter dewpoint temperature (01-01-1940 09:00)<br>
<img src="images/2d9_og_.jpeg" width="450px"><br><img src="images/2d9_cp_.jpeg" width="450px"><br>Mean SSIM: 0.996<br>Compression rate 1:13<br>Mean relative error 0.1 %</div>

<div style='display:inline-block', align='center'>Era 5 10 metre U wind component (01-01-1940 09:00)<br>
<img src="images/10u9_og_.jpeg" width="450px"><br><img src="images/10u9_cp_.jpeg" width="450px"><br>Mean SSIM: 0.995<br>Compression rate 1:27<br>Mean relative error 0.3 %</div>

</div>


## Prerequisites
### DestinE Platform Credentials

You need to have an account on the [Destination Earth Platform](https://auth.destine.eu/realms/desp/account).

#### ⚠️ Warning: Authorized Access Only
The usage of this notebook and data access is reserved only to authorized user groups.

## Access the Data
With a DESP account you can access the stream data proposed in this tutorial.

In [None]:
%%capture cap
%run ./auth.py

In [None]:
output = cap.stdout.split('\n')
#refresh_token = output[1]
token = output[2]

# Imports and general definitions
We start by importing necessary packages and definitions regarding the resolution and the endpoint to the streaming api.

Note: The API token must be set here including the user group. This happens in **Authentification**.

In [None]:
from dtelib2 import DTEStreamer
import numpy as np
from datetime import datetime
import plotly.graph_objects as go
import plotly.io as pio
import plotly.express as px
from IPython.display import clear_output

FORMAT = '%Y-%m-%dT%H:%M'

# Parameters for stream access

Here the parameters are set to access the data from the service.

*category_name*: Era5 has data from 1940 to 2023 </br>
*short_name*: 2t is the 2m temperature</br>
*start_date*: August 1st 1940 as the start date</br>
*end_date*: August 31st 2023 as the end date</br>
</br>

In [None]:
category_name = "era5"
short_name = "2t"
start_date = "1940-08-01T12:00"
end_date = "2023-08-31T12:00"

start_date = datetime.strptime(start_date, FORMAT)
end_date = datetime.strptime(end_date, FORMAT)

# Initializing the stream

With the DTEStreamer class we can easily access the data stream through the api and access individual data frames. 

At first, we create a DTEStreamer object with the parameters defined in the step above and the access token. The object initializes by calling the api to get meta information about the stream and the location of the stream. (You can take a look at the api yourself in the swagger [here](https://dev.destinestreamer.geoville.com/api/streaming/metadata)). Also, the ffmpeg package is used to seek to the first image according to *start_date*.

Note: Due to the very large amount of meta that is neccessary to process 83 years, this step needs a few seconds to process.

Lists are also declared for the use of data processing.

In [None]:
streamer = DTEStreamer(category_name=category_name,
                       short_name=short_name,
                       start_date=start_date,
                       end_date=end_date,
                       token=token)

august = list()
augusts = list()
years = list()

# Working with the data

The load_next_image() method is then used in a for loop to load the next image with ffmpeg with its appropriate time stamp. If the image is not representing data at noon or outside of August, the image is dismissed.

Then, the image is sliced into a rectangle that contains the data for Germany: 

![germany_08_08_1940T12_00.png](./images/germany_08_08_1940T12_00.png)

The bounds are hardcoded without georeferencing to speed up the for loop.

After that, the average of the temperature of Germany is stored in a list. Then all of the stored data is plotted. The plot is made for every noon in any August from 1940 to 2023.

On the last day of that years August, data is stored into lists. To speed up the loop and not to load every image until the next years August, the seek_to_date() method is called. This seeks the next image corresponding to the given time stamp, which is 1. August the following year. That image will be loaded when load_next_image() is called on top of the for loop.

In [None]:
for image, time_stamp in streamer.load_next_image():
    if time_stamp.month != 8 or time_stamp.hour != 12:
        continue
            
    # isolate German data
    image = image[140:171, 24:60]
    
    # save the average temperature in degree Celsius 
    august.append(np.average(image)-273.15)

    # Clear previous plot and plot again with new data point
    clear_output(wait=True)
    x_data = np.array([*years, time_stamp.year])
    y_data = np.array([*augusts, sum(august)/len(august)])

    fig = go.Figure(data=go.Scatter(x=x_data, y=y_data))
    pio.show(fig)

    if time_stamp.day == 31:
        augusts.append(sum(august)/len(august))
        august.clear()
        years.append(time_stamp.year)

        streamer.seek_to_date(datetime.strptime(f"{time_stamp.year+1}-08-01T12:00", FORMAT))
