## Script for automated Download of Sentinel-3 LST Data


### Create Environment and select Kernel

Before you run the script, you need to create the environment defined in the file sentinel_env.yml. Therefore, use Anaconda Promt following these steps:
- Open Anaconda Promt
- Write: "conda env create -n sentinel_env -f 'insert path to environment file ending with sentinel_env.yml'"
- Press Enter
- In the next line, write: "activate sentinel_env"
- Press Enter
- If the next line starts with (sentinel_env), you have been successful
- To create the kernel, write: „python -m ipykernel install --user --name sentinel_env --display-name sentinel_env_kernel“
- If you are having troubles, you can find help here: https://docs.conda.io/projects/conda/en/4.6.0/user-guide/troubleshooting.html
- ...and useful commands here: https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf
- In Jupiter Notebook, click kernel in the menu bar, select change kernel, choose sentinel_env_kernel
- If it says sentinle_env_kernel in the upper right corner, you are ready to run the script!

Please note, that the environment works for Windows only.

In [1]:
# Import necessary libraries
import os
import re
import json
import requests
import pandas as pd
import geopandas as gpd

from shapely import wkt, Polygon
from sentinelsat import read_geojson
from datetime import date, datetime, timedelta
from requests.exceptions import ChunkedEncodingError

### Defining important Variables

The user needs to specify the following variables:
- username: The username for the Copernicus Open Dataspace login.
- password: The password for the Copernicus Open Dataspace login.
- kenya_aoi: The corner coordinates of the bounding box (Polygon - Extend Area) of the aoi.
- collection_product: The name of the Sentinel collection product (for LST: "SL_2_LST", for SYN: "SY_2_SYN").
- start_date: The start date of the desired aquisition period.
- end_date: The end date of the desired aquisition period.
- start_time: The start time for the LST Daytime aquisitions.
- end_time: The end time for the LST Daytime aquisitions.
- output_dir: The path to the directory to store the downloaded Sentinel zip-files.

In [2]:
# Copernicus Data Space credentials
#username = "example@email.de"
#password = "Password"

username = "kiwi@rssgmbh.de"
password = "1WNHO8D8GeSNNtoflQZX!"

In [3]:
# kenya_aoi = "POLYGON ((33.9095878601073650 -4.7204170227049644, 41.8875236511230540 -4.7204170227049644, 41.8875236511230540 4.6338191032410290, 33.9095878601073650 4.6338191032410290, 33.9095878601073650 -4.7204170227049644))"

# Prompt the user for input to confirm the download
aoi_filename_input = str(input("Please enter the name of the AOI vector file located in 'AOI_Files' folder:"))

# Define the folder path
folder_path = r"./AOI_Files/"

# Find the matching file in the folder
matching_files = [f for f in os.listdir(folder_path) if aoi_filename_input in f]

if matching_files:
    # Use the first matched file as the AOI filepath
    aoi_filepath = os.path.join(folder_path, matching_files[0])
    
    # Load it as a GeoDataFrame
    aoi_gdf = gpd.read_file(aoi_filepath)
    
    # Check if the GeoDataFrame is not empty and has a geometry column
    if not aoi_gdf.empty and 'geometry' in aoi_gdf.columns:
        # Calculate the total bounds of all geometries in the GeoDataFrame
        minx, miny, maxx, maxy = aoi_gdf.total_bounds
        
        # Create a Polygon from the total bounds
        bounds_polygon = Polygon([(minx, miny), (minx, maxy), (maxx, maxy), (maxx, miny), (minx, miny)])
        
        print("Bounds Polygon:", bounds_polygon)  # Print the Polygon object
        
        # Convert the bounds Polygon to WKT format
        kenya_aoi = wkt.dumps(bounds_polygon)
        print("WKT of Bounds Polygon:", kenya_aoi)  # Print the WKT string
    else:
        print("The GeoDataFrame is empty or missing a geometry column.")
else:
    print("No matching files found in the 'AOI_Files' folder.")

Bounds Polygon: POLYGON ((33.909587860107365 -4.720417022704964, 33.909587860107365 4.633819103241024, 41.88752365112305 4.633819103241024, 41.88752365112305 -4.720417022704964, 33.909587860107365 -4.720417022704964))
WKT of Bounds Polygon: POLYGON ((33.9095878601073650 -4.7204170227049644, 33.9095878601073650 4.6338191032410236, 41.8875236511230469 4.6338191032410236, 41.8875236511230469 -4.7204170227049644, 33.9095878601073650 -4.7204170227049644))


In [4]:
# Prompt the user to select a product type
product_input = input("Please enter the product type (LST or SYN): ").strip().upper()

if product_input == "LST":
    collection_product = "SL_2_LST"
    output_dir = r'./Download/S3_LST'
elif product_input == "SYN":
    collection_product = "SY_2_SYN"
    output_dir = r'./Download/S3_SYN'
else:
    print("Invalid product type. Please enter either 'LST' or 'SYN'.")
    exit(1)

# Prompt the user for the start and end dates
start_date_input = input("Please enter the start date (YYYY MM DD): ").strip()
end_date_input = input("Please enter the end date (YYYY MM DD): ").strip()

try:
    # Parse the start date
    start_date_parts = [int(part) for part in start_date_input.split()]
    start_date = date(start_date_parts[0], start_date_parts[1], start_date_parts[2])
    
    # Parse the end date
    end_date_parts = [int(part) for part in end_date_input.split()]
    end_date = date(end_date_parts[0], end_date_parts[1], end_date_parts[2])
except ValueError as e:
    print(f"Invalid date format: {e}")
    exit(1)

# Define the time range
start_time = "06:00:00"
end_time = "15:00:00"

# Define max records / search
maxRecords = 1000 # number of records to search (maximum = 1000)

# Print the results to verify
print(f"Product Type: {collection_product}")
print(f"Start Date: {start_date}")
print(f"End Date: {end_date}")
print(f"Time Range: {start_time} to {end_time}")
print(f"Max Records: {maxRecords}")
print(f"Output Directory: {output_dir}")

Product Type: SL_2_LST
Start Date: 2020-06-01
End Date: 2020-07-01
Time Range: 06:00:00 to 15:00:00
Max Records: 1000
Output Directory: ./Download/S3_LST


### Creating Copernicus Open Dataspace Access Token

In [5]:
# Get access token
def get_access_token(username: str, password: str) -> str:
    data = {
        "client_id": "cdse-public",
        "username": username,
        "password": password,
        "grant_type": "password",
    }
    try:
        r = requests.post(
            "https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token",
            data = data,
        )
        r.raise_for_status()
    except Exception as e:
        raise Exception(
            f"Access token creation failed. Response from server was: {r.json()}"
        )
    return r.json()["access_token"]

access_token = get_access_token(username, password)


### Searching the Catalogue with OData

Please note: If the number of products is 1000, the maximum number has been reached. The actual number might be larger, but it stops counting at 1000. Go back to the defining variables section and decrease the search period by adapting start and end date. Run another query, until the number is below 1000. You can download the missing data in a new run afterwards.

In [6]:
# Write the query
json = requests.get(
    "https://catalogue.dataspace.copernicus.eu/odata/v1/Products?$filter=contains(Name,"
    f"'{collection_product}')%20and%20OData.CSC.Intersects(area=geography'SRID=4326;"
    f"{kenya_aoi}')%20and%20ContentDate/Start%20gt%20"
    f"{start_date}T00:00:00.000Z%20and%20ContentDate/End%20lt%20"
    f"{end_date}T00:00:00.000Z&$top={maxRecords}"
).json()
len(json['value'])

213

In [7]:
# Convert dictionary to dataframe
products_df = pd.DataFrame.from_dict(json['value'])

# Print the first five products to see if the query worked
products_df.head(5)

Unnamed: 0,@odata.mediaContentType,Id,Name,ContentType,ContentLength,OriginDate,PublicationDate,ModificationDate,Online,EvictionDate,S3Path,Checksum,ContentDate,Footprint,GeoFootprint
0,application/octet-stream,a88cd7b7-2358-59f5-9905-cd59e370f328,S3A_SL_2_LST____20200601T185613_20200601T18581...,application/octet-stream,0,2020-06-02T23:59:09.294Z,2020-06-03T00:05:11.135Z,2020-06-03T00:05:11.135Z,True,,/eodata/Sentinel-3/SLSTR/SL_2_LST/2020/06/01/S...,[],"{'Start': '2020-06-01T18:56:12.610Z', 'End': '...","geography'SRID=4326;POLYGON ((54.2666 1.93894,...","{'type': 'Polygon', 'coordinates': [[[54.2666,..."
1,application/octet-stream,ce761011-8307-5f23-9e8a-49cf805bb361,S3B_SL_2_LST____20200601T195741_20200601T19594...,application/octet-stream,0,2020-06-03T01:27:18.209Z,2020-06-03T01:31:36.381Z,2020-06-03T01:31:36.381Z,True,,/eodata/Sentinel-3/SLSTR/SL_2_LST/2020/06/01/S...,[],"{'Start': '2020-06-01T19:57:41.081Z', 'End': '...","geography'SRID=4326;POLYGON ((38.8378 1.94599,...","{'type': 'Polygon', 'coordinates': [[[38.8378,..."
2,application/octet-stream,927b707a-ac22-5d25-805a-34710b040251,S3B_SL_2_LST____20200601T195940_20200601T20024...,application/octet-stream,0,2020-06-03T02:46:23.765Z,2020-06-03T02:53:29.631Z,2020-06-03T02:53:29.631Z,True,,/eodata/Sentinel-3/SLSTR/SL_2_LST/2020/06/01/S...,[],"{'Start': '2020-06-01T19:59:40.352Z', 'End': '...","geography'SRID=4326;POLYGON ((36.6641 12.4824,...","{'type': 'Polygon', 'coordinates': [[[36.6641,..."
3,application/octet-stream,908b4bed-96ad-5918-ae8b-eea7bc9dbee6,S3B_SL_2_LST____20200602T065636_20200602T06593...,application/octet-stream,0,2020-06-03T12:59:33.383Z,2020-06-03T13:04:21.939Z,2020-06-03T13:04:21.939Z,True,,/eodata/Sentinel-3/SLSTR/SL_2_LST/2020/06/02/S...,[],"{'Start': '2020-06-02T06:56:35.688Z', 'End': '...",geography'SRID=4326;POLYGON ((34.2621 -10.5552...,"{'type': 'Polygon', 'coordinates': [[[34.2621,..."
4,application/octet-stream,38460b9c-913c-5d89-9ee9-30b58915eb9a,S3B_SL_2_LST____20200602T065336_20200602T06563...,application/octet-stream,0,2020-06-03T13:02:33.450Z,2020-06-03T13:02:26.269Z,2020-06-03T13:02:26.269Z,True,,/eodata/Sentinel-3/SLSTR/SL_2_LST/2020/06/02/S...,[],"{'Start': '2020-06-02T06:53:35.688Z', 'End': '...",geography'SRID=4326;POLYGON ((36.7954 -0.06773...,"{'type': 'Polygon', 'coordinates': [[[36.7954,..."


### Filter Dictionary

#### Filter Records by Daytime

The Sentinel-3 SLSTR Level-2 LST product provides a measure of how hot or cold the 'surface' of the Earth would feel to the touch. Usually, it is recorded during daytime and nighttime. We are only interested in the daytime acquisitions, which is why we need to specify a time range to filter the acquisitions. In case of Kenya, that time range is set to 6 am to 3 pm, since Kenya is roughly two hours ahead of UTC.

In [8]:
# Convert the strings for the start and end time to time objects
start_time = datetime.strptime(start_time, "%H:%M:%S").time()
end_time = datetime.strptime(end_time, "%H:%M:%S").time()

products_daytime = {}
products_list = json.get('value', {})

for product in products_list:
    contentDate = product['ContentDate']
    product_id = product['Id']
    
    # Extract the start and end time of the products and convert them to time objects
    start_datetime = datetime.strptime(contentDate.get('Start'), "%Y-%m-%dT%H:%M:%S.%fZ")
    end_datetime = datetime.strptime(contentDate.get('End'), "%Y-%m-%dT%H:%M:%S.%fZ")
    start_datetime = datetime.strptime(start_datetime.strftime('%H:%M:%S'), '%H:%M:%S').time()    
    end_datetime = datetime.strptime(end_datetime.strftime('%H:%M:%S'), '%H:%M:%S').time()
    
    # Filter out products with acquisition times outside the set range
    if start_time <= start_datetime <= end_time or start_time <= end_datetime <= end_time:
        products_daytime[product_id] = product
  
len(products_daytime)

105

#### Filter for Non-Timecritical Frames

The instrument data from Sentinel-3 SLSTR can be disseminated in 'stripes', 'frames', or 'tiles'. Since we are only interested in frames, we need to filter the data for products of a certain naming pattern accoring to the naming conventions.

In [9]:
# Filter for a specific structure of the instance id (part of the name)
from re import Match

products_nonTimeCritical = {}

pattern = r"\d{4}_\d{3}_\d{3}_\d{4}"
product_id = pd.DataFrame.from_dict(products_daytime).loc['Id',].to_list()  # type: ignore
# print(product_id)

for product in product_id:
    product_id = product
    product = products_daytime[product]
    name = product['Name']
    
    match = re.search(pattern, name)
    
    if match:
        products_nonTimeCritical[product_id] = product
        
len(products_nonTimeCritical)

105

In [10]:
# Print the number of total products, filtered products by daytime, and non-time-critical frames
print("Total Number of Products: ", len(products_df))
print("Number of Daytime Products: ", len(products_daytime))
print("Number of Non-Timecritical Products: ", len(products_nonTimeCritical))

Total Number of Products:  213
Number of Daytime Products:  105
Number of Non-Timecritical Products:  105


### Download Data

In [15]:
# Get the list of existing files in the output directory
existing_files = os.listdir(output_dir)
print(existing_files)

# Iterate over the non-timecritical frames
download_frames = {}
product_id = pd.DataFrame.from_dict(products_nonTimeCritical).loc['Id',].to_list()

for product in product_id:
    product_id = product
    product_file = products_nonTimeCritical[product]
    # Extract the titel of the product
    title = products_nonTimeCritical[product]['Name']
    
    # Append the .zip extension to the title
    zip_title = title + ".zip"
    
    # Append the .SEN3 extension to the title
    sen_title = title + ".SEN3"
    
    # Check if the zip or sen file with the same title already exists
    if zip_title not in existing_files and sen_title not in existing_files:
        download_frames[product_id] = product_file

# Print the number of non-time-critical frames that need to be downloaded
print('Non-Timecritical Frames to download:', len(download_frames))

[]
Non-Timecritical Frames to download: 105


In [16]:
from unittest import skip

# Promt the user for input to confirm the download
user_input = input("Do you want to download the data? (yes/ no):")

if user_input.lower() == "yes":

    # Extract Id of selected products for download
    product_id = pd.DataFrame.from_dict(download_frames).loc['Id',].to_list()
    access_token = get_access_token(username, password)
    headers = {"Authorization": f"Bearer {access_token}"}
    session = requests.Session()

    for product in product_id:
        title = download_frames[product]['Name']
        file_name = title.replace('SEN3', 'zip')
        url = f"https://zipper.dataspace.copernicus.eu/odata/v1/Products({product})/$value"

        response = session.get(url, headers=headers, stream=True)

        # Check if the access tokem is still valid
        while response.status_code == 401:

            # Token expired, generate new one
            access_token = get_access_token(username, password)
            headers = {"Authorization": f"Bearer {access_token}"}
            session.headers.update(headers)
            response = session.get(url, headers=headers, stream=True)

        max_attempts = 5
        attempts = 0

        while attempts < max_attempts:
            try:
                print(f"\n(Attempt {attempts + 1}/{max_attempts}) Downloading '{file_name}' ...")
                # Your download code here
                if response.status_code == 200:
                    with open(os.path.join(output_dir, file_name), "wb") as file:
                        for chunk in response.iter_content(chunk_size=8192):
                            file.write(chunk)
                    print(f"Successfully downloaded '{file_name}' to '{output_dir}'.")
                    break  # Break out of the loop if download succeeds
                else:
                    print(f"Error: Received status code {response.status_code} when attempting to download '{file_name}'.")
            except ChunkedEncodingError as e:
                print(f"Warning: Download interrupted due '{e}'. Retrying...")
                attempts += 1

        if attempts == max_attempts:
            print(f"Failed to download '{file_name}' after {max_attempts} attempts.")
            continue

        # # If the request was successful, proceed with downloading
        # if response.status_code == 200:
        #     with open(os.path.join(output_dir, file_name), "wb") as file:
        #         for chunk in response.iter_content(chunk_size=5192):
        #             if chunk:
        #                 file.write(chunk)
        # else:
        #     print(f"Error downloading product {product}, status code: {response.status_code}")

elif user_input.lower() == "no":
    print("Download cancelled.")
    
else:
    print("Invalid input. Download cancelled.")



(Attempt 1/5) Downloading 'S3B_SL_2_LST____20200602T065636_20200602T065936_20200603T123253_0179_039_291_3060_LN2_O_NT_004.zip' ...
Successfully downloaded 'S3B_SL_2_LST____20200602T065636_20200602T065936_20200603T123253_0179_039_291_3060_LN2_O_NT_004.zip' to './Download/S3_LST'.

(Attempt 1/5) Downloading 'S3B_SL_2_LST____20200602T065336_20200602T065636_20200603T123245_0179_039_291_2880_LN2_O_NT_004.zip' ...


KeyboardInterrupt: 