# Tutorial 3.2 Preprocessing the Sentinel-3 OLCI Level 2 Full Resolution (View-only)

This notebook is **Step 2** for the *Predicting Chla from Sentinel-3 OLCI at Chesapeake Bay Area Tutorial Series*.

> ⚠️ **Note (Must Read):** Please consider not to run this script during the workshop, data downloading is **not recommended** during the workshop. OLCI products are very large. You are welcome to do your own data downloading exercise at home with enough data storage. You will have the opportunity to use **pre-extracted image patches** that are already matched with CBP in-situ data and ready for model training in the Tutorial 3.4.

### 3.2.1 Introduction

In this section, we download Sentinel-3 OLCI L2 water color data to match the in-situ Chla measurements from CBP.

We use the [WEkEO Sentinel-3 Ocean Color tutorial](https://github.com/wekeo/learn-olci/tree/main/2_OLCI_advanced) as a reference for downloading, exploring, and processing OLCI data.

To access Sentinel-3 data via API, we use the [`eumdac`](https://pypi.org/project/eumdac/) Python client, which requires a valid EUMETSAT account and credentials.

For instructions on setting up the API and retrieving credentials, also refer to this official tutorial:

📎 [OLCI Data Access — WEkEO GitHub Tutorial](https://github.com/wekeo/learn-olci/blob/f4b96389bd69dd5fcf08eaebe07ffd91df92e50a/1_OLCI_introductory/1_1a_OLCI_data_access_Data_Store.ipynb)


This script performs:
- Setup of the EUMDAC API
- Uses the in-situ dates from `averaged_layer_S.csv`
- Searches for OLCI L2 products within ±2 days of each in-situ sampling date
- Downloads and unzips the products into a local `products/` folder

🛰️ The dataset used: **EO:EUM:DAT:0407 (Sentinel-3 OLCI L2 Water Colour Full Resolution)**


### 3.2.2 Setup & Load CBP Dates

In [None]:
import os
import shutil
import zipfile
import datetime
from pathlib import Path
from shapely import geometry
import pandas as pd
import eumdac

In [None]:
# # === Load unique CBP sampling dates ===
# cbp_df = pd.read_csv("CleanedData/averaged_layer_S.csv", parse_dates=["SampleDate"])
# unique_dates = sorted(cbp_df["SampleDate"].dt.date.unique())
# print(f"🗓️  Loaded {len(unique_dates)} unique CBP sample dates.")


### 3.2.3 Setup EUMDAC Token

In [None]:
# # === Setup EUMDAC authentication ===
# credentials_file = os.path.join(os.path.expanduser("~"), '.eumdac', 'credentials')
# creds = Path(credentials_file).read_text().split(',')
# token = eumdac.AccessToken((creds[0], creds[1]))
# store = eumdac.DataStore(token)

# # === Load Sentinel-3 OLCI L2 water colour product collection ===
# collection = store.get_collection("EO:EUM:DAT:0407")


### 3.2.4 Define ROI and Output Folder

In [None]:
# # === Define Chesapeake Bay ROI ===
# north, south = 39.49, 36.92
# east, west = -75.95, -76.52
# ROI_WKT = geometry.Polygon([
#     (west, south), (east, south), (east, north), (west, north), (west, south)
# ])

# # === Create download folder ===
# download_dir = "products"
# os.makedirs(download_dir, exist_ok=True)


### 3.2.5 Downloading ±2 days Products

In [None]:
# # === Check if product already downloaded for a given date ===
# def is_downloaded(date):
#     return any(date.strftime("%Y%m%d") in fname for fname in os.listdir(download_dir))

# # === Search products within ±2 days of the date ===
# def find_products_around(date):
#     for delta in range(0, 3):
#         for offset in [-delta, delta]:
#             try_date = date + datetime.timedelta(days=offset)
#             dtstart = datetime.datetime(try_date.year, try_date.month, try_date.day, 0, 0)
#             dtend = datetime.datetime(try_date.year, try_date.month, try_date.day, 23, 59)
#             products = collection.search(
#                 geo=ROI_WKT,
#                 dtstart=dtstart,
#                 dtend=dtend,
#                 timeliness="NT",
#                 sat="Sentinel-3A"
#             )
#             dedup = {}
#             for p in products:
#                 tag = str(p).split("_")[4]
#                 if tag not in dedup:
#                     dedup[tag] = p
#             if dedup:
#                 return list(dedup.values()), try_date
#     return [], None

# # === Download Sentinel-3 OLCI products for each date ===
# for date in unique_dates:
#     if is_downloaded(date):
#         print(f"✅ Already downloaded: {date}")
#         continue

#     print(f"\n📅 Looking for products around {date}")
#     products, actual_date = find_products_around(date)

#     if not products:
#         print(f"❌ No valid product found ±2 days around {date}")
#         continue

#     print(f"🎯 Found {len(products)} granules from {actual_date}")
#     for p in products:
#         prod_id = p._id
#         out_path = os.path.join(download_dir, prod_id)
#         os.makedirs(out_path, exist_ok=True)

#         zip_file = os.path.join(out_path, f"{prod_id}.zip")
#         if os.path.exists(zip_file.replace(".zip", "")):
#             print(f"   ⏭️ Already exists: {prod_id}")
#             continue

#         print(f"⬇️ Downloading {prod_id}.zip")
#         with p.open() as src, open(zip_file, "wb") as dst:
#             shutil.copyfileobj(src, dst)

#         print(f"📦 Unzipping {prod_id}")
#         with zipfile.ZipFile(zip_file, 'r') as zf:
#             zf.extractall(out_path)
#         os.remove(zip_file)

#         print(f"✅ Done: {prod_id}")
