# Satellite data: metadata & MinIO downloads

This notebook demonstrates two ways of working with satellite data:

1. **From database via API**  
   - Query satellite metadata (location, sensing_start, sensing_end).  
   - Use the `location` paths to download GeoTIFFs from MinIO.

2. **Directly from MinIO using generated paths**  
   - Build the MinIO prefix from **region + datetime**.  
   - List objects under that prefix, optionally filter by a **quality flag** (e.g. `-NA-`),  
     and download the selected GeoTIFFs.


In [18]:
# 1. Imports and configuration

from datetime import datetime, timedelta, timezone
from pathlib import Path
import os
import sys

import pandas as pd
from dotenv import load_dotenv

# 1.1 Load ../config/.env 
load_dotenv(dotenv_path="../config/.env")

# 1.2 Make project root importable (so we can import config + utils)
PROJECT_ROOT = Path("..").resolve()
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

from config.loader import get_settings
from utils.radar_sat_client import SatelliteClient

# 1.3 Load Pydantic settings (includes MinIO and satellite file paths)
settings = get_settings()

API_BASE = os.getenv("HR_LOCAL_DEV_BASE", "http://localhost:8030")
API_TOKEN = os.getenv("HR_LOCAL_DEV_TOKEN", "")

# 1.4 Create SatelliteClient using the dev API base + RBAC token
sat_client = SatelliteClient(
    api_base_url=API_BASE,
    api_token=API_TOKEN,
)

print("=== API / MinIO configuration ===")
print("API base:       ", API_BASE)
print("Auth token set: ", bool(API_TOKEN))
print("MinIO endpoint: ", settings.sat.client.endpoint)
print("MinIO bucket:   ", settings.sat.file_paths.raw.bucket_name)
print("Target prefix:  ", settings.sat.file_paths.raw.target)


=== API / MinIO configuration ===
API base:        http://localhost:8030
Auth token set:  True
MinIO endpoint:  minio-api.bo-i-t.selfhost.eu
MinIO bucket:    heavyrain
Target prefix:   satellite


## 1. Metadata from API (DB → locations)

In this section we:

1. Query the `/satellite` endpoint of the heavyrain data API.  
2. Restrict to a **time window** and **region** (`NRW` or `BOO`).  
3. Convert the result into a pandas DataFrame for inspection.


In [19]:
# 2. Query satellite metadata from the API

now_utc = datetime.now(timezone.utc)

# Example: last 7 days
from_ts = now_utc - timedelta(hours=168)
to_ts = now_utc

rows = sat_client.list_satellite_metadata(
    from_ts=from_ts,
    to_ts=to_ts,
    region="NRW",       # use "BOO" if you want Bochum region
    limit=100,
)

print("Number of metadata rows returned:", len(rows))

# Convert to DataFrame for easier handling
df_meta = pd.DataFrame(
    {
        "location": [r.location for r in rows],
        "sensing_start": [r.sensing_start for r in rows],
        "sensing_end": [r.sensing_end for r in rows],
    }
)

df_meta.head()


Number of metadata rows returned: 100


Unnamed: 0,location,sensing_start,sensing_end
0,satellite/NRW/2025/December/05/MSG4-SEVI-MSG15...,2025-12-05 10:54:17,2025-12-05 10:59:17
1,satellite/NRW/2025/December/05/MSG4-SEVI-MSG15...,2025-12-05 10:49:17,2025-12-05 10:54:17
2,satellite/NRW/2025/December/05/MSG4-SEVI-MSG15...,2025-12-05 10:49:17,2025-12-05 10:54:17
3,satellite/NRW/2025/December/05/MSG4-SEVI-MSG15...,2025-12-05 10:44:18,2025-12-05 10:49:18
4,satellite/NRW/2025/December/05/MSG4-SEVI-MSG15...,2025-12-05 10:44:18,2025-12-05 10:49:18


## 2. Download via locations from DB (API → MinIO)

Now we use the `location` column from the metadata:

1. Select a subset of locations (e.g. first 5 rows).  
2. Use `SatelliteClient.download_objects(...)` to fetch the corresponding GeoTIFF
   files from MinIO into a local *downloads* directory.


In [20]:
# 3. Download GeoTIFFs for the selected locations

# Choose which locations to download (example: first 5 rows)
locations_to_download = df_meta["location"].head(5).tolist()

download_dir_cell1 = Path("../notebooks/downloads_cell1_satellite")
download_dir_cell1.mkdir(parents=True, exist_ok=True)

downloaded_files_cell1 = sat_client.download_objects(
    locations=locations_to_download,
    destination_dir=download_dir_cell1,
)

print("Downloaded files (cell #1):")
for p in downloaded_files_cell1:
    print(" -", p)


Downloaded files (cell #1):
 - ..\notebooks\downloads_cell1_satellite\MSG4-SEVI-MSG15-0100-NA-20251205105917.252000000Z-NA.tif
 - ..\notebooks\downloads_cell1_satellite\MSG4-SEVI-MSG15-0100-NA-20251205105417.516000000Z-NA.tif
 - ..\notebooks\downloads_cell1_satellite\MSG4-SEVI-MSG15-0100-NA-20251205105417.516000000Z-NA.tif
 - ..\notebooks\downloads_cell1_satellite\MSG4-SEVI-MSG15-0100-NA-20251205104918.986000000Z-NA.tif
 - ..\notebooks\downloads_cell1_satellite\MSG4-SEVI-MSG15-0100-NA-20251205104918.986000000Z-NA.tif


## 3. Download directly from MinIO via generated path

In this section we **skip the API** and work directly against MinIO:

1. Build a prefix of the form  
   `satellite/<REGION>/<YYYY>/<MonthName>/<dd>/`  
   from a given **region** and **datetime**.
2. List all objects stored under that prefix.
3. Optionally filter by a **quality flag** in the filename.
4. Download the filtered objects into a second downloads folder.


In [21]:
from datetime import timezone

# 4. Choose region + example date
example_region = "NRW"                     # or "BOO"
example_ts = datetime(2025, 12, 1, 0, 0, 0, tzinfo=timezone.utc)

# 4.1 Build the MinIO prefix:
#     satellite/<REGION>/<YYYY>/<MonthName>/<dd>/
prefix = sat_client.build_prefix_for_datetime(
    region=example_region,
    ts=example_ts,
)
print("Generated MinIO prefix:", prefix)

# 4.2 List all objects under that prefix
objects = sat_client.list_objects_for_datetime(
    region=example_region,
    ts=example_ts,
)
print(f"Found {len(objects)} objects under prefix '{prefix}'")

# 4.3 Optional: filter by quality flag in filename 
quality_flag = "NA"
objects_filtered = [
    obj for obj in objects
    if f"-{quality_flag}-" in obj.object_name
]

print(f"Objects after quality-flag filter ('{quality_flag}'): {len(objects_filtered)}")
for obj in objects_filtered[:5]:
    print(" -", obj.object_name)


Generated MinIO prefix: satellite/NRW/2025/December/01/
Found 15 objects under prefix 'satellite/NRW/2025/December/01/'
Objects after quality-flag filter ('NA'): 15
 - satellite/NRW/2025/December/01/MSG4-SEVI-MSG15-0100-NA-20251201111913.970000000Z-NA.tif
 - satellite/NRW/2025/December/01/MSG4-SEVI-MSG15-0100-NA-20251201112415.523000000Z-NA.tif
 - satellite/NRW/2025/December/01/MSG4-SEVI-MSG15-0100-NA-20251201112917.077000000Z-NA.tif
 - satellite/NRW/2025/December/01/MSG4-SEVI-MSG15-0100-NA-20251201113418.630000000Z-NA.tif
 - satellite/NRW/2025/December/01/MSG4-SEVI-MSG15-0100-NA-20251201113918.374000000Z-NA.tif


In [22]:
# 4.4 Download the filtered GeoTIFFs from MinIO

download_dir_cell2 = Path("../notebooks/downloads_cell2_satellite")
download_dir_cell2.mkdir(parents=True, exist_ok=True)

downloaded_files_cell2 = []

for obj in objects_filtered:
    filename = Path(obj.object_name).name
    dest_path = download_dir_cell2 / filename

    # Re-use the MinIO client & bucket from SatelliteClient
    sat_client._minio.fget_object(
        bucket_name=sat_client._bucket,
        object_name=obj.object_name,
        file_path=str(dest_path),
    )
    downloaded_files_cell2.append(dest_path)

print("Downloaded files (cell #2):")
for p in downloaded_files_cell2[:5]:
    print(" -", p)

print(f"Total files downloaded in cell #2: {len(downloaded_files_cell2)}")


Downloaded files (cell #2):
 - ..\notebooks\downloads_cell2_satellite\MSG4-SEVI-MSG15-0100-NA-20251201111913.970000000Z-NA.tif
 - ..\notebooks\downloads_cell2_satellite\MSG4-SEVI-MSG15-0100-NA-20251201112415.523000000Z-NA.tif
 - ..\notebooks\downloads_cell2_satellite\MSG4-SEVI-MSG15-0100-NA-20251201112917.077000000Z-NA.tif
 - ..\notebooks\downloads_cell2_satellite\MSG4-SEVI-MSG15-0100-NA-20251201113418.630000000Z-NA.tif
 - ..\notebooks\downloads_cell2_satellite\MSG4-SEVI-MSG15-0100-NA-20251201113918.374000000Z-NA.tif
Total files downloaded in cell #2: 15


## 4. Summary

In this notebook we:

1. **Used the heavyrain data API** to read satellite metadata from the database  
   and downloaded GeoTIFFs from MinIO using the `location` paths (cell #1a / #1b).

2. **Accessed MinIO directly** by generating the satellite folder structure from  
   `region + datetime`, listed available objects, filtered by a quality flag,  
   and downloaded the resulting GeoTIFFs (cell #2).

These two approaches cover both,

- Working “database → MinIO” via the API.  
- Working “MinIO only” by reconstructing the object keys from date/region/quality.
