<a href="https://colab.research.google.com/github/BahneTP/spatiotemporal-mining-medsea/blob/main/spatiotemporal_mining_medsea_acquisition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Acquisition

This notebook handles the download of the **[Global Ocean Physics Reanalysis](https://data.marine.copernicus.eu/product/GLOBAL_MULTIYEAR_PHY_001_030/download?dataset=cmems_mod_glo_phy_my_0.083deg_P1M-m_202311)** dataset from **Copernicus**.

For further work, see:
- [Exploratory Data Analysis](./eda.ipynb)  
- [Data Mining](./mining.ipynb)


In [3]:
# !pip install copernicusmarine
# !pip install zarr fsspec
# !pip install "xarray>=2024.1.0"

In [4]:
import sys
!{sys.executable} -m pip install "numpy<2.0"



In [5]:
import xarray as xr
import copernicusmarine
import os

parent = os.path.dirname(os.getcwd())
path = os.path.join(parent, "data")
os.makedirs(path, exist_ok=True)

In [4]:
# Monthly. Salinity, Temperature, more depths...

output_file = os.path.join(path, "medsea.nc")
ds = copernicusmarine.subset(
    dataset_id="cmems_mod_glo_phy_my_0.083deg_P1M-m",
    variables=["so", "thetao"],
    minimum_longitude=-6.285859234924248,
    maximum_longitude=36.52446704416333,
    minimum_latitude=29.252430574547926,
    maximum_latitude=46.2175134343721,
    start_datetime="1993-01-01T00:00:00",
    end_datetime="2021-06-01T00:00:00",
    minimum_depth=0.49402499198913574,
    maximum_depth=1062.43994140625,
    output_filename= output_file
)

INFO - 2025-06-09T13:03:06Z - Downloading Copernicus Marine data requires a Copernicus Marine username and password, sign up for free at: https://data.marine.copernicus.eu/register


Copernicus Marine username:

  bthielpeters


Copernicus Marine password:

  ········


INFO - 2025-06-09T13:03:19Z - Selected dataset version: "202311"
INFO - 2025-06-09T13:03:19Z - Selected dataset part: "default"
INFO - 2025-06-09T13:03:21Z - Starting download. Please wait...


  0%|          | 0/288 [00:00<?, ?it/s]

INFO - 2025-06-09T13:12:17Z - Successfully downloaded to /home/jovyan/spatiotemporal-mining-medsea/data/medsea_(1).nc


In [4]:
# Daiy, only temperature, 

output_file = os.path.join(path, "medsea_daily.nc")
ds = copernicusmarine.subset(
    dataset_id="cmems_mod_glo_phy_my_0.083deg_P1D-m",
    variables=["thetao"],
    minimum_longitude=-6.285859234924248,
    maximum_longitude=36.52446704416333,
    minimum_latitude=29.252430574547926,
    maximum_latitude=46.2175134343721,
    start_datetime="1993-01-01T00:00:00",
    end_datetime="2021-06-01T00:00:00",
    minimum_depth=47.37369155883789,
    maximum_depth=47.37369155883789,
    output_filename= output_file
)

INFO - 2025-06-13T10:35:55Z - Downloading Copernicus Marine data requires a Copernicus Marine username and password, sign up for free at: https://data.marine.copernicus.eu/register


Copernicus Marine username:

  bthielpeters


Copernicus Marine password:

  ········


INFO - 2025-06-13T10:36:03Z - Selected dataset version: "202311"
INFO - 2025-06-13T10:36:03Z - Selected dataset part: "default"
INFO - 2025-06-13T10:36:05Z - Starting download. Please wait...


  0%|          | 0/4290 [00:00<?, ?it/s]

INFO - 2025-06-13T10:50:58Z - Successfully downloaded to /home/jovyan/spatiotemporal-mining-medsea/data/medsea_daily.nc


In [None]:
import xarray as xr
import os

depths = [
    47.37369155883789,
    318.1274108886719,
    1062.43994140625
]

for depth in depths:
    output_file = os.path.join(path, f"medsea_daily_depth_{int(depth)}.nc")
    ds = copernicusmarine.subset(
        dataset_id="cmems_mod_glo_phy_my_0.083deg_P1D-m",
        variables=["thetao"],
        minimum_longitude=-6.285859234924248,
        maximum_longitude=36.52446704416333,
        minimum_latitude=29.252430574547926,
        maximum_latitude=46.2175134343721,
        start_datetime="1993-01-01T00:00:00",
        end_datetime="2021-06-01T00:00:00",
        minimum_depth=depth,
        maximum_depth=depth,
        output_filename=output_file
    )


# Dateien laden und manuell "depth" setzen
datasets = []
for depth in depths:
    file_path = os.path.join(path, f"medsea_daily_depth_{int(depth)}.nc")
    ds = xr.open_dataset(file_path)
    
    # Setze Tiefe explizit, falls nicht eindeutig in Datei
    ds = ds.assign_coords(depth=depth)
    
    # Optional: nur Variable 'thetao' behalten
    ds = ds[["thetao"]]
    
    datasets.append(ds)

# Zu einem Dataset zusammenführen (über depth stacken)
combined = xr.concat(datasets, dim="depth")

# Falls nötig: sortiere depth (nicht zwingend nötig)
combined = combined.sortby("depth")

# Speichern
output_combined = os.path.join(path, "medsea_combined_daily.nc")
combined.to_netcdf(output_combined)
print(f"Zusammengeführt gespeichert unter: {output_combined}")


INFO - 2025-06-17T19:47:49Z - Downloading Copernicus Marine data requires a Copernicus Marine username and password, sign up for free at: https://data.marine.copernicus.eu/register


Copernicus Marine username:

  bthielpeters


Copernicus Marine password:

  ········


INFO - 2025-06-17T19:47:58Z - Selected dataset version: "202311"
INFO - 2025-06-17T19:47:58Z - Selected dataset part: "default"
INFO - 2025-06-17T19:48:00Z - Starting download. Please wait...


  0%|          | 0/4290 [00:00<?, ?it/s]

INFO - 2025-06-17T19:58:03Z - Successfully downloaded to /home/jovyan/spatiotemporal-mining-medsea/data/medsea_daily_depth_47.nc
INFO - 2025-06-17T19:58:03Z - Downloading Copernicus Marine data requires a Copernicus Marine username and password, sign up for free at: https://data.marine.copernicus.eu/register


Copernicus Marine username:

  bthielpeters


Copernicus Marine password:

  ········


INFO - 2025-06-17T20:01:08Z - Selected dataset version: "202311"
INFO - 2025-06-17T20:01:08Z - Selected dataset part: "default"
INFO - 2025-06-17T20:01:09Z - Starting download. Please wait...


  0%|          | 0/4290 [00:00<?, ?it/s]