# Step 2: Pre-processing - Calculating yearly Mean 📅

The goal is to calculate the **yearly mean** of the variables from the **Baltic Sea Wave Hindcast** dataset, which is available only in **hourly resolution**. Each day has a separate file, and each file is around **200MB** in size. If I wanted to download the entire dataset and calculate the mean, I would need **657GB** of storage! Here's the math: (*365 * 200 * (2024-2015) = 657,000 MB = 657 GB*)... 

Since this isn't feasible, I will use a technique often employed in survey sampling. Instead of downloading the entire dataset, I will download a **random sample** of the data and calculate the mean based on this sample. To make it more manageable, I’ll assume I download **10%** of the dataset, which means downloading **328 files** (*365 * 9 * 0.10 = 328*).

#### 📚 Required Libraries
To carry out this process, you'll need the following libraries:
- **`pandas`**: For reading, manipulating, and analyzing the data. 
- **`random`**: For generating random samples from the data. 
- **`requests`**: For interacting with the API and downloading files. 
- **`netCDF4`**: For working with NetCDF files, the format used by this dataset.

### 🛠️ Steps:
1. **Generate a Random List of Dates**: Create a list of random dates to represent 10% of the total dataset.
2. **Download the Files**: Using the list of random dates, download the corresponding files from the API. 
3. **Apply the Spatial Mask**: Use a spatial filter to restrict the data to the region of interest (if necessary).
4. **Calculate the Mean**: After downloading the files, calculate the yearly mean for the variables of interest.

This method significantly reduces the amount of data needed while still providing a statistically valid approximation.

______

### 🛠️ Creating a Random List of Dates 

Let’s start by generating a random list of **328 dates**. This list will determine which files to download and will be the foundation for calculating the mean based on a representative sample of the data.

In [15]:
import pandas as pd
import random

In [4]:
# Generate the full list of dates from January 1, 2015 to December 31, 2023
start_date = "2015-01-01"
end_date = "2023-12-31"

# Generate the date range
date_range = pd.date_range(start=start_date, end=end_date, freq='D')

# Randomly sample 328 dates from this range
sampled_dates = random.sample(list(date_range), 328)

sampled_dates.sort()

We can now save our random dates in a csv file. Once I saved it, I make sure not to overwrite the file.

In [14]:
# pd.Series(sampled_dates).to_csv('sampled_dates.csv', index=False)

### 🛠️ Dowloading the files

I don't want to dowload this 328 files one by one. I can use the API of Copernicus Marine Data Store. It is possible to create a list of files or dowload the files one by one. You can find more informations on the [Copernicus Marine API page](https://help.marine.copernicus.eu/en/articles/8286883-copernicus-marine-toolbox-api-get-original-files#h_4cb9923744).

⚠️ The dowloaded files are not saved in the GitHub repositary because it was too heavy (56GB). However, the spatial filtered version is saved.

In [3]:
import copernicusmarine as cm

  from .autonotebook import tqdm as notebook_tqdm


In [16]:
datasetID = 'cmems_mod_bal_wav_my_PT1H-i'
sampled_dates = pd.read_csv('sampled_dates.csv', parse_dates=[0]).iloc[:, 0].tolist()

In [None]:
i=0
for date in sampled_dates:
    i += 1
    year = date.year
    month = date.month 
    if month < 10:
        month = f'0{month}'             
    day = date.day
    if day < 10:
        day = f'0{day}'
    cm.get(
        username = 'your_username',
        password = 'your_password',
        dataset_id  = datasetID,
        filter=f'*{year}{month}{day}*'
        )
    print(f'Dowloading {i} / 328 files for {year}-{month}-{day}')

INFO - 2025-07-29T12:56:45Z - Selected dataset version: "202411"
INFO - 2025-07-29T12:56:45Z - Selected dataset part: "default"
INFO - 2025-07-29T12:56:46Z - Listing files on remote server...
17it [00:09,  1.84it/s]
Downloading files: 100%|██████████| 1/1 [00:09<00:00,  9.63s/it]


Dowloading 249 / 328 files for 2022-01-07


INFO - 2025-07-29T12:57:15Z - Selected dataset version: "202411"
INFO - 2025-07-29T12:57:15Z - Selected dataset part: "default"
INFO - 2025-07-29T12:57:16Z - Listing files on remote server...
17it [00:09,  1.70it/s]
Downloading files: 100%|██████████| 1/1 [00:13<00:00, 13.11s/it]


Dowloading 250 / 328 files for 2022-01-10


INFO - 2025-07-29T12:57:49Z - Selected dataset version: "202411"
INFO - 2025-07-29T12:57:49Z - Selected dataset part: "default"
INFO - 2025-07-29T12:57:49Z - Listing files on remote server...
17it [00:09,  1.80it/s]
Downloading files: 100%|██████████| 1/1 [00:09<00:00,  9.32s/it]


Dowloading 251 / 328 files for 2022-01-14


INFO - 2025-07-29T12:58:18Z - Selected dataset version: "202411"
INFO - 2025-07-29T12:58:18Z - Selected dataset part: "default"
INFO - 2025-07-29T12:58:19Z - Listing files on remote server...
17it [00:12,  1.37it/s]
Downloading files: 100%|██████████| 1/1 [00:12<00:00, 12.71s/it]


Dowloading 252 / 328 files for 2022-01-22


INFO - 2025-07-29T12:59:02Z - Selected dataset version: "202411"
INFO - 2025-07-29T12:59:02Z - Selected dataset part: "default"
INFO - 2025-07-29T12:59:03Z - Listing files on remote server...
17it [00:16,  1.05it/s]
Downloading files: 100%|██████████| 1/1 [00:13<00:00, 13.42s/it]


Dowloading 253 / 328 files for 2022-01-25


INFO - 2025-07-29T12:59:49Z - Selected dataset version: "202411"
INFO - 2025-07-29T12:59:49Z - Selected dataset part: "default"
INFO - 2025-07-29T12:59:50Z - Listing files on remote server...
17it [00:11,  1.46it/s]
Downloading files: 100%|██████████| 1/1 [00:22<00:00, 22.38s/it]


Dowloading 254 / 328 files for 2022-01-31


INFO - 2025-07-29T13:00:42Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:00:42Z - Selected dataset part: "default"
INFO - 2025-07-29T13:00:44Z - Listing files on remote server...
17it [00:13,  1.31it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.01s/it]


Dowloading 255 / 328 files for 2022-02-04


INFO - 2025-07-29T13:01:21Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:01:21Z - Selected dataset part: "default"
INFO - 2025-07-29T13:01:22Z - Listing files on remote server...
17it [00:14,  1.19it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.41s/it]


Dowloading 256 / 328 files for 2022-02-06


INFO - 2025-07-29T13:02:03Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:02:03Z - Selected dataset part: "default"
INFO - 2025-07-29T13:02:05Z - Listing files on remote server...
17it [00:13,  1.23it/s]
Downloading files: 100%|██████████| 1/1 [00:12<00:00, 12.55s/it]


Dowloading 257 / 328 files for 2022-02-24


INFO - 2025-07-29T13:02:45Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:02:45Z - Selected dataset part: "default"
INFO - 2025-07-29T13:02:47Z - Listing files on remote server...
17it [00:11,  1.46it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.52s/it]


Dowloading 258 / 328 files for 2022-03-21


INFO - 2025-07-29T13:03:23Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:03:23Z - Selected dataset part: "default"
INFO - 2025-07-29T13:03:24Z - Listing files on remote server...
17it [00:11,  1.42it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.75s/it]


Dowloading 259 / 328 files for 2022-03-22


INFO - 2025-07-29T13:04:04Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:04:04Z - Selected dataset part: "default"
INFO - 2025-07-29T13:04:06Z - Listing files on remote server...
17it [00:12,  1.33it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.12s/it]


Dowloading 260 / 328 files for 2022-03-23


INFO - 2025-07-29T13:04:41Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:04:41Z - Selected dataset part: "default"
INFO - 2025-07-29T13:04:42Z - Listing files on remote server...
17it [00:09,  1.82it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.03s/it]


Dowloading 261 / 328 files for 2022-03-25


INFO - 2025-07-29T13:05:14Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:05:14Z - Selected dataset part: "default"
INFO - 2025-07-29T13:05:16Z - Listing files on remote server...
17it [00:11,  1.44it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.16s/it]


Dowloading 262 / 328 files for 2022-04-05


INFO - 2025-07-29T13:05:51Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:05:51Z - Selected dataset part: "default"
INFO - 2025-07-29T13:05:52Z - Listing files on remote server...
17it [00:12,  1.35it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.66s/it]


Dowloading 263 / 328 files for 2022-04-07


INFO - 2025-07-29T13:06:27Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:06:27Z - Selected dataset part: "default"
INFO - 2025-07-29T13:06:28Z - Listing files on remote server...
17it [00:14,  1.16it/s]
Downloading files: 100%|██████████| 1/1 [00:12<00:00, 12.85s/it]


Dowloading 264 / 328 files for 2022-04-14


INFO - 2025-07-29T13:07:11Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:07:11Z - Selected dataset part: "default"
INFO - 2025-07-29T13:07:12Z - Listing files on remote server...
17it [00:11,  1.43it/s]
Downloading files: 100%|██████████| 1/1 [00:14<00:00, 14.47s/it]


Dowloading 265 / 328 files for 2022-04-25


INFO - 2025-07-29T13:07:50Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:07:50Z - Selected dataset part: "default"
INFO - 2025-07-29T13:07:51Z - Listing files on remote server...
17it [00:11,  1.44it/s]
Downloading files: 100%|██████████| 1/1 [00:13<00:00, 13.88s/it]


Dowloading 266 / 328 files for 2022-05-12


INFO - 2025-07-29T13:08:33Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:08:33Z - Selected dataset part: "default"
INFO - 2025-07-29T13:08:34Z - Listing files on remote server...
17it [00:13,  1.29it/s]
Downloading files: 100%|██████████| 1/1 [00:20<00:00, 21.00s/it]


Dowloading 267 / 328 files for 2022-05-18


INFO - 2025-07-29T13:09:21Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:09:21Z - Selected dataset part: "default"
INFO - 2025-07-29T13:09:22Z - Listing files on remote server...
17it [00:11,  1.52it/s]
Downloading files: 100%|██████████| 1/1 [00:12<00:00, 12.87s/it]


Dowloading 268 / 328 files for 2022-05-27


INFO - 2025-07-29T13:09:57Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:09:57Z - Selected dataset part: "default"
INFO - 2025-07-29T13:09:58Z - Listing files on remote server...
17it [00:10,  1.61it/s]
Downloading files: 100%|██████████| 1/1 [00:12<00:00, 12.45s/it]


Dowloading 269 / 328 files for 2022-06-02


INFO - 2025-07-29T13:10:34Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:10:34Z - Selected dataset part: "default"
INFO - 2025-07-29T13:10:34Z - Listing files on remote server...
17it [00:08,  1.91it/s]
Downloading files: 100%|██████████| 1/1 [00:12<00:00, 12.06s/it]


Dowloading 270 / 328 files for 2022-06-07


INFO - 2025-07-29T13:11:07Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:11:07Z - Selected dataset part: "default"
INFO - 2025-07-29T13:11:08Z - Listing files on remote server...
17it [00:11,  1.43it/s]
Downloading files: 100%|██████████| 1/1 [00:13<00:00, 13.18s/it]

Dowloading 271 / 328 files for 2022-06-12



INFO - 2025-07-29T13:11:46Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:11:46Z - Selected dataset part: "default"
INFO - 2025-07-29T13:11:47Z - Listing files on remote server...
17it [00:11,  1.48it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.14s/it]


Dowloading 272 / 328 files for 2022-06-17


INFO - 2025-07-29T13:12:21Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:12:21Z - Selected dataset part: "default"
INFO - 2025-07-29T13:12:22Z - Listing files on remote server...
17it [00:11,  1.45it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.53s/it]


Dowloading 273 / 328 files for 2022-06-19


INFO - 2025-07-29T13:12:59Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:12:59Z - Selected dataset part: "default"
INFO - 2025-07-29T13:13:02Z - Listing files on remote server...
17it [00:10,  1.61it/s]
Downloading files: 100%|██████████| 1/1 [00:13<00:00, 13.19s/it]


Dowloading 274 / 328 files for 2022-06-25


INFO - 2025-07-29T13:13:36Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:13:36Z - Selected dataset part: "default"
INFO - 2025-07-29T13:13:38Z - Listing files on remote server...
17it [00:10,  1.62it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.72s/it]


Dowloading 275 / 328 files for 2022-07-11


INFO - 2025-07-29T13:14:11Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:14:11Z - Selected dataset part: "default"
INFO - 2025-07-29T13:14:12Z - Listing files on remote server...
17it [00:10,  1.62it/s]
Downloading files: 100%|██████████| 1/1 [00:13<00:00, 13.64s/it]


Dowloading 276 / 328 files for 2022-07-22


INFO - 2025-07-29T13:14:47Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:14:47Z - Selected dataset part: "default"
INFO - 2025-07-29T13:14:48Z - Listing files on remote server...
17it [00:12,  1.34it/s]
Downloading files: 100%|██████████| 1/1 [00:15<00:00, 15.41s/it]


Dowloading 277 / 328 files for 2022-08-08


INFO - 2025-07-29T13:15:26Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:15:26Z - Selected dataset part: "default"
INFO - 2025-07-29T13:15:27Z - Listing files on remote server...
17it [00:08,  2.03it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.15s/it]


Dowloading 278 / 328 files for 2022-08-29


INFO - 2025-07-29T13:15:54Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:15:54Z - Selected dataset part: "default"
INFO - 2025-07-29T13:15:55Z - Listing files on remote server...
17it [00:09,  1.71it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.30s/it]


Dowloading 279 / 328 files for 2022-09-01


INFO - 2025-07-29T13:16:25Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:16:25Z - Selected dataset part: "default"
INFO - 2025-07-29T13:16:26Z - Listing files on remote server...
17it [00:08,  1.94it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.71s/it]


Dowloading 280 / 328 files for 2022-09-25


INFO - 2025-07-29T13:16:55Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:16:55Z - Selected dataset part: "default"
INFO - 2025-07-29T13:16:56Z - Listing files on remote server...
17it [00:07,  2.16it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.55s/it]


Dowloading 281 / 328 files for 2022-10-10


INFO - 2025-07-29T13:17:22Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:17:22Z - Selected dataset part: "default"
INFO - 2025-07-29T13:17:23Z - Listing files on remote server...
17it [00:09,  1.86it/s]
Downloading files: 100%|██████████| 1/1 [00:17<00:00, 17.71s/it]


Dowloading 282 / 328 files for 2022-10-12


INFO - 2025-07-29T13:17:58Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:17:58Z - Selected dataset part: "default"
INFO - 2025-07-29T13:17:59Z - Listing files on remote server...
17it [00:07,  2.13it/s]
Downloading files: 100%|██████████| 1/1 [00:09<00:00,  9.46s/it]


Dowloading 283 / 328 files for 2022-10-24


INFO - 2025-07-29T13:18:25Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:18:25Z - Selected dataset part: "default"
INFO - 2025-07-29T13:18:26Z - Listing files on remote server...
17it [00:08,  2.07it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.09s/it]


Dowloading 284 / 328 files for 2022-10-27


INFO - 2025-07-29T13:18:55Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:18:55Z - Selected dataset part: "default"
INFO - 2025-07-29T13:18:56Z - Listing files on remote server...
17it [00:10,  1.57it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.32s/it]


Dowloading 285 / 328 files for 2022-11-02


INFO - 2025-07-29T13:19:28Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:19:28Z - Selected dataset part: "default"
INFO - 2025-07-29T13:19:29Z - Listing files on remote server...
17it [00:10,  1.57it/s]
Downloading files: 100%|██████████| 1/1 [00:12<00:00, 12.94s/it]


Dowloading 286 / 328 files for 2022-11-04


INFO - 2025-07-29T13:20:04Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:20:04Z - Selected dataset part: "default"
INFO - 2025-07-29T13:20:05Z - Listing files on remote server...
17it [00:09,  1.78it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.06s/it]


Dowloading 287 / 328 files for 2022-11-07


INFO - 2025-07-29T13:20:36Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:20:36Z - Selected dataset part: "default"
INFO - 2025-07-29T13:20:37Z - Listing files on remote server...
17it [00:09,  1.72it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.22s/it]


Dowloading 288 / 328 files for 2022-11-08


INFO - 2025-07-29T13:21:09Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:21:09Z - Selected dataset part: "default"
INFO - 2025-07-29T13:21:10Z - Listing files on remote server...
17it [00:09,  1.85it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.79s/it]


Dowloading 289 / 328 files for 2022-11-29


INFO - 2025-07-29T13:21:40Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:21:40Z - Selected dataset part: "default"
INFO - 2025-07-29T13:21:41Z - Listing files on remote server...
17it [00:11,  1.53it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.71s/it]


Dowloading 290 / 328 files for 2022-12-05


INFO - 2025-07-29T13:22:13Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:22:13Z - Selected dataset part: "default"
INFO - 2025-07-29T13:22:14Z - Listing files on remote server...
17it [00:10,  1.65it/s]
Downloading files: 100%|██████████| 1/1 [00:09<00:00,  9.84s/it]


Dowloading 291 / 328 files for 2022-12-19


INFO - 2025-07-29T13:22:46Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:22:46Z - Selected dataset part: "default"
INFO - 2025-07-29T13:22:47Z - Listing files on remote server...
17it [00:09,  1.87it/s]
Downloading files: 100%|██████████| 1/1 [00:12<00:00, 13.00s/it]


Dowloading 292 / 328 files for 2023-01-04


INFO - 2025-07-29T13:23:18Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:23:18Z - Selected dataset part: "default"
INFO - 2025-07-29T13:23:19Z - Listing files on remote server...
17it [00:09,  1.84it/s]
Downloading files: 100%|██████████| 1/1 [00:12<00:00, 12.06s/it]


Dowloading 293 / 328 files for 2023-01-15


INFO - 2025-07-29T13:23:51Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:23:51Z - Selected dataset part: "default"
INFO - 2025-07-29T13:23:52Z - Listing files on remote server...
17it [00:08,  2.12it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.82s/it]


Dowloading 294 / 328 files for 2023-01-22


INFO - 2025-07-29T13:24:23Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:24:23Z - Selected dataset part: "default"
INFO - 2025-07-29T13:24:24Z - Listing files on remote server...
17it [00:10,  1.64it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.80s/it]


Dowloading 295 / 328 files for 2023-01-26


INFO - 2025-07-29T13:24:56Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:24:56Z - Selected dataset part: "default"
INFO - 2025-07-29T13:24:58Z - Listing files on remote server...
17it [00:11,  1.54it/s]
Downloading files: 100%|██████████| 1/1 [00:23<00:00, 23.34s/it]


Dowloading 296 / 328 files for 2023-01-31


INFO - 2025-07-29T13:25:43Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:25:43Z - Selected dataset part: "default"
INFO - 2025-07-29T13:25:43Z - Listing files on remote server...
17it [00:08,  2.12it/s]
Downloading files: 100%|██████████| 1/1 [00:18<00:00, 18.22s/it]


Dowloading 297 / 328 files for 2023-02-14


INFO - 2025-07-29T13:26:19Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:26:19Z - Selected dataset part: "default"
INFO - 2025-07-29T13:26:20Z - Listing files on remote server...
17it [00:10,  1.56it/s]
Downloading files: 100%|██████████| 1/1 [00:15<00:00, 15.33s/it]


Dowloading 298 / 328 files for 2023-03-06


INFO - 2025-07-29T13:26:57Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:26:57Z - Selected dataset part: "default"
INFO - 2025-07-29T13:26:58Z - Listing files on remote server...
17it [00:09,  1.73it/s]
Downloading files: 100%|██████████| 1/1 [00:09<00:00,  9.66s/it]


Dowloading 299 / 328 files for 2023-03-20


INFO - 2025-07-29T13:27:29Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:27:29Z - Selected dataset part: "default"
INFO - 2025-07-29T13:27:30Z - Listing files on remote server...
17it [00:10,  1.67it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.18s/it]


Dowloading 300 / 328 files for 2023-03-22


INFO - 2025-07-29T13:28:02Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:28:02Z - Selected dataset part: "default"
INFO - 2025-07-29T13:28:03Z - Listing files on remote server...
17it [00:10,  1.60it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.03s/it]


Dowloading 301 / 328 files for 2023-04-03


INFO - 2025-07-29T13:28:35Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:28:35Z - Selected dataset part: "default"
INFO - 2025-07-29T13:28:36Z - Listing files on remote server...
17it [00:08,  2.02it/s]
Downloading files: 100%|██████████| 1/1 [00:15<00:00, 15.99s/it]


Dowloading 302 / 328 files for 2023-04-23


INFO - 2025-07-29T13:29:09Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:29:09Z - Selected dataset part: "default"
INFO - 2025-07-29T13:29:10Z - Listing files on remote server...
17it [00:08,  1.97it/s]
Downloading files: 100%|██████████| 1/1 [00:25<00:00, 25.60s/it]


Dowloading 303 / 328 files for 2023-04-24


INFO - 2025-07-29T13:29:56Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:29:56Z - Selected dataset part: "default"
INFO - 2025-07-29T13:29:57Z - Listing files on remote server...
17it [00:10,  1.63it/s]
Downloading files: 100%|██████████| 1/1 [00:18<00:00, 18.98s/it]


Dowloading 304 / 328 files for 2023-04-26


INFO - 2025-07-29T13:30:39Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:30:39Z - Selected dataset part: "default"
INFO - 2025-07-29T13:30:40Z - Listing files on remote server...
17it [00:10,  1.69it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.66s/it]


Dowloading 305 / 328 files for 2023-05-04


INFO - 2025-07-29T13:31:12Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:31:12Z - Selected dataset part: "default"
INFO - 2025-07-29T13:31:13Z - Listing files on remote server...
17it [00:11,  1.52it/s]
Downloading files: 100%|██████████| 1/1 [00:13<00:00, 13.13s/it]


Dowloading 306 / 328 files for 2023-05-05


INFO - 2025-07-29T13:31:49Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:31:49Z - Selected dataset part: "default"
INFO - 2025-07-29T13:31:50Z - Listing files on remote server...
17it [00:10,  1.70it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.51s/it]


Dowloading 307 / 328 files for 2023-05-23


INFO - 2025-07-29T13:32:21Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:32:21Z - Selected dataset part: "default"
INFO - 2025-07-29T13:32:22Z - Listing files on remote server...
17it [00:10,  1.57it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.18s/it]


Dowloading 308 / 328 files for 2023-05-27


INFO - 2025-07-29T13:32:53Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:32:53Z - Selected dataset part: "default"
INFO - 2025-07-29T13:32:54Z - Listing files on remote server...
17it [00:10,  1.57it/s]
Downloading files: 100%|██████████| 1/1 [00:12<00:00, 12.09s/it]


Dowloading 309 / 328 files for 2023-06-01


INFO - 2025-07-29T13:33:28Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:33:28Z - Selected dataset part: "default"
INFO - 2025-07-29T13:33:29Z - Listing files on remote server...
17it [00:10,  1.67it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.29s/it]


Dowloading 310 / 328 files for 2023-06-05


INFO - 2025-07-29T13:34:00Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:34:00Z - Selected dataset part: "default"
INFO - 2025-07-29T13:34:01Z - Listing files on remote server...
17it [00:11,  1.50it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.24s/it]


Dowloading 311 / 328 files for 2023-06-06


INFO - 2025-07-29T13:34:35Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:34:35Z - Selected dataset part: "default"
INFO - 2025-07-29T13:34:35Z - Listing files on remote server...
17it [00:09,  1.85it/s]
Downloading files: 100%|██████████| 1/1 [00:09<00:00,  9.97s/it]


Dowloading 312 / 328 files for 2023-06-19


INFO - 2025-07-29T13:35:05Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:35:05Z - Selected dataset part: "default"
INFO - 2025-07-29T13:35:06Z - Listing files on remote server...
17it [00:07,  2.14it/s]
Downloading files: 100%|██████████| 1/1 [00:09<00:00,  9.83s/it]


Dowloading 313 / 328 files for 2023-06-21


INFO - 2025-07-29T13:35:33Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:35:33Z - Selected dataset part: "default"
INFO - 2025-07-29T13:35:34Z - Listing files on remote server...
17it [00:10,  1.56it/s]
Downloading files: 100%|██████████| 1/1 [00:09<00:00,  9.83s/it]


Dowloading 314 / 328 files for 2023-07-18


INFO - 2025-07-29T13:36:04Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:36:04Z - Selected dataset part: "default"
INFO - 2025-07-29T13:36:04Z - Listing files on remote server...
17it [00:08,  2.06it/s]
Downloading files: 100%|██████████| 1/1 [00:09<00:00,  9.52s/it]


Dowloading 315 / 328 files for 2023-07-19


INFO - 2025-07-29T13:36:32Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:36:32Z - Selected dataset part: "default"
INFO - 2025-07-29T13:36:33Z - Listing files on remote server...
17it [00:11,  1.48it/s]
Downloading files: 100%|██████████| 1/1 [00:19<00:00, 19.33s/it]


Dowloading 316 / 328 files for 2023-08-03


INFO - 2025-07-29T13:37:16Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:37:16Z - Selected dataset part: "default"
INFO - 2025-07-29T13:37:16Z - Listing files on remote server...
17it [00:10,  1.55it/s]
Downloading files: 100%|██████████| 1/1 [00:12<00:00, 12.08s/it]


Dowloading 317 / 328 files for 2023-08-08


INFO - 2025-07-29T13:37:49Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:37:49Z - Selected dataset part: "default"
INFO - 2025-07-29T13:37:50Z - Listing files on remote server...
17it [00:09,  1.83it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.78s/it]


Dowloading 318 / 328 files for 2023-08-09


INFO - 2025-07-29T13:38:21Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:38:21Z - Selected dataset part: "default"
INFO - 2025-07-29T13:38:22Z - Listing files on remote server...
17it [00:09,  1.85it/s]
Downloading files: 100%|██████████| 1/1 [00:13<00:00, 13.39s/it]


Dowloading 319 / 328 files for 2023-08-17


INFO - 2025-07-29T13:38:53Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:38:53Z - Selected dataset part: "default"
INFO - 2025-07-29T13:38:54Z - Listing files on remote server...
17it [00:08,  1.97it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.26s/it]


Dowloading 320 / 328 files for 2023-08-23


INFO - 2025-07-29T13:39:24Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:39:24Z - Selected dataset part: "default"
INFO - 2025-07-29T13:39:25Z - Listing files on remote server...
17it [00:08,  2.12it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 11.00s/it]


Dowloading 321 / 328 files for 2023-08-26


INFO - 2025-07-29T13:39:52Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:39:52Z - Selected dataset part: "default"
INFO - 2025-07-29T13:39:53Z - Listing files on remote server...
17it [00:08,  1.90it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.09s/it]


Dowloading 322 / 328 files for 2023-08-31


INFO - 2025-07-29T13:40:24Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:40:24Z - Selected dataset part: "default"
INFO - 2025-07-29T13:40:26Z - Listing files on remote server...
17it [00:11,  1.46it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.24s/it]


Dowloading 323 / 328 files for 2023-09-26


INFO - 2025-07-29T13:40:59Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:40:59Z - Selected dataset part: "default"
INFO - 2025-07-29T13:41:00Z - Listing files on remote server...
17it [00:10,  1.55it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.17s/it]


Dowloading 324 / 328 files for 2023-10-03


INFO - 2025-07-29T13:41:32Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:41:32Z - Selected dataset part: "default"
INFO - 2025-07-29T13:41:33Z - Listing files on remote server...
17it [00:11,  1.44it/s]
Downloading files: 100%|██████████| 1/1 [00:10<00:00, 10.65s/it]


Dowloading 325 / 328 files for 2023-11-18


INFO - 2025-07-29T13:42:08Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:42:08Z - Selected dataset part: "default"
INFO - 2025-07-29T13:42:09Z - Listing files on remote server...
17it [00:10,  1.59it/s]
Downloading files: 100%|██████████| 1/1 [00:31<00:00, 31.85s/it]


Dowloading 326 / 328 files for 2023-12-05


INFO - 2025-07-29T13:43:01Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:43:01Z - Selected dataset part: "default"
INFO - 2025-07-29T13:43:02Z - Listing files on remote server...
17it [00:09,  1.71it/s]
Downloading files: 100%|██████████| 1/1 [00:17<00:00, 17.63s/it]


Dowloading 327 / 328 files for 2023-12-19


INFO - 2025-07-29T13:43:41Z - Selected dataset version: "202411"
INFO - 2025-07-29T13:43:41Z - Selected dataset part: "default"
INFO - 2025-07-29T13:43:42Z - Listing files on remote server...
17it [00:10,  1.61it/s]
Downloading files: 100%|██████████| 1/1 [00:11<00:00, 11.22s/it]

Dowloading 328 / 328 files for 2023-12-28





### 🛠️ Spatial filtering

I filtered the files using the same method I used in the 02_Spatial_filtering notebook. I will remove the dowloaded files to reduce the space it takes on my computer.

In [17]:
import xarray as xr
import numpy as np
from shapely.geometry import shape
import geopandas as gpd
import os
import time
import gc

In [18]:
livinglab_geometry = gpd.read_file('livinglab_west.json')
path = 'C:/Users\PC\Documents\Ecole/2A\Stage_2A\RISE\Example\Angiosperms-model\BALTICSEA_MULTIYEAR_WAV_003_015\cmems_mod_bal_wav_my_PT1H-i_202411'
input_file = f'{path}/2015/01/CMEMS_BAL_WAV_MY_2015010201.nc'
ds = xr.open_dataset(input_file)
ds

In [None]:
i = 0
for date in sampled_dates:
    i += 1
    # If you want to skip some dates
    # if date < pd.Timestamp('2017-01-29'): 
    #     continue
    year = date.year
    month = date.month
    if month < 10:
        month = f'0{month}'             
    day = date.day
    if day < 10:
        day = f'0{day}'
    input_file_name = f'CMEMS_BAL_WAV_MY_{year}{month}{day}01.nc'
    output_file_name = f'filtered_{input_file_name}'
    input_file = f'{path}/{year}/{month}/{input_file_name}'
    output_file = f'{path}/{year}/{month}/{output_file_name}'

    ds = xr.open_dataset(input_file)
    lat = ds['lat'].values
    lon = ds['lon'].values
    lon_grid, lat_grid = np.meshgrid(lon, lat)
    coords = np.column_stack((lon_grid.ravel(), lat_grid.ravel()))
    mask = np.array([livinglab_geometry.contains(shape({'type': 'Point', 'coordinates': (lon, lat)})) for lon, lat in coords])
    mask = mask.reshape(lon_grid.shape)
    mask_da = xr.DataArray(mask, coords=[ds['lat'], ds['lon']], dims=["lat", "lon"])
    filtered_data = ds.where(mask_da, drop=True) 
    
    try:
        filtered_data.to_netcdf(output_file)
    finally:
        ds.close()
        filtered_data.close()
        del ds
        del filtered_data
        gc.collect()
        time.sleep(1)

    # Try to delete the file, retrying if necessary
    for attempt in range(5):
        try:
            if os.path.exists(output_file):
                os.remove(input_file)
            break
        except PermissionError:
            time.sleep(1)
    print(f'File {i} / 328 processed: {output_file_name}')

File 70 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017020801.nc
File 71 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017022001.nc
File 72 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017032801.nc
File 73 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017040101.nc
File 74 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017041401.nc
File 75 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017041701.nc
File 76 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017042301.nc
File 77 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017043001.nc
File 78 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017051001.nc
File 79 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017051101.nc
File 80 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017051301.nc
File 81 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017051401.nc
File 82 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017052601.nc
File 83 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017060601.nc
File 84 / 328 processed: filtered_CMEMS_BAL_WAV_MY_2017062601.nc
File 85 / 328 processed: 

### 🛠️ Calculating the yearly mean

I will merge the datasets per year and caculate the mean.

In [None]:
# Filter the sampled_dates list for the specified year
year_to_filter = 2015
filtered_dates = [date for date in sampled_dates if date.year == year_to_filter]
files = [f'{path}/{date.year}/{date.month:02d}/filtered_CMEMS_BAL_WAV_MY_{date.year:04d}{date.month:02d}{date.day:02d}01.nc' for date in filtered_dates]
# Open the datasets and concatenate them along the time dimension
datasets = [xr.open_dataset(file) for file in files]
# Concatenate the datasets along the 'time' dimension
combined_dataset = xr.concat(datasets, dim='time')
# Calculate the mean for each variable across time for the year
yearly_mean = combined_dataset.mean(dim='time')
yearly_mean

In [21]:
for year_to_filter in range(2015,2024):
    filtered_dates = [date for date in sampled_dates if date.year == year_to_filter]
    files = [f'{path}/{date.year}/{date.month:02d}/filtered_CMEMS_BAL_WAV_MY_{date.year:04d}{date.month:02d}{date.day:02d}01.nc' for date in filtered_dates]
    # Open the datasets and concatenate them along the time dimension
    datasets = [xr.open_dataset(file) for file in files]
    # Concatenate the datasets along the 'time' dimension
    combined_dataset = xr.concat(datasets, dim='time')
    # Calculate the mean for each variable across time for the year
    yearly_mean = combined_dataset.mean(dim='time')
    # Save the yearly mean to a new NetCDF file
    yearly_mean.to_netcdf(f'{path}/{year_to_filter}/yearly_mean_CMEMS_BAL_WAV_MY_{year_to_filter}.nc')
    print(f'Yearly mean for {year_to_filter} saved.')

Yearly mean for 2015 saved.
Yearly mean for 2016 saved.
Yearly mean for 2017 saved.
Yearly mean for 2018 saved.
Yearly mean for 2019 saved.
Yearly mean for 2020 saved.
Yearly mean for 2021 saved.
Yearly mean for 2022 saved.
Yearly mean for 2023 saved.
