Skip to content

Python wrapper to fetch data from Copernicus servers via subset

Notifications You must be signed in to change notification settings

d2gex/copernicus-subset-wrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of content:

1. Introduction

This is a wrapper to help you download data from Copernicus servers via a configuration file, eliminating therefore the needs to deal with programming. It has been developed in Python 3.9 and makes calls to the recently released Copernicus Marine Toolbox's python API - as per December 2023.

In a nutshell, this script will look for a csv file with rows having latitude, longitude, depth and specific dates. It will also look for a configuration file - setup.toml- to find what products and variables you are after. Then it will enquire Copernicus service for such details and generate a csv output of the exact same dimension as your input, merging the sought information into the inputs file.

The main advantages over the toolbox's console client are as follows:

  1. The input data is processed by single unique dates, meaning that rows which dates are the same are processed together by calculating the widest area that embeds all given coordinates for that day. This avoids reducing the number of calls or the generation of massive files should the whole set be requested at once.
  2. Only one single csv file is generated.
  3. Although the data fetched is by area and date, the script will find the point in the downloaded dataset closest to the coordinates given in each row of the input file, meaning that the final generated csv file has got the same number of rows as the input file.
  4. All Individual and original downloaded files are kept intact - .nc format - so that they can be post-processed in whichever way you consider appropriate, should you need to do so in the forthcoming future.

This wrapper uses solely the subset functionality of the Copernicus Marine Toolbox's Python API.

2. Installation

First, install Python >=3.9 and < 3.12 as required by Copernicus Marine Toolbox and pip - please do ensure you install pip too. To do so download the Python version of your choice from https://www.python.org/downloads/ and then follow the instructions on https://docs.python.org/3/using/index.html. Details for Windows, Mac and *nix users are provided in the appropriate sections.

Second, download the source code from Github either by downloading the zip directly from the web on https://github.com/d2gex/copernicus-subset-wrapper.git as shown in the figure below ...

... Or just git-cloning to your preferred location, ensuring that the destination folder is empty:

   cd <<your_source_folder>>
   git clone https://github.com/d2gex/copernicus-subset-wrapper.git .

Third, install the project dependencies. If you do not want to install them system-wide, which is highly recommended, you can create a virtual environment as described on Python Virtual Environments and Packages. A quick tutorial hack is shown below:

   python3 -m venv <<your_virtualenv_folder>>
   source /path/to/your_virtualenv_folder/bin/activate # (Linux-way)
   \path\to\your_virtualenv_folder\Scripts\activate # (Windows-way)

Otherwise you just can install the project requirements as:

   pip install -r /path/to/your_source_folder/requirements.txt

The file requirements.txt contains all libraries that are necessary for this wrapper to run.

3. Loging to Copernicus

If you have not yet registered with Copernicus you need first to do so here. Then you need to run the login function from Copernicus API one-off for your credentials to be generated. Subsequent calls to the wrapper will know where your credentials are stored and pick them as needed. To call login you need to run the following on the console:

  copernicusmarine login

You will be asked for the username and password you used earlier on in the registration process. Upon providing it, a message saying that the credentials have been generated and the location where they are should be prompted to you. You are now ready to use the wrapper without worrying in the future about credentials or whatsoever.

4. Running

4.1 Configure your setup.toml file

The setup.toml file is the configuration file used by the wrapper and contains information about the products and variables you are trying to download. It is placed within <<your_source_folder>> and its options have been explained within the file itself and should be self-explanatory.

setup.toml

input_filename = "api_parameters.csv" # name of the file holding the input parameters
output_filename = "result.nc" # suffix added to the name of each individual file fetched per input row
dataset_id = "cmems_mod_glo_phy_my_0.083deg_P1M-m" # data set identifier
variables = ["thetao", "zos"] # variables wanting to be fetched
years = [2012, 2020] #  date interval of interest. One single year can be defined as [2012]
# distance method used to calculate the nearest point.
# See alternatives on https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html
distance = "euclidean"
# [days, hours, minutes, seconds] time added to each start_time - per row - in days, hours, minutes and seconds
time_offset = [0, 23, 59, 59]
start_mode = 0  # 0 start afresh, 1 resume from given years interval and 2 read only from disk

4.2 Ensure input file is in the correct format

The wrapper will read a csv file within data folder in '<<your_source_folder>>' provided by the variable input_filename in your configuration file. An example is shown below:

<<your_apir_parameters_filename>>.csv

In a nutshell, columns lat, lon, time and depth must be named as such and time must be in %d/%m/%Y %H:%M. The coordinate system is WGS 84 EPSG: 4326. There must be a column in the spreadsheet identifying each row uniquely, although its name is down to you. In the example above it is called ID_Gil.

4.3 Run the wrapper

   cd your_source_folder
   python -m src.main

4.4 Look for the results

After the data has been downloaded look for the resulting csv file in '<<your_source_folder>>/data/<<dataset_identifier>>/csv/<<dataset_identifier>>.csv'. The wrapper will also place each downloaded *.nc files in '<<your_source_folder>>/data/<<dataset_identifier>>/nc/'.

5. Resuming the script at a specific point

Given that fetching data from Copernicus servers falls within the Big Data domain, dealing with large datasets does not come without troubles. The natural unreliability of the internet connection you may be using plus the spatiotemporal inconvenient derived from constantly downloading data, may make the script to break at some point. In such case it is possible to resume at a desired point by both reducing the original yearly interval one was after and setting the start_mode = 2 in the setup.toml file. Beware that all files associated to the first year of the new reduced interval will be deleted entirely and re-downloaded again.

About

Python wrapper to fetch data from Copernicus servers via subset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages