# Motu client snippet to fetch data from Copernicus servers

This snippet allows to fetch full areas del of data stored in Copernicus servers and delimited by two points **(x<sub>min</sub>, y<sub>min</sub>)** and **(x<sub>max</sub>, y<sub>max</sub>)**, where x<sub>min</sub> and x<sub>man</sub> represents the minimum and maximum longitude. Likewise y<sub>min</sub> and y<sub>max</sub> denotes the minimum and maximum latitude. Others parameters such as date and depth range and the variables to be fetched are required. A full list of all parmeters can be found here: https://github.com/clstoulouse/motu-client-python

## Installation 

First download the source code from https://github.com/d2gex/afar_coper/archive/refs/heads/master.zip, unzip the downloaded file and "cd" into it. Then you need to install the required Python libraries to run this snippet. Please note that the following instructions would install such libraries system-wide and would therefore be available for every program with access to your Python path. Open a console - on Windows by typing CMD on your search box - and run the following command:

```bash
pip install -r requirements.txt
```

If you have not yet installed Jupyter, then yo need to install it as well as follows:

```bash
pip install jupyterlab
```

If you have used Conda to install your Python libraries stick to it rather than using pip. Lastly open this notebook.

## Python configuration section

The script below relies on the following folder structure:

* `setup.toml`:configuration file that holds information about the service to be consumed on Copernicus servers 
* `data`: main root folder where the inputs provided to the Copernicus services and the retrieved data will be stored.
    * `output/nc`: folder where fetched **nc** files from Copernicus will be stored
    * `output/csv`: folder where converted nc files into **csv** will be stored
* `motu_calls.log`: logging file used by Motu to store everything it does to fetch the requested data

There are some logging configuration that is not relevant for the execution of this script were you not familiar with Python logging.

## setup.toml file
This file contains information about the Copernicus service such as the product and service you are about to inquiry. You need to modify these settings if you are planning to use another product or service and whether you want to fetch different variables.

```toml
base_url = "https://my.cmems-du.eu/motu-web/Motu" # Copernicus' base url. It hardly ever changes
input_filename = "api_parameters.csv" # The name of the file in 'data' root folder where the input parameters such as coordinates, date and depth range are provided to Copernicus
output_filename = "result.nc" # Name of the file for each file downloaded from Copernicus. An offset will be added with the ID from the input_filename (See next section)
service_id = "GLOBAL_MULTIYEAR_PHY_001_030-TDS" # Name of the service from Copernicus you are interested in
product_id = "cmems_mod_glo_phy_my_0.083_P1D-m" # Name of the product from Copernicus you have an interest in
variables = ["thetao", "zos"] # Name of the variables of the service you want to fetch
```

## input_file format

Below there is a snapshot of what the expected input dataframe should be. There are 9 columns with the coordinates, depth and date intervals of the area you may want to explore.  A row could have the same area and different depth or date range. Any combination of these variables generating a different dataset is accepted. The script does not check against duplicate rows. The overall idea is that usually you should have a row per area inspected unless you may want to look at different depths for the same area. You may also consider that the date range is too large for the same area and therefore the file too big to be produced, hence you may want to chop it up into more manageable chunks.

| ID  | longitude_min  | longitude_max  | latitude_min  | latitude_max  | depth_min  | depth_max  | date_min  | date_max  |
|---|---|---|---|---|---|---|---|---|
|  1 |  -8.9875 | 5.98694444  | 35.875  | 42.99055556  | 0.494024992  | 0.494024992  | 31/12/2020 00:00:00  | 01/01/2021 00:00:00  |
|  2 |  -7.9875 | 4.98694444  | 36.875  | 43.99055556  | 0.494024992  | 0.494024992  | 01/01/2019 00:00:00  | 02/01/2019 00:00:00  |



## ".env" credentials file

The requests performed against the Copernicus servers requires basic authentication credentials, namely a username and password.  These need to be provided as follows:

```toml
COPERNICUS_USERNAME=<<your_username>>
COPERNICUS_PASSWORD=<<your_password>>
```

You need to replace the words within "<<" ">>" with your own real details.

**DO NOT forget to create this file yourself within the root folder of the project and add your credentias in the same fashion as outlined earlier on!!**

## Python configuration file

Usually this file doesn't require changing.

In [1]:
import logging
import sys
import os
import tomli
import pandas as pd
import xarray as xr
from typing import Dict, Any
from datetime import datetime
from pathlib import Path
from dotenv import load_dotenv
from motu_utils import motu_api
import xarray as xr

file_path = Path.cwd()
ROOT_PATH = file_path.parents[0]
DATA_PATH = ROOT_PATH / 'data'
OUTPUT_PATH = DATA_PATH / 'output'
CSV_PATH = OUTPUT_PATH / 'csv'
NC_PATH = OUTPUT_PATH / 'nc'

dot_env = load_dotenv(ROOT_PATH / '.env')
with open(ROOT_PATH / 'setup.toml', mode="rb") as fp:
    settings = tomli.load(fp)

INPUT_FILENAME = settings['input_filename']
OUTPUT_FILENAME = settings['output_filename']
COPERNICUS_USERNAME = os.getenv('COPERNICUS_USERNAME')
COPERNICUS_PASSWORD = os.getenv('COPERNICUS_PASSWORD')

# Log to the output and into a file
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
fh = logging.FileHandler(ROOT_PATH / 'motu_calls.log', mode='w')
sh = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter('[%(asctime)s] %(levelname)s [%(filename)s.%(funcName)s:%(lineno)d] %(message)s',
                              datefmt='%a, %d %b %Y %H:%M:%S')
fh.setFormatter(formatter)
sh.setFormatter(formatter)
logger.addHandler(fh)
logger.addHandler(sh)

## User-defined payload motu classes

Usually these two classes aren't required changing

In [2]:
class MotuOptions:
    def __init__(self, attrs: dict):
        super().__setattr__("attrs", attrs)

    def __setattr__(self, k, v):
        self.attrs[k] = v

    def __getattr__(self, k):
        try:
            return self.attrs[k]
        except KeyError:
            return None

In [3]:
class MotuPayloadGenerator:

    def __init__(self, data: pd.DataFrame, payload: Dict[str, Any], output_filename: str):
        self.data = data
        self.payload = payload
        self.output_filename = output_filename
        self.area_details = []

    def _process_row(self, row):
        tokens = self.output_filename.split(".")
        product_details = {
            'latitude_min': row["latitude_min"],
            'longitude_min': row["longitude_min"],
            'latitude_max': row["latitude_max"],
            'longitude_max': row["longitude_max"],
            'depth_min': row["depth_min"],
            'depth_max': row["depth_max"],
            'out_name': f"{row['ID']}_{tokens[0]}.{tokens[-1]}",
            'date_min': (datetime.strptime(row['date_min'], '%d/%m/%Y %H:%M:%S')).strftime('%Y-%m-%d %H:%M:%S'),
            'date_max': (datetime.strptime(row['date_max'], '%d/%m/%Y %H:%M:%S')).strftime('%Y-%m-%d %H:%M:%S'),
        }
        product_details.update(self.payload)
        return product_details

    def run(self):
        return {row['ID']: self._process_row(row) for _, row in self.data.iterrows()}


## Nc to Csv converter class

Usually this class doesn't require changing

In [4]:
class NcToCsv:

    def __init__(self, nc_path: Path, csv_path: Path):
        self.nc_path = nc_path
        self.csv_path = csv_path

    def __call__(self, *args, **kwargs):
        nc_paths = [f for f in Path(self.nc_path).glob(str('*.nc'))]
        for _path in nc_paths:
            ds = xr.open_dataset(_path)
            df = ds.to_dataframe()
            csv_filename = f"{str(_path.stem)}.csv"
            abs_csv_path = self.csv_path / csv_filename
            df.to_csv(abs_csv_path)

## Main body where MOTU api calls are requested

This is the bit of the script wheren calls to the Copernicus server through Motu client is done. The paradigm is very simple:
1. Fetch a row from the inputted excel file and make a request to Copernicus
2. The Copernicus server will start preparing the file to download
3. The Motu client will keep asking whether the file is ready until it actually is. Then it downloads it on `output/nc` folder.
4. Continue to the next row and proceed from 1-3. All steps are documented into the `motu_calls.log`. See that depending on how large is your input_file and how large the areas, you may have to waint a very long time. You don't need to do anything as the motu client will take care of the whole process.
5. Lastly, once all files (one per row in your input_file) have been downloaded, the script will convert all of them into csv and will be place in `output/csv`. The name of the file will start by the ID number of the row being processed + the name provided for your out_file in your `setup.toml` file. 

In [None]:
logger = logging.getLogger()
if __name__ == "__main__":
    input_data = pd.read_csv(DATA_PATH / INPUT_FILENAME)
    common_payload = {# Common information for each request to Copernicus. This infor mis commmon for all request
        'motu': settings['base_url'],
        "auth_mode": 'cas',
        'out_dir': str(NC_PATH),
        'user': COPERNICUS_USERNAME,
        'pwd': COPERNICUS_PASSWORD,
        'service_id': settings['service_id'],
        'product_id': settings['product_id'],
        'variable': settings['variables']
    }
    payload_generators = MotuPayloadGenerator(input_data, common_payload, OUTPUT_FILENAME) # It generates a custom set of details for each request: i.e, different coordinates
    motu_payloads = payload_generators.run()

    # Fetch .data from Coperniculs in .nc format
    for _id, payload_data in motu_payloads.items():
        logger.info(
            f"------> Processing area  for ID = {_id} delimited by ({payload_data['longitude_min']},{payload_data['latitude_min']}) and "
            f"({payload_data['longitude_max']},{payload_data['latitude_max']})")
        motu_api.execute_request(MotuOptions(payload_data)) # Real call to Copernicus
        logger.info("-------> END")

    # Convert all nc files to csv
    (NcToCsv(NC_PATH, CSV_PATH))()