# Download multiple points via the Copernicus Marine Toolbox

<div class="alert alert-block alert-info">
<b>Note:</b> This notebook will show you how to download variables for multiple points from one or several datasets via the <a href="https://help.marine.copernicus.eu/en/collections/4060068-copernicus-marine-toolbox">Copernicus Marine Toolbox</a>, giving a CSV file as input.<br/>
It follows the example from <a href="https://help.marine.copernicus.eu/en/articles/7970637-how-to-extract-multiple-points-from-a-csv">this article</a>.
</div>

In this tutorial, we will download the variables `uo` and `vo` from the following Mediterranean product and dataset:  

- Product: [MEDSEA_MULTIYEAR_PHY_006_004](https://data.marine.copernicus.eu/product/MEDSEA_MULTIYEAR_PHY_006_004/description)  
- Dataset: `med-cmcc-cur-rean-d`  

Feel free to change the download parameters to get the variable(s) of interest.  
From the same input CSV file, you can **download the same variable(s) from different datasets** by adding them to the `list_datasetID` (and their corresponding `output_names`).  

Depending on what you have in the input file, you can download data for:  
- a **single date**: output result is one `.csv` file (existing column `Date`)
- a **timeseries**: output results are a `.csv` file per point by default but you can choose to save in `.nc` format (existing columns `Start` and `End`)

## Import libraries & functions

We first import these libraries and define the `sort_dimension()` function to sort potential inverted axes:

In [1]:
import os
import pandas as pd
import copernicusmarine

In [2]:
def sort_dimension(dataset, dim_name):
    """
    Get the values for the specified dimension and verify if they are unsorted. If so, the function sorts them.
    """
    # Get the coordinate values for the specified dimension.
    coordinates = dataset[dim_name].values

    # Check if the coordinates are unsorted.
    if (coordinates[0] >= coordinates[:-1]).all():
        dataset = dataset.sortby(dim_name, ascending=True)
        
    return dataset

<div class="alert alert-block alert-info">
<b>Note:</b> We have used versions <code>1.0.0</code> of the Copernicus Marine Toolbox and Python <code>3.11.6</code> <b>(don't use Python 3.12 or later!)</b>.<br/>
More info on all packages and working environment used for running this notebook <a href="#Work-environment">at the end</a>.
</div>

## Create fake csv (optional)

This cell will create a csv with fake points in the Mediterranean Sea, for testing purpose:

In [3]:
import random

# Number of fake points
n = 10

# Create fake points in Mediterranean Sea
fake_lat = [(40 + 2 * random.random()) for i in range(n)]
fake_lon = [(4 + 4 * random.random()) for i in range(n)]
start_dates = ["2020-01-01"] * n
end_dates = ["2020-12-31"] * n
depth = [(5 * random.random()) for i in range(n)]
date = "2020-01-01"

# Create dataframe
data = {'Latitude': fake_lat, 'Longitude': fake_lon, 
     'Start': start_dates, 'End': end_dates, 
     'Depth': depth, 'Date': date}
dataframe = pd.DataFrame(data=data)

# Save dataframe into csv file
dataframe.to_csv('fake_coords_MED.csv', index=False)

## Read input csv file

In this notebook, we will use our previously created file with fake points. Feel free to use your own file with real coordinates of course!  
You can download data for a timeseries between the columns `Start` and `End` or for a single date via the column `Date`, which can vary between points.

<div class="alert alert-block alert-warning">
<b>Warning:</b> names of the columns hey must be the same as the dataframe we are using here.<br/>
    You can rename your columns via: <code>df = df.rename(columns={"A": "a", "B": "c"})</code>
</div>

In [4]:
# Read the CSV in a pandas dataframe
dataframe_coordinates = pd.read_csv("fake_coords_MED.csv", sep = ',')

# Convert columns into right format
dataframe_coordinates["Date"] = pd.to_datetime(dataframe_coordinates["Date"])
dataframe_coordinates["Start"] = pd.to_datetime(dataframe_coordinates["Start"])
dataframe_coordinates["End"] = pd.to_datetime(dataframe_coordinates["End"])

# Show dataframe
dataframe_coordinates

Unnamed: 0,Latitude,Longitude,Start,End,Depth,Date
0,40.429303,6.26579,2020-01-01,2020-12-31,1.092234,2020-01-01
1,40.531485,4.39394,2020-01-01,2020-12-31,0.886261,2020-01-01
2,40.082421,7.117325,2020-01-01,2020-12-31,0.925198,2020-01-01
3,41.280656,5.336382,2020-01-01,2020-12-31,0.314112,2020-01-01
4,40.723999,6.596482,2020-01-01,2020-12-31,1.898866,2020-01-01
5,40.307816,6.057475,2020-01-01,2020-12-31,0.756725,2020-01-01
6,41.198638,5.919086,2020-01-01,2020-12-31,3.35768,2020-01-01
7,41.701031,5.749161,2020-01-01,2020-12-31,2.516334,2020-01-01
8,40.467414,6.209489,2020-01-01,2020-12-31,4.077246,2020-01-01
9,40.888534,5.611275,2020-01-01,2020-12-31,1.854793,2020-01-01


## Download parameters

Here are the download parameters you can modify to suit your needs. You can add more datasets to the list (but don't forget to add the same amount of output names then!):

In [5]:
# Datasets
list_datasetID = [
    'med-cmcc-cur-rean-d',
]

# Output names
output_names = [
    'current_006_004',
]

# Variables
variables = ['uo','vo']

# Output directory
output_dir = "Dataframes/"

## Download temporal points (existing column "Date")

You only need to run this cell to proceed to the download of single dates:

In [6]:
%%time

# Create directory if doesn't exist
if not os.path.exists(output_dir):
   os.makedirs(output_dir)

# Loop for datasets in list_datasetID
for dataset_id, output_name in zip(list_datasetID, output_names):
  
   # Read dataset with CMC
   dataset = copernicusmarine.open_dataset(dataset_id = dataset_id)
  
   # Select surface and rename dimensions
   for coordinate in dataset.coords:
       if coordinate=='lon':
           dataset = dataset.rename({'lon': 'longitude'})
       if coordinate=='lat':
           dataset = dataset.rename({'lat': 'latitude'})
          
   # Sort axis that were inverted
   dataset = sort_dimension(dataset, 'latitude')
   dataset = sort_dimension(dataset, 'longitude')
  
   # Copy the input dataframe
   dataframe_final = dataframe_coordinates.copy()
  
   # Download data for 3D datasets
   if "depth" in dataset.dims:
       dataframe_final = dataframe_final.assign(**{
           var : [float(dataset[var].sel(time=row[0], depth=row[3], method="nearest")\
                        .sel(latitude=row[1], longitude=row[2], method='nearest'))\
                        for row in zip(dataframe_final['Date'], dataframe_final['Latitude'], dataframe_final['Longitude'], dataframe_final["Depth"])]\
                        for var in variables
       })
  
   # Download data for 2D datasets
   else:
       dataframe_final= dataframe_final.assign(**{
           var : [float(dataset[var].sel(time=row[0], method="nearest")\
                        .sel(latitude=row[1], longitude=row[2], method='nearest'))\
                        for row in zip(dataframe_final['Date'], dataframe_final['Latitude'], dataframe_final['Longitude'])]\
                        for var in variables                     
       })
  
   # Add the corresponding date from the dataset (for checking purpose)
   dataframe_final= dataframe_final.assign(**{
       "Date_dataset" : [ dataset.sel(time=date, method="nearest").time.values for date in dataframe_final['Date'] ]
       })
  
   # Save the dataframe with downloaded variable(s)
   dataframe_final.to_csv(output_dir + output_name + "_temporal_points.csv")

print("Download completed!")

INFO - 2024-01-29T11:20:20Z - Dataset version was not specified, the latest one was selected: "202012"
INFO - 2024-01-29T11:20:20Z - Dataset part was not specified, the first one was selected: "default"
INFO - 2024-01-29T11:20:21Z - Service was not specified, the default one was selected: "arco-geo-series"
Download completed!
CPU times: total: 1.7 s
Wall time: 5.4 s


## Download timeseries (existing columns "Start" and "End")

You only need to run this cell to proceed to the download of timeseries:

In [7]:
%%time

# Create directory if doesn't exist
if not os.path.exists(output_dir):
   os.makedirs(output_dir)
  
# Loop for datasets in list_datasetID
for datset_id, output_name in zip(list_datasetID, output_names):

   # Read dataset with CMC
   dataset = copernicusmarine.open_dataset(dataset_id = dataset_id)
  
   # Select surface and rename dimensions
   for coordinate in dataset.coords:
       if coordinate=='lon':
           dataset = dataset.rename({'lon': 'longitude'})
       if coordinate=='lat':
           dataset = dataset.rename({'lat': 'latitude'})
          
   # Sort axis that were inverted
   dataset = sort_dimension(dataset, 'latitude')
   dataset = sort_dimension(dataset, 'longitude')
  
   # Download data for 3D datasets
   if "depth" in dataset.dims:
       for row in zip(dataframe_coordinates['Start'], dataframe_coordinates['End'], dataframe_coordinates['Latitude'], dataframe_coordinates['Longitude'], dataframe_coordinates["Depth"], dataframe_coordinates.index):
           # Do the subset
           dataset_point = dataset[variables].sel(time=slice(row[0],row[1])).sel(latitude=row[2], longitude=row[3], depth=row[4], method="nearest")
           # Save in .csv
           dataset_point.to_dataframe().to_csv(output_dir + output_name + f"point_{row[5]}.csv")
           # Save in .nc
           #dataset_point.to_netcdf(output_dir + output_name + f"point_{row[5]}.nc")
  
   # Download data for 2D datasets
   else:
       for row in zip(dataframe_coordinates['Start'], dataframe_coordinates['End'], dataframe_coordinates['Latitude'], dataframe_coordinates['Longitude'], dataframe_coordinates.index):
           # Do the subset
           dataset_point = dataset[variables].sel(time=slice(row[0],row[1])).sel(latitude=row[2], longitude=row[3], method="nearest")
           # Save in .csv
           dataset_point.to_dataframe().to_csv(output_dir + output_name + f"point_{row[5]}.csv")
           # Save in .nc
           #dataset_point.to_netcdf(output_dir + output_name + f"point_{row[5]}.nc")

print("Download completed!")

INFO - 2024-01-29T11:20:25Z - Dataset version was not specified, the latest one was selected: "202012"
INFO - 2024-01-29T11:20:25Z - Dataset part was not specified, the first one was selected: "default"
INFO - 2024-01-29T11:20:26Z - Service was not specified, the default one was selected: "arco-geo-series"
Download completed!
CPU times: total: 20.4 s
Wall time: 2min 21s


## Conclusion

<div class="alert alert-block alert-success">
    <b>CONGRATULATIONS!</b><br>

You have downloaded the variable(s) you were looking for! 😃   
    
That's it for this tutorial! Don't hesitate to [contact the Copernicus Support](https://marine.copernicus.eu/contact) in case you have any trouble or question about this notebook.  
We would also be happy to get feedback from you about how we could improve this tutorial, if you managed to follow it all along and so on 😊   
    
You can find all the articles regarding the Copernicus Marine Client [in this page](https://help.marine.copernicus.eu/en/collections/4060068-copernicus-marine-client) and especially other useful [Use Cases](https://help.marine.copernicus.eu/en/collections/4062677-use-cases) using this tool.
</div>

***

## Work environment

Environment where this notebook was run:

- copernicusmarine    1.0.0
- pandas              2.2.0
- session_info        1.0.0
- xarray              2024.1.1

Last run: 2024-01-29.

In [8]:
# You can use this package to get information on your environment
# You can install it via mamba install session-info
import session_info
session_info.show()