# Retrieve a CMEMS dataset - CORA OA


### 0. setup environment

#### Requirements

In [3]:
packages = ['pystac_client',
            'copernicusmarine',
            'xarray',
            'requests',
            'aiohttp',
            'copernicusmarine',
            'cartopy',
           'geopandas']

#### Install packages

In [4]:
for package in packages:
    !pip install {package} > /dev/null 2>&1

#### Load packages

In [5]:
for package in packages:
    exec(f'import {package}')

In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import os
import geopandas
import cartopy.crs as ccrs
import json
import ast
from copernicusmarine.core_functions import custom_open_zarr
from shapely.geometry import Point, box, Polygon, shape
import geopandas as gpd
from shapely import wkt

<br>

### 1. Open the STAC catalog
Using pystac client you can connect to the STAC.

In [7]:
url = 'https://catalog.dive.edito.eu'
client = pystac_client.Client.open(url)
print(client)

<Client id=root>


<br>

### 2. Load collections
One property of the STAC is collections, its a good way to explore the available datasets.

In [8]:
collections = list(client.get_collections())

Lets see how many collections there are:

In [9]:
print(f"number of collections: {len(collections)}")

number of collections: 445


<br>

### 3. Query collections
We will loop over the collections and filter a variable defined in CMEMS such as: Mass concentration of chlorophyll a in sea water. Notice that we need to add underscores as spaces are not accepted.
#### 3.1 Filter on variable

In [11]:

variable1 = "sea_water_temperature"

If you use one of the easiest variables to obtain from the product website for CORA OA in-situ data (https://data.marine.copernicus.eu/product/INSITU_GLO_PHY_TS_OA_MY_013_052/description). This data only have sea water temperature and sea water salinity available. 

For STAC search purposes, it is easier to search for sea_water_temperature as a variable. 

In [12]:
for collection in collections:
    if variable1 in collection.id:
        print(collection.id)

climate_forecast-sea_water_temperature


#### 3.2 Retrieve products
Get all the products from these collections.

<i> This is where I have run into issues because I am not sure how to introduce a start_datetime and end_datetime in the internal item.Item search. </i> 

In [68]:
products = []

for collection in collections:
    if variable1 in collection.id:
        for i, item in enumerate(collection.get_items()):
            products.append(item)
            

STACError: Invalid Item: If datetime is None, a start_datetime and end_datetime must be supplied.

Lets see how many products this is.

In [15]:
print(f"number of products: {len(products)}")

number of products: 148


#### 3.3 Filter the products 

In [75]:
product_id = "INSITU_GLO_PHY_TS_OA_MY_013_052"

In [54]:
for collection in collections:
    if variable1 in collection.id:
        for i, item in enumerate(collection.get_items()):
            for asset_key, asset in item.assets.items():
                if product_id in asset.href:
                    print(i, asset.href)

STACError: Invalid Item: If datetime is None, a start_datetime and end_datetime must be supplied.