# Copernicus Marine In-situ data

Copernicus Marine provides access to a lot of ocean observations through the [In Situ Thematic Centre](
https://marine.copernicus.eu/about/producers/insitu-tac), which consists of a global centre and six regional centers

* Arctic Ocean
* Baltic Sea
* Black Sea
* Iberia Biscay Ireland Seas
* Mediterranean Sea
* North-west Shelf


> The INS TAC provides integrated products for a core set of parameters (Temperature, Salinity, Current, Sea Level,Waves,Chlorophyll ,Oxygen ,Nutrients , Carbon) , for initialization, forcing, assimilation and validation of ocean numerical models. These products are used for forecasting, analysis and re-analysis of ocean physical and biogeochemical conditions, satellite validation and downstream applications that require NRT data


## Data access

To get an overview of what data is available you can visit the [CMEMS In Situ TAC Dashboard](http://www.marineinsitu.eu/dashboard/)

Data can also be downloaded from the web interface.

![](../images/cmems_insitu_dashboard.png)

(To download CMEMS data, you first need to create a user, in order to get a username and password.

Registering a new user can be [here](https://data.marine.copernicus.eu/register))


In [None]:
import os
from urllib import request
import matplotlib.pyplot as plt
import pandas as pd
import geopandas
import xarray as xr

## Platform overview

In [None]:


cols = "platform_code,date_creation,date_update,wmo_platform_code,data_source,institution,institution_edmo_code,parameters,last_latitude_observation,last_longitude_observation,last_date_observation".split(",")

platforms = pd.read_csv("https://data-marineinsitu.ifremer.fr/glo_multiparameter_nrt/index_platform.txt", names=cols, header=None, comment="#")

In [None]:
platforms.head()

The observed parameters for each platform is found in the parameters columns, but not in a [tidy](https://vita.had.co.nz/papers/tidy-data.pdf) format.
Let's fix that!

In [None]:
platforms['parameters'] = platforms.parameters.str.split()
platforms = platforms.explode("parameters")
platforms.head()

There are many different variables ("parameters") available for download.

Here is a list of some popular parameter abbreviations.

In [None]:
platforms.groupby("parameters").platform_code.count().nlargest(30)

In [None]:
wave_platforms = platforms[platforms.parameters == "VHM0"][["platform_code", "institution", "last_latitude_observation", "last_longitude_observation"]]

Let's try to filter this list to a specific area

In [None]:
wave_platforms_gdf = geopandas.GeoDataFrame(
    wave_platforms, geometry=geopandas.points_from_xy(wave_platforms.last_longitude_observation, wave_platforms.last_latitude_observation),crs=4326)

wave_platforms_gdf[['platform_code','geometry']].head()

In [None]:
aoi = geopandas.read_file("../tests/data/northsea.geojson", crs=4326)
aoi.plot()
plt.title("Area of interest")

In [None]:
ns_wave_platforms = wave_platforms_gdf.overlay(aoi, how='intersection')

m = aoi.explore()
ns_wave_platforms.explore(m=m, color='red')


`Fanoebugt` seems like it could be relevant, let's try to download that data

In [None]:
stn = 'Fanoebugt'
base_url = "https://data-marineinsitu.ifremer.fr/glo_multiparameter_nrt/history/MO/"

tac = "NO" # TODO how to get this?

filename = f"{tac}_TS_MO_{stn}.nc"

url = os.path.join(base_url,filename)
url


In [None]:
response = request.urlretrieve(url, filename)

In [None]:
ds = xr.open_dataset(filename)
ds

The dataset contains many different variables

In [None]:
ds.data_vars

Each variable has a standard name ([CF convention](https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html))

In [None]:
ds.VHM0.standard_name

As well a long, friendly name

In [None]:
ds.VHM0.long_name

Each variable also has an associated QC flag

In [None]:
ds.VHM0_QC.long_name

In [None]:
ds.VHM0.isel(DEPTH=0).plot()

In [None]:
ds.VHM0.isel(DEPTH=0).sel(TIME=slice("2020-10-15","2020-12-15")).plot.line('+-')

In [None]:
ds.VHM0_QC.isel(DEPTH=0).sel(TIME=slice("2020-10-15","2020-12-15")).plot.line('+-')

In [None]:
ds.isel(DEPTH=0).sel(TIME=slice("2020-11-01","2020-11-09"))[['VHM0','VHM0_QC']].to_dataframe()