# Discrete Sampling Geometries (DSG) netCDF in the CF Conventions

The examples in the previous chapters all concern netCDF files with grid data. In this chapter, we are going to learn about discrete sampling geometries (DSG) data and how they should look like in a netCDF file complied with the CF Conventions.

## DSG Types

DSG netCDF is the type of a netCDF file that stores a collection of **"features"** in a single file, where the "features" can be one of the following types:

* **Point(s)**: Data is located at different, unconnected locations, e.g. earthquake data, lightning data.

* **Time Series**: Data is located at named locations, called stations, each has a unique identifier. There can be multiple stations, and ususally for each station we have multiple data with different time coordinates. Examples are weather stations data, fixed buoys.

* **Profile(s)**: A series of connected observations along a vertical line. Each profile has only one lat, lon coordinate (possibly nominal), so that the points along the profile differ only in z coordinate and possibly time coodinate. There can be multiple profiles in the same file, and each profile has a unique identifier. If we have many profiles with the same lat, lon location, use the Time Series Profile type. *Examples: atomospheric profiles from satellites, moving profilers*.

* **Time Series of Profile(s)**: Time series of profiles at fixed locations. A file can contain many stations and many time series at each station. *Examples: profilers, balloon soundings*.

* **Trajectory**: A series of connected observations along a 1D curve in time and space. There can be multiple trajectories in the same file, each with a unique identifier. *Examples: aircraft data, drifting buoys*.

* **Trajectory of Profile(s)**: a collection of profile features which originate along a trajectory. So there are trajectories which have profile data (varying with z) at each (lat, lon) location. *Examples: ship soundings*.



In [9]:
import os
from glob import glob
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt

In [3]:
os.chdir('/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data')
os.getcwd()

'/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data'

## Point(s)

Example Dataset: [North Pacific High](https://coastwatch.pfeg.noaa.gov/erddap/tabledap/erdNph.html)

In [4]:
ds_nph = xr.open_dataset(os.path.join(os.getcwd(), "dsg_point", "NPH_IDS.nc"))
ds_nph.info()

xarray.Dataset {
dimensions:
	time = 565 ;

variables:
	datetime64[ns] time(time) ;
		time:long_name = Centered Time ;
	int16 year(time) ;
		year:long_name = Year ;
	int8 month(time) ;
		month:long_name = Month (1 - 12) ;
	float32 longitude(time) ;
		longitude:long_name = Longitude of the Center of the NPH ;
		longitude:units = degrees_east ;
	float32 latitude(time) ;
		latitude:long_name = Latitude of the Center of the NPH ;
		latitude:units = degrees_north ;
	float32 area(time) ;
		area:long_name = Areal Extent of the 1020 hPa Contour ;
		area:units = km2 ;
	float32 maxSLP(time) ;
		maxSLP:long_name = Maximum Sea Level Pressure ;
		maxSLP:units = hPa ;

// global attributes:
	:id = NPH_IDS ;
}

In [6]:
os.path.join(os.getcwd(), "dsg_point", "NPH_IDS.nc")

'/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_point/NPH_IDS.nc'

### point text --> netCDF

In [10]:
df_haulCatch = pd.read_csv(os.path.join(os.getcwd(), "dsg_point", "CPS_Trawl_LifeHistory_Haulcatch.csv"))
df_haulCatch

Unnamed: 0,cruise,ship,haul,collection,start_latitude,start_longitude,stop_latitude,stop_longitude,equilibrium_time,haulback_time,surface_temp,surface_temp_method,ship_spd_through_water,itis_tsn,scientific_name,subsample_count,subsample_weight,remaining_weight
0,200307,FR,1,2003,42.9816,-124.8413,43.0006,-124.8930,2003-07-08T21:03:00-07,2003-07-08T21:34:00-07,13.3,bucket,3.5,82367,Teuthida,1.0,0.0100,
1,200307,FR,1,2003,42.9816,-124.8413,43.0006,-124.8930,2003-07-08T21:03:00-07,2003-07-08T21:34:00-07,13.3,bucket,3.5,82371,Doryteuthis (Loligo) opalescens,3.0,0.0300,
2,200307,FR,1,2003,42.9816,-124.8413,43.0006,-124.8930,2003-07-08T21:03:00-07,2003-07-08T21:34:00-07,13.3,bucket,3.5,159643,Salpida,,0.1000,
3,200307,FR,1,2003,42.9816,-124.8413,43.0006,-124.8930,2003-07-08T21:03:00-07,2003-07-08T21:34:00-07,13.3,bucket,3.5,161729,Sardinops sagax,,0.0100,
4,200307,FR,1,2003,42.9816,-124.8413,43.0006,-124.8930,2003-07-08T21:03:00-07,2003-07-08T21:34:00-07,13.3,bucket,3.5,161828,Engraulis mordax,,0.0500,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18081,202307,SH,81,4616,48.1445,-125.6153,48.1073,-125.5773,2023-11-01T01:03:00-08,2023-11-01T01:48:00-08,13.6,TSG,3.8,551209,Clupea pallasii,9.0,0.3180,0.0
18082,202307,SH,81,4616,48.1445,-125.6153,48.1073,-125.5773,2023-11-01T01:03:00-08,2023-11-01T01:48:00-08,13.6,TSG,3.8,50623,Aequorea,2.0,0.1100,0.0
18083,202307,SH,81,4616,48.1445,-125.6153,48.1073,-125.5773,2023-11-01T01:03:00-08,2023-11-01T01:48:00-08,13.6,TSG,3.8,51671,Cyanea capillata,4.0,0.3500,0.0
18084,202307,SH,81,4616,48.1445,-125.6153,48.1073,-125.5773,2023-11-01T01:03:00-08,2023-11-01T01:48:00-08,13.6,TSG,3.8,161977,Oncorhynchus kisutch,3.0,0.9390,0.0


In [19]:
len(set(df_haulCatch['cruise']))

34

## Time Series

Example Dataset: [Kelp Forest Monitoring Sea Temperature](https://coastwatch.pfeg.noaa.gov/erddap/tabledap/erdCinpKfmT.html)

In [7]:
ts_files = glob(os.path.join(os.getcwd(), "dsg_timeSeries", "*.nc"))
ts_files

['/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_timeSeries/KFMTemperature_Anacapa_Cathedral_Cove.nc',
 '/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_timeSeries/KFMTemperature_Anacapa_Black_Sea_Bass_Reef.nc']

In [12]:
ds_ts1 = xr.open_dataset(ts_files[0])

print(ds_ts1["ID"].data)
ds_ts1

b'Anacapa (Cathedral Cove)'


In [13]:
ds_ts2 = xr.open_dataset(ts_files[1])

print(ds_ts2['ID'].data)
ds_ts2

b'Anacapa (Black Sea Bass Reef)'


## Profile(s)

Example Dataset: [CTD profile ANTARES station (NetCDF files) (lat/long : 42.485/6.06)](https://erddap.osupytheas.fr/erddap/tabledap/CTD_Antares_NC_6fad_a064_dc26.html)

In [14]:
pf_files = glob(os.path.join(os.getcwd(), "dsg_profile", "*.nc"))
pf_files

['/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_profile/20180917_ctd0268_moose11.nc',
 '/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_profile/20170411_ctd0268_moose11.nc']

In [15]:
ds_pf1 = xr.open_dataset(pf_files[0])
ds_pf1

In [16]:
ds_pf2 = xr.open_dataset(pf_files[1])
ds_pf2

In [19]:
print(ds_pf1.time.data)
print(ds_pf1.latitude.data)
print(ds_pf1.longitude.data)
print(ds_pf1.stationname.data)

print(ds_pf2.time.data)
print(ds_pf2.latitude.data)
print(ds_pf2.longitude.data)
print(ds_pf2.stationname.data)

2018-09-17T08:34:00.000000000
42.81
6.11
b'ANTARES'
2017-04-11T08:42:00.000000000
42.8
6.14
b'ANTARES'


## Time Series of Profile(s)

Example Dataset: [Newport Lab CTD Casts, 1997-2008](https://coastwatch.pfeg.noaa.gov/erddap/tabledap/erdNewportCtd.html)

In [20]:
file_tsProfile = glob(os.path.join(os.getcwd(), "dsg_tsProfile", "*.nc"))
file_tsProfile

['/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_tsProfile/061207NH01.nc',
 '/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_tsProfile/061207NH03.nc',
 '/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_tsProfile/061298NH01.nc']

In [23]:
ds_ctd1 = xr.open_dataset(file_tsProfile[2])
ds_ctd1

In [24]:
ds_ctd2 = xr.open_dataset(file_tsProfile[0])
ds_ctd2

In [25]:
ds_ctd3 = xr.open_dataset(file_tsProfile[1])
ds_ctd3

In [31]:
print(ds_ctd1.id, ds_ctd1.time.data[0], ds_ctd1.station.data[0], ds_ctd1.latitude.data[0], ds_ctd1.longitude.data[0])
print(ds_ctd2.id, ds_ctd2.time.data[0], ds_ctd2.station.data[0], ds_ctd2.latitude.data[0], ds_ctd2.longitude.data[0])
print(ds_ctd3.id, ds_ctd3.time.data[0], ds_ctd3.station.data[0], ds_ctd3.latitude.data[0], ds_ctd3.longitude.data[0])

061298NH01 2007-06-13T04:32:00.000000000 b'NH25' 44.65169887200318 -124.64999685107614
061207NH01 2000-06-12T18:17:00.000000000 b'NH45' 44.65169887200318 -125.1166968392863
061207NH03 2007-06-13T00:26:00.000000000 b'NH01' 44.65169887200318 -124.09999686497031


In [41]:
# Each station has a different number of profiles AND the level coordinates for each station vary
print(ds_ctd1.depth_or_pressure.data[-5:-1])
print(ds_ctd2.depth_or_pressure.data[-5:-1])
print(ds_ctd3.depth_or_pressure.data[-5:-1])

[261.99999338 262.99999336 263.99999333 264.99999331]
[513.99998702 514.99998699 515.99998696 516.99998694]
[14.99999962 15.9999996  16.99999957 17.99999955]


## Trajectory(-ies)

Example Datasets:
1. Only one trajectory in the file: [SWFSC Protected Resources Division CTD Data](https://coastwatch.pfeg.noaa.gov/erddap/tabledap/erdPrdCtd.html)
2. Multiple trajectories and each trajectory has different number of points: [SWFSC FED Mid Water Trawl Juvenile Rockfish Survey](https://coastwatch.pfeg.noaa.gov/erddap/tabledap/erdFedRockfishStation.html)

In [54]:
file_trj = glob(os.path.join(os.getcwd(), "dsg_trajectory", "*.nc"))
file_trj

['/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_trajectory/rockfish_header_2015.nc',
 '/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_trajectory/rockfish_header_1987.nc',
 '/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_trajectory/PRD_ctd.nc']

In [55]:
ds_trj1 = xr.open_dataset(file_trj[-1])
ds_trj1

In [56]:
print(ds_trj1.datetime.data[0:3])
print(ds_trj1.timestamp.data[0:3])
print(ds_trj1.lat.data[0:3])
print(ds_trj1.lon.data[0:3])

print(ds_trj1.geopoint.data[0:3])
print(ds_trj1.ship_station.data[0:3])

[b'1998-09-01 10:39:00' b'1998-09-02 10:38:00' b'1998-09-03 10:46:00']
[b'19980901103900' b'19980902103800' b'19980903104600']
[12.56  11.447  9.688]
[-88.157 -86.663 -85.62 ]
[b'0101000020E6100000355EBA490C0A56C01F85EB51B81E2940'
 b'0101000020E6100000DF4F8D976EAA55C0BE9F1A2FDDE42640'
 b'0101000020E610000048E17A14AE6755C0C74B378941602340']
[b'1-044' b'1-046' b'1-048']


In [57]:
ds_trj2 = xr.open_dataset(file_trj[0])
ds_trj2

In [58]:
ds_trj3 = xr.open_dataset(file_trj[1])
ds_trj3

## Trajectory(-ies) of Profile(s)

Example Dataset: [GLOBEC NEP Rosette Bottle Data (2002)](https://coastwatch.pfeg.noaa.gov/erddap/tabledap/erdGlobecBottle.html)

In [59]:
ds_trjPf = xr.open_dataset(os.path.join(os.getcwd(), "dsg_trjProfile", "Globec_bottle_data_2002.nc"))
ds_trjPf

In [74]:
print(ds_trjPf.date.data)
print(ds_trjPf.ship.data)
print(ds_trjPf.cruise_id.data)
print(ds_trjPf.lat.data)
print(ds_trjPf.lon.data)
print(ds_trjPf.datetime.data)
print(ds_trjPf.pressure.data)

[b'2002-05-29 00:00:00' b'2002-05-29 00:00:00' b'2002-05-29 00:00:00' ...
 b'2002-08-19 00:00:00' b'2002-08-19 00:00:00' b'2002-08-19 00:00:00']
[b'Wecoma' b'Wecoma' b'Wecoma' ... b'New_Horizon' b'New_Horizon'
 b'New_Horizon']
[b'w0205' b'w0205' b'w0205' ... b'nh0207' b'nh0207' b'nh0207']
[44.65 44.65 44.65 ... 44.65 44.65 44.65]
[-124.1 -124.1 -124.1 ... -124.1 -124.1 -124.1]
[b'2002-05-29 20:21:00' b'2002-05-29 20:21:00' b'2002-05-29 20:21:00' ...
 b'2002-08-19 13:18:00' b'2002-08-19 13:18:00' b'2002-08-19 13:18:00']
[22.606 15.754 10.235 ... 20.    10.     5.   ]
