## Profile(s)

**profile**: A series of connected observations along a vertical line. Each profile has only one lat, lon coordinate (possibly nominal), so that the points along the profile differ only in z coordinate and possibly time coodinate. There can be multiple profiles in the same file, and each profile has a unique identifier. If we have many profiles with the same lat, lon location, use the Time Series Profile type. *Examples: atomospheric profiles from satellites, moving profilers*.

* [Only one profile in the file](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_single_profile)
* [All profiles have the same vertical coordinates](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_orthogonal_multidimensional_array_representation_of_profiles)
* [Each profile has the same number of vertical coordinates but the coordinate values may be different](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_incomplete_multidimensional_array_representation_of_profiles)
* Each profile has a different number of vertical coordinates and we wanna keep file size as small as possible:
    * [we have all the data already, and we wanna optimize reading all the data for one profile](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_contiguous_ragged_array_representation_of_profiles)
    * [we wanna write the data as it arrives, in any order](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_indexed_ragged_array_representation_of_profiles)
        * *double time(profile) is possible*; *double time(obs)* is also possible, when the observation varies by time. 


Example Dataset: [CLAPPP : New Caledonian lagoons: CTD Profiles](https://erddap.osupytheas.fr/erddap/files/CLAPP_CTD_OutNetCDF_6c83/Clappp1/)

In [1]:
import os
from glob import glob
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt

In [2]:
os.chdir('/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data')
os.getcwd()

'/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data'

In [43]:
pf_files = glob(os.path.join(os.getcwd(), "dsg_profile", "*.nc"))
pf_files

['/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_profile/prony-1.nc',
 '/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_profile/teremba-1.nc']

### H.3.3. Single Profile

In [53]:
# Load the profile dataset.
# It is a standard single profile netCDF file.
ds_prony = xr.open_dataset(pf_files[0], decode_times=False)
ds_prony.info()

xarray.Dataset {
dimensions:
	depth = 11 ;

variables:
	int32 time() ;
		time:long_name = date de prelevement ;
		time:standard_name = time ;
		time:units = minutes since 1970-01-01 00:00:00 UTC ;
		time:origin = 01-JAN-1970 00:00:00 ;
		time:calendar = standard ;
	|S18 stationname() ;
		stationname:standard_name = platform_name ;
		stationname:long_name = station name ;
		stationname:cf_role = profile_id ;
	float32 latitude() ;
		latitude:units = degrees_north ;
		latitude:standard_name = latitude ;
		latitude:axis = Y ;
		latitude:coverage_content_type = coordinate ;
	float32 longitude() ;
		longitude:units = degrees_east ;
		longitude:standard_name = longitude ;
		longitude:axis = X ;
		longitude:coverage_content_type = coordinate ;
	float32 depth(depth) ;
		depth:axis = Z ;
		depth:positive = up ;
		depth:standard_name = depth ;
		depth:long_name = Profondeur ;
		depth:units = m ;
	float32 temperature(depth) ;
		temperature:standard_name = sea_water_temperature ;
		temperature:unit

### H.3.1. Orthogonal multidimensional array representation of profiles

Multiple profiles, same number of vertical levels and vertical coordinate values are identical.

In [54]:
# Load another profile.
ds_teremba = xr.open_dataset(pf_files[1], decode_times=False)

In [55]:
# Vertical levels of two datasets
print(ds_prony.depth.data)
print(ds_teremba.depth.data[:20])

[-0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1.  -1.1]
[-0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1.  -1.1 -1.2 -1.3 -1.4
 -1.5 -1.6 -1.7 -1.8 -1.9 -2. ]


In [57]:
# Select the same depths for the station "teremba" to match those for the station "prony"
ds_teremba = ds_teremba.sel(depth=ds_prony.depth.data)
ds_teremba

In [112]:
# Vertical coordinate values for both profiles
depth = ds_teremba.depth.data
print("Vertical levels for both profiles are: ", depth)

# Station names for two profiles
# class <bytes>.decode() ==> string
station_name = [ds_prony.stationname.data.tolist().decode(), ds_teremba.stationname.data.tolist().decode()]
print("Station names of two profiles are: ", station_name)

# Time coordinate values for two profiles
# 0-D array (scalar).tolist() ==> the scalar value itself, not a list containing that value
time = [ds_prony.time.data.tolist(), ds_teremba.time.data.tolist()]
print("Time values for two profiles are: ", time)

# Longitudes for two profiles
lon = [ds_prony.longitude.data.tolist(), ds_teremba.longitude.data.tolist()]
print("Longitudes for two stations are: ", lon)

# Latitudes for two profiles
lat = [ds_prony.latitude.data.tolist(), ds_teremba.latitude.data.tolist()]
print("Latitudes for two profiles are: ", lat)

Vertical levels for both profiles are:  [-0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1.  -1.1]
Station names of two profiles are:  ['./Clappp1//prony-1', './Clappp1//teremba-1']
Time values for two profiles are:  [20439933, 20442692]
Longitudes for two stations are:  [166.8699951171875, 165.72999572753906]
Latitudes for two profiles are:  [-22.399999618530273, -21.850000381469727]


In [94]:
# In the example, we have 2 profiles and 11 vertical coordinates,
# Make all the data variables be an array of the shape (profile, z), i.e. (2, 11)
temperature = np.row_stack((ds_prony.temperature.data.tolist(), ds_teremba.temperature.data.tolist()))
conductivity = np.row_stack((ds_prony.conductivity.data.tolist(), ds_teremba.conductivity.data.tolist()))
salinity = np.row_stack((ds_prony.salinity.data.tolist(), ds_teremba.salinity.data.tolist()))
fluorescence = np.row_stack((ds_prony.fluorescence.data.tolist(), ds_teremba.fluorescence.data.tolist()))
irradiance = np.row_stack((ds_prony.irradiance.data.tolist(), ds_teremba.irradiance.data.tolist()))
density = np.row_stack((ds_prony.density.data.tolist(), ds_teremba.density.data.tolist()))
turbidity = np.row_stack((ds_prony.turbidity.data.tolist(), ds_teremba.turbidity.data.tolist()))

When there're lots of data variables in a dataset, it can get tedious to access to the arrays one by one;
**Looping is an alternative way of doing.**

In [109]:
# Alternative and better way to access to all arrays and reshape them.
coord_names = list(ds_prony.coords)[:-1] # drop "depth" from the list
data_variable_names = list(ds_prony.keys())[1:] # drop "stationname" from the list

print(coord_names)
print(data_variable_names)

['time', 'latitude', 'longitude']
['temperature', 'conductivity', 'salinity', 'fluorescence', 'irradiance', 'density', 'turbidity']


In [110]:
# Create a dictionary containing the coordinates for the new dataset
dict_coords = {}
for i in coord_names:
    dict_coords[i] = [ds_prony[i].data.tolist(), ds_teremba[i].data.tolist()]

dict_coords

{'time': [20439933, 20442692],
 'latitude': [-22.399999618530273, -21.850000381469727],
 'longitude': [166.8699951171875, 165.72999572753906]}

In [113]:
# Create another dictionary containing the data variables for the new dataset
dict_datavar = {}
for i in data_variable_names:
    dict_datavar[i] = np.row_stack((ds_prony[i].data.tolist(), ds_teremba[i].data.tolist()))

dict_datavar

{'temperature': array([[24.46899986, 24.31800079, 24.35400009, 24.34300041, 24.27099991,
         24.25600052, 24.24399948, 24.23600006, 24.22999954, 24.23999977,
         24.3239994 ],
        [25.02899933, 24.91600037, 24.84300041, 24.7840004 , 24.76199913,
         24.75099945, 24.70299911, 24.69300079, 24.67700005, 24.66900063,
         24.66500092]]),
 'conductivity': array([[32.59090042, 52.65810013, 52.72539902, 52.71509933, 52.65539932,
         52.64300156, 52.64400101, 52.64849854, 52.71120071, 52.72040176,
         52.76760101],
        [21.35740089, 34.02090073, 25.94919968, 40.42350006, 47.11520004,
         46.70410156, 53.00569916, 53.01900101, 53.10960007, 53.08599854,
         53.08349991]]),
 'salinity': array([[20.58300018, 35.22499847, 35.24599838, 35.2480011 , 35.25899887,
         35.26200104, 35.27199936, 35.28099823, 35.33399963, 35.33300018,
         35.30199814],
        [12.79100037, 21.37000084, 15.89099979, 25.94199944, 30.78700066,
         30.49399948, 35

In [114]:
# Get Attributes
attrs_coords = {}
for i in coord_names:
    attrs_coords[i] = ds_prony[i].attrs

attrs_dataVar = {}
for i in data_variable_names:
    attrs_dataVar[i] = ds_prony[i].attrs

attrs_depth = ds_prony.depth.attrs
attrs_station_name = ds_prony.depth.attrs

attrs_global = ds_prony.attrs

In [116]:
dict_data_vars = {}
for i in data_variable_names:
    dict_data_vars[i] = (["station","depth"], dict_datavar[i], attrs_dataVar[i])

ds_profile = xr.Dataset(
    coords={
        "depth": (["depth"], np.float32(depth), attrs_depth),
        "station": (["station"], station_name, attrs_station_name),
        "time": (["station"], np.int32(dict_coords['time']), attrs_coords['time']),
        "lat": (["station"], np.float32(dict_coords['latitude']), attrs_coords['latitude']),
        "lon": (["station"], np.float32(dict_coords['longitude']), attrs_coords['longitude'])
    },
    data_vars = dict_data_vars,
)

In [119]:
# Add Global Attributes
ds_profile.attrs = attrs_global

# Add Feature Type 
ds_profile.attrs["featureType"] = "profile"

In [120]:
ds_profile.info()

xarray.Dataset {
dimensions:
	station = 2 ;
	depth = 11 ;

variables:
	float64 temperature(station, depth) ;
		temperature:standard_name = sea_water_temperature ;
		temperature:units = Celsius ;
		temperature:long_name = Temperature ;
		temperature:source = Seabird CTD ;
		temperature:coverage_content_type = physicalMeasurement ;
	float64 conductivity(station, depth) ;
		conductivity:standard_name = sea_water_electrical_conductivity ;
		conductivity:long_name = conductivity ;
		conductivity:units = S.m^-1 ;
		conductivity:coverage_content_type = physicalMeasurement ;
		conductivity:source = Seabird CTD ;
	float64 salinity(station, depth) ;
		salinity:standard_name = sea_water_salinity ;
		salinity:long_name = salinity ;
		salinity:station = CLAPPP ;
		salinity:units = 1e-3 ;
		salinity:coverage_content_type = physicalMeasurement ;
	float64 fluorescence(station, depth) ;
		fluorescence:standard_name = mass_concentration_of_chlorophyll_a_in_sea_water ;
		fluorescence:long_name = fluoresc

## Time Series of Profile(s)

**time series (station) of profile(s)**: Time series of profiles at fixed locations. A file can contain many stations and many time series at each station. *Examples: profilers, balloon soundings*.
    
* [Only one station in the file](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_time_series_of_profiles_at_a_single_station)
* [Each station has the same number of profiles, and the same number of vertical levels for each profile](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_multidimensional_array_representations_of_time_series_profiles)
* [Each station has a different number of profiles AND/OR the level coordinates for each station may vary](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_ragged_array_representation_of_time_series_profiles)


Example Dataset: [Newport Lab CTD Casts, 1997-2008](https://coastwatch.pfeg.noaa.gov/erddap/tabledap/erdNewportCtd.html)

In [None]:
file_tsProfile = glob(os.path.join(os.getcwd(), "dsg_tsProfile", "*.nc"))
file_tsProfile

['/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_tsProfile/061207NH01.nc',
 '/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_tsProfile/061207NH03.nc',
 '/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_tsProfile/061298NH01.nc']

In [None]:
ds_ctd1 = xr.open_dataset(file_tsProfile[2])
ds_ctd1

In [None]:
ds_ctd2 = xr.open_dataset(file_tsProfile[0])
ds_ctd2

In [None]:
ds_ctd3 = xr.open_dataset(file_tsProfile[1])
ds_ctd3

In [None]:
print(ds_ctd1.id, ds_ctd1.time.data[0], ds_ctd1.station.data[0], ds_ctd1.latitude.data[0], ds_ctd1.longitude.data[0])
print(ds_ctd2.id, ds_ctd2.time.data[0], ds_ctd2.station.data[0], ds_ctd2.latitude.data[0], ds_ctd2.longitude.data[0])
print(ds_ctd3.id, ds_ctd3.time.data[0], ds_ctd3.station.data[0], ds_ctd3.latitude.data[0], ds_ctd3.longitude.data[0])

061298NH01 2007-06-13T04:32:00.000000000 b'NH25' 44.65169887200318 -124.64999685107614
061207NH01 2000-06-12T18:17:00.000000000 b'NH45' 44.65169887200318 -125.1166968392863
061207NH03 2007-06-13T00:26:00.000000000 b'NH01' 44.65169887200318 -124.09999686497031


In [None]:
# Each station has a different number of profiles AND the level coordinates for each station vary
print(ds_ctd1.depth_or_pressure.data[-5:-1])
print(ds_ctd2.depth_or_pressure.data[-5:-1])
print(ds_ctd3.depth_or_pressure.data[-5:-1])

[261.99999338 262.99999336 263.99999333 264.99999331]
[513.99998702 514.99998699 515.99998696 516.99998694]
[14.99999962 15.9999996  16.99999957 17.99999955]
