## Time Series

Data is located at named locations, called *stations*. There can be many stations, and ususally for each station we have multiple data with different time coordinates. Stations have a unique identifier. *Examples: weather station data, fixed buoys*.
* Global attribute `feature_type = timeSeries`.
* The altitude coordinate is optional.
* Special station variables are recognized by standard names as given below. For backwards compatibility, the given aliases are allowed.

    |**standard_name**|**alias**|
    |-----------------|---------|
    |`timeseries_id`|`station_id`|
    |`platform_name`|`station_description`|
    |`surface_altitude`|`station_altitude`|
    |`platform_id`|`station_WMO_id`|

There are different ways of structuring time series data:

* [Single Timeseries](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_single_time_series_including_deviations_from_a_nominal_fixed_spatial_location): Only one station in the file

* [Orthogonal multidimensional array representation of time series](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_orthogonal_multidimensional_array_representation_of_time_series): Multiple stations share same time coordinates, e.g. multiple weather stations measure at the same time, or the times are averaged to be the same. It's fine to use this structure even if there're some missing values from specific station at specific time stamps; but if the missing values are too much, we'd better consider using the next structure (trade-off decision needs to be made).

* [Incomplete multidimensional array representation of time series](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_incomplete_multidimensional_array_representation_of_time_series): Multiple stations, each station has different time coordinates. Measurements for one station are complete, further stations can be added.

* [Contiguous ragged array representation of time series](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_contiguous_ragged_array_representation_of_time_series): The connection between observation and station is lost. Multiple stations, we don't care about which station an observations comes from.

* [Indexed ragged array representation of time series](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_indexed_ragged_array_representation_of_time_series): Fixed multiple stations, measurements still continue, and the dataset will get updated after new measurements are generated. We can't foresee from which station a new measurement is generated.


Show a single time series example!
(optionally converting it to CF-recommended format too!)

Merging everything is not always the best solution; when things get too complicated, it's more recommended to store time series separately, just as the source data provider did. And merging everything together may increase the file size unneccessarily because there might be many missing values. 

But in case you're in the situation where you need to or have to merge data, we show you an example here of how to do it.

(Example of merged time series!)

In [1]:
import os
from glob import glob
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt

In [2]:
os.chdir('/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data')
os.getcwd()

'/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data'

Example Dataset: [Kelp Forest Monitoring Sea Temperature](https://coastwatch.pfeg.noaa.gov/erddap/tabledap/erdCinpKfmT.html)

In [3]:
ts_files = sorted(glob(os.path.join(os.getcwd(), "dsg_timeSeries", "*.nc")))
ts_files

['/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_timeSeries/KFMTemperature_Anacapa_Black_Sea_Bass_Reef.nc',
 '/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_timeSeries/KFMTemperature_Anacapa_Cathedral_Cove.nc',
 '/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_timeSeries/KFMTemperature_Anacapa_East_Fish_Camp.nc',
 '/Users/icdc/Documents/NFDI/Kemeng/cfbook/src/data/dsg_timeSeries/KFMTemperature_San_Clemente_Eel_Point.nc']

In [98]:
ds_anacapa_bassreef = xr.open_dataset(ts_files[0], decode_times=False)

print(ds_anacapa_bassreef["ID"].data)
ds_anacapa_bassreef

b'Anacapa (Black Sea Bass Reef)'


In [99]:
# Original shape of the "Temperature" data
print(ds_anacapa_bassreef.Temperature.shape)

# Length of the original "Temperature" data array
print(len(ds_anacapa_bassreef.Temperature))

# Reshape it as a one-dimensional array, using numpy.ndarray.reshape
temp = ds_anacapa_bassreef.Temperature.data.reshape([len(ds_anacapa_bassreef.Temperature)])
temp

(16065, 1, 1, 1)
16065


array([16.66, 16.5 , 16.5 , ..., 18.21, 18.21, 18.21], dtype=float32)

In [100]:
# Get the data values of latitude, longitude and depth
lat = ds_anacapa_bassreef.LAT.data[0]
lon = ds_anacapa_bassreef.LON.data[0]
depth = ds_anacapa_bassreef.DEPTH.data[0]

print(lat, lon, depth)

34.0 -119.38333 17


In [101]:
# Get Variable Attributes
attr_lat = ds_anacapa_bassreef.LAT.attrs
attr_lon = ds_anacapa_bassreef.LON.attrs
attr_depth = ds_anacapa_bassreef.DEPTH.attrs
attr_temp = ds_anacapa_bassreef.Temperature.attrs

attr_lat

{'_CoordinateAxisType': 'Lat',
 'actual_range': array([34., 34.], dtype=float32),
 'axis': 'Y',
 'long_name': 'Latitude',
 'standard_name': 'latitude',
 'units': 'degrees_north'}

In [102]:
# Drop unneeded dimensions, rename dimension and coordinate variable of time
ds_anacapa_bassreef = ds_anacapa_bassreef.drop_dims(["LON", "LAT", "DEPTH"]).rename({"TIME":"time"})
ds_anacapa_bassreef

```{note}
`xarray.Dataset.rename()` will change the name of both the dimension and the coordinate variable, while `xarray.Dataset.rename_dims()` only changes the name of the dimension and leaves the coordinate variable name unchanged.
```

In [103]:
# Recompose the Dataset
ds_anacapa_bassreef = ds_anacapa_bassreef.assign_coords(lat = ([], np.float32(lat), attr_lat),
                                     lon = ([], np.float32(lon), attr_lon),
                                     depth = ([], np.int32(depth), attr_depth))

ds_anacapa_bassreef = ds_anacapa_bassreef.assign(temperature = (['time'], np.float32(temp), attr_temp))

ds_anacapa_bassreef

```{note}
`xarray.Dataset.assign_coords` will add new coordinate variables to the dataset, and `xarray.Dataset.assign` will add new data variables to the dataset. In this example, latitude, longitude, and depth function like auxiliary coordinate variable, so they're added as coordinates.
```

In [105]:
ds_anacapa_bassreef.info()

xarray.Dataset {
dimensions:
	time = 16065 ;

variables:
	float64 time(time) ;
		time:_CoordinateAxisType = Time ;
		time:actual_range = [1.12982952e+09 1.18765692e+09] ;
		time:axis = T ;
		time:long_name = Time ;
		time:standard_name = time ;
		time:time_origin = 01-JAN-1970 00:00:00 ;
		time:units = seconds since 1970-01-01T00:00:00Z ;
	|S29 ID() ;
		ID:long_name = Station Identifier ;
		ID:units = unitless ;
	float32 lat() ;
		lat:_CoordinateAxisType = Lat ;
		lat:actual_range = [34. 34.] ;
		lat:axis = Y ;
		lat:long_name = Latitude ;
		lat:standard_name = latitude ;
		lat:units = degrees_north ;
	float32 lon() ;
		lon:_CoordinateAxisType = Lon ;
		lon:actual_range = [-119.38333 -119.38333] ;
		lon:axis = X ;
		lon:long_name = Longitude ;
		lon:standard_name = longitude ;
		lon:units = degrees_east ;
	int32 depth() ;
		depth:_CoordinateAxisType = Height ;
		depth:_CoordinateZisPositive = down ;
		depth:actual_range = [17 17] ;
		depth:axis = Z ;
		depth:long_name = Depth ;
		depth

In [49]:
ds_test.info()

xarray.Dataset {
dimensions:
	time = 16065 ;

variables:
	datetime64[ns] time(time) ;
		time:_CoordinateAxisType = Time ;
		time:actual_range = [1.12982952e+09 1.18765692e+09] ;
		time:axis = T ;
		time:long_name = Time ;
		time:standard_name = time ;
		time:time_origin = 01-JAN-1970 00:00:00 ;
	|S29 ID() ;
		ID:long_name = Station Identifier ;
		ID:units = unitless ;
	float32 lat() ;
	float32 lon() ;
	int32 depth() ;
	float32 temperature(time) ;

// global attributes:
	:acknowledgement = NOAA NESDIS COASTWATCH, NOAA SWFSC ERD ;
	:cdm_data_type = Station ;
	:contributor_name = Channel Islands National Park, National Park Service ;
	:contributor_role = Source of data. ;
	:Conventions = COARDS, CF-1.0, Unidata Observation Dataset v1.0 ;
	:creator_email = Roy.Mendelssohn@noaa.gov ;
	:creator_name = NOAA NMFS SWFSC ERD ;
	:creator_url = http://www.pfel.noaa.gov ;
	:date_created = 2008-06-11T21:42:43Z ;
	:date_issued = 2008-06-11T21:42:43Z ;
	:Easternmost_Easting = -119.38333129882812 ;
	:g

In [34]:
ds_ts2 = xr.open_dataset(ts_files[1])

print(ds_ts2['ID'].data)
ds_ts2

b'Anacapa (Cathedral Cove)'


In [35]:
da_ts1 = ds_ts1.to_array()
#da_ts1[0]

da_ts2 = ds_ts2.to_array()
#da_ts2[0]

In [38]:
ds_ts3 = xr.open_dataset(ts_files[-1])

print(ds_ts3['ID'].data)
ds_ts3

b'San Clemente (Eel Point)'


In [39]:
#ds_merged = xr.combine_by_coords([ds_ts1, ds_ts2])
ds_combined_one_loc = ds_ts1.merge(ds_ts2, compat='override')
ds_combined_one_loc

In [41]:
ds_combined_two_loc = ds_ts1.merge(ds_ts3, compat='override')
ds_combined_two_loc

In [43]:
ds_combined_two_loc.ID.data

array(b'Anacapa (Black Sea Bass Reef)', dtype='|S29')

In [45]:
print(16065+113457)
print(16065+9385)

129522
25450
