# Uploading Eddy TK3 data

Loading Fendt TK2 output eddy data into metacatalog.
Compliance with the NetCDF standard is attempted by loading additional information into the details of each entry (see *data/TK3_to_NetCDF*).  

#### Note:
data/Fendt/ is not included on Github, to get the data, contact Mirko Mälicke, he stores it on his Dropbox (Feb. 2023)

In [1]:
UPLOAD = True
#CONNECTION = 'test_iso'
CONNECTION = 'postgresql://postgres:hiwiwork76@localhost:5432/metacatalog'

In [2]:
import pandas as pd
import numpy as np

from metacatalog import api, ext

In [3]:
session = api.connect_database(CONNECTION)
print('Using: %s' % session.bind)

Using: Engine(postgresql://postgres:***@localhost:5432/metacatalog)


In [4]:
# check if the IO extension is activate
try:
    print(ext.extension('io'))
except AttributeError:
    ext.activate_extension('io', 'metacatalog.ext.io', 'IOExtension')
    from metacatalog.ext.io import IOExtension
    ext.extension('io', IOExtension)

AttributeError: module 'metacatalog.ext' has no attribute 'activate_extension'

### Read data

Fendt Eddy data is available for the years 2014 - 2018.  
As each partial dataset ends at **T_end = '31.12.201X 23:30'** and the next dataset starts at **T_end = '01.01.201X 00:30'** we fill the missing tstamp **T_end = '01.01.201X 00:00'** with NaN values to keep a temporal timescale of 30 minutes.  
The same gap has to be filled in year 2018 at **T_end = '01.03.2018 00:00:00'**.

In [5]:
# read header file
header = pd.read_csv('data/Fendt/Fendt_TK2_result_header.csv')
df_colnames = list(header.columns)

# read data files, insert rows between yearly datasets, concat and drop the last column (dat csv lines ending with a comma)
dat2014 = pd.read_csv('data/Fendt/Fendt_TK2_result_2014.csv', header=None)
append2014 = dat2014.tail(1).copy()
append2014[[0,1,47]] = ['31.12.2014 23:30', '01.01.2015 00:00', '31.12.2014 23:45']
append2014[2:46] = np.nan
append2014[48:-1] = np.nan
dat2014 = pd.concat([dat2014, append2014], axis=0, ignore_index=True)

dat2015 = pd.read_csv('data/Fendt/Fendt_TK2_result_2015.csv', header=None)
append2015 = append2014.copy()
append2015[[0,1,47]] = ['31.12.2015 23:30', '01.01.2016 00:00', '31.12.2015 23:45']
dat2015 = pd.concat([dat2015, append2015], axis=0, ignore_index=True)

dat2016 = pd.read_csv('data/Fendt/Fendt_TK2_result_2016.csv', header=None)
append2016 = append2014.copy()
append2016[[0,1,47]] = ['31.12.2016 23:30', '01.01.2017 00:00', '31.12.2016 23:45']
dat2016 = pd.concat([dat2016, append2016], axis=0, ignore_index=True)

dat2017 = pd.read_csv('data/Fendt/Fendt_TK2_result_2017.csv', header=None)
append2017 = append2014.copy()
append2017[[0,1,47]] = ['31.12.2017 23:30', '01.01.2018 00:00', '31.12.2017 23:45']
dat2017 = pd.concat([dat2017, append2017], axis=0, ignore_index=True)

dat2018 = pd.read_csv('data/Fendt/Fendt_TK2_result_2018.csv', header=None)
insert2018 = append2014.copy()
insert2018[[0,1,47]] = ['28.02.2018 23:30', '01.03.2018 00:00', '28.02.2018 23:45']
insert2018.index = [2832]
dat2018 = pd.concat([dat2018.iloc[:2831], insert2018, dat2018.iloc[2831:]], axis=0).reset_index(drop=True)

dat2018 = dat2018.sort_index().reset_index(drop=True)


dat = pd.concat([dat2014, dat2015, dat2016, dat2017, dat2018], axis=0, ignore_index=True)

dat.drop(dat.columns[len(dat.columns)-1], axis=1, inplace=True)

# use the column names from the header file for the data
dat.columns = df_colnames

dat.head()

FileNotFoundError: [Errno 2] No such file or directory: 'data/Fendt/Fendt_TK2_result_header.csv'

### Create a dummy Person entry who acts as the owner of the Eddy data¶

In [6]:
author = api.find_person(session, organisation_abbrev='KIT', last_name='Mauder', return_iterator=True).first()

if author is None and UPLOAD:
    author = api.add_person(session, first_name='Matthias', last_name='Mauder',
                            organisation_name='Karlsruhe Institute of Technology (KIT)',
                            affiliation='Institute of Meteorology and Climate Research - Atmospheric Environmental Research (IMK-IFU), Campus Alpin',
                            organisation_abbrev='KIT'
                            #attribution=""
                           )

print(author)

Matthias Mauder <ID=1>


### Specify the location of the Eddy flux tower

source: https://www.icos-infrastruktur.de/en/icos-d/komponenten/oekosysteme/beobachtungsstandorte/fendt-c1/

In [7]:
location = (11.061000, 47.833000)

### Specify the license ID, which is used for each entry

In [8]:
license = api.find_license(session, id=6)[0]
# license = 6 # True value???

print(license.short_title)

CC BY 4.0


### Data column overview

In [9]:
df_TK = pd.read_excel('data/Fendt/Datenübersicht_Fendt_FastData.xlsx', sheet_name=3, usecols=[1,2,3], skiprows=4, header=None)
pd.set_option('display.max_rows', 61)
df_TK

FileNotFoundError: [Errno 2] No such file or directory: 'data/Fendt/Datenübersicht_Fendt_FastData.xlsx'

Find variables and units already available in metacatalog:

In [10]:
for var in api.find_variable(session):
    print(var)

air temperature [C] <ID=1>
soil temperature [C] <ID=2>
water temperature [C] <ID=3>
discharge [m3/s] <ID=4>
air pressure [10^2*Pa] <ID=5>
relative humidity [%] <ID=6>
daily rainfall sum [mm/d] <ID=7>
rainfall intensity [mm/h] <ID=8>
solar irradiance [W/m2] <ID=9>
net radiation [W/m2] <ID=10>
gravimetric water content [kg/kg] <ID=11>
volumetric water content [cm3/cm3] <ID=12>
precision [-] <ID=13>
sap flow [cm^3/cm^2h] <ID=14>
matric potential [MPa] <ID=15>
bulk electrical conductivity [mS/cm] <ID=16>
specific electrical conductivity [mS/cm] <ID=17>
river water level [m] <ID=18>
evapotranspiration [mm/d] <ID=19>
drainage [mm/d] <ID=20>


In [11]:
for unit in api.find_unit(session):
    print(unit)

second <ID=1>
meter <ID=2>
kilogram <ID=3>
ampere <ID=4>
kelvin <ID=5>
mole <ID=6>
candela <ID=7>
radian <ID=8>
degree <ID=9>
hertz <ID=10>
newton <ID=11>
pascal <ID=12>
joule <ID=13>
watt <ID=14>
coulomb <ID=15>
volt <ID=16>
farad <ID=17>
ohm <ID=18>
siemens <ID=19>
lux <ID=20>
relative <ID=21>
mass flux density per hour <ID=22>
hour <ID=23>
megapascal <ID=24>
millisiemens per centimeter <ID=25>
degree Celsius <ID=101>
milimeter <ID=102>
mm per day <ID=103>
hectopascal <ID=104>
mm per hour <ID=105>
mm per second <ID=106>
meter per second <ID=107>
cubicmeter per second <ID=108>
liter per second <ID=109>
degree <ID=110>
percent <ID=112>
cm3/cm3 <ID=113>
kg/kg <ID=114>
watt per sqauaremeter <ID=115>


### tstamp

We use T_end as a timestamp index.  
As we define the temporal scale, T_begin and T_mid can be calculated.

### Standard workflow to create an entry and add data to it:
1. Variable (find / add)
2. Entry (find / add)
3. Details (NetCDF convention: standard name, long name, ... & TK3 explanation)
4. Data: selection & cleaning (values < 9999 -> NaN)
5. Data: upload (create_datasource, datasource.create_scale, import_data)

## 1) Wind speed

#### 1.1) Variable (find / add)

In [12]:
variable = api.find_variable(session, name='3D-wind', return_iterator=True).first()

if variable is None and UPLOAD:
    variable = api.add_variable(session, name='3D-wind', symbol='uvw', unit=107, column_names=['u','v','w'])
    
print(variable)

3D-wind [m/s] <ID=10001>


#### 1.2) Entry (find / add)

In [13]:
entry = api.find_entry(session, title='Fendt dataset: 3-dimensional windspeed', return_iterator=True).first()

if entry is None and UPLOAD:
    entry = api.add_entry(session,
                          title='Fendt dataset: 3-dimensional windspeed',
                          abstract='3-dimensional windspeed data from the Fendt data set.',
                          location=location, 
                          variable=variable.id, 
                          comment='after double rotation',
                          license=license, 
                          author=author.id, 
                          embargo=True, 
                          is_partial=False)

print(entry)

<ID=1 Fendt dataset: 3-dim [3D-wind] >


#### 1.3) Details (NetCDF convention: standard name, long name, ... & TK3 explanation)

In [14]:
# extract TK3 explanation from df_TK
TK3_explanation = df_TK.iloc[[2,3,4],1].to_list()

if not entry.details and UPLOAD:
    details_dict = [
        {
        'key': 'standard_name',
        'value': ['x_wind', 'y_wind', 'upward_air_velocity'],
        'description': 'standard name according to CF Conventions'
        }, 
        {
        'key': 'long_name', 
        'value': ['longitudinal wind velocity', 'lateral wind velocity', 'vertical wind velocity'],
        'description': 'long name according to CF Conventions'
        },
        {
        'key': 'ancillary_variables', 
        'value': ['sigma2_u', 'sigma2_v', 'sigma2_w'],
        'description': 'ancillary variables in the Fendt dataset'
        },
        {
        'key': 'TK3_explanation', 
        'value': TK3_explanation,
        'description': 'TK3 variable explanation'
        }
    ]
    
    # add details to entry
    api.add_details_to_entries(session, entry, details_dict)

entry.details_dict()

NameError: name 'df_TK' is not defined

#### 1.4) Data: selection & cleaning (values < 9999)

In [15]:
# select data
data = dat.iloc[:, [1,2,3,4]].copy()

# data formatting
data.columns.values[0] = 'tstamp'
data['tstamp'] = pd.to_datetime(data.loc[:,'tstamp'], format='%d.%m.%Y %H:%M')
data.set_index('tstamp', inplace=True)

# replace values < -9999 with NaN
data = data.mask(data < -9999)

data.head(2)

NameError: name 'dat' is not defined

#### 1.5) Data: upload (create_datasource, datasource.create_scale, import_data)

In [16]:
if UPLOAD and not entry.datasource:
    entry.create_datasource(type=1, path='timeseries', datatype='timeseries')
    
    entry.datasource.create_scale(
        resolution='30min', 
        extent=(data.index[0], data.index[-1]), 
        support=1.0, # not sure 
        scale_dimension='temporal'
    )
    
    entry.import_data(data) 

NameError: name 'data' is not defined

In [17]:
edat = entry.get_data()
edat.head(2)

ProgrammingError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405)

## 2) Wind direction

#### 2.1) Variable (find / add)

In [18]:
variable = api.find_variable(session, name='wind direction', return_iterator=True).first()

if variable is None and UPLOAD:
    variable = api.add_variable(session, name='wind direction', symbol='dir', unit=9, column_names=['wind_direction'])
    
print(variable)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

#### 2.2) Entry (find / add)

In [19]:
entry = api.find_entry(session, title='Fendt dataset: wind direction', return_iterator=True).first()

if entry is None and UPLOAD:
    entry = api.add_entry(session,
                          title='Fendt dataset: wind direction',
                          abstract='Wind direction data from the Fendt data set.',
                          location=location, 
                          variable=variable.id,
                          license=license, 
                          author=author.id, 
                          embargo=True, 
                          is_partial=False)

print(entry)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

#### 2.3) Details (NetCDF convention: standard name, long name, ... & TK3 explanation)

In [20]:
# extract TK3 explanation from df_TK
TK3_explanation = df_TK.iloc[[35],1].to_list()

if not entry.details and UPLOAD:
    details_dict = [
        {
        'key': 'standard_name',
        'value': 'wind_from_direction',
        'description': 'standard name according to CF Conventions'
        }, 
        {
        'key': 'long_name', 
        'value': 'wind direction',
        'description': 'long name according to CF Conventions'
        },
        {
        'key': 'TK3_explanation', 
        'value': TK3_explanation,
        'description': 'TK3 variable explanation'
        }
    ]
    
    # add details to entry
    api.add_details_to_entries(session, entry, details_dict)

entry.details_dict()

NameError: name 'df_TK' is not defined

#### 2.4) Data: selection & cleaning (values < 9999)

In [21]:
# select data
data = dat.iloc[:, [1,35]].copy()

# data formatting
data.columns.values[0] = 'tstamp'
data['tstamp'] = pd.to_datetime(data.loc[:,'tstamp'], format='%d.%m.%Y %H:%M')
data.set_index('tstamp', inplace=True)

# replace values < -9999 with NaN
data = data.mask(data < -9999)

data.head(2)

NameError: name 'dat' is not defined

In [22]:
data = dat.iloc[:, [1,35]].copy()

# data formatting
data.columns.values[0] = 'tstamp'
data['tstamp'] = pd.to_datetime(data.loc[:,'tstamp'], format='%d.%m.%Y %H:%M')
data.set_index('tstamp', inplace=True)
data

NameError: name 'dat' is not defined

#### 2.5) Data: upload (create_datasource, datasource.create_scale, import_data)

In [23]:
if UPLOAD and not entry.datasource:
    entry.create_datasource(type=1, path='timeseries', datatype='timeseries')

    entry.datasource.create_scale(
        resolution='30min', 
        extent=(data.index[0], data.index[-1]), 
        support=1.0, # not sure 
        scale_dimension='temporal'
    )
    
    entry.import_data(data)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

In [24]:
edat = entry.get_data()
edat.head(2)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

## 3) Temperature

#### 3.1) Variable (find / add)

In [25]:
variable = api.find_variable(session, name='air temperature', return_iterator=True).first()

print(variable)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

#### 3.2) Entry (find / add)

In [26]:
entry = api.find_entry(session, title='Fendt dataset: air temperature', return_iterator=True).first()

if entry is None and UPLOAD:
    entry = api.add_entry(session,
                          title='Fendt dataset: air temperature',
                          abstract='Air temperature data from the Fendt data set.',
                          location=location, 
                          variable=variable.id,
                          license=license, 
                          author=author.id, 
                          embargo=True, 
                          is_partial=False)

print(entry)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

#### 3.3) Details (NetCDF convention: standard name, long name, ... & TK3 explanation)

In [27]:
# extract TK3 explanation from df_TK
TK3_explanation = df_TK.iloc[[5,6,9],1].to_list()

if not entry.details and UPLOAD:
    details_dict = [
        {
        'key': 'standard_name',
        'value': ['', '', 'air_temperature'],
        'description': 'standard name according to CF Conventions'
        }, 
        {
        'key': 'long_name', 
        'value': ['sonic temperature', '', 'reference air temperature'],
        'description': 'long name according to CF Conventions'
        },
        {
        'key': 'ancillary_variables', 
        'value': ['sigma2_Tsonic', '', ''],
        'description': 'ancillary variables in the Fendt dataset'
        },
        {
        'key': 'TK3_explanation', 
        'value': TK3_explanation,
        'description': 'TK3 variable explanation'
        }
    ]
    
    # add details to entry
    api.add_details_to_entries(session, entry, details_dict)

entry.details_dict()

NameError: name 'df_TK' is not defined

#### 3.4) Data: selection & cleaning (values < 9999)

In [28]:
# select data
data = dat.iloc[:, [1,5,6,9]].copy()

# data formatting
data.columns = ['tstamp', 'sonic_temperature', 'fast_response_temperature_probe', 'reference_temperature']
data['tstamp'] = pd.to_datetime(data.loc[:,'tstamp'], format='%d.%m.%Y %H:%M')
data.set_index('tstamp', inplace=True)

# replace values < -9999 with NaN
data = data.mask(data < -9999)

data.head(2)

NameError: name 'dat' is not defined

#### 3.5) Data: upload (create_datasource, datasource.create_scale, import_data)

In [29]:
if UPLOAD and not entry.datasource:
    entry.create_datasource(type=1, path='timeseries', datatype='timeseries')

    entry.datasource.create_scale(
        resolution='30min', 
        extent=(data.index[0], data.index[-1]), 
        support=1.0, # not sure 
        scale_dimension='temporal'
    )
    
    entry.import_data(data, force_data_names=True) # use column names in data instead of variable.column_names 

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

In [30]:
edat = entry.get_data()
edat.head(2)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

## 4) Humidity

#### 4.1) Variable (find / add)

In [31]:
variable = api.find_variable(session, name='absolute humidity', return_iterator=True).first()

if variable is None and UPLOAD:
    unit = api.add_unit(session, name='g/cm3', symbol='g/cm3')
    variable = api.add_variable(session, name='absolute humidity', symbol='a', unit=unit.id, column_names=['a'])
    
print(variable)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

#### 4.2) Entry (find / add)

In [32]:
entry = api.find_entry(session, title='Fendt dataset: absolute humidity', return_iterator=True).first()

if entry is None and UPLOAD:
    entry = api.add_entry(session,
                          title='Fendt dataset: absolute humidity',
                          abstract='Absolute humidity data from the Fendt data set.',
                          location=location, 
                          variable=variable.id,
                          license=license, 
                          author=author.id, 
                          embargo=True, 
                          is_partial=False)

print(entry)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

#### 4.3) Details (NetCDF convention: standard name, long name, ... & TK3 explanation)

In [33]:
# extract TK3 explanation from df_TK
TK3_explanation = df_TK.iloc[[7,10],1].to_list()

if not entry.details and UPLOAD:
    details_dict = [
        {
        'key': 'standard_name',
        'value': ['mass_concentration_of_water_vapor_in_air', 'mass_concentration_of_water_vapor_in_air'],
        'description': 'standard name according to CF Conventions'
        }, 
        {
        'key': 'long_name', 
        'value': ['absolute humidity from fast-response sensor', 'reference absolute humidity'],
        'description': 'long name according to CF Conventions'
        },
        {
        'key': 'ancillary_variables', 
        'value': ['sigma2_a', ''],
        'description': 'ancillary variables in the Fendt dataset'
        },
        {
        'key': 'TK3_explanation', 
        'value': TK3_explanation,
        'description': 'TK3 variable explanation'
        }
    ]

    # add details to entry
    api.add_details_to_entries(session, entry, details_dict)

entry.details_dict()

NameError: name 'df_TK' is not defined

#### 4.4) Data: selection & cleaning (values < 9999)

In [34]:
# select data
data = dat.iloc[:, [1,7,10]].copy()

# data formatting
data.columns = ['tstamp', 'absolute_humidity', 'reference_absolute_humidity']
data['tstamp'] = pd.to_datetime(data.loc[:,'tstamp'], format='%d.%m.%Y %H:%M')
data.set_index('tstamp', inplace=True)

# replace values < -9999 with NaN
data = data.mask(data < -9999)

data.head(2)

NameError: name 'dat' is not defined

#### 4.5) Data: upload (create_datasource, datasource.create_scale, import_data)

In [35]:
if UPLOAD and not entry.datasource:
    entry.create_datasource(type=1, path='timeseries', datatype='timeseries')

    entry.datasource.create_scale(
        resolution='30min', 
        extent=(data.index[0], data.index[-1]), 
        support=1.0, # not sure 
        scale_dimension='temporal'
    )
    
    entry.import_data(data, force_data_names=True) # use column names in data instead of variable.column_names 

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

In [36]:
edat = entry.get_data()
edat.head(2)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

## 5) Carbon dioxide

#### 5.1) Variable (find / add)

In [37]:
variable = api.find_variable(session, name='CO2 concentration', return_iterator=True).first()

if variable is None and UPLOAD:
    unit = api.add_unit(session, name='mmol/m3', symbol='mmol/m3')
    variable = api.add_variable(session, name='CO2 concentration', symbol='CO2', unit=unit.id, column_names=['co2_concentration'])
    
print(variable)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

#### 5.2) Entry (find / add)

In [38]:
entry = api.find_entry(session, title='Fendt dataset: carbon dioxide', return_iterator=True).first()

if entry is None and UPLOAD:
    entry = api.add_entry(session,
                          title='Fendt dataset: carbon dioxide',
                          abstract='Carbon dioxide concentration data from the Fendt data set.',
                          location=location, 
                          variable=variable.id,
                          license=license, 
                          author=author.id, 
                          embargo=True, 
                          is_partial=False)

print(entry)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

#### 5.3) Details (NetCDF convention: standard name, long name, ... & TK3 explanation)

In [39]:
# extract TK3 explanation from df_TK
TK3_explanation = df_TK.iloc[[8],1].to_list()

if not entry.details and UPLOAD:
    details_dict = [
        {
        'key': 'standard_name',
        'value': ['mole_concentration_of_carbon_dioxide_in_air'],
        'description': 'standard name according to CF Conventions'
        }, 
        {
        'key': 'long_name', 
        'value': ['CO2 molar density from fast-response sensor'],
        'description': 'long name according to CF Conventions'
        },
        {
        'key': 'ancillary_variables', 
        'value': ['sigma2_co2'],
        'description': 'ancillary variables in the Fendt dataset'
        },
        {
        'key': 'TK3_explanation', 
        'value': TK3_explanation,
        'description': 'TK3 variable explanation'
        }
    ]
    
    # add details to entry
    api.add_details_to_entries(session, entry, details_dict)

entry.details_dict()

NameError: name 'df_TK' is not defined

#### 5.4) Data: selection & cleaning (values < 9999)

In [40]:
# select data
data = dat.iloc[:, [1,8]].copy()

# data formatting
data.columns.values[0] = 'tstamp'
data['tstamp'] = pd.to_datetime(data.loc[:,'tstamp'], format='%d.%m.%Y %H:%M')
data.set_index('tstamp', inplace=True)

# replace values < -9999 with NaN
data = data.mask(data < -9999)

data.head(2)

NameError: name 'dat' is not defined

#### 5.5) Data: upload (create_datasource, datasource.create_scale, import_data)

In [41]:
if UPLOAD and not entry.datasource:
    entry.create_datasource(type=1, path='timeseries', datatype='timeseries')

    entry.datasource.create_scale(
        resolution='30min', 
        extent=(data.index[0], data.index[-1]), 
        support=1.0, # not sure 
        scale_dimension='temporal'
    )
    
    entry.import_data(data)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

In [42]:
edat = entry.get_data()
edat.head(2)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

## 6) Air pressure

#### 6.1) Variable (find / add)

In [43]:
variable = api.find_variable(session, name='air pressure', return_iterator=True).first()

print(variable)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

#### 6.2) Entry (find / add)

In [44]:
entry = api.find_entry(session, title='Fendt dataset: air pressure', return_iterator=True).first()

if entry is None and UPLOAD:
    entry = api.add_entry(session,
                          title='Fendt dataset: air pressure',
                          abstract='Air pressure data from the Fendt data set.',
                          location=location, 
                          variable=variable.id,
                          license=license, 
                          author=author.id, 
                          embargo=True, 
                          is_partial=False)

print(entry)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

#### 6.3) Details (NetCDF convention: standard name, long name, ... & TK3 explanation)

In [45]:
# extract TK3 explanation from df_TK
TK3_explanation = df_TK.iloc[[11],1].to_list()

if not entry.details and UPLOAD:
    details_dict = [
        {
        'key': 'standard_name',
        'value': ['air_pressure'],
        'description': 'standard name according to CF Conventions'
        }, 
        {
        'key': 'long_name', 
        'value': ['reference air pressure'],
        'description': 'long name according to CF Conventions'
        },
        {
        'key': 'TK3_explanation', 
        'value': TK3_explanation,
        'description': 'TK3 variable explanation'
        }
    ]
    
    # add details to entry
    api.add_details_to_entries(session, entry, details_dict)

entry.details_dict()

NameError: name 'df_TK' is not defined

#### 6.4) Data: selection & cleaning (values < 9999)

In [46]:
# select data
data = dat.iloc[:, [1,11]].copy()

# data formatting
data.columns.values[0] = 'tstamp'
data['tstamp'] = pd.to_datetime(data.loc[:,'tstamp'], format='%d.%m.%Y %H:%M')
data.set_index('tstamp', inplace=True)

# replace values < -9999 with NaN
data = data.mask(data < -9999)

data.head(2)

NameError: name 'dat' is not defined

#### 6.5) Data: upload (create_datasource, datasource.create_scale, import_data)

In [47]:
if UPLOAD and not entry.datasource:
    entry.create_datasource(type=1, path='timeseries', datatype='timeseries')

    entry.datasource.create_scale(
        resolution='30min', 
        extent=(data.index[0], data.index[-1]), 
        support=1.0, # not sure 
        scale_dimension='temporal'
    )
    
    entry.import_data(data)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

In [48]:
edat = entry.get_data()
edat.head(2)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

## 7) Eddy Dataset

Create a (dummy variable) with a dummy unit for the remaining, very eddy-specific data.  
Units for each column are named in the column names.

#### 7.1) Variable (find / add)

In [49]:
variable = api.find_variable(session, name='Eddy Covariance', return_iterator=True).first()

if variable is None and UPLOAD:
    unit = api.find_unit(session, name='dimensionless', return_iterator=True).first()
    if unit is None:
        unit = api.add_unit(session, name='dimensionless', symbol='-')
    variable = api.add_variable(session, name='Eddy Covariance', symbol='E', unit=unit.id, column_names=[''])
    
print(variable)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

#### 7.2) Entry (find / add)

In [50]:
entry = api.find_entry(session, title='Fendt dataset: Eddy covariance data', return_iterator=True).first()

if entry is None and UPLOAD:
    entry = api.add_entry(session,
                          title='Fendt dataset: Eddy covariance data',
                          abstract='Eddy data and ancillary variables from the Fendt data set.',
                          location=location, 
                          variable=variable.id,
                          license=license.id, 
                          author=author.id, 
                          embargo=True, 
                          is_partial=True)

print(entry)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

#### 7.3) Details (NetCDF convention: standard name, long name, ...)

In [51]:
# extract TK3 explanation from df_TK
TK3_explanation = df_TK.iloc[[*list(range(12, 35)), *list(range(36, 47)), *list(range(48, 61))],1].to_list()
print(TK3_explanation)

if not entry.details and UPLOAD:
    details_dict = [
        {
        'key': 'standard_name',
        'value': ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'upward_sensible_heat_flux_in_air', '', 'upward_latent_heat_flux_in_air', '', '', 'quality_flag', 'quality_flag', 'quality_flag', 'quality_flag', 'quality_flag', '', '', '', '', '', '', '', '', '', '', '', '', ''],
        'description': 'standard name according to CF Conventions'
        }, 
        {
        'key': 'long_name', 
        'value': ['variance of longitudinal wind velocity', 'variance of lateral wind velocity', 'variance of vertical wind velocity', 'variance of sonic temperature', '', 'variance of absolute humidity from fast-response sensor', 'variance of CO2 molar density from fast-response sensor', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'number of values from sonic anemometer', 'friction velocity', 'sensible heat flux', '', 'latent heat flux', '', 'Monin Obukhov stability parameter', 'friction velocity quality flag', 'sensible heat flux quality flag', '', 'latent heat flux quality flag', 'net ecosystem exchange quality flag', '', 'net ecosystem exchange of CO2', 'flux footprint contribution from target landuse 1', 'flux footprint contribution from target landuse 2', 'flux footprint maximum distance', 'relative random error of friction velocity', 'relative random error of sensible heat flux', 'relative random error of latent heat flux', 'relative random error of net ecosystem exchange', '', '', '', ''],
        'description': 'long name according to CF Conventions'
        },
        {
        'key': 'ancillary_variables',
        'value': ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ['u_star_qc_flag', 'u_star_rel_random_error'], ['H_qc_flag', 'H_rel_random_error'], '', ['LE_qc_flag', 'LE_rel_random_error'], '', '', '', '', '', '', '', '', ['NEE_qc_flag', 'NEE_rel_random_error'], '', '', '', '', '', '', '', '', '', '', ''],
        'description': 'ancillary variables in the Fendt dataset'
        },
        {
        'key': 'TK3_explanation', 
        'value': TK3_explanation,
        'description': 'TK3 variable explanation'
        }
    ]
    
    # add details to entry
    api.add_details_to_entries(session, entry, details_dict)

entry.details_dict()

NameError: name 'df_TK' is not defined

# units:

### units in (unedited) column names --> sufficient?

#### 7.4) Data: selection & cleaning (values < 9999)

In [52]:
# select data
data = dat.iloc[:, [1, *list(range(12, 35)), *list(range(36, 47)), *list(range(48, 61))]].copy()

# data formatting
data.columns.values[0] = 'tstamp'
data['tstamp'] = pd.to_datetime(data.loc[:,'tstamp'], format='%d.%m.%Y %H:%M')
data.set_index('tstamp', inplace=True)

# replace values < -9999 with NaN
data = data.mask(data < -9999)

pd.set_option('display.max_columns', None)
data.head(2)

NameError: name 'dat' is not defined

#### 7.5) Data: upload (create_datasource, datasource.create_scale, import_data)

In [53]:
if UPLOAD and not entry.datasource:
    entry.create_datasource(type=1, path='timeseries', datatype='timeseries')

    entry.datasource.create_scale(
        resolution='30min', 
        extent=(data.index[0], data.index[-1]), 
        support=1.0, # not sure 
        scale_dimension='temporal'
    )
    
    entry.import_data(data, force_data_names=True)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

In [54]:
edat = entry.get_data()
edat.head(2)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

## Create the Eddy EntryGroup

In [55]:
entries = api.find_entry(session, title='Fendt dataset:*', include_partial=True)
group = api.find_group(session, title='Fendt dataset: Eddy covariance data')

if UPLOAD and not group:
    group = api.add_group(session, 'Composite', entry_ids=[e.id for e in entries],
                          title='Fendt dataset: Eddy covariance data',
                          description='The Fendt dataset contains eddy covariance data. The eddy data entry is partial, other entries can be exported as stand-alone data sets of different environmental variables.'
                         )

print(group)

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

In [56]:
result = api.find_entry(session, title='*3-dimensional windspeed*', as_result=True)[0]
result.get_data()

PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UndefinedColumn) FEHLER:  Spalte »variable_names« von Relation »datasources« existiert nicht
LINE 1: ...type_id, datatype_id, encoding, path, data_names, variable_n...
                                                             ^

[SQL: INSERT INTO datasources (type_id, datatype_id, encoding, path, data_names, variable_names, args, temporal_scale_id, spatial_scale_id, creation, "lastUpdate") VALUES (%(type_id)s, %(datatype_id)s, %(encoding)s, %(path)s, %(data_names)s::VARCHAR(128)[], %(variable_names)s::VARCHAR(128)[], %(args)s, %(temporal_scale_id)s, %(spatial_scale_id)s, %(creation)s, %(lastUpdate)s) RETURNING datasources.id]
[parameters: {'type_id': 1, 'datatype_id': 14, 'encoding': 'utf-8', 'path': 'timeseries', 'data_names': None, 'variable_names': None, 'args': '{}', 'temporal_scale_id': None, 'spatial_scale_id': None, 'creation': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298), 'lastUpdate': datetime.datetime(2024, 2, 12, 15, 43, 16, 307298)}]
(Background on this error at: https://sqlalche.me/e/14/f405) (Background on this error at: https://sqlalche.me/e/14/7s2a)

Cool!