In [1]:
import numpy as np
import pandas as pd
import h5py
from pprint import pprint

from pycmlh5.pycmlh5.metadata_def_parser import read_and_check_metadata

For now I use the abbreviation **CML** for commercial microwave link. Hence the preliminary name of the file format is **`cmlh5`**, since it is just a definition of the structure of a HDF5 file. I have chosen the abbreviatoin CML over, e.g. MWL, because it is easier to pronounnce and seperates a bit more from all other MW related stuff. I would be happy to get feedback, not only on the file format, but also on the naming.

# Example structure of a cmlh5 file 

```

/                               RootGroup
/cml_1                          Group for first CML
/cml_1/channel_1 		       Group for first channel
/cml_1/channel_1/rx		     Array of RSL values in dBm 
/cml_1/channel_1/tx		     Array of TSL values in dBm
/cml_1/channel_1/time		   Array of timestamps in POSIX time

/cml_1/channel_2 		       Group for second CML channel_2 /cml_1/channel_2/rx 	
/cml_1/channel_2/tx		
/cml_1/channel_2/time


/cml_2 				         Group for second CML
/cml_2/channel_1 		
/cml_2/channel_1/rx		
/cml_2/channel_1/tx		
/cml_2/channel_1/time		

```


# Overview of the metadata definition for each level
The metadata definitions are stored in CSV files so that they can be easily used as the basis for parsers in other languages

## Metadata at the root level

In [2]:
pd.read_csv('definitions/metadata_def_root_level.csv', delimiter=',', index_col=0)

Unnamed: 0_level_0,Units,Type,Mandatory,Description
Metadata name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
file_format,-,string,True,This must always be set to ‘CMLh5’
file_format_version,-,string,True,"examples: ‘0.1, ‘1.2’, ..."
author_name,-,string,False,
author_email,-,string,False,


## Metadata  at the CML level

In [3]:
pd.read_csv('definitions/metadata_def_cml_level.csv', delimiter=',', index_col=0)

Unnamed: 0_level_0,Units,Type,Mandatory,Description
Metadata name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
site_a_latitude,Decimal degrees,float16,True,
site_a_longitude,Decimal degrees,float16,True,
site_a_altitude,Meter,float16,False,
site_a_antenna_above_ground,Meter,float16,False,
site_a_id,-,string,False,
site_b_latitude,Decimal degrees,float16,True,
site_b_longitude,Decimal degrees,float16,True,
site_b_altitude,Meter,float16,False,
site_b_antenna_above_ground,Meter,float16,False,
site_b_id,-,string,False,


## Metadata at the channel level

In [4]:
pd.read_csv('definitions/metadata_def_channel_level.csv', delimiter=',', index_col=0)

Unnamed: 0_level_0,Units/Values,Type,Mandatory,Description
Metadata name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
frequency,GHz,float,True,
polarization,"[‘V’, ‘H’, ‘v’, ‘h’]",string,True,
tx_site,"[‘site_a’, ‘site_b’]",string,False,
rx_site,-,string,False,
channel_id,-,string,True,
atpc,"[‘on’, ‘off’]",string,True,
tx_quantization,dBm,float,False,
tx_quantization_type,"[‘rounded’, ‘truncated’]",string,False,
rx_quantization,dBm,float,False,
rx_quantization_type,"[‘rounded’, ‘truncated’]",string,False,


## Metadata at the array level

In [5]:
pd.read_csv('definitions/metadata_def_array_level.csv', delimiter=',', index_col=0)

Unnamed: 0_level_0,units/values,type,Mandatory,description
metadata name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
quantity,e.g. ‘power’,string,True,
unit,e.g. ‘dBm’,string,True,
side?,"[‘transmitter’, ‘receiver’]",string,False,
sampling?,"[‘min’, ‘max’, ‘instant’, ‘mean’]",string,False,
array_id,-,string,False,


# Check cmlh5 files
Right now, this only checks the metadata at the root-, cml- and channel level. Further checks, e.g. for array-level, timestamp format, ..., will follow.

## Check a valid cmlh5 file
This one was provided by Martin

In [6]:
fn = 'example_data/cml_martin2.h5'

cml_metadata_list, error_list = read_and_check_metadata(fn)

In [7]:
pprint(error_list)

[u"/cml_1: Metadata `site_a_latitude` is `50.1388` with type `<type 'numpy.float32'>` which should be float16",
 u"/cml_1: Metadata `site_b_longitude` is `14.5089` with type `<type 'numpy.float32'>` which should be float16",
 u"/cml_1: Metadata `length` is `0.899` with type `<type 'numpy.float32'>` which should be float16",
 u"/cml_1: Metadata `site_b_latitude` is `50.1308` with type `<type 'numpy.float32'>` which should be float16",
 u"/cml_1: Metadata `site_a_longitude` is `14.5098` with type `<type 'numpy.float32'>` which should be float16",
 u"/cml_1/channel_1: Metadata `rx_quantization` is `NA` with type `<type 'numpy.string_'>`, but it should be some kind of float",
 u"/cml_1/channel_1: Metadata `frequency` is `38.5` with type `<type 'numpy.float32'>` which should be float",
 u"/cml_1/channel_1: Metadata `tx_quantization` is `1.0` with type `<type 'numpy.float32'>` which should be float",
 u'/cml_1/channel_1: Metadata type_str `nan` for `additional_info` is not supported']


In [8]:
pprint(cml_metadata_list)

[{u'cml_1': {u'channel_1': {'metadata': {'additional_info': 'This is a virtual channel of a virtual link generated for testing the CMLh5 data format',
                                         'atpc': 'on',
                                         'channel_id': '14850_10500',
                                         'frequency': 38.5,
                                         'polarization': 'V',
                                         'rx_quantization': 'NA',
                                         'rx_site': 'site_b',
                                         'tx_quantization': 1.0,
                                         'tx_site': 'site_a'}},
             'metadata': {'cml_id': '14050_10500',
                          'length': 0.89899999,
                          'site_a_latitude': 50.13884,
                          'site_a_longitude': 14.50976,
                          'site_b_latitude': 50.130772,
                          'site_b_longitude': 14.5089}}}]


## Check a file which has some missing metadata and wrong metadata types


The metadata will be parsed, even though there are errors. This behavior could of course be changed, e.g. so that the check simply aborts at the first error.

In [9]:
fn = 'example_data/invalid.h5'

cml_metadata_list, error_list = read_and_check_metadata(fn, strict_type_check=True)

In [10]:
pprint(error_list[0:20])

[u"/cml_0: Metadata `site_a_latitude` is `47.93` with type `<type 'numpy.float64'>` which should be float16",
 u"/cml_0: Metadata `site_b_longitude` is `11.29` with type `<type 'numpy.float64'>` which should be float16",
 u"/cml_0: Metadata `length` is `7.63541519629` with type `<type 'numpy.float64'>` which should be float16",
 u"/cml_0: Metadata `site_b_latitude` is `47.99` with type `<type 'numpy.float64'>` which should be float16",
 u'/cml_0: Mandatory metadata `cml_id` is missing',
 u"/cml_0: Metadata `site_a_longitude` is `11.34` with type `<type 'numpy.float64'>` which should be float16",
 u'/cml_0/channel_1: Mandatory metadata `channel_id` is missing',
 u'/cml_0/channel_1: Mandatory metadata `polarization` is missing',
 u"/cml_0/channel_1: Metadata `frequency` is `18.085` with type `<type 'numpy.float64'>` which should be float",
 u'/cml_0/channel_1: Mandatory metadata `atpc` is missing',
 u'/cml_0/channel_2: Mandatory metadata `channel_id` is missing',
 u'/cml_0/channel_2: Man

In [11]:
pprint(cml_metadata_list[0:2])

[{u'cml_0': {u'channel_1': {'metadata': {'frequency': 18.085000000000001,
                                         'sampling_type': 'instantaneous',
                                         'temporal_resolution': '1min'}},
             u'channel_2': {'metadata': {'frequency': 19.094999999999999,
                                         'sampling_type': 'instantaneous',
                                         'temporal_resolution': '1min'}},
             'metadata': {'length': 7.6354151962942574,
                          'site_a_latitude': 47.93,
                          'site_a_longitude': 11.34,
                          'site_b_latitude': 47.990000000000002,
                          'site_b_longitude': 11.289999999999999,
                          'system_manufacturer': 'Ericsson',
                          'system_model': 'MINI LINK Traffic Node'}}},
 {u'cml_1': {u'channel_1': {'metadata': {'frequency': 25.920999999999999,
                                         'sampling_type'

## Let's be less strict with the type checking
e.g. we may not care whether something is a float64 instead of float32

In [12]:
fn = 'example_data/invalid.h5'

cml_metadata_list, error_list = read_and_check_metadata(fn, strict_type_check=False)

In [13]:
pprint(error_list[0:20])

[u'/cml_0: Mandatory metadata `cml_id` is missing',
 u'/cml_0/channel_1: Mandatory metadata `channel_id` is missing',
 u'/cml_0/channel_1: Mandatory metadata `polarization` is missing',
 u'/cml_0/channel_1: Mandatory metadata `atpc` is missing',
 u'/cml_0/channel_2: Mandatory metadata `channel_id` is missing',
 u'/cml_0/channel_2: Mandatory metadata `polarization` is missing',
 u'/cml_0/channel_2: Mandatory metadata `atpc` is missing',
 u'/cml_1: Mandatory metadata `cml_id` is missing',
 u'/cml_1/channel_1: Mandatory metadata `channel_id` is missing',
 u'/cml_1/channel_1: Mandatory metadata `polarization` is missing',
 u'/cml_1/channel_1: Mandatory metadata `atpc` is missing',
 u'/cml_1/channel_2: Mandatory metadata `channel_id` is missing',
 u'/cml_1/channel_2: Mandatory metadata `polarization` is missing',
 u'/cml_1/channel_2: Mandatory metadata `atpc` is missing',
 u'/cml_2: Mandatory metadata `cml_id` is missing',
 u'/cml_2/channel_1: Mandatory metadata `channel_id` is missing',
 u