In [1]:
# This is a temporary solution, until fixes for https://github.com/G-Node/python-odml/issues/289 are released
!pip install git+https://github.com/g-node/python-odml.git@master --upgrade --quiet

# odMLtables scenarios

This tutorial is an implementation of the scenarios described in *Sprenger et al (in prep.) odMLtables: A user-friendly approach for managing metadata of neurophysiological experiments* The scenarios present a simple, but realistic use case of odML and odMLtables in an experimental lab and are a good start to start using odML and odMLtables. Modification of this jupyter notebook is highly encuraged and can serve as a starting point for your own metadata workflow. For a detailed description of the individual scenarios, see *Sprenger et al. (in prep)*.

To execute the steps of the tutorial, press *Ctrl + Enter* in the cell you want to execute.

### Scenario 1: How to generate a metadata template without programming language

This scenario describes how a template structure for daily data collection can be set up. The example used here is the measurement of basic attributes of a mouse. The measures collected on a single day can be listed in a table as shown below, where *'YYYY-MM-DD'* specifies the measurement date.

| Date       | Measure             | Value    | Unit | Type   |
|------------|---------------------|----------|------|--------|
| YYYY-MM-DD | Weight              |          | g    | float  |
|            | Water Intake        |          | ml   | float  |
|            | Breathing Frequency |          | bpm  | float  |
|            | Measured by         | John Doe |      | string |
|            | Comment             |          |      | string |

This table can be generated using any spreadsheet software. Possible formats supported by odMLtables are *.xls* and *.csv*. A possible implementation using Microsoft Excel or LibreOffice Calc can include color coding to aid visual inspection and might look like this

![Screenshot%20from%202018-11-05%2013-47-44.png](attachment:Screenshot%20from%202018-11-05%2013-47-44.png)

 For simplicity, we generate a *.csv* file with the same content using Python here.

In [2]:
# string representation of the score sheet in csv format
score_sheet = \
"""Date,Measure,Value,Unit,Type
YYYY-MM-DD,Weight,,g,float
,Water Intake,,ml,float
,Breathing Frequency,,bpm,float
,Measured by, John Doe,,string
,Comment,,,string
"""
# write the string representation to disk
with open('score_sheet.csv', 'w+') as f:
    f.write(score_sheet)

This [metadata template](score_sheet.csv) in *.csv* format can be converted to an odML file using odMLtables:

In [3]:
import odmltables as odt

def score_sheet_to_odml(csv_file):
    """Convert a score sheet from csv to odML format."""
    
    # initialize an OdmlTable object for handling metadata
    table = odt.OdmlTable()
    # specify experiment specific headers used in the score sheet csv files (Date, Measure, Unit and Type)
    table.change_header(Path=1, PropertyName=2, Value=3, DataUnit=4, odmlDatatype=5)
    table.change_header_titles(Path='Date',PropertyName='Measure', DataUnit='Unit', odmlDatatype='Type')

    # load from csv format and save in odML format
    table.load_from_csv_table(csv_file)
    table.write2odml(csv_file[:-4] + '.odml')
    
# convert the score sheet to odml format
score_sheet_to_odml('score_sheet.csv')

The resulting [odML file](score_sheet.odml) can be visualized in the browser using the `odml.xls` style sheet. When working locally on your computer, you can generate this visualization by opening the odML file in your browser while having the style sheet located in the same folder.

In [4]:
# This is utility code for displaying the odML file as html representation here.
# You can also just open the odML file in your browser having the style sheet in the same location as your odML file and
# will get the same result
from IPython.display import display, HTML
import lxml.etree as ET

def display_odML_as_html(odML_file, xsl_file='odml.xsl'):
    # generate html representation from odML file and style sheet
    dom = ET.parse(odML_file)
    xslt = ET.parse(xsl_file)
    transform = ET.XSLT(xslt)
    newdom = transform(dom)
    
    # display html
    display(HTML(ET.tostring(newdom, pretty_print=True).decode()))
    
display_odML_as_html('score_sheet.odml')

### Scenario 2: Collecting daily observations in a common odML structure

The template structure defined in `scenario 1` can now be copied for each measurement day and filled. The filled files will then be converted to odML and incorporated in a single odML file containing the complete metadata collected for an animal.

Here again we generate a filled metadata sheet in csv format using Python. In a real case this step would be performed using any spreadsheet software.

In [5]:
# string representation of the score sheet in csv format
score_sheet1 = \
"""Date,Measure,Value,Unit,Type
2000-01-01,Weight,20.3,g,float
,Water Intake,5.21,ml,float
,Breathing Frequency,323,bpm,float
,Measured by, John Doe,,string
,Comment,Blood sample taken,,string
"""
# write the string representation to disk
with open('score_sheet_day1.csv', 'w+') as f:
    f.write(score_sheet1)

Since this is the only [set of measurements](score_sheet_day1.odml) availabe yet, in the next step we only need to convert it into the odML format as we did with the score sheet template in `scenario 1` and rename it because it is already the [complete set of measurements](score_sheet_complete.odml) at this point.

In [6]:
# convert the score sheet to odml format
score_sheet_to_odml('score_sheet_day1.csv')

# rename file, because this is the complete score sheet for now
import os
os.rename('score_sheet_day1.odml', 'score_sheet_complete.odml')

In the next step we aquire a [second set of measurements](score_sheet_day2.csv), recorded on day 2. We directly convert the generated `csv` file into the [odML format](score_sheet_day2.odml).

In [7]:
# string representation of the score sheet in csv format
score_sheet2 = \
"""Date,Measure,Value,Unit,Type
2000-01-02,Weight,23.5,g,float
,Water Intake,6.89,ml,float
,Breathing Frequency,309,bpm,float
,Measured by, John Doe,,string
,Comment,small scratch at the right ear,,string
"""
# write the string representation to disk
with open('score_sheet_day2.csv', 'w+') as f:
    f.write(score_sheet2)
# convert the score sheet to odml format
score_sheet_to_odml('score_sheet_day2.csv')

Now, we have two odML files for two subsequent recording days. To merge these into a single odML structure, we use the `merge` functionality provided by odMLtables. We expect the odML documents not to overlap, so we use the `stric` merge mode which raises errors for conflicting entries in the two odML files.

The measurement data of subsequent recordings days are added to the complete set stored in `score_sheet_complete.odml`.

In [8]:
def merge_score_sheets(file1, file2):
    """ Merge one score sheet (file2) into another score sheet (file1) in odML representation"""
    # load first odML file
    table1 = odt.OdmlTable(file1)
    # merge file2 into table1
    table1.merge(odt.OdmlTable(file2))
    # overwrite file1 with the merged score sheets
    table1.write2odml(file1)
    
# merge the daily score sheet into the complete metadata collection
merge_score_sheets('score_sheet_complete.odml',
                   'score_sheet_day2.odml')

### Scenario 3: How to filter a subset of an odML to edit it later on

For larger experiments the generated odML structure will grow in complexity. For easier visualization and modification / update of data we will generate an odML file which contains only a subset of the complete score sheet using the odML filter function.

In [9]:
def extract_subset(odML_file):
    """Extract comments for the first day in this millenial."""
    table = odt.OdmlTable(odML_file)
    # extract specific property based on property name and section name
    table.filter(SectionName='2000-01-01', PropertyName='Comment')
    # generate separate file containing only subset of the information
    table.write2odml('score_sheet_filtered.odml')
    
# extract a subset of the information to a different file
extract_subset('score_sheet_complete.odml')

The next step could be to convert the [filtered odML file](score_sheet_filtered.odml) into a csv file, update the necessary entries and convert it back into the odML format to finally merge the change back into the complete score sheet. For demonstration purposes here, we will modify the filtered odML file directly and merge it into the [complete score sheet](score_sheet_complete.odml).

In [10]:
# this code mimics a manual modification of an existing odML file, eg using the csv representation generated with odMLtables
import odml
odmlfile = odml.fileio.load('score_sheet_filtered.odml', show_warnings=False)
odmlfile.sections['2000-01-01'].properties['Comment'].value = ['Blood sample shows no abnormalities']
odml.fileio.save(odmlfile, 'score_sheet_filtered.odml')

### Scenario 4: Merging the edited subset back into the original structure

For merging the changes back into the [complete score sheet](score_sheet_complete.odml) we can use the same function as in `scenario 2`. In this case however, the entries of the two odML files overlap and we want to extend values in the first document by entries in the second one, so we merge using the `append` mode.

In [11]:
# merge the modified filtered odML into the complete metadata collection
merge_score_sheets('score_sheet_complete.odml', 'score_sheet_filtered.odml')

### Scenario 5: Create a tabular representation of the final merged odML for better viewing using the color options

For visualization of the metadata we convert the odML file to the [tabular representation](score_sheet_complete.xls) in the `.xls` format. This has the advantage of color support within the tabular structure. All color options can be customized using odMLtables.

In [12]:
def visualize_as_xls(odML_file):
    """ Generate an xls version of an odML file for visualization purposes """
    table = odt.OdmlXlsTable(odML_file)
    # optional: change the color options in the output table
    table.first_marked_style.fontcolor = 'red'
    table.second_marked_style.fontcolor = 'red'
    # write to xls format
    table.write2file('.'.join(odML_file.split('.')[:-1]) + '.xls')
    
# visualize the complete metadata collection in the xls format
visualize_as_xls('score_sheet_complete.odml')

### Scenario 6: Compare entries in the odML via data screening, lab book tables

For many odML files a number of metadata structure are repeating within the file. Here, all metadata sections for the daily measurement have the same structure. For visualization and documentation purposes in labbooks an overview across these related structures is usefull and can be generated using the odMLtables `compare` function.

In [13]:
def generate_overview(odML_file, sections='all'):
    """ Compare entries with same structure across an odML file """
    if sections=='all':
        # compare between all available sections
        sections = [s.name for s in odml.fileio.load(odML_file, show_warnings=False).sections]
    table = odt.compare_section_xls_table.CompareSectionXlsTable()
    table.load_from_file(odML_file)
    # specify all sections to be compared
    table.choose_sections(*sections)
    # save to different odML file
    table.write2file('.'.join(odML_file.split('.')[:-1]) + '_overview.xls')
    
# compare all properties across the complete metadata collection
generate_overview('score_sheet_complete.odml')

  'comparison table' % (prop.name, len(prop.value)))


This generates an `xls` [overview table](score_sheet_complete_overview.xls) comparing the first value entries for all selected sections.

### Scenario 7: Automatized processing of metadata collections

The workflow presented in `scenario 1 to 6` can be to some extend automatized using odMLtables. This simplifies the generation of an comprehensive metadata collection for the experimenter and makes the workflow more robust against human errors.

Here we start from a collection of daily csv sheets and generate a complete metadata collection as well as overview sheets from these.

In the first step we generation a number of score sheets containing dummy data to demonstrate the metadata workflow building on top of these files.

In [14]:
# generate a number of score sheets for demonstration of workflow
import os
import numpy.random as random

def generate_dummy_data(folder):
    """ Generate 20 daily score sheets with random values entered"""
    # make sure folder exists
    if not os.path.exists(folder):
        os.mkdir(folder)
        
    # generate score sheets and save them into the folder
    for i in range(20):
        score_sheet = \
"""Date,Measure,Value,Unit,Type
2000-01-{0:02d},Weight,{1:.1f},g,float
,Water Intake,{2:.2f},ml,float
,Breathing Frequency,{3:.1f},bpm,float
,Measured by, John Doe,,string
,Comment,-,,string
""".format(i+1, random.uniform(low=19, high=25), random.uniform(low=5, high=7), random.uniform(low=300, high=400))
        with open(folder + '/score_sheet_day{}.csv'.format(i), 'w+') as f:
            f.write(score_sheet)
        
# generate multiple daily score sheets for demonstation purposes
generate_dummy_data('./complete_workflow')

In the second step we define the complete workflow for metadata collection, merge, storage and visualizations in a single function and run this on the dummy data generated before. An example of one of the dummy data sets is available [here](./complete_workflow/score_sheet_day1.csv).

In [15]:
# metadata workflow based on previously generated collection of csv files
import glob
def process_all_metadata(folder):
    """ Find daily score sheets, merge them into complete metadata collection and generate visualizations. """
    # extract all metadata files present in this folder
    source_files = sorted(glob.glob(folder + '/score_sheet_day*.csv'))
    if not source_files:
        return None
    
    # convert first source file to add other files to
    score_sheet_to_odml(source_files[0])
    os.rename(source_files[0][:-4] + '.odml', folder + '/score_sheet_complete.odml')
    
    # convert all other source files
    for source_file in source_files[1:]:
        score_sheet_to_odml(source_file)
        merge_score_sheets(folder + '/score_sheet_complete.odml',
                           source_file[:-4] + '.odml')
        
    # create visualization and comparison tables
    visualize_as_xls(folder + '/score_sheet_complete.odml')
    generate_overview(folder + '/score_sheet_complete.odml')
    
# run complete metadata workflow from score sheet detection to visualization generation
process_all_metadata('./complete_workflow')

# copy style sheet for visualization in browser
os.popen('cp odml.xsl ./complete_workflow/odml.xsl')

<os._wrap_close at 0x7f53ea4f8dd8>

This generates in addition to the dummy score sheet in the subfolder `complete_workflow` a [complete metadata collection in a single odML file](./complete_workflow/score_sheet_complete.odml) as well as two `xls` files for [visualization](./complete_workflow/score_sheet_complete.xls) of the odML structure and an [overview](./complete_workflow/score_sheet_complete_overview.xls) across all common properties within the complete metadata collection.