# Cell Sens VSI Metadata Reading Tutorial

## In this tutorial we will go through how to use python to read metadata from .vsi files in CellSens. This approach is particularly useful because parameters such as date, image scale, z-stack spacing, and many others are automatically stored in the metadata and can be used for analysis. Some of this can be done really easily with the package Bioformats, others we will have to do manually by navigating the complementary ".oex" file made with every Cell Sens VSI

### Important functions to import

In [1]:
import javabridge # Used for tunneling to java where bioformats is primarily written in
import xml.etree.ElementTree as ET # Used for manually navigating .oex files
import bioformats # Used automatically reading in bioformats data

## First we'll write the functions to manually look through the .eox file for some important metadata not accessible through bioformats. An oex file is basically a xml file, so we'll use this function with a .xml package  

### Function to look in a directory of the xml and find the subdirectory you're querying. Used for the manual metedata extraction

In [2]:
def query_branches(loop,tag,attrib):
    found=False
    # search through subdirectories of directory
    for l in loop:
        # if the tag and attribute match of subdir match return the subdir
        if l.tag ==tag and l.attrib['name']==attrib:
            new_loop=l
            found=True
            break
    if found:
        return new_loop
    else:
        return 'not_found'

### Next we will take the query_branches function and pass it into this function that iterated through nodes of an xml file to extract the metadata we wat

In [3]:
# paths through the xml file to final metadata value
default_locations=[['Loop','cycle time'],['Loop','Z-Stack','step width']]
default_tag=[['node','attribute'],['node','node','attribute']]
def extract_meta_manual(file_path,locations=default_locations,tag=default_tag,metadata=dict()):
    file_path=file_path.replace('vsi','oex')
    tree=ET.parse(file_path)
    root=tree.getroot()
    # get into net
    for subroot in root:
        if subroot.tag=='net':
            root=subroot
            break
    # iterate through the different values we want to get
    for i in range(len(locations)):
        loop_root=root
        path=locations[i][:]
        path_tag=tag[i][:]
        # naviate through the directed paths of the xml file
        for j in range(len(path)):
            loop_root=query_branches(loop_root,path_tag[j],path[j])
            # if at the end of the path get the metadata value we want and append it to a dictionary
            if j==len(path)-1:
                if  loop_root!='not_found':
                    for l in loop_root:
                        metadata[path[j]]=l.get('val')
    return metadata

### Now we write a function that used bioformats to get the other metadata we want

In [4]:
# ---- Function that gets the attainable information using bioformats
def extract_meta_bioformats(filepath, metadata=dict()):
    omexmlstr = bioformats.get_omexml_metadata(filepath)
    o = bioformats.OMEXML(omexmlstr)
    x = o.image().Pixels
    metadata['size_Z'] = x.SizeZ
    metadata['size_T'] = x.SizeT
    metadata['scale'] = x.PhysicalSizeX
    return metadata

### A few auxillary functions used to make string and filename handling easier...

In [5]:
def split(word):
    return [char for char in word]

In [6]:
def increment_numbers(string,incrementor):
    nums=[]
    location=[]
    split_string=split(string)
    for i in range(len(split_string)):
        if split_string[i].isdigit():
            nums.append(split_string[i])
            location.append(i)
    nums=[int(i) for i in split(string) if i.isdigit()]
    original=nums[-1]
    update=original-incrementor
    loc=location[-1]
    split_string[loc]=str(update)
    new_string = "".join(split_string)
    return new_string

### Now we tie all of these together in the function "extract metadata". This function takes the inputted filepath of the .vsi file and outputs a dictionary of metadata

In [7]:
# ---- Main function that extracts the relevant metadata from a vsi file
def extract_metadata(filepath,cycle_vm=True,increment_num=None):
    if cycle_vm:
        javabridge.start_vm(class_path=bioformats.JARS)
    biof=extract_meta_bioformats(filepath)
    if increment_num is not None:
        filepath=increment_numbers(filepath,increment_num)
    metadata=extract_meta_manual(filepath,metadata=biof)
    if cycle_vm:
        javabridge.kill_vm()
    return metadata

### Example of code in action 

In [8]:
import os
path=os.getcwd()+'/Assay 1_03.vsi'
path=path.replace('\\','/')
meta=extract_metadata(path)
meta

{'size_Z': 7,
 'size_T': 220,
 'scale': 0.16250000000000003,
 'cycle time': '3.2074099999999998722',
 'step width': '3.9500000000000001776'}