<a id="top"></a>
# Fix python parser
Development notes for refactoring the code in MaymoicsVC/src/python/ <br>
where it concerns writing the *WorkFlow.FilledIn.json* <br>
from the *WorkFlow.Template.json* file as output by Cromwell Java Womtool.
<br>[MayomicsVC github master](https://github.com/ncsa/MayomicsVC) <br>
****
## This is the *edit-view-test-document* notebook for two modules:
### womtool_template_fill_in.py
[called module: womtool_template_fill_in](#womtool_template_fill_in_module) <br>

    * parse_args                  parses command line input into python
    * args_dict_to_filledin_json  takes the dictionary of command line inputs CALLS:
        * get_json_file_dict         open the womtool-generated template file as a python dict
        * assemble_config_dict       combines the list of config files into a python dict
        * configure_json_dict        fills the template dict with the config files dict
        * write_filled_in_json_dict  write the dictionary to json file format
    * Note that this modular approach facilitates test automation
        * Import to test module and call with python types
        * multiple filled-in json files may be generated for different test types
### config_parser_II.py
[(python main) module config_parser_II](#config_parser_module) <br>
#### same signature replacement for config_parser.py
    * calls womtool_template_fill_in.py functions and returns a code to the os
    * python main - command-line interface
****
##### Broad Institute Links:
[WOMTOOL docs](https://cromwell.readthedocs.io/en/stable/WOMtool/) <br>
[Cromwell docs](https://cromwell.readthedocs.io/en/stable/) <br>
[wdl specification](https://software.broadinstitute.org/wdl/documentation/spec) <br>
[BI WDL](https://github.com/broadinstitute/wdl) <br>
[wdl/parsers/python](https://github.com/broadinstitute/wdl/tree/master/parsers/python) <br>
[Widdler python exec & monitor cromwell](https://github.com/broadinstitute/widdler) <br>

<a id="page_links"></a> <br>
****
### This Notebook Page-Links
****
#### data generation and code development cells
[test data generation cells](#generate_test_data) <br>
****
[called module: womtool_template_fill_in](#womtool_template_fill_in_module) <br>
[(python main) module config_parser_II](#config_parser_module) <br>
****
[(python main) module import and run](#module_import_run) <br>
****
#### individual function operation tests
[read json file](#read_json_file) <br>
[read config file](#read_config_file) <br>
[put config in json](#put_config_in_json) <br>
[write filled in json](#write_filled_in_json) <br>
****
#### module documentation output
[documentation - help(module) output](#view_documentation) <br>

<a id="generate_test_data"></a>
****
### Test Data Generation Cells
    * json file has one variable not found in config
    * config file has variable not found in json
[Top](#top) <br>
[Page Links](#page_links) <br>

In [1]:
%%writefile jjalltheway.json
{
    "wf0.task0.var0": "Boolean",
    "wf0.task0.var1": "Int",
    "wf0.task0.var2": "String",
    "wf0.task0.var3": "File",
    "wf0.task0.var4": "Array[File]",
    "wf0.task0.var5": "Array[Array[File]]",
    "wf0.var6": "Array[Array[Array[File]]]",
    "wf0.var7": "Array[Array[Array[File]]]",
    "number10": "String"
}

Overwriting jjalltheway.json


In [2]:
%%writefile conf.txt
var01="true"
var1="true"
var2=""
var3="file1A"
var4=["file2A","file1B"]
var5=[["file3A","file2B"],["file1C","file1D"]]
var6=[[["fileA","fileB"],["fileC","fileD"]],[["fileE","fileF"],["fileG","fileH"]]]
number9="quacks like a duck"
number10=""'who would do this'""


Overwriting conf.txt


In [3]:
%%writefile conf_II.txt
var0="true"
var9="file1A"
var10=["file2A","file1B"]
var11=[["file3A","file2B"],["file1C","file1D"]]
var7=[[["fileA","fileB"],["fileC","fileD"]],[["fileE","fileF"],["fileG","fileH"]]]
number98="quacks like a duck"
number100=""'who would do this'""

Overwriting conf_II.txt


<a id="womtool_template_fill_in_module"></a> <br>
## Module   *womtool_template_fill_in.py* 
```python
# get the template.json dictionary
json_template_dict = get_json_file_dict(args_dict['jsonTemplate'])

# assemble config files list into a single config dictionary
config_dict = assemble_config_dict(config_files_list=args_dict["i"])

# get the Filled In dict
filled_in_dict, json_missing_dict, config_used_dict = configure_json_dict(json_template_dict, config_dict)

# write the output file
rc = write_filled_in_json_dict(json_dict, template_dict, full_filename=args_dict['o'])
```

#### Code Development Cell
[Top](#top) <br>
[Page Links](#page_links) <br>

In [4]:
%%writefile womtool_template_fill_in.py
""" 
NCSA Industry Genomics Group
lanier4@illinois.edu

Starting with a WOMTOOL created workflow.template.json file 
and a list of properly formatted configuration files,
assembled and write a workflow.FilledIn.json file.

The previous version of the python command-line interface is unchanged.
"""
import os
import argparse
import sys
import json
from collections import OrderedDict, defaultdict, Counter


def read_json_raw(json_full_filename):
    """ Usage: list_of_lines = read_json_raw(json_full_filename) 
    
    Args:
        json_full_filename: full path name to the json file
        
    Returns:
        list_of_lines:      list of strings terminated by '\n' newline character
    """
    lines = []
    try:
        with open(json_full_filename, 'r') as fh:
            lines = fh.readlines()
    except:
        print('%s\nFailed to open and read with python std library'%(json_full_filename))
        pass
    
    return lines


def get_json_file_dict(json_full_filename):
    """ Usage: json_dict = get_json_file_dict(data_fullfilename)
    
    Args:
        data_fullfilename: json or yaml format full path filename
                                quoted strings or not (if consistant) 
                                -but no lines start with tab characters
    Returns:
        json_dict:               python dictionary of name - value parameters.
    """
    lines = read_json_raw(json_full_filename)
    S = ''
    for line in lines:
        S += line.strip()
    
    return json.loads(S)


def get_config_file_dict(configfile_fullpath):
    """ Usage:   config_file_dict = get_config_file_dict(configfile_fullpath)
    Ignore comments, only include lines with "=" sign, empty strings as empty strings.
    
    Args:
        configfile_fullpath:    full path to formatted plain text file 
                                
    Returns:
        config_file_dict:       python Ordered dictionary of key-value pairs 
                                (suitable for json file insertion)
    """
    pairs_list = []
    
    with open(configfile_fullpath, 'r') as fh:
        lines = fh.readlines()
        
    for line in lines:
        l = line.strip().split("=")
        if len(l) > 0 and len(l[0]) > 0:
            lefty = l[0].strip()
            if len(lefty) == 0:
                continue

            if not lefty[0] == "#" and len(l) > 1 and len(l[1]) > 0:
                righty = l[1].strip()
                if len(righty) == 0:
                    righty = ' '
                        
                if len(lefty) > 0:
                    pairs_list.append((lefty, righty))

    if len(pairs_list) > 0:
        config_file_dict = OrderedDict(pairs_list)
    else:
        config_file_dict = {}

    return config_file_dict


def assemble_config_dict(config_files_list):
    """ Usage: config_dict = assemble_config_dict(config_files_list)
    Prints Warning: BUT may retrun multiple duplicate keys with different values.
    
    Args:
        config_files_list:      python list of configurateion.txt files
        
    Returns:
        config_dict:            python ordered dictionary of all config files
    """
    config_dict_list = []
    for file_name in config_files_list:
        if os.path.isfile(file_name):
            conf_dict_next = get_config_file_dict(file_name)
            for k, v in conf_dict_next.items():
                config_dict_list.append((k, v))
        else:
            print('\n\t', file_name, '\n\tNot found - Ignoring this file\n')

    total_number_of_tuples = len(config_dict_list)
    if total_number_of_tuples > 0:
        config_dict = OrderedDict(config_dict_list)
        number_of_duplicates = total_number_of_tuples - len(config_dict)
        if number_of_duplicates > 0:
            print('\nDuplicates Found: %i\n'%(number_of_duplicates))
    else:
        config_dict = {}
    
    return config_dict


def get_json_keys_config_dict(json_dict):
    """ Usage:    keys_dict = get_json_keys_config_dict(json_dict)
    Config file dictionary for x.template.json keys
    
    Args:
        json_dict:      json template file as python dict
        
    Returns:
        keys_dict:      key is righmost member of json key
                        value is list of json keys in the input dict
    """
    keys_dict = defaultdict(list)
    for k, v in json_dict.items():
        k_list = k.split('.')
        keys_dict[k_list[-1]].append(k)
    
    return keys_dict


def configure_json_dict(json_dict, config_dict):
    """ Merge config-files dictionary into the json template dictionary. Also return reference dicts.
    Usage: 
    configured_dict, json_missing_dict, config_used_dict = configure_json_dict(json_dict, config_dict)
    
    Args:
        json_dict:          python dict from json template file
        config_dict:        python dict from config.txt file intended to fill in the json template
        
    Returns:
        configured_dict:    The FilledIn dictionary suitable for writing.
        json_missing_dict:  The missing json templte key-value pairs dictionary to inform user.
        config_used_dict:   The actual config lines used - for future integration testing usage.
    """
    configured_dict = defaultdict()
    json_missing_dict = defaultdict()
    config_used_dict = defaultdict()
    json_keys_counter = Counter(json_dict.keys())
    
    keys_d = get_json_keys_config_dict(json_dict)
    
    #                              put the config dict values in the json file full-key
    for k, v in config_dict.items():
        if k in keys_d:
            config_used_dict[k] = v

            for var_name in keys_d[k]:
                if len(v) > 2 and v[0:2] == '""':
                    v_fixed = '"' + '\\' + '"'
                    v_fixed += v[2:-2]
                    v_fixed += '"' + '\\'  + '"'
                    configured_dict[var_name] = v_fixed
                else:
                    configured_dict[var_name] = v.strip()
                    
                json_keys_counter[var_name] += 1
                
    for k, v in json_keys_counter.items():
        if v < 2:
            json_missing_dict[k] = json_dict[k]
                
    return configured_dict, json_missing_dict, config_used_dict


def write_filled_in_json_dict(json_dict, template_dict, full_filename):
    """ Usage:
    outfile = write_filled_in_json_dict(json_dict, json_template_dict, filename_prefix, output_dir)
    
    Args:
        json_dict:              template.json as dict and filled in with config.txt
        template_dict:          WOMTOOL template.json as python dict.
        filename_prefix:        (='test') string to prepend to .FilledIn.json
        output_dir:             (=None) if DNE current directory is used
        
    Returns:
        rc:                     return code is zero if no error detected with write
    """
    #                                   assemble filled-in json file as string
    out_string = '{\n'
    for json_key, json_value in json_dict.items():
        out_string += '    "' + json_key + '":'
        #                               magic string value first
        if json_value[0:4] == '"\\"\'':
            out_string += ' \"\\\"' + json_value[3:] + '\"\n'
        else:
            out_string += ' ' + json_value + ',\n'
            
    out_string = out_string[:-2] + '\n}\n'
    
    #                                   open file handle an write the string
    rc = -1
    try:
        with open(full_filename, 'w') as fh:
            fh.writelines(out_string)
            
        if os.path.isfile(full_filename):
            rc = 0
    except:
        pass
        
    return rc


def args_dict_to_filledin_json(args_dict, output_dir=None):
    """ Usage: return_code = args_dict_to_filledin_json(args_dict, output_dir) 
    Assemble all the above fucntions in the context of the input command line args.
    
    Args:
        args_dict:          Command line arguments converted to a python dict
        output_dir:         Where to write the output
        
    Returns:
        rc:                 return code = 0 if write operation succeeded, else rc = -1
    """
    if output_dir is None or os.path.isdir(output_dir) == False:
        output_dir = os.getcwd()
        
    # get the template.json dictionary
    json_template_dict = get_json_file_dict(args_dict['jsonTemplate'])
    
    # assemble config files list into a single config dictionary
    config_dict = assemble_config_dict(config_files_list=args_dict["i"])
    
    # get the Filled In dict
    filled_in_dict, json_missing_dict, config_used_dict = configure_json_dict(json_template_dict, config_dict)
    
    # write the output file
    rc = write_filled_in_json_dict(json_dict, template_dict, full_filename=args_dict['o'])
    
    return rc


def parse_args(args):
    """ This function (parse_args) is copy-adapted from existing repo to preserve input signature 
    By default, argparse treats all arguments that begin with '-' or '--' as optional in the help menu 
    (preferring to have required arguments be positional).

    To get around this, we must define a required group to contain the required arguments
    This will cause the help menu to be displayed correctly
    """
    parser = argparse.ArgumentParser()

    required_group = parser.add_argument_group('required arguments')
    
    required_group.add_argument("-i", action='append', required=True, metavar='',
                                help="The input configuration files (Multiple entries of this flag are allowed)")
    
    required_group.add_argument("--jsonTemplate", required=True, metavar='',
                                help='The json template file that is filled in with data from the input files')
    
    required_group.add_argument("-o", required=True, metavar='',
                                help='The location for the output file')
    
    # Truly optional argument
    parser.add_argument('--jobID', type=str, metavar='', help='The job ID', default=None, required=False)
    
    # Debug mode is on when the flag is present and is false by default
    parser.add_argument("-d", action="store_true", help="Turns on debug mode", default=False, required=False)
    
    return parser.parse_args(args)

Overwriting womtool_template_fill_in.py


<a id="config_parser_module"></a> <br>
## config_parser_II.py - main function
#### Code Development Cell 
[Top](#top) <br>
[Page Links](#page_links) <br>

In [5]:
%%writefile config_parser_II.py
"""
NCSA Industry Genomics Group
lanier4@illinois.edu

main function for receiving command-line args & calling  womtool_template_fill_in.py

requred args: '-i', '-o', '--jsonTemplate'

return code is either 0 (success) or -1 (fail)
"""
import sys
import json

path_to_womtool_template_fill_in_module = '.'
sys.path.insert(1, path_to_womtool_template_fill_in_module)
from womtool_template_fill_in import parse_args, args_dict_to_filledin_json

def main(args):
    """ Usage:
    python3 config_parser_II.py --jsonTemplate a.template.json -o a.FilledIn.json -i c1.txt -i c2.txt
    
    Args (command line args):
        --jsonTemplate: womtool generated json template file
        -o              output file name (usually like - Workflowname.FilledIn.json)
        -i              one or more config_whatever.txt files each preceeded by "-i"
        
    Returns:
        rc:             0=success, -1=fail
    """
    parsed_args = parse_args(args)
    args_dict = json.dumps(vars(parsed_args), indent=4)
    args_dict = json.loads(args_dict)
    
    rc = args_dict_to_filledin_json(args_dict)
    
    return rc
        
if __name__ == '__main__':
    rc = main(sys.argv[1:])

Overwriting config_parser_II.py


<a id="module_import_run"></a> <br>
## Python Main - Module Import and Run Cell
#### modules may be placed anywhere that your kernel has permission on the system
[Top](#top) <br>
[Page Links](#page_links) <br>

In [6]:
#                              Import womtool_template_fill_in.py 
import os
import sys

path_to_development_code_modules = '.'
sys.path.insert(1, path_to_development_code_modules)
from womtool_template_fill_in import *

#                              Construct the call String
ConfigsBeingUsed = ' -i conf.txt -i conf_II.txt'
S = 'python3 config_parser_II.py'
S = S + ConfigsBeingUsed
S = S + ' --jsonTemplate jjalltheway.json -o workflow.FilledIn.json'

#                              Call Main
print('Return code =', os.system(S))

Return code = 256


<a id="read_json_file"></a> <br>
### Python function that reads WOMTOOL generated json template.
[json module in python3 standard library](https://docs.python.org/3/library/json.html) <br>
```python
json_dict = get_json_file_dict(data_fullfilename)
```
[Top](#top) <br>
[Page Links](#page_links) <br>

In [7]:
#                                 demonstrate get_run_file_dict on json with quotes:
TestTask_dir = os.getcwd()
TestTask_jason_file = 'jjalltheway.json'
json_fullfilename = os.path.join(TestTask_dir, TestTask_jason_file)

if os.path.isfile(json_fullfilename):
    json_template_dict = get_json_file_dict(json_fullfilename)
    print('{0} variables found\n'.format(len(json_template_dict)))
    if len(json_template_dict) > 0:
        for k, v in json_template_dict.items():
            print('%30s: %s'%(k,v))
else:
    print(json_fullfilename, '\nNot Found')


9 variables found

                wf0.task0.var0: Boolean
                wf0.task0.var1: Int
                wf0.task0.var2: String
                wf0.task0.var3: File
                wf0.task0.var4: Array[File]
                wf0.task0.var5: Array[Array[File]]
                      wf0.var6: Array[Array[Array[File]]]
                      wf0.var7: Array[Array[Array[File]]]
                      number10: String


<a id="read_config_file"></a> <br>
## python function that reads (special.txt, yaml, json) config files.
```python
config_file_dict = get_config_file_dict(configfile_fullpath)
```
[Top](#top) <br>
[Page Links](#page_links) <br>

In [8]:
# demonstrate get_run_file_dict on json with quotes:
TestTask_dir = os.getcwd()
TestTask_jason_file = 'conf.txt'
json_fullfilename = os.path.join(TestTask_dir, TestTask_jason_file)

if os.path.isfile(json_fullfilename):
    CONFIG_txt_dict = get_config_file_dict(json_fullfilename)
    print('{0} variables found\n'.format(len(CONFIG_txt_dict)))
    if len(CONFIG_txt_dict) > 0:
        for k, v in CONFIG_txt_dict.items():
            print('%30s: %s'%(k,v))
else:
    print(json_fullfilename, '\nNot Found')


9 variables found

                         var01: "true"
                          var1: "true"
                          var2: ""
                          var3: "file1A"
                          var4: ["file2A","file1B"]
                          var5: [["file3A","file2B"],["file1C","file1D"]]
                          var6: [[["fileA","fileB"],["fileC","fileD"]],[["fileE","fileF"],["fileG","fileH"]]]
                       number9: "quacks like a duck"
                      number10: ""'who would do this'""


<a id="put_config_in_json"></a> <br>
## python function puts the config dict o dicts into the variables dict.
```python
configured_dict, data_not_dict = configure_json_dict(json_dict, config_dict)
```
[Top](#top) <br>
[Page Links](#page_links) <br>

In [9]:
configured_dict, json_missing_dict, config_used_dict = configure_json_dict(json_dict=json_template_dict,
                                                                           config_dict=CONFIG_txt_dict)

print('\nConfigured dictionary\n')
for k, v in configured_dict.items():
    print('%30s: %s'%(k,v))
    
print('\n\nMissing data\n')
for k, v in json_missing_dict.items():
    print('%30s: %s'%(k,v))
    
print('\n\nInserted data\n')
for k, v in config_used_dict.items():
    print('%30s: %s'%(k,v))


Configured dictionary

                wf0.task0.var1: "true"
                wf0.task0.var2: ""
                wf0.task0.var3: "file1A"
                wf0.task0.var4: ["file2A","file1B"]
                wf0.task0.var5: [["file3A","file2B"],["file1C","file1D"]]
                      wf0.var6: [[["fileA","fileB"],["fileC","fileD"]],[["fileE","fileF"],["fileG","fileH"]]]
                      number10: "\"'who would do this'"\"


Missing data

                wf0.task0.var0: Boolean
                      wf0.var7: Array[Array[Array[File]]]


Inserted data

                          var1: "true"
                          var2: ""
                          var3: "file1A"
                          var4: ["file2A","file1B"]
                          var5: [["file3A","file2B"],["file1C","file1D"]]
                          var6: [[["fileA","fileB"],["fileC","fileD"]],[["fileE","fileF"],["fileG","fileH"]]]
                      number10: ""'who would do this'""


<a id="write_filled_in_json"></a> <br>
## python function of outputs above -- checks and writes the workflow.FilledIn.json file

```python
full_filename = write_filled_in_json_dict(json_dict, 
                                          json_template_dict, 
                                          filename_prefix,
                                          output_dir) 
```
[Top](#top) <br>
[Page Links](#page_links) <br>

In [10]:
full_filename = 'test.FilledIn.json'
rc = write_filled_in_json_dict(configured_dict, json_template_dict, full_filename)

print('Reading: ',full_filename)

with open(full_filename, 'r') as fh:
    lines = fh.readlines()
    
for line in lines:
    if '{' in line or '}' in line:
        print(line.strip())
    else:
        print('    ',line.strip())

Reading:  test.FilledIn.json
{
     "wf0.task0.var1": "true",
     "wf0.task0.var2": "",
     "wf0.task0.var3": "file1A",
     "wf0.task0.var4": ["file2A","file1B"],
     "wf0.task0.var5": [["file3A","file2B"],["file1C","file1D"]],
     "wf0.var6": [[["fileA","fileB"],["fileC","fileD"]],[["fileE","fileF"],["fileG","fileH"]]],
     "number10": "\"'who would do this'"\"
}


<a id="view_documentation"></a> <br>
## view the help output for the module
[Top](#top) <br>
[Page Links](#page_links) <br>

In [11]:
path_to_module = '.'
import sys
sys.path.insert(1, path_to_module)

import json_template_configuration

help(json_template_configuration)

Help on module json_template_configuration:

NAME
    json_template_configuration

DESCRIPTION
    Starting with a WOMTOOL created workflow.template.json file 
    a list of properly formatted configureation files
    is assembled and written to a workflow.FilledIn.json file 
    
    The previous repository version of the python command-line interface is preserved

FUNCTIONS
    args_dict_to_filledin_json(args_dict, output_dir=None)
        Usage: return_code = args_dict_to_filledin_json(args_dict, output_dir) 
        Wrapper function to assemble all the above fucntions in the context of the input command line args.
        
        Args:
            args_dict:          Command line arguments converted to a python dict
            output_dir:         Where to write the output
            
        Returns:
            rc:                 return code = 0 if write operation succeeded, else rc = -1
    
    assemble_config_dict(config_files_list)
        Usage: config_dict = assemble_con

In [1]:
from womtool_template_fill_in import configure_json_dict
help(configure_json_dict)

Help on function configure_json_dict in module womtool_template_fill_in:

configure_json_dict(json_dict, config_dict)
    Merge config-files dictionary into the json template dictionary. Also return reference dicts.
    Usage: 
    configured_dict, json_missing_dict, config_used_dict = configure_json_dict(json_dict, config_dict)
    
    Args:
        json_dict:          python dict from json template file
        config_dict:        python dict from config.txt file intended to fill in the json template
        
    Returns:
        configured_dict:    The FilledIn dictionary suitable for writing.
        json_missing_dict:  The missing json templte key-value pairs dictionary to inform user.
        config_used_dict:   The actual config lines used - for future integration testing usage.

