# Table of Contents
* [1. Burrito use example](#1.-Burrito-use-example)
	* [1.1 exmaple of the same filling](#1.1-exmaple-of-the-same-filling)
* [2. dumpling use example](#2.-dumpling-use-example)
	* [2.1 example of filling](#2.1-example-of-filling)
	* [2.2 This is how the Dumpling object looks like](#2.2-This-is-how-the-Dumpling-object-looks-like)
	* [2.3 You can augment the fillings with more parameters that is not previously wrapped](#2.3-You-can-augment-the-fillings-with-more-parameters-that-is-not-previously-wrapped)
	* [2.4 You only need two types of command line](#2.4-You-only-need-two-types-of-command-line)
	* [2.5 the parameters are ordered](#2.5-the-parameters-are-ordered)


** To run this notebook, you need to install Infernal: http://infernal.janelia.org/ **

In [1]:
from dumpling import check_choice, Dumpling, OptionParam, ArgmntParam, Parameters

# 1. Burrito use example

## 1.1 exmaple of the same filling

This is the typical usage of Burrito to make a filling. Developers provide util functions (`scan_file`, `scan_seq` here) for end users to call.

In [2]:
with open('cmscan2.py', 'w') as o:
    o.write('''
from burrito.parameters import FlagParameter, ValuedParameter
from burrito.util import CommandLineApplication, ResultPath
from tempfile import NamedTemporaryFile
from skbio import write

class CMScan(CommandLineApplication):
    """cmscan application controller.

    INFERNAL 1.1.1 (July 2014)"""
    _command = "cmscan"
    _suppress_stderr = False

    _parameters = { 
        # save parseable table of hits to file
        '--tblout': ValuedParameter('--', Name='tblout', Delimiter=' ', IsPath=True),
        # number of parallel CPU workers to use for multithreads
        '--cpu': ValuedParameter('--', Name='cpu', Delimiter=' ')}
    _synonyms = {'cpu': '--cpu'}
    def _accept_exit_status(self, exit_status):
        return exit_status == 0

    def _get_result_paths(self,data):
        result = {}
        for i in ['--tblout']:
            o = self.Parameters[i]
            if o.isOn():
                out_fp = self._absolute(o.Value)
                result[i] = ResultPath(Path=out_fp, IsWritten=True)
        return result
        
def scan_file(query, db, cpu=1, params=None):
    if params is None:
        params = {}
    params['--cpu'] = cpu
    app = CMScan(InputHandler='_input_as_paths', params=params)
    return app([db, query])

def scan_seq(seq, db, cpu=1, params=None):
    if params is None:
        params = {}
    params['--cpu'] = cpu
    app = CMScan(InputHandler='_input_as_paths', params=params)
    with NamedTemporaryFile(mode='w+') as i:
        write(seq, into=i.name, format='fasta')
        return app([db, i.name])
''')

In [3]:
from cmscan2 import scan_seq
from skbio import read, DNA

seq = DNA('ATGCATGC', metadata={'id': 'seq1'})
# tblout has to be passed in a dict and can't be as a keyword argument unless changing the API of scan_seq
res = scan_seq(seq, db='Rfam.cm', params={'--tblout': '/tmp/a.tblout'})

In [4]:
print(res['StdOut'].read())

# cmscan :: search sequence(s) against a CM database
# INFERNAL 1.1.1 (July 2014)
# Copyright (C) 2014 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query sequence file:                   /var/folders/4f/5zc7djg954dc71r037j88jwh0000gn/T/tmpxge1ovj_
# target CM database:                    Rfam.cm
# tabular output of hits:                /tmp/a.tblout
# number of worker threads:              1 [--cpu]
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       seq1  [L=8]
Hit scores:
 rank     E-value  score  bias  modelname  start    end   mdl trunc   gc  description
 ----   --------- ------ -----  --------- ------ ------   --- ----- ----  -----------

   [No hits detected that satisfy reporting thresholds]


Hit alignments:

   [No hits detected that satisfy reporting thresholds]


Internal CM pipeline statistics summary:
-------

# 2. dumpling use example

## 2.1 example of filling

An example of using dumpling to wrap Infernal cmscan command below. The parameter definition is much more readable and cleaner:

In [5]:
# this is a filling module: cmscan.py
with open('cmscan.py', 'w') as o:
    o.write('''
from tempfile import NamedTemporaryFile
from skbio import write
from dumpling import check_choice, Dumpling, OptionParam, ArgmntParam, Parameters


_params = [
        OptionParam('--tblout', name='out', help='save parseable table of hits to file'),
        OptionParam('--cpu', value=1, help='number of parallel CPU workers to use for multithreads'),
        ArgmntParam(name='db', help='HMM/CM database file'),
        ArgmntParam(name='query', help='input sequence to scan')]

def scan_file(query, db, cpu=1, **kwargs):
    cmscan = Dumpling('cmscan', params=Parameters(*_params))
    cmscan.update(query=query, db=db, **kwargs)
    return cmscan()

def scan_seq(seq, db, cpu=1, **kwargs):
    cmscan = Dumpling('cmscan', params=Parameters(*_params))
    with NamedTemporaryFile(mode='w+') as i:
        write(seq, into=i.name, format='fasta')
        cmscan.update(query=i.name, db=db, **kwargs)
        return cmscan()
''')

In [6]:
# this is the real code using the filling
from cmscan import scan_seq
from skbio import read, DNA

seq = DNA('ATGCATGC', metadata={'id': 'seq1'})
# out param will be passed to '--tblout' as its name is set.
res = scan_seq(seq, db='Rfam.cm', out='/tmp/a.tblout')

In [7]:
res.returncode

0

In [8]:
print(res.stderr)




In [9]:
print(res.stdout)

# cmscan :: search sequence(s) against a CM database
# INFERNAL 1.1.1 (July 2014)
# Copyright (C) 2014 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query sequence file:                   /var/folders/4f/5zc7djg954dc71r037j88jwh0000gn/T/tmptn3a92sz
# target CM database:                    Rfam.cm
# tabular output of hits:                /tmp/a.tblout
# number of worker threads:              1 [--cpu]
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       seq1  [L=8]
Hit scores:
 rank     E-value  score  bias  modelname  start    end   mdl trunc   gc  description
 ----   --------- ------ -----  --------- ------ ------   --- ----- ----  -----------

   [No hits detected that satisfy reporting thresholds]


Hit alignments:

   [No hits detected that satisfy reporting thresholds]


Internal CM pipeline statistics summary:
-------

## 2.2 This is how the Dumpling object looks like

In [10]:
from cmscan import _params
cmscan = Dumpling('cmscan', params=Parameters(*_params))

In [11]:
cmscan

Dumpling
--------
CMD: cmscan
CMD version: ''
CMD URL: ''
CMD Parameter:
OptionParam(flag='--tblout', alter=None, name='out', value=None, action=<lambda>, help='save parseable table of hits to file', delimiter=' ')
OptionParam(flag='--cpu', alter=None, name='cpu', value=1, action=<lambda>, help='number of parallel CPU workers to use for multithreads', delimiter=' ')
ArgmntParam(name='db', value=None, action=<lambda>, help='HMM/CM database file')
ArgmntParam(name='query', value=None, action=<lambda>, help='input sequence to scan')

In [12]:
# update with keyword argument or dict:

cmscan.update(cpu=2, out='/tmp/foo.txt', db='/Users/zech/Dropbox/git/micronota/micronota/bfillings/tests/data/infernal/test.cm', query='/Users/zech/Dropbox/git/micronota/micronota/bfillings/tests/data/infernal/NC_018498.fna')

cmscan.update(**{'cpu':3})

This the command that will actually run:

In [13]:
cmscan.command

['cmscan',
 '--tblout',
 '/tmp/foo.txt',
 '--cpu',
 '3',
 '/Users/zech/Dropbox/git/micronota/micronota/bfillings/tests/data/infernal/test.cm',
 '/Users/zech/Dropbox/git/micronota/micronota/bfillings/tests/data/infernal/NC_018498.fna']

## 2.3 You can augment the fillings with more parameters that is not previously wrapped

In [14]:
from cmscan import _params
more_params = [
    OptionParam('--rfam', help='Set all filter thresholds as if the search space were more than 20 Gb.'),
    OptionParam('--noali', value=True,
                help='Omit the alignment section from the main output. This can greatly reduce the output volume.')]
cmscan = Dumpling('cmscan', params=Parameters(*more_params + _params), version='1.1.1', url='www.infernal.com')

In [15]:
cmscan

Dumpling
--------
CMD: cmscan
CMD version: '1.1.1'
CMD URL: 'www.infernal.com'
CMD Parameter:
OptionParam(flag='--rfam', alter=None, name='rfam', value=None, action=<lambda>, help='Set all filter thresholds as if the search space were more than 20 Gb.', delimiter=' ')
OptionParam(flag='--noali', alter=None, name='noali', value=True, action=<lambda>, help='Omit the alignment section from the main output. This can greatly reduce the output volume.', delimiter=' ')
OptionParam(flag='--tblout', alter=None, name='out', value=None, action=<lambda>, help='save parseable table of hits to file', delimiter=' ')
OptionParam(flag='--cpu', alter=None, name='cpu', value=1, action=<lambda>, help='number of parallel CPU workers to use for multithreads', delimiter=' ')
ArgmntParam(name='db', value=None, action=<lambda>, help='HMM/CM database file')
ArgmntParam(name='query', value=None, action=<lambda>, help='input sequence to scan')

## 2.4 You only need two types of command line

You can use OptionParam for any of parameter type of `burrito.FlagParameter`, `burrito.ValuedParameter`, `burrito.MixedParameter`, as they are essentially all option parameters in linux command line. It is cleaner and less confusing. Example:

In [16]:
params = [
    # check_choice check if the given value is a member of legal values
    OptionParam('-f', action=check_choice((True, False)), help='this is equivlent to burrito.FlagParameter'),
    OptionParam('-d', action=check_choice((True, False, 1, 0)), help='this is equivlent to burrito.MixedParameter'),
    OptionParam('--input', help='this is equivalent to burrito.ValuedParameter'),
]
foo = Dumpling(cmd='foo', params=Parameters(*params))

In [17]:
foo.update(d=0)
foo.command

['foo', '-d', '0']

In [18]:
foo.update(d=False)
foo.command

['foo']

In [19]:
foo.update(d=True)
foo.command

['foo', '-d']

In [20]:
foo.update(d=True, f=True)
foo.command

['foo', '-f', '-d']

In [21]:
foo.update(d=3)

ValueError: Illegal value: 3

## 2.5 the parameters are ordered

Some weird command line tools require a parameter is given before another in position. I don't think burrito can do this, but dumpling can. The parameters are ordered as you define it:

In [22]:
p1 = [
    OptionParam('-f', value=True, action=check_choice((True, False)), help='this is equivlent to burrito.FlagParameter'),
    OptionParam('-d', value=True, action=check_choice((True, False, 1, 0)), help='this is equivlent to burrito.MixedParameter')]
foo1 = Dumpling('foo', params=Parameters(*p1))
foo1.command

['foo', '-f', '-d']

In [23]:
# reverse the order of 2 parameters
p2 = [
    OptionParam('-d', value=True, action=check_choice((True, False, 1, 0)), help='this is equivlent to burrito.MixedParameter'),
    OptionParam('-f', value=True, action=check_choice((True, False)), help='this is equivlent to burrito.FlagParameter')]
foo2 = Dumpling('foo', params=Parameters(*p2))
foo2.command

['foo', '-d', '-f']