Skip to content

Latest commit

 

History

History
224 lines (144 loc) · 11 KB

cli.rst

File metadata and controls

224 lines (144 loc) · 11 KB

Command-line interface

commonnexus comes with a shell command of the same name: commonnexus. commonnexus is a multi-command CLI, or a "git-like multi-tool command", i.e. actual functionality to manipulate NEXUS files is implemented as sub-commands. To get an overview and a list of sub-commands, run

.. command-output:: commonnexus -h


Most commands can read input from stdin and print results to stdout. Thus, these commands can easily be chained together with other shell commands:

.. command-output:: echo "#nexus begin trees; tree 1 = ((a,b)c); end;" | commonnexus normalise - | grep TREE | grep -v TREES
    :shell:

Warning

While :class:`commonnexus.Nexus <commonnexus.nexus.Nexus>` can read multiple blocks with the same name just fine, most of the commands listed below assume just one block per block type in their input (i.e. only act on the first occurrence of each block type).

In the following we describe the available sub-commands.

commonnexus normalise

Arguably the most important sub-command is normalise, because it removes quite a few complexities of the NEXUS format (e.g. different TRIANGLE options for DISTANCES, or EQUATE mappings for CHARACTERS), and thus makes downstream NEXUS reading a lot more reliable.

.. command-output:: commonnexus normalise -h

For examples of of running commonnexus normalise refer to the documentation of the underlying function :func:`commonnexus.tools.normalise.normalise`.

Normalising CHARACTERS

.. command-output:: commonnexus normalise '#nexus begin d[c]ata; dimensions nchar=3; format missing=x nolabels; matrix x01 100 010; end;'


Normalising DISTANCES

.. command-output:: commonnexus normalise '#nexus begin distances; dimensions ntax=3; format missing=x nodiagonal; matrix t1 t2 x t3 1.0 2.1; end;'


Normalising TREES

.. command-output:: commonnexus normalise '#nexus begin trees; translate a t1, b t2, c t3; tree 1 = ((a,b)c); end;'


commonnexus combine

Combining data from multiple NEXUS files into a single one can be useful to have data and resulting trees from a phylogenetic analysis in a single file or to aggregate character data for the same set of taxa.

.. command-output:: commonnexus combine -h

Combining CHARACTERS blocks

.. command-output:: cat characters.nex | commonnexus combine - characters.nex
    :shell:


commonnexus split

The Mesquite software can write multiple TAXA, CHARACTERS and TREES blocks - linked together via TITLE and LINK commands to a single NEXUS file. Most other tools can't handle such "multi-taxa" files, though.

Running commonnexus split will split such files into one NEXUS file per CARACTERS or TREES block, bundled with the appropriate TAXA block.

.. command-output:: commonnexus split -h


commonnexus characters

The characters sub-command provides functionality to manipulate the characters matrix in a NEXUS file.

.. command-output:: commonnexus characters -h


"Binarise" the matrix

Some tools (e.g. BEAST) offer special analysis options for binary data. To convert multistate character data to you can run characters --binarise:

.. command-output:: commonnexus characters --binarise "#NEXUS BEGIN DATA; DIMENSIONS nchar=1; MATRIX t1 a t2 b t3 c t4 d t5 e; END;"


"Multistatise" the matrix

Sometimes characters which are "naturally multistate" are coded as binary data (for the above reason). E.g. cognate-coded wordlist data are often binarised for analysis with BEAST, i.e. each cognate set is considered a separate character as opposed to grouping cognate sets for the same meaning into a multistate character. Binary data is somewhat harder to inspect "manually", though. E.g. figuring out whether languages may have words coded as cognate in two different cognate sets for the same meaning is difficult looking at data such as https://github.com/phlorest/birchall_et_al2016/blob/main/raw/Chapacuran_Swadesh207-2019-labelled.nex.

Running characters --multitatise on such data can make this easier. The --multistatise option expects a Python lambda function as argument, which converts a character label into a group key. E.g. the character labels

1 100_laugh_A,
2 100_laugh_B,
3 100_laugh_C,

could be merged into a multistate character passing lambda c: '_'.join(c.split('_')[:-1]).

curl https://raw.githubusercontent.com/phlorest/birchall_et_al2016/main/raw/Chapacuran_Swadesh207-2019-labelled.nex |\\
commonnexus characters --multistatise "lambda c: '_'.join(c.split('_')[:-1])" -

will output a MATRIX with rows like

Cojubim  AAAAAAA??AB(AB)AECABAAAAACAABBECAAAA?A?(AB)ACAA?AA?AEACAA??CBA??AADACBB?C?(AB)...

where polymorphisms (e.g. (AB)) mean a language has a word coded as cognate with two different cognate sets for the same meaning.

Describing character set sizes

The output of the most commands is also suitable for piping to other commands. E.g. termgraph can be used to display character set sizes:

.. command-output:: commonnexus characters characters.nex --describe binary-setsize | termgraph
    :shell:


commonnexus trees

The trees sub-command provides functionality to manipulate the TREES block in a NEXUS file.

.. command-output:: commonnexus trees -h



commonnexus taxa

The taxa sub-command provides functionality to manipulate the set of taxa in a NEXUS file.

.. command-output:: commonnexus taxa -h


Removing taxa

While removing a taxon from a NEXUS file can be as simple as deleting one line in the CHARACTERS MATRIX command, it typically isn't because the taxon may also appears in TREES TRANSLATE, etc. taxa --drop will remove relevant taxon references from TAXA, TREES, CHARACTERS, DATA, DISTANCES and NOTES blocks.

.. command-output:: commonnexus taxa --drop t1 "#NEXUS BEGIN DATA; DIMENSIONS nchar=1; MATRIX t1 a t2 b t3 c t4 d t5 e; END;"

If you want to drop constant/invariant characters which might have arisen due to removing a taxon, you could pipe the result of taxa --drop into characters --drop constant.

Describing taxa

Describing the data for a taxon in a NEXUS file is particularly useful for files with a CHARACTERS MATRIX of DATATYPE=STANDARD and labeled states - such as the files from Morphobank.

Running

commonnexus taxa ../tests/fixtures/regression/mbank_X962_11-22-2013_1534.nex --describe 1

will output a markdown formatted table of characters looking like

Character State Notes
Vomer, shape of tooth patch Trapezoidal to ovate  
Orbitosphenoid Present  
Pterotic, enclosure of lateral line canal absent or incomplete  
Frontals, midline suture joined along entire midline  
Frontoparietal crests absent  
Frontoparietal crests, sensory pore on dorsal margin ?  
Supraoccipital crest, shape long and low  
Supraoccipital crest, horizontal shelf projecting laterally at mid-height present  
Supraoccipital crest, shape of dorsal margin blade-like  
Sphenotic, horizontal shelf absent  
Mesethmoid, anterolaterally facing projection absent  
Lateral ethmoid-lacrimal articulation, orientation entirely or primarily in the horizontal plane Waldman, 1986
...