# `molli` Command Line 

## About this tutorial
This file is meant to illustrate a few fundamental principles of the new molli package. The difference between old and new style molli is stark, therefore this introductory tutorial will be useful for both experienced people and newcomers.

## Basic structure of `molli` package.

### Subpackages

In [16]:
# This is meant to be as iconic as `import numpy as np` :)
import molli as ml

## Command line

`molli` features a number of standalone scripts for standard procedures, such as parsing a .CDXML file, or for compiling a collection.

In [17]:
# This is a shell command
!molli --HELP

usage: molli [-C <file.yml>] [-L <file.log>] [-v] [-H] [-V]
             {list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}

MOLLI package is an API that intends to create a concise and easy-to-use
syntax that encompasses the needs of cheminformatics (especially so, but not
limited to the workflows developed and used in the Denmark laboratory.

positional arguments:
  {list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}
                        This is main command that invokes a specific
                        standalone routine in MOLLI. To get full explanation
                        of available commands, run `molli list`

options:
  -C <file.yml>, --CONFIG <file.yml>
                        Sets the file from which molli configuration will be
                        read from
  -L <file.log>, --LOG <file.log>
                        Sets the file that will contain the output of molli
                        routines.
  

In [18]:
# This is a shell command
!molli list

[32mmolli align
[0m[32mmolli combine
[0m[32mmolli compile
[0m[32mmolli gbca
[0m[32mmolli grid
[0m[32mmolli info
[0m[32mmolli ls
[0m[32mmolli parse
[0m[32mmolli recollect
[0m[32mmolli show
[0m[32mmolli stats
[0m[32mmolli test
[0m

## Align

`align` allows for alignment of molecule libraries or conformer libraries based on a "Query" mol2 file. This can be a minimum substructure that exists within a library. Note: This requires the `rmsd` and `pandas` packages, which are currently not dependencies of molli. These can be added via `pip install rmsd` and `pip install pandas` OR `conda install rmsd` and `conda install pandas` respectively.

In [19]:
!molli align -h

usage: molli align [-h] -i INPUT -q query_mol.mol2 [--rmsd {rmsd,scipy}]
                   [-o <aligned>] [-s STATS]

Read a conformer library and align it across given query

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        ConformerLibrary/MoleculeLibrary file to align
  -q query_mol.mol2, --query query_mol.mol2
                        Mol2 file with the reference query structure
  --rmsd {rmsd,scipy}   Method of rmsd calculation. Available are the default
                        and scipy
  -o <aligned>, --output <aligned>
                        Output file path and name w/o extension
  -s STATS, --stats STATS
                        True/False flag to save alignment statistics in the
                        separate file. Defaults to False.


## Combine

`combine` allows combinatorial expansion of a library. One can view the `core` as being a full of base structures with certain attachment points. The substituents can be appended at different attachemnt points and with different methods depending on the values chosen.

In [20]:
!molli combine -h

usage: molli combine [-h] -s <substituents.mlib>
                     [-m {same,permutns,combns,combns_repl}]
                     [-a ATTACHMENT_POINTS] [-n 1] [-b 1] -o <combined.mlib>
                     [-sep SEPARATOR] [--hadd]
                     [--obopt [ff maxiter tol disp ...]] [--overwrite]
                     cores

Combines two lists of molecules together

positional arguments:
  cores                 Base library file to combine wth substituents

options:
  -h, --help            show this help message and exit
  -s <substituents.mlib>, --substituents <substituents.mlib>
                        Substituents to add at each attachment of a core file
  -m {same,permutns,combns,combns_repl}, --mode {same,permutns,combns,combns_repl}
                        Method for combining substituents
  -a ATTACHMENT_POINTS, --attachment_points ATTACHMENT_POINTS
                        Label used to find attachment points
  -n 1, --nprocs 1      Number of processes to be used in parall

## Compile

`compile` allows multiple libraries to be combined into one.

In [21]:
!molli compile -h

usage: molli compile [-h] -o LIB_FILE [-t {molecule,ensemble}]
                     [-p {openbabel,obabel,molli}] [--stem] [-s] [-v]
                     [--overwrite]
                     [<file_or_glob> ...]

Compile matching files into a molli collection. Both conformer libraries and
molecule libraries are supported.

positional arguments:
  <file_or_glob>        List of source files or a glob pattern.

options:
  -h, --help            show this help message and exit
  -o LIB_FILE, --output LIB_FILE
                        New style collection to be made
  -t {molecule,ensemble}, --type {molecule,ensemble}
                        Type of object to be imported
  -p {openbabel,obabel,molli}, --parser {openbabel,obabel,molli}
                        Parser to be used to import the molecule object
  --stem                Renames the conformer ensemble to match the file stem
  -s, --split           This is only compatible with the choice of type
                        `molecule`. In thi

## GBCA

`gbca` allows calculation of some of the grid-based descriptors. A more in-depth description of the command and its applications can be found in the cookbook.

In [22]:
!molli gbca -h

usage: molli gbca [-h] [-w] [-n 128] [-b 128] [-g <grid.hdf5>]
                  [-o <lib_aso.hdf5>] [--dtype DTYPE] [--overwrite]
                  {aso,aeif} CLIB_FILE

This module can be used for standalone computation of descriptors

positional arguments:
  {aso,aeif}            This selects the specific descriptor to compute.
  CLIB_FILE             Conformer library to perform the calculation on

options:
  -h, --help            show this help message and exit
  -w, --weighted        Apply the weights specified in the conformer files
  -n 128, --nprocs 128  Selects number of processors for python
                        multiprocessing application. If the program is
                        launched via MPI backend, this parameter is ignored.
  -b 128, --batchsize 128
                        Number of conformer ensembles to be processed in one
                        batch.
  -g <grid.hdf5>, --grid <grid.hdf5>
                        File that contains the information about the
  

## Grid

`grid` allows rectangular grid calculation of an existing molecule or conformer library with a variety of parameters. This is expanded upon in the cookbook.

In [23]:
!molli grid -h

usage: molli grid [-h] [-o <fpath>] [-n NPROCS] [-p 0.0] [-s 1.0]
                  [-b BATCHSIZE] [--prune [<max_dist>:<eps>]]
                  [--nearest [NEAREST]] [--overwrite] [--dtype DTYPE]
                  library

Read a molli library and calculate a grid

positional arguments:
  library               Conformer library file to perform the calculations on

options:
  -h, --help            show this help message and exit
  -o <fpath>, --output <fpath>
                        Destination for calculation results
  -n NPROCS, --nprocs NPROCS
                        Specifies the number of jobs for constructing a grid
  -p 0.0, --padding 0.0
                        The bounding box will be padded by this many angstroms
                        prior to grid construction
  -s 1.0, --spacing 1.0
                        Intervals at which the grid points will be placed
  -b BATCHSIZE, --batchsize BATCHSIZE
                        Number of molecules to be treated simulateneously
  --p

## List Names

`ls` allows access to a list of names in an existing conformer library or molecule library.

In [24]:
!molli ls -h

usage: molli ls [-h] [-t {mlib,clib,cdxml}] [-a [ATTRIB ...]] input

Read a molli library and list its contents.

positional arguments:
  input                 Collection to inspect. If type is not specified, it
                        will be deduced from file extensions or directory
                        properties.

options:
  -h, --help            show this help message and exit
  -t {mlib,clib,cdxml}, --type {mlib,clib,cdxml}
                        Collection type
  -a [ATTRIB ...], --attrib [ATTRIB ...]
                        Attributes to report. At least one must be specified.
                        Attributes are accessed via `getattr` function.
                        Possible options: `n_atoms`, `n_bonds`,
                        `n_attachment_points`, `n_conformers`
                        `molecular_weight`, `formula`. If none specified, only
                        the indexes will be returned.


## Parse

`parse` allows direct reading from a cdxml file to a molecule library. This by default does not perceive implicit hydrogens, but these can be added with the `hadd` option.

In [25]:
!molli parse -h

usage: molli parse [-h] [-f {cdxml}] [-o <fpath>] [--hadd] [--overwrite] file

This package parses chemical files, such as .cdxml, and creates a collection
of molecules in .mlib format.

positional arguments:
  file                  File to be parsed.

options:
  -h, --help            show this help message and exit
  -f {cdxml}, --format {cdxml}
                        Override the source file format. Defaults to the file
                        extension. Supported types: 'cdxml'
  -o <fpath>, --output <fpath>
                        Destination for .MLIB output
  --hadd                Add implicit hydrogen atoms wherever possible. By
                        default this only affects elements in groups 13-17.
  --overwrite           Overwrite the target files if they exist (default is
                        false)


## Recollect

`recollect` allows reading in of Molecule Library files, Conformer Library files, Zip Files, Molli 0.2 (Legacy) Zip Files, and Directories of molecules or conformer ensembles. 

In the event that files outside of `MOL2` or `XYZ` need to be read, one can use `openbabel` to leverage the interface `molli` has with this. Note: `openbabel` is not a dependency of `molli` and can be installed via `conda install openbabel`.

### Example 1 Conformer Library to SDF Directory

`molli recollect -it clib -i example.clib -p obabel -o example_sdf_dir -ot dir -oext sdf`

This would read from the `ConformerLibrary` file using `openbabel` to parse this to create a directory "example_sdf_dir" which contains multi-SDF files based on the `ConformerEnsemble` objects in the `Conformer Library

### Example 2 Zipfile to Molecule Library

`molli recollect -it zip -i example_mol2s.zip -iext mol2 -p molli -o example.mlib -ot mlib`

This would read from an existing zip file using `molli` to parse the files as `MOL2`. This would then be written to a Molecule Library file `example.mlib`. 

In [26]:
!molli recollect -h

usage: molli recollect [-h] [-i <PATH>] [-it {mlib,clib,dir,zip}]
                       [-iext INPUT_EXT] [-iconv {molecule,ensemble}]
                       [-o <PATH>] [-ot {mlib,clib,dir,zip}]
                       [-oext OUTPUT_EXT] [-l {molli,obabel,openbabel}]
                       [-cm 0 1] [-v] [-s] [--overwrite]

Read old style molli collection and convert it to the new file format.

options:
  -h, --help            show this help message and exit
  -i <PATH>, --input <PATH>
                        This is the input path
  -it {mlib,clib,dir,zip}, --input_type {mlib,clib,dir,zip}
                        This is the input type, including <mlib>, <.clib>,
                        <.zip>, <.xml>, <.ukv>, or directory (<dir>)
  -iext INPUT_EXT, --input_ext INPUT_EXT
                        This option is required if reading from a <zip> or
                        directory to indicate the File Type being searched for
                        (<mol2>, <xyz>, etc.)
  -iconv {molecu

## Show

`show` allows visualization via pyvista of a molecule or a molecule within a molecule library via pyvista directly from the command line.

In [27]:
!molli show -h

usage: molli show [-h] [-p PROGRAM] [-o OUTPUT] [-ot OTYPE]
                  [--bgcolor BGCOLOR] [--port PORT] [--parser PARSER]
                  [--no_confs]
                  library_or_mol [key]

Show a molecule in a GUI of choice

positional arguments:
  library_or_mol        This can be a molecule file or a Load all these
                        molecules from this library
  key                   Molecule to be shown. Only applies if the
                        `library_or_mol` argument is a molli collection.

options:
  -h, --help            show this help message and exit
  -p PROGRAM, --program PROGRAM
                        Run this command to get to gui. Special cases:
                        `pyvista`, `3dmol.js`, `http-3dmol.js`. Others are
                        interpreted as command path.
  -o OUTPUT, --output OUTPUT
                        If any temporary visualization files are producted,
                        they will be written in this destination. User is th

## Stats

`stats` allows for various statistics to be calculated within molecule or conformer libraries using an "expression" associated with the local variable `m`. For example, if I wanted to get the statistics associated with the number of conformers in a conformer library, I could use

`molli stats "m.n_conformers" example.clib -t clib`

This returns not only the number of ensembles in the library, but the mean, standard deviation, minimum, IQR1, median, IQR3, and maximum.

In [28]:
!molli stats -h 

usage: molli stats [-h] [-t {mlib,clib}] [-o OUTPUT] expression input

Calculate statistics on the collection

positional arguments:
  expression            What to count. Expression is evaluated with the local
                        variable `m` that corresponds to the object.
  input                 Collection to inspect. If type is not specified, it
                        will be deduced from file extensions or directory
                        properties.

options:
  -h, --help            show this help message and exit
  -t {mlib,clib}, --type {mlib,clib}
                        Collection type
  -o OUTPUT, --output OUTPUT
                        Output the results as a space-separated file


## Test

`test` runs all the unit tests available in `molli`. This will skip tests associated with `openbabel` and `rdkit` if they are not installed.

In [29]:
!molli test -h

usage: molli test [-h] [-v] [-q] [--locals] [-f] [-c] [-b]
                  [-k TESTNAMEPATTERNS]
                  [tests ...]

positional arguments:
  tests                a list of any number of test modules, classes and test
                       methods.

options:
  -h, --help           show this help message and exit
  -v, --verbose        Verbose output
  -q, --quiet          Quiet output
  --locals             Show local variables in tracebacks
  -f, --failfast       Stop on first fail or error
  -c, --catch          Catch Ctrl-C and display results so far
  -b, --buffer         Buffer stdout and stderr during tests
  -k TESTNAMEPATTERNS  Only run tests which match the given substring

Examples:
  molli test                           - run default set of tests
  molli test MyTestSuite               - run suite 'MyTestSuite'
  molli test MyTestCase.testSomething  - run MyTestCase.testSomething
  molli test MyTestCase                - run all 'test*' test methods
              