# Advanced uses of Pear
Here we introduce some more elaborated ways we can use pear to perform in-depth analyses and produce effective representations of the embedded distance matrices.

## pear.toml
Pear will automatically look for a `pear.toml` file in the working directory, alternatively, a `.toml` file can be specified using the `--config` flag. A `.toml` file is just a convenient way of specifying many parameters. Doing so in an auxiliary file remove excessive clutter in the use of pear and promotes a more standardized way of performing a series of analyses. 
<br>We will guide you through the use of this additional tool.

In [2]:
!cat pear.toml

# This is an instance of a config TOML for pear_ebi

# [trees] and [dir] sections allow one to specify the files
# containing the set of trees in newick format.
# They can be used at the same time, the file
# selected are going to be compared all together.

[trees] # file entries specify the path to a single file
#file1 = "Pear-EBI/examples_tree_sets/beast_trees/beast_run1.trees" # filen = "path/to/file"
#file2 = "Pear-EBI/examples_tree_sets/beast_trees/beast_long.trees"

#[dir] # specify the path to a directory
#dir1 = "Pear-EBI/examples_tree_sets/beast_trees/" # "path/to/directory"
#pattern = '*run2*' # pattern of files to be analyzed

#[collection]
# output_file = None # name of output file where the distance matrix is written
# distance_matrix = None # file with distance matrix if this has been precomputed
# metadata = None # file with dataframe containing metadata compatible with the collection

#[highlight] # allows one to highlight specifi

Here you can see the `pear.toml` example stored in the same directory of this notebook. It is a perfect template for your future analyses!
<br>Let's go thorugh all its parts:<ul>
    <li> `[trees]` contains single file specifications. Each line associated with this key should direct pear to a file containing trees in Newick format. The nomenclature is "file$n$=filename", where $n$ is just the index of the file, whereas filename is the path to the file itself.
    <li> `[dir]` contains directory and pattern specifications. Each directory should contain only tree-containing files, and should be indicated with a . Alternatively, a `pattern` can be indicated to narrow the research of the files.
    <li> `[collection]` stores details related to the `tree_set` or `set_collection`:<ul>
        <li> `output_file` specifies an alternative name and path for the distance matrix file;
        <li> `distance_matrix` indicates the path of a precomputed distance matrix;
        <li> `metadata` indicates a `.csv` file containing metadata compatible with the collection. That means that the number of rows in the file should be equal to the number of trees in the collection. The information stored in metadata can be of any type (discrete or continuous) and can subsequently used in the representation of your data in the 3D embedding instead of the 3$^{rd}$ dimension, or to color the points (trees). </ul>
    <li> `[highlight]` allows for specifying specific trees in the set/collection which are going to highlighted in the final plots. The way one specifies this is by giving a list of indexes indicating which trees to be highlighted for a given set (either if that is part of a collection or not). You specify a list for a set by writing "file$n$"if the file has been indexed as such in the `[files]` argument. Otherwise by using the name of the file (without extension: filename.trees is just filename) if the file has been specified through `[dir]` selection.   
    <li> `[distance]` specifies the `method` used to compute the distance matrix. It can be chosen among `hashrf_RF`, `hashrf_wRF`, `smart_RF`, `tqdist_quartet`, `tqdist_triplet`.
    <li> `[embedding]` specifies the `method` used to compute the embedding of the distance matrix, the `dimensions` of the embedding, and whether to display the `quality` or not. Methods are `pca`(pcoa), `tsne`, `isomap`, `lle`.
    <li> `[plot]` defines some aspects of the plots produced by pear:<ul>
        <li>`name_plot` specifies the name of the plot produced;
        <li>`plot_meta` indicates which feature to use to color the points in the graph, default value is `SET-ID` which simply colors by `tree_set`. A `STEP` meta-variable is present and indicates the index of a tree in a `tree_set`, it can be used to color trees when the ordering is important. Other meta-variables can be specified through the `[metadata]` argument.
        <li>`select` indicates whether the graph should have a set of interactive buttons to display/hide specific `tree_set`s or not. 
        <li>`same_scale` indicates whether the same colorscale should be applied to every `tree_set` or not.
        <li>`show` specifies whether the plot should be shown or not.
</ul>
<b>Note that</b> all these arguments are optional, and many of them can be specified otherwise using the normal functionalities of pear. In fact, should any of these arguments be specified using the `.toml` structure and the flags in pear, the arguments will be overscribed by the ones indicated on the command line. On an additional note related to this, the flag `--meta` allows to specify on the command line a metadata file, replicating the behaviour of the `metadata` argument in `[collection]`.  

****

## Examples

In [None]:
!pear_ebi --config example_1.toml

[34mPEAR v0.[0m[1;34m1.68[0m
[37mLooking into directory [0m[35m..[0m[35m/beast_trees/[0m[35m [0m[37m- pattern: [0m[35m*run2*[0m
[95mYour input:[0m
─────────────────────────────            
 Tree set collection containing [1;36m3003[0m trees;            
 File: Set_collection_[93mda362999-8cb2-4d53-be47-6685abc25dea[0m;
 Distance matrix: not computed.                
───────────────────────────── 
beast_run1; Containing [1;36m1001[0m trees. 
beast_long; Containing [1;36m1001[0m trees. 
beast_run2; Containing [1;36m1001[0m trees. 

[2K[32m⠹[0m [1;32mCalculating distances...[0m0m
[1A[2K[1;34mhashrf_RF | Done![0m
[2K[32m⠧[0m [1;32mEmbedding distances...[0m0m

In [4]:
!pear_ebi --help

usage: PEAR [-h] [-o output] [--interactive] [-d distance_matrix]
            [--meta metadata] [-m METHOD] [--pca PCA] [--tsne TSNE] [--plot]
            [--config CONFIG] [--quality] [-dir DIR] [--pattern PATTERN]
            [input ...]

PEAR-EBI v0.1.68 | Phylogeny Embedding and Approximate Representation
Calculates Robison-Foulds distances between large set of trees

positional arguments:
  input                 input file : tree set in newic format

optional arguments:
  -h, --help            show this help message and exit
  -o output             output file : storage of distance matrix
  --interactive, -i     run the program in interactive mode
  -d distance_matrix, --dM distance_matrix
                        distance matrix : file of the distance matrix
  --meta metadata       metadata : csv file with metadata for each tree
  -m METHOD, --method METHOD
                        calculates tree distances using specified method
                        (hashrf_