# iPhylo CLI Demo

Welcome. This is a simple demo explaining each of the modules and arguments’ functionality.

In this demo, you can see the acceptable arguments and the example output. You can also refer to the example input files and commands under the directory [example](example).




## Set up
Please use the Python 3 environment. The develope and test environment is **Python 3.8**.
Please keep your network available for the first load, as well as enough space on your disk (about 2G), as the initial run needs to fetch the online database resources.


## Packages environment
git clone https://github.com/ARise-fox/iPhylo-CLI.git to clone this project to your local directory.
Use `pip install -r requirements.txt` to preload the required packages.

## Overall

The iPhylo CLI includes several modules:

1. **phylotree**: the phylo tree module
2. **chemtree**: the chemical tree module based on the local ChemOnt database
3. **chemonline**: the chemical tree module based on the online ClassyFire API
4. **csv2tree**: the CSV to tree module
5. **NPtree**: the chemical tree module based on the local NPClassifier database
6. **NPonline**: the chemical tree online module based on the online NPClassifier API

Users are able to use any of the modules by running the command `python iphylo.py <module> <args>`. In the following illustration, we will split the command into elements in the 'command' variable.

## phylotree module
This module is used to generate a phylogenetic tree of organisms, run the following command for help:
`python iphylo.py phylotree -h`

Usage:
`iphylo.py phylotree [-h] (-i ITEMS | -f FILE | --subtree SUBTREE) [-o PREFIX] [-fn FNAME] [-bl] [-interrupt (--p | --c | --o | --f | --g | --s)]`

The following is the parameter documentation:
- `-h`, `--help`
    Show help message and exit.

- `-i ITEMS`, `--input ITEMS`
    Input, separate the entries with commas in English format, mixed input supported.
    Example: `-input "Homo sapiens,Mus musculus,9031,7227,562"`

- `-f FILE`, `--file FILE`
    Input file path, needs to be a .txt file, separate by new line.
    Example: `-file species_taxid_for_tree.txt`

- `--subtree SUBTREE`
    Draw subtree of a certain taxon.
    Example: `-subtree Mammalia`. You can also use `-input xx|subtree` instead.

- `-o PREFIX`, `--prefix PREFIX`
    Output file directory, default `iphylo_files/` in the current project.

- `-fn FNAME`, `--fname FNAME`
    Output file name.

- `-bl`, `--branch_length`
    Boolean, choose whether you need branch length for the tree. Default is False.

- `-interrupt`
    Interrupt the tree at a specified taxonomic level. You need to input the level parameter to make it work.
    Example: `-file "species_for_tree_short.txt" -interrupt --g`

- `--p`
    Phylum level.

- `--c`
    Class level.

- `--o`
    Order level.

- `--f`
    Family level.

- `--g`
    Genus level.

- `--s`
    Species level.



Here is a simple demo showcase the phylotree submodule. Entries are entered in the command line with the `-i` parameter.

In [1]:
import subprocess

In [23]:
command = [
    'python',
    'iphylo.py',
    'phylotree',
    '-i',
    "Homo sapiens,Mus musculus,Gallus gallus,Drosophila melanogaster,Escherichia coli"
]

# Instead, you can run the following command in Terminal. The two results are equivalent.
# python iphylo.py phylotree -i "Homo sapiens,Mus musculus,Gallus gallus,Drosophila melanogaster,Escherichia coli"

result = subprocess.run(command, capture_output=True, text=True)
print(result.stdout)


database connect success!
Searching for 5 items… 
|████████████████████████████████████████| 5/5 [100%] in 0.0s (648.40/s) 
Searching for 0 items for subtree… 
|████████████████████████████████████████| 0 in 0.0s (0.00/s) 
Generate Tree Success!
Tree and ASCII tree is saved to: /Users/liyueer/PycharmProjects/iphylo_cmd_release/iphylo_files/iPHYLO_Tree/iPHYLO_Tree.txt
5 leaves in tree
0 names are replaced!
Running time: 0.3869498330000001 Seconds



The outputs include tree in Newick, Nexus, PhyloXML,and structure representation in ASCII and PDF.

In [8]:
# show output file content
newick_path = 'iphylo_files/iPHYLO_Tree/iPHYLO_Tree.txt'

with open(newick_path, 'r') as file:
    file_content = file.read()

print(file_content)


((((((((s__Drosophila_melanogaster)g__Drosophila)f__Drosophilidae)o__Diptera)c__Insecta)p__Arthropoda,(((((s__Mus_musculus)g__Mus)f__Muridae)o__Rodentia,(((s__Homo_sapiens)g__Homo)f__Hominidae)o__Primates)c__Mammalia,((((s__Gallus_gallus)g__Gallus)f__Phasianidae)o__Galliformes)c__Aves)p__Chordata)k__Metazoa)d__Eukaryota,(((((((s__Escherichia_coli)g__Escherichia)f__Enterobacteriaceae)o__Enterobacterales)c__Gammaproteobacteria)p__Proteobacteria)k__Bacteria)d__Bacteria);



In [9]:
# show output ASCII
ASCII_path = 'iphylo_files/iPHYLO_Tree/iPHYLO_Tree_ascii_tree.txt'

with open(ASCII_path, 'r') as file:
    file_content = file.read()

print(file_content)


              _____ ______ _____ _____ _____ ______ s__Drosophila_melanogaster
             |
  _____ _____|             _____ _____ _____ ______ s__Mus_musculus
 |           |      ______|
 |           |_____|      |_____ _____ _____ ______ s__Homo_sapiens
_|                 |
 |                 |______ _____ _____ _____ ______ s__Gallus_gallus
 |
 |_____ _____ _____ ______ _____ _____ _____ ______ s__Escherichia_coli




Additionally, the program will generate a table `[file name]_items.csv` that includes species names, IDs, and classification information.

In [24]:
import pandas as pd
file_path = 'iphylo_files/iPHYLO_Tree/iPHYLO_Tree_items.csv'
df = pd.read_csv(file_path)
df

Unnamed: 0,taxid,input_term,domain,kingdom,phylum,class,order,family,genus,species
0,9606,Homo_sapiens,d__Eukaryota,k__Metazoa,p__Chordata,c__Mammalia,o__Primates,f__Hominidae,g__Homo,s__Homo_sapiens
1,9031,Gallus_gallus,d__Eukaryota,k__Metazoa,p__Chordata,c__Aves,o__Galliformes,f__Phasianidae,g__Gallus,s__Gallus_gallus
2,10090,Mus_musculus,d__Eukaryota,k__Metazoa,p__Chordata,c__Mammalia,o__Rodentia,f__Muridae,g__Mus,s__Mus_musculus
3,562,Escherichia_coli,d__Bacteria,k__Bacteria,p__Proteobacteria,c__Gammaproteobacteria,o__Enterobacterales,f__Enterobacteriaceae,g__Escherichia,s__Escherichia_coli
4,7227,Drosophila_melanogaster,d__Eukaryota,k__Metazoa,p__Arthropoda,c__Insecta,o__Diptera,f__Drosophilidae,g__Drosophila,s__Drosophila_melanogaster


Also, you can define the input in a file, using `-f FILE`, `--file FILE`. Take the input file `example/taxids_50.txt` as an example.
Set the output directory and file name using `-o PREFIX`, `--prefix PREFIX` and `-fn FNAME`, `--fname FNAME`

In [11]:
command = [
    'python',
    'iphylo.py',
    'phylotree',
    '-f',
    'example/taxids_50.txt',
    '-o',
    'iphylo_files',
    '-fn',
    'iPHYLO_Tree_2'
]
result = subprocess.run(command, capture_output=True, text=True)
print(result.stdout)

database connect success!
Searching for 50 items… 
|████████████████████████████████████████| 50/50 [100%] in 0.0s (684931.51/s) 
Searching for 0 items for subtree… 
|████████████████████████████████████████| 0 in 0.0s (0.00/s) 
Generate Tree Success!
Tree and ASCII tree is saved to: iphylo_files/iPHYLO_Tree_2/iPHYLO_Tree_2.txt
50 leaves in tree
0 names are replaced!
Running time: 0.36514125 Seconds



Here is an example use case that specifies generating a tree with branch lengths using the `-bl` parameter and filtering at the genus level using the `-interrupt`, `--g` parameter.

In [16]:
command = [
    'python',
    'iphylo.py',
    'phylotree',
    '-i',
    'Homo sapiens,Mus musculus,Gallus gallus,Drosophila melanogaster,Escherichia coli',
    '-bl',
    '-interrupt',
    '--g'
]

result = subprocess.run(command, capture_output=True, text=True)


In the result file, all branches are set length to 1 and all leave nodes are at genus level.

In [17]:
# show output
file_path = 'iphylo_files/iPHYLO_Tree/iPHYLO_Tree.txt'
ASCII_path = 'iphylo_files/iPHYLO_Tree/iPHYLO_Tree_ascii_tree.txt'


with open(file_path, 'r') as file:
    file_content = file.read()
print(file_content)

with open(ASCII_path, 'r') as file:
    ASCII_content = file.read()
print(ASCII_content)

(((((((g__Escherichia:1.00000)f__Enterobacteriaceae:1.00000)o__Enterobacterales:1.00000)c__Gammaproteobacteria:1.00000)p__Proteobacteria:1.00000)k__Bacteria:1.00000)d__Bacteria:1.00000,((((((g__Homo:1.00000)f__Hominidae:1.00000)o__Primates:1.00000,((g__Mus:1.00000)f__Muridae:1.00000)o__Rodentia:1.00000)c__Mammalia:1.00000,(((g__Gallus:1.00000)f__Phasianidae:1.00000)o__Galliformes:1.00000)c__Aves:1.00000)p__Chordata:1.00000,((((g__Drosophila:1.00000)f__Drosophilidae:1.00000)o__Diptera:1.00000)c__Insecta:1.00000)p__Arthropoda:1.00000)k__Metazoa:1.00000)d__Eukaryota:1.00000):1.00000;

         _______ _______ _______ ______ _______ _______ _______ g__Escherichia
        |
        |                               _______ _______ _______ g__Homo
________|                        ______|
        |                _______|      |_______ _______ _______ g__Mus
        |               |       |
        |_______ _______|       |______ _______ _______ _______ g__Gallus
                        |
    

You can quickly obtain a full-clade tree for any taxonomic level using the `|subtree` operator. For example, to get the tree with all descendants within the specified common ancestor clade "Primates", use `-i Primates|subtree` or `--subtree Primates`.

In [14]:
command = [
    'python',
    'iphylo.py',
    'phylotree',
    '--subtree',
    'Primates'
]

result = subprocess.run(command, capture_output=True, text=True)
file_path = 'iphylo_files/iPHYLO_Tree/iPHYLO_Tree.txt'
with open(file_path, 'r') as file:
    file_content = file.read()
print(file_content)

((((((((s__Cercopithecus_neglectus,(Cercopithecus_mona_campbelli,Cercopithecus_mona_mona)s__Cercopithecus_mona,(Cercopithecus_mitis_opisthostictus,Cercopithecus_mitis_stuhlmanni,Cercopithecus_mitis_mitis,Cercopithecus_mitis_heymansi,Cercopithecus_mitis_boutourlinii)s__Cercopithecus_mitis,s__Cercopithecus_diana,(Cercopithecus_ascanius_whitesidei,Cercopithecus_ascanius_schmidti,Cercopithecus_ascanius_katangae,Cercopithecus_ascanius_ascanius)s__Cercopithecus_ascanius,(Cercopithecus_pogonias_schwarzianus,Cercopithecus_pogonias_pogonias,Cercopithecus_pogonias_nigripes,Cercopithecus_pogonias_grayi)s__Cercopithecus_pogonias,(Cercopithecus_erythrogaster_erythrogaster,Cercopithecus_erythrogaster_pococki)s__Cercopithecus_erythrogaster,(Cercopithecus_campbelli_lowei)s__Cercopithecus_campbelli,(Cercopithecus_nictitans_nictitans,Cercopithecus_nictitans_martini)s__Cercopithecus_nictitans,(Cercopithecus_hamlyni_hamlyni)s__Cercopithecus_hamlyni,(Cercopithecus_cephus_ngottoensis,Cercopithecus_cephus_ce

## chemtree module
In the chemtree module, you can generate a chemical taxonomic tree using the ChemOnt classification system based on the local database. This database includes 801,308 functional compounds compiled from the MONA, GNPS, and NIST databases.
To view parameter help, please run the command  `python iphylo.py chemtree -h`.

Usage:
`iphylo.py chemtree [-h] (-i ITEMS | -f FILE) [-o PREFIX] [-fn FNAME] [-bl] [-interrupt (--Super | --Class | --Sub)] `

In the chemical module, we recommend uploading compound information from a file. The accepted chemical identifiers for file uploads are InChIKey, InChI, and isomeric SMILES. When using the `-i` command to input data from the keyboard, only InChIKey is accepted.

The followings are few demos:


In [26]:
# Generate a chemical tree from files
command = [
    'python',
    'iphylo.py',
    'chemtree',
    '-f',
    'example/inchikeys.txt'
]

result = subprocess.run(command, capture_output=True, text=True)
file_path = 'iphylo_files/iPHYLO_Tree/iPHYLO_Tree.txt'
with open(file_path, 'r') as file:
    file_content = file.read()
print(file_content)

(((((498799,514138,918571,1110037)Alcohols_and_polyols)Organooxygen_compounds)Organic_oxygen_compounds,(((498798,506424,933921,1124895)Triradylcglycerols,(1201720,445446)Glycosylglycerols,(1063982,1194005)Diradylglycerols)Glycerolipids,((254057,514064)Glycosphingolipids,(246370)Phosphosphingolipids,(636000)Ceramides)Sphingolipids,((605797)Glycerophosphoserines,(544862,720469)Glycerophosphocholines,(323154,414746)Glycerophosphoglycerophosphoglycerols,(956983)Glycerophosphoethanolamines)Glycerophospholipids,((430159,537138,773676,712717,712712)Fatty_acid_esters)Fatty_Acyls,(1063981)Saccharolipids)Lipids_and_lipid_like_molecules,(((972354,1056314)Quaternary_ammonium_salts)Organonitrogen_compounds)Organic_nitrogen_compounds)Organic_compounds);



To avoid errors in Newick format parsing caused by symbols in the names of compounds in the chemical tree, use numerical identifiers for each input compound.

The correspondence table between the IDs and compound classification information is available in the `items.csv` file.

In [30]:
import pandas as pd
file_path = 'iphylo_files/iPHYLO_Tree/iPHYLO_Tree_items.csv'
df = pd.read_csv(file_path)
df

Unnamed: 0,id,name,inchi,inchikey,inchikey_Planar,SMILES,kingdom,superclass,class,subclass,parent_level_1,parent_level_2
0,602499,N-Acetylserotonin - 40.0 eV,,MVAWJSIDNICKHF-UHFFFAOYSA-N,,CC(=O)NCCc1c[nH]c2c1cc(cc2)O,Organic_compounds,Organoheterocyclic_compounds,Indoles_and_derivatives,Hydroxyindoles,,
1,4232,"Acetamide, N-[2-(6-hydroxy-5-methoxy-1H-indol-...",,OMYMRCXOJJZYKE-UHFFFAOYSA-N,,,Organic_compounds,Organoheterocyclic_compounds,Indoles_and_derivatives,Hydroxyindoles,,
2,719369,"1,2,3-.eta.-4,5,6-.eta.-Pentalenebis(cobalt di...",,VTXICXWVUFRZHJ-UHFFFAOYSA-N,,,Organic_compounds,Hydrocarbons,Unsaturated_hydrocarbons,Olefins,Cyclic_olefins,Pentalenes
3,4233,MLS000860057-01!6-Hydroxymelatonin,,OMYMRCXOJJZYKE-UHFFFAOYSA-N,,COc1cc2c(CCNC(=O)C)c[nH]c2cc1O,Organic_compounds,Organoheterocyclic_compounds,Indoles_and_derivatives,Hydroxyindoles,,
4,350601,MLS001074886-01!BIO,,JBFAMTBDRIBCKO-UHFFFAOYSA-N,,ON=C1C(=Nc2ccccc12)c3c(O)[nH]c4cc(Br)ccc34,Organic_compounds,Organoheterocyclic_compounds,Indoles_and_derivatives,Hydroxyindoles,,
5,444168,"Pentan-1-one, 2-hydroxy-1-(2-hydroxy-3-indolyl...",,UGGBBXIKZRXTJW-VHEBQXMUSA-N,,,Organic_compounds,Organoheterocyclic_compounds,Indoles_and_derivatives,Hydroxyindoles,,
6,299543,"Butan-1-one, 1-(5-bromo-2-hydroxy-3-indolylazo...",,NCJVDEKFYQSEFS-ISLYRVAYSA-N,,,Organic_compounds,Organoheterocyclic_compounds,Indoles_and_derivatives,Hydroxyindoles,,
7,376087,"Pentan-1-one, 1-(5-bromo-2-hydroxy-3-indolylaz...",,GUGHLEKVUFXOKN-FMQUCBEESA-N,,,Organic_compounds,Organoheterocyclic_compounds,Indoles_and_derivatives,Hydroxyindoles,,
8,612254,"1H-Indol-5-ol, 4-phenyl-3-(2-phenylethyl)-",,GFIXGUSGVTTZKF-UHFFFAOYSA-N,,,Organic_compounds,Organoheterocyclic_compounds,Indoles_and_derivatives,Hydroxyindoles,,
9,612771,6-Hydroxy-3-[2-methyl-2-[2-hydroxy-3-(2-cyanop...,,IUKUBLQOWNHWPE-UHFFFAOYSA-N,,,Organic_compounds,Organoheterocyclic_compounds,Indoles_and_derivatives,Hydroxyindoles,,


The chemical taxonomic tree, as structured by the ChemOnt classification system, organizes compounds hierarchically from the highest to the lowest levels: Kingdom, Superclass, Class, Subclass, Parent Level 1, and Parent Level 2.

In this use case, we interrupt the result tree at Class level.

In [27]:
## Interrupt the tree at Class level, using '-interrupt' and '--Class'
## Other parameters: --Super for Superclass, --sub for Subclass
command = [
    'python',
    'iphylo.py',
    'chemtree',
    '-f',
    'example/inchikeys.txt',
    '-interrupt',
    '--Class'
]
result = subprocess.run(command, capture_output=True, text=True)
file_path = 'iphylo_files/iPHYLO_Tree/iPHYLO_Tree.txt'
with open(file_path, 'r') as file:
    file_content = file.read()
print(file_content)

((((918571,1110037,514138,498799)Organooxygen_compounds)Organic_oxygen_compounds,((933921,1194005,1124895,1063982,445446,498798,506424,1201720)Glycerolipids,(956983,605797,414746,544862,720469,323154)Glycerophospholipids,(636000,254057,246370,514064)Sphingolipids,(773676,430159,537138,712717,712712)Fatty_Acyls,(1063981)Saccharolipids)Lipids_and_lipid_like_molecules,((972354,1056314)Organonitrogen_compounds)Organic_nitrogen_compounds)Organic_compounds);



In the chemical module, you can also generate a subtree.

Unlike biological phylogenetic trees, where the input is a species name, here the input is a chemical classification name.

 For example, you can generate subtrees for the chemical classifications "Pentalenes" and "Hydroxyindoles".

In [31]:
## Build a subtree
command = [
    'python',
    'iphylo.py',
    'chemtree',
    '-i',
    'Pentalenes|subtree,Hydroxyindoles|subtree'
]
subprocess.run(command, text=True)
file_path = 'iphylo_files/iPHYLO_Tree/iPHYLO_Tree.txt'
with open(file_path, 'r') as file:
    file_content = file.read()
print("\n", file_content)

database connect success!
|████████████████████████████████████████| 2/2 [100%] in 0.0s (714.71/s) 
Generate Tree Success!
Tree and ASCII tree is saved to: /Users/liyueer/PycharmProjects/iphylo_cmd_release/iphylo_files/iPHYLO_Tree/iPHYLO_Tree.txt
39 leaves in tree
Running time: 0.403202375 Seconds

 (((((1053055,602494,20858,1107958,1053040,236399,840558,694765,85354,749161,405859,405219,487770,336856,619853,375500,49098,594888,594882,16959,1137088,756671,657597,672184,260534,786741,1068596,1009710,612771,612254,376087,299543,444168,350601,4233,4232,602499)Hydroxyindoles)Indoles_and_derivatives)Organoheterocyclic_compounds,(((1109859,719369)Olefins)Unsaturated_hydrocarbons)Hydrocarbons)Organic_compounds);



# chemonline module
This module is an extension of the chemtree module, which allows you to query compounds categorized outside of the local database, and requires a network connection for its use.

In this module, classifications also use the ChemOnt classification system, which is implemented by calling the ClassyFire API. Currently, this module supports queries for over 70 million chemicals.

Usage:
`iphylo.py chemonline [-h] (-i ITEMS | -f FILE) [-o PREFIX] [-fn FNAME] [-bl] [-interrupt (--Super | --Class | --Sub)] [-x THREADS]`

The input chemicals MUST be in InChIKey format. This requirement must be especially noted.

The parameter list is similar to the parameters mentioned in the phylotree module, with the addition of a parameter for specifying multithreading.
- `-x THREADS, --threads THREADS`
   Threads number, default 3.


In [38]:
# python iphylo.py chemonline -f 'example/inchikeys_for_online.txt' -x 12
# threads num: 12
command = [
    'python',
    'iphylo.py',
    'chemonline',
    '-f',
    'example/inchikeys_for_online.txt',
    '-x',
    '12'
]
subprocess.run(command, text=True)
file_path = 'iphylo_files/iPHYLO_Tree/iPHYLO_Tree.txt'
with open(file_path, 'r') as file:
    file_content = file.read()
print("\n", file_content)

on 0: Search round 1,  13 items to process
|████████████████████████████████████████| 13/13 [100%] in 1:06.6 (0.20/s) 
on 0: Search round 2,  0 items to process
|████████████████████████████████████████| 0 in 2.0s (0.00/s) 
on 0: Search round 3,  0 items to process
|████████████████████████████████████████| 0 in 2.0s (0.00/s) 
Generate Tree Success!
Tree and ASCII tree is saved to: /Users/liyueer/PycharmProjects/iphylo_cmd_release/iphylo_files/iPHYLO_Tree/iPHYLO_Tree.txt
13 leaves in tree
------These 0 inchikeys are not found in classyfire------
[]

 (((((13,7)Glycosphingolipids)Sphingolipids,(((9)Phosphatidylethanolamines)Glycerophosphoethanolamines,((8)Phosphatidylcholines)Glycerophosphocholines,((6)Phosphatidylglycerols)Glycerophosphoglycerols,(3)Lysobisphosphatidic_acids)Glycerophospholipids,(((5)Triacylglycerols)Triradylcglycerols)Glycerolipids,((1)Triterpenoids)Prenol_lipids)Lipids_and_lipid_like_molecules,(((((12)Aminoglycosides)Aminosaccharides)Carbohydrates_and_carbohydrate_co

## NPtree module

The NPtree module allows you to build a chemical taxonomic tree using the InChI, InChIKeys, or isomeric SMILES of compounds. The classification system NPClassifier is used, but performed locally.

NPClassifier system categorizes compounds into Pathway, Superclass, and Class. Thus, you are allowed to interrupt the tree at "pathway" or "superclass" level.

usage: `iphylo.py NPtree [-h] [-h] (-i ITEMS | -f FILE) [-o PREFIX] [-fn FNAME] [-bl] [-interrupt (--pathway | --superclass)] `


In [36]:
# python iphylo.py NPtree -i "OZVHLPSQDONQQJ-BRFBDGPJSA-N,PMKLGQLZGZOFRC-IKPAITLHSA-N,QUZVJYJCAHCGDH-NQLNTKRDSA-N,RBDQPRHHJDVFEQ-XYCAVOTASA-N" -interrupt --superclass
# input InChIKeys using "-i", interrupt at superclass level
command = [
    'python',
    'iphylo.py',
    'NPtree',
    '-i',
    'OZVHLPSQDONQQJ-BRFBDGPJSA-N,PMKLGQLZGZOFRC-IKPAITLHSA-N,QUZVJYJCAHCGDH-NQLNTKRDSA-N,RBDQPRHHJDVFEQ-XYCAVOTASA-N',
    '-interrupt',
    '--superclass'
]
subprocess.run(command, text=True)
file_path = 'iphylo_files/iPHYLO_Tree/iPHYLO_Tree.txt'
with open(file_path, 'r') as file:
    file_content = file.read()
print("\n", file_content)

database connect success!
|████████████████████████████████████████| 4/4 [100%] in 0.0s (3592.54/s) 
Generate Tree Success!
Tree and ASCII tree is saved to: /Users/liyueer/PycharmProjects/iphylo_cmd_release/iphylo_files/iPHYLO_NP_Tree/iPHYLO_NP_Tree.txt
4 leaves in tree
Running time: 0.344253833 Seconds

 ((((((13)Phosphatidylethanolamines)Glycerophosphoethanolamines,((11)Phosphatidylglycerols)Glycerophosphoglycerols,(7)Lysobisphosphatidic_acids,((6)Phosphatidylcholines)Glycerophosphocholines)Glycerophospholipids,((8,5)Glycosphingolipids)Sphingolipids,((4)Triterpenoids)Prenol_lipids,(((3)Triacylglycerols)Triradylcglycerols)Glycerolipids)Lipids_and_lipid_like_molecules,((((12,9)Secondary_alcohols,((2)Cyclitols_and_derivatives)Cyclic_alcohols_and_derivatives)Alcohols_and_polyols,(((10)Aminoglycosides)Aminosaccharides)Carbohydrates_and_carbohydrate_conjugates)Organooxygen_compounds)Organic_oxygen_compounds,((1)Pyranodioxins)Organoheterocyclic_compounds)Organic_compounds);



## NPonline Module

The NPonline module is designed to build a chemical taxonomic tree using the isomeric SMILES of compounds.

This module only accepts input in the form of a file, and the compounds must be specified as isomeric SMILES.

It leverages the NPClassifier’s API, which allows for the retrieval of more than 1.3 million natural products, ensuring comprehensive classification and accurate results.

Usage:
`iphylo.py NPonline [-h] <-f FILE> [-o PREFIX] [-fn FNAME] [-bl] [-interrupt (--pathway | --superclass)] [-x THREADS]`


In [2]:
# python iphylo.py NPonline -f "./example/smiles.txt" -x 10
command = [
    'python',
    'iphylo.py',
    'NPonline',
    '-f',
    './example/smiles.txt',
    '-x',
    '10'
]
subprocess.run(command, text=True)
file_path = 'iphylo_files/iPHYLO_Tree/iPHYLO_Tree.txt'
with open(file_path, 'r') as file:
    file_content = file.read()
print("\n", file_content)

on 0: Search round 1,  20 items to process
on 20: Exiting Main Thread
|████████████████████████████████████████| 20/20 [100%] in 2.2s (9.08/s) 
on 0: Search round 2,  0 items to process
on 0: Exiting Main Thread
|████████████████████████████████████████| 0 in 0.0s (0.00/s) 
on 0: Search round 3,  0 items to process
on 0: Exiting Main Thread
|████████████████████████████████████████| 0 in 0.0s (0.00/s) 
Generate Tree Success!
Tree and ASCII tree is saved to: /Users/liyueer/PycharmProjects/iphylo_cmd_release/iphylo_files/NP_Tree/NP_Tree.txt
20 leaves in tree

 (((((13,7)Glycosphingolipids)Sphingolipids,(((9)Phosphatidylethanolamines)Glycerophosphoethanolamines,((8)Phosphatidylcholines)Glycerophosphocholines,((6)Phosphatidylglycerols)Glycerophosphoglycerols,(3)Lysobisphosphatidic_acids)Glycerophospholipids,(((5)Triacylglycerols)Triradylcglycerols)Glycerolipids,((1)Triterpenoids)Prenol_lipids)Lipids_and_lipid_like_molecules,(((((12)Aminoglycosides)Aminosaccharides)Carbohydrates_and_carbohy

## CSV2Tree module

You can use this module to build a tree for any data with a hierarchy.

The data should be presented in a csv table, with each row representing a categorized piece of information and each column representing a categorization level.

usage: `iphylo.py csv2tree [-h] <-f FILE> [-o PREFIX] [-fn OUT_NAME] [-fg] [-xh] [-bl]`

Explanation:

- `-fg`, `--fill-gap`
    fill the classification gap (some missing classificaltion level) in each row
- `-xh`, `--header`
    remove input file's header row

Here is an example of statistical analysis methods:

In [3]:
import pandas as pd
file_path = 'example/statistical_methods.csv'
df = pd.read_csv(file_path)
df

Unnamed: 0,Statistical Analysis,Hypothesis Tests,Parametric Tests,One Sample,One sample t test,Unnamed: 5
0,Statistical Analysis,Hypothesis Tests,Parametric Tests,One Sample,z test,
1,Statistical Analysis,Hypothesis Tests,Parametric Tests,Two Samples,Independent Samples,Two-group t test
2,Statistical Analysis,Hypothesis Tests,Parametric Tests,Two Samples,Independent Samples,Z test
3,Statistical Analysis,Hypothesis Tests,Parametric Tests,Two Samples,Paired Samples,Paired t test
4,Statistical Analysis,Hypothesis Tests,Nonparametric Tests,One Sample,Chi-square,
5,Statistical Analysis,Hypothesis Tests,Nonparametric Tests,One Sample,Kolmogorov-Smirnov,
6,Statistical Analysis,Hypothesis Tests,Nonparametric Tests,One Sample,Runs,
7,Statistical Analysis,Hypothesis Tests,Nonparametric Tests,One Sample,Binomial,
8,Statistical Analysis,Hypothesis Tests,Nonparametric Tests,Two Samples,Independent Samples,Chi-square
9,Statistical Analysis,Hypothesis Tests,Nonparametric Tests,Two Samples,Independent Samples,Mann-Whitney


In [7]:
# python iphylo.py csv2tree -f 'example/statistical_methods.csv' -fn statistical_methods

import subprocess

command = [
    'python',
    'iphylo.py',
    'csv2tree',
    '-f',
    'example/statistical_methods.csv',
    '-fn',
    'statistical_methods'
]
subprocess.run(command, text=True)
file_path = 'iphylo_files/statistical_methods.txt'
with open(file_path, 'r') as file:
    file_content = file.read()
print("\n", file_content)


There are same child nodes with different parent nodes:

Child Node: One_Sample
Parent Nodes: Parametric_Tests, Nonparametric_Tests

Child Node: Two_Samples
Parent Nodes: Parametric_Tests, Nonparametric_Tests

Child Node: Multiple_Samples
Parent Nodes: Parametric_Tests, Nonparametric_Tests

Child Node: Independent_Samples
Parent Nodes: Multiple_Samples, Two_Samples

Child Node: Paired_Samples
Parent Nodes: Multiple_Samples, Two_Samples

Child Node: Chi-square
Parent Nodes: Independent_Samples, Paired_Samples

generate tree success
The tree is saved to: /Users/liyueer/PycharmProjects/iphylo_cmd_release/iphylo_files/statistical_methods.txt

 (((((DCCA,CCA)Unimodal_model,(RDA)Linear_model)Direct_Gradient_Analysis,((DCA,CA,PCA)Eigenanalysis-based_approaches,(Polar_ordination,PCoA,NMDS)Distance-based_approaches)Indirect_Gradient_Analysis)Ordination_Methods,((((Repeated_Measures_ANOVA)Paired_Samples,(One-way_ANOVA)Independent_Samples)Multiple_Samples,((Paired_t_test)Paired_Samples,(Z_test,T

This is a case using `-fg`, `--fill_gap` to fill the column gaps in the data frame, and remove input file’s header row

The NaN gap in Col3, Row4 will be replaced by "Two_Samples+", according to it's child's name "Two Samples"

In [9]:
import pandas as pd
file_path = 'example/statistical_methods_2.csv'
df = pd.read_csv(file_path)
df

Unnamed: 0,col_1,col_2,col_3,col_4,col_5,col_6
0,Statistical Analysis,Hypothesis Tests,Parametric Tests,One Sample,One sample t test,
1,Statistical Analysis,Hypothesis Tests,Parametric Tests,One Sample,z test,
2,Statistical Analysis,Hypothesis Tests,Parametric Tests,Two Samples,Independent Samples,Two-group t test
3,Statistical Analysis,Hypothesis Tests,Parametric Tests,Two Samples,Independent Samples,Z test
4,Statistical Analysis,Hypothesis Tests,,Two Samples,Paired Samples,Paired t test
5,Statistical Analysis,Hypothesis Tests,Nonparametric Tests,One Sample,Chi-square,
6,Statistical Analysis,Hypothesis Tests,Nonparametric Tests,One Sample,Kolmogorov-Smirnov,
7,Statistical Analysis,Hypothesis Tests,Nonparametric Tests,One Sample,Runs,
8,Statistical Analysis,Hypothesis Tests,Nonparametric Tests,One Sample,Binomial,
9,Statistical Analysis,Hypothesis Tests,Nonparametric Tests,Two Samples,Independent Samples,Chi-square


In [11]:
# python iphylo.py csv2tree -f 'example/statistical_methods_2.csv' -fg -xh

import subprocess

command = [
    'python',
    'iphylo.py',
    'csv2tree',
    '-f',
    'example/statistical_methods_2.csv',
    '-fg',
    '-xh'
]
subprocess.run(command, text=True)
file_path = 'iphylo_files/iPHYLO_Tree.txt'
with open(file_path, 'r') as file:
    file_content = file.read()
print("\n", file_content)


There are same child nodes with different parent nodes:

Child Node: One_Sample
Parent Nodes: Nonparametric_Tests, Parametric_Tests

Child Node: Two_Samples
Parent Nodes: nan, Nonparametric_Tests, Parametric_Tests

Child Node: Multiple_Samples
Parent Nodes: Nonparametric_Tests, Parametric_Tests

Child Node: Chi-square
Parent Nodes: Independent_Samples, Paired_Samples

generate tree success
The tree is saved to: /Users/liyueer/PycharmProjects/iphylo_cmd_release/iphylo_files/iPHYLO_Tree.txt

 (((((DCCA,CCA)Unimodal_model,(RDA)Linear_model)Direct_Gradient_Analysis,((DCA,CA,PCA)Eigenanalysis-based_approaches,(Polar_ordination,PCoA,NMDS)Distance-based_approaches)Indirect_Gradient_Analysis)Ordination_Methods,((((Repeated_Measures_ANOVA)Repeated_Measures_ANOVA+,(One-way_ANOVA)One-way_ANOVA+)Multiple_Samples,((Z_test,Two-group_t_test)Independent_Samples)Two_Samples,(z_test)One_Sample)Parametric_Tests,(((Cochran_s_Q_test)Cochran_s_Q_test+,(Kendall_s_W_test)Kendall_s_W_test+,(Friedman_test)Frie