# Merging of Metabolic Network Graphs: ANI Calculations

The purpose of this notebook is to identify criteria for classifying GFMs (without 16S sequence data) as members of a particular tribe, based on shared amino acid identity (ANI) and genome coverage. The metabolic models of these genomes will then be merged to give a model which is better representative of a particular tribe. Our reverse ecology analysis will then be performed on these merged models.

## Merging Genomes Based on ANI
Pairwise ANI were computed by Sarah using the method in Goris et al IJSEM 2007. This notebook examines pairwise ANI and coverage values between all members within a tribe to identify minimum ANI and coverage criterion for adding a new genome to the tribe. Then I identify genomes which are candidates for inclusion within a tribe (based on phylogeny) and determine whether or not they can be included within a tribe (based on ANI and coverage).

#### References
1. Konstantinidis, K. T., & Tiedje, J. M. (2005). Genomic insights that advance the species definition for prokaryotes. Proceedings of the National Academy of Sciences, 102(7), 2567–2572.
2. Goris, J., Konstantinidis, K. T., Klappenbach, J. A., Coenye, T., Vandamme, P., & Tiedje, J. M. (2007). DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. International Journal of Systematic and Evolutionary Microbiology, 57(1), 81–91.

The first chunk of code identifies imports the Python packages necessary for this analysis.

In [1]:
# Import special features for iPython
import sys
sys.path.append('../Python')
import matplotlib
%matplotlib inline

# Import Python modules 
# These custom-written modules should have been included with the package
# distribution. 
import pairwiseANIFunctions as ANI

# Define local folder structure for data input and processing.
externalDataDir = 'ExternalData'

### Using SAGs to Compute Cutoffs

I want to identify ANI and coverage cutoffs, such that any two genomes belonging to the same tribe have ANI and coverage greater than this value. The function below computes the minimum and maximum pairwise ANI (or coverage) for all genomes belonging to the same tribe.

In [2]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomySAGs.csv')
ANI.sameTribePairwiseANI(externalDataDir, pairwiseANI, taxonClass, tribes, 'ANI_out')

Unnamed: 0,Samples,Num Samples,Max,Min
Iluma-A2,[AAA027E14],1,99.97,99.97
Iluma-B1,[AAA027L17],1,99.54,99.54
Iluma-B2,[AAA028K15],1,99.53,99.53
Luna1-A2,"[AAA028P02, Rhodoluna]",2,100.0,76.68
acI-A1,"[AAA027M14, AAA278O22]",2,100.0,79.56
acI-A5,"[AAA028G02, AAA044O16]",2,99.95,86.69
acI-A6,"[AAA028E20, AAA028I14]",2,100.0,81.17
acI-A7,"[AAA023J06, AAA024D14, AAA041L13, AAA044N04]",4,100.0,87.13
acI-B1,"[AAA023D18, AAA027J17, AAA027L06, AAA028A23, A...",6,100.0,79.06
acI-B2,[bin_7_acI-B2],1,98.13,98.13


In [3]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'COV_out', 'taxonomySAGs.csv')
ANI.sameTribePairwiseANI(externalDataDir, pairwiseANI, taxonClass, tribes, 'COV_out')

Unnamed: 0,Samples,Num Samples,Max,Min
Iluma-A2,[AAA027E14],1,100.01,100.01
Iluma-B1,[AAA027L17],1,101.72,101.72
Iluma-B2,[AAA028K15],1,102.41,102.41
Luna1-A2,"[AAA028P02, Rhodoluna]",2,100.38,5.4
acI-A1,"[AAA027M14, AAA278O22]",2,100.66,22.69
acI-A5,"[AAA028G02, AAA044O16]",2,101.07,52.23
acI-A6,"[AAA028E20, AAA028I14]",2,105.56,16.34
acI-A7,"[AAA023J06, AAA024D14, AAA041L13, AAA044N04]",4,105.08,35.85
acI-B1,"[AAA023D18, AAA027J17, AAA027L06, AAA028A23, A...",6,109.56,18.96
acI-B2,[bin_7_acI-B2],1,158.31,158.31


These results suggest a cutoff of 76.68% ANI and 5.4% coverage as a reasonable cutoff for merging an additional genome into the tribe. **This coverage cutoff is incredibly low.** Let's step through the workflow and see if any cases arise where this cutoff would influence classification.

So, let's take a look at our phylogenetic tree and identify opportunities to merge genomes. 

A new genome which gets added to the tribe must:
* have ANI and coverage above the cutoff
* maintain the tribe as a monophylogenetic group

![Phylogenetic tree](imageFiles/2015-08-19-Tree-16S&Taxon&Clustering.png)

First we will define a function to compute the minimum pairwise ANI within a particular tribe. The function will also compute the minimum pairwise ANI with the new genome added. If this value is lower than the cutoff defined above, then the additional genome is too divergent to be added to the tribe.

### Cases Involving a Tribe with Multiple SAGs
One candidate for addition to a tribe is MEint2297 to tribe acI-A1. Let's check it out:

In [4]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.addGenomeToTribe(pairwiseANI, taxonClass, tribes, 'acI-A1', 'MEint2297')

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'COV_out', 'taxonomy.csv')
ANI.addGenomeToTribe(pairwiseANI, taxonClass, tribes, 'acI-A1', 'MEint2297')

           AAA027M14  AAA278O22  MEint2297
AAA027M14     100.00      79.56      75.47
AAA278O22      79.71     100.00      75.62
MEint2297      75.66      75.52      99.44
79.56

The minimum within the tribe acI-A1 is: 79.56

When genome MEint2297 is added, the new minimum becomes: 75.47
           AAA027M14  AAA278O22  MEint2297
AAA027M14     100.66      22.69      12.34
AAA278O22      22.83     100.56       7.71
MEint2297      11.50       7.80     113.07
22.69

The minimum within the tribe acI-A1 is: 22.69

When genome MEint2297 is added, the new minimum becomes: 7.71


The GFM has a smaller pairwise ANI than the cutoff, so it will not get added to the acI-A1 tribe. ** However, this case is borderline and we could consider adding it back in. **

### Cases Involving a Tribe with a Single SAG

For some tribes we only have a single SAG, such as AAA044-D11 for tribe acI-B4. This SAG forms a monophyletic group with the GFM MEint4252. Can we assign them to the same tribe?

The function below takes two inputs:
* existingGenomes - one or more genomes defined as belonging to the same tribe
* newGenome - the genome being considered for addition to a cluster containing existingGenomes
and computes the smallest pairwise ANI among all samples belonging to the same tribe (e.g., the 'cutoff' described above). The function will also compute the minimum pairwise ANI with the new genome added. If this value is lower than the first, then the additional genome is too divergent to be added to the tribe.

Let's check out AAA044-D11 and MEint4252:

In [5]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir, pairwiseANI, taxonClass, tribes, 'ANI_out', ['AAA044D11'], ['MEint4252'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir, pairwiseANI, taxonClass, tribes, 'COV_out', ['AAA044D11'], ['MEint4252'])


When genome ['MEint4252'] is added, the min among all samples: 84.36

When genome ['MEint4252'] is added, the min among all samples: 44.5


So we can consider MEint4252 to belong to the same tribe as AAA044-D11. The phylogenetic tree suggests that TBepi.4208 is also a candidate for this tribe:

In [6]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomySAGs.csv')
ANI.compareSamples(externalDataDir, pairwiseANI, taxonClass, tribes, 'ANI_out', ['AAA044D11', 'MEint4252'], ['TBepi4208'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'COV_out', 'taxonomySAGs.csv')
ANI.compareSamples(externalDataDir, pairwiseANI, taxonClass, tribes, 'ANI_out', ['AAA044D11', 'MEint4252'], ['TBepi4208'])


When genome ['TBepi4208'] is added, the min among all samples: 76.32

When genome ['TBepi4208'] is added, the min among all samples: 9.26


Based on ANI, that genome should not get added to the tribe. However, the case is again borderline.

There are also examples within the acIV lineage. The SAG AAA027L17 (acIV-B1) forms a monophyletic group with the GFM MEint1719:

In [7]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir, pairwiseANI, taxonClass, tribes, 'ANI_out', ['AAA027L17'], ['MEint1719'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir, pairwiseANI, taxonClass, tribes, 'COV_out', ['AAA027L17'], ['MEint1719'])


When genome ['MEint1719'] is added, the min among all samples: 85.45

When genome ['MEint1719'] is added, the min among all samples: 51.33


So those two genomes belong to the same tribe.

The SAG AAA028K15 (acIV-B2) forms a monophyletic group with the GFM MEint2729:

In [8]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir, pairwiseANI, taxonClass, tribes, 'ANI_out', ['AAA028K15'], ['MEint2729'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir, pairwiseANI, taxonClass, tribes, 'COV_out', ['AAA028K15'], ['MEint2729'])


When genome ['MEint2729'] is added, the min among all samples: 85.11

When genome ['MEint2729'] is added, the min among all samples: 47.35


So those two genomes belong to the same tribe. This tribe could be expanded through the addition of MEint1091:

In [9]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir, pairwiseANI, taxonClass, tribes, 'ANI_out', ['AAA028K15', 'MEint2729'], ['MEint1091'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir, pairwiseANI, taxonClass, tribes, 'COV_out', ['AAA028K15', 'MEint2729'], ['MEint1091'])


When genome ['MEint1091'] is added, the min among all samples: 76.85

When genome ['MEint1091'] is added, the min among all samples: 11.99


So this genome can also be added to the tribe, though again the case is borderline.

### Constructing Tribes de novo

The above approach could also be used to cluster GFMs into tribes in the absence of SAGs. This would enable us to declare multiple GFMs as belonging to the same tribe, but without knowing what specific tribe it might be (e.g., within the acIV and acV lineages which suffer from a shortage of GFMs.) 

For example, the acIV lineage contains a monophyletic group of MEint11576 and BIN_15. Let's compare these two genomes:

In [10]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'ANI_out',['MEint11576'], ['BIN_15'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir,'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'COV_out',['MEint11576'], ['BIN_15'])


When genome ['BIN_15'] is added, the min among all samples: 71.44

When genome ['BIN_15'] is added, the min among all samples: 0.78


So those genomes belong to separate tribes.

Likewise, the acV lineage contains a monophyletic group of TBepi2973 and TBhypo3180. Let's compare these two genomes:

In [11]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'ANI_out',['TBepi2973'], ['TBhypo3180'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir,'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'COV_out',['TBepi2973'], ['TBhypo3180'])


When genome ['TBhypo3180'] is added, the min among all samples: 99.77

When genome ['TBhypo3180'] is added, the min among all samples: 92.35


So those two genomes belong to the same tribe. The sample TBhypo9906 is also monophyletic with the two samples above. Should we add it?

In [12]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'ANI_out', ['TBepi2973','TBhypo3180'], ['TBhypo9906'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'COV_out', ['TBepi2973','TBhypo3180'], ['TBhypo9906'])


When genome ['TBhypo9906'] is added, the min among all samples: 71.19

When genome ['TBhypo9906'] is added, the min among all samples: 1.41


So that genome belongs to a separate tribe. (Remember, the cutoff is 76.68% ANI).

The two GFMs TBepi149 and TBhypo3219 also form a monophyletic group:

In [13]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'ANI_out',['TBepi149'], ['TBhypo3219'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir,'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'COV_out',['TBepi149'], ['TBhypo3219'])


When genome ['TBhypo3219'] is added, the min among all samples: 99.2

When genome ['TBhypo3219'] is added, the min among all samples: 85.46


So those two genomes belong to the same tribe.

The two GFMs TBepi4163 and TBhypo3765 also form a monophyletic group:

In [14]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'ANI_out',['TBepi4163'], ['TBhypo3765'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir,'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'COV_out',['TBepi4163'], ['TBhypo3765'])


When genome ['TBhypo3765'] is added, the min among all samples: 99.85

When genome ['TBhypo3765'] is added, the min among all samples: 71.64


So those two genomes belong to the same tribe.

We have three genomes from the acI-C lineage, BIN10, MEint885 and MEint3864. BIN_10 and MEint885 form a monophyletic group. Do they belong to the same tribe?

In [15]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'ANI_out',['BIN_10'], ['MEint885'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir,'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'COV_out',['BIN_10'], ['MEint885'])


When genome ['MEint885'] is added, the min among all samples: 88.43

When genome ['MEint885'] is added, the min among all samples: 62.24


So BIN_10 and MEint885 belong to the same tribe. Does MEint3864 belong to this tribe also?

In [16]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'ANI_out',['BIN_10', 'MEint885'], ['MEint3864'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir,'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'COV_out',['BIN_10', 'MEint885'], ['MEint3864'])


When genome ['MEint3864'] is added, the min among all samples: 74.07

When genome ['MEint3864'] is added, the min among all samples: 6.83


No it does not, though again just barely.

Within the acI-B lineage, we have a number of TB samples but no SAGs. Monophyletic groups include:
* TBhypo3838 and TBepi2057
* TBhypo680 and TBepi2754
* TBhypo3463 and TBepi3207

In [17]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'ANI_out',['TBhypo3838'], ['TBepi2057'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir,'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'COV_out',['TBhypo3838'], ['TBepi2057'])


When genome ['TBepi2057'] is added, the min among all samples: 99.04

When genome ['TBepi2057'] is added, the min among all samples: 79.07


In [18]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'ANI_out',['TBhypo680'], ['TBepi2754'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir,'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'COV_out',['TBhypo680'], ['TBepi2754'])


When genome ['TBepi2754'] is added, the min among all samples: 99.48

When genome ['TBepi2754'] is added, the min among all samples: 88.84


In [19]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'ANI_out',['TBhypo3463'], ['TBepi3207'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir,'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'COV_out',['TBhypo3463'], ['TBepi3207'])


When genome ['TBepi3207'] is added, the min among all samples: 99.53

When genome ['TBepi3207'] is added, the min among all samples: 81.82


So each monophyletic pair contains two members of a tribe.

The two tribes TBhypo680, TBepi2754, also form a monophyletic group with bin7_acI_B2. Could they all belong to the same tribe?

In [20]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'ANI_out', ['TBhypo680', 'TBepi2754'], ['bin_7_acI-B2'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'COV_out', ['TBhypo680', 'TBepi2754'], ['bin_7_acI-B2'])


When genome ['bin_7_acI-B2'] is added, the min among all samples: 77.78

When genome ['bin_7_acI-B2'] is added, the min among all samples: 23.92


Yes. These three samples are from the tribe acI-B2! The five samples TBepi2754, TBhypo680, bin_7_acI-B2, TBepi3207, TBhypo3838 also form a monophyletic group. Could all five belong to the same tribe?

In [21]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'ANI_out', ['TBhypo680', 'TBepi2754', 'bin_7_acI-B2', 'TBepi3207'], ['TBhypo3838'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'COV_out', ['TBhypo680', 'TBepi2754', 'bin_7_acI-B2', 'TBepi3207'], ['TBhypo3838'])


When genome ['TBhypo3838'] is added, the min among all samples: 72.99

When genome ['TBhypo3838'] is added, the min among all samples: 7.45


So these genomes belong to separate tribes.

Finally, Ghai et al proposed a new acMicro lineage. Let's look at those two genomes, acMicro-1 and acMicro-4:

In [22]:
[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'ANI_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'ANI_out', ['acMicro-1'], ['acMicro-4'])

[pairwiseANI, taxonClass, tribes] = ANI.importANIandTaxonomy(externalDataDir, 'COV_out', 'taxonomy.csv')
ANI.compareSamples(externalDataDir,pairwiseANI, taxonClass, tribes, 'COV_out', ['acMicro-1'], ['acMicro-4'])


When genome ['acMicro-4'] is added, the min among all samples: 74.79

When genome ['acMicro-4'] is added, the min among all samples: 2.39


So those two genomes belong to separate tribes.

### Conclusions

This analysis allowed us to classify the following GFMs as belonging to a tribe:

* acI-B2: bin_7_acI-B2 and (TBepi2754 and TBhypo680)
* acI-B4: AAA044-D11 and MEint4252
* Other acI-B tribes
    * TBepi2057 and TBhypo3838
    * TBepi3207 and TBhypo3463
* Unknown acI-C tribe: BIN_10 and MEint885
* acIV-B1: AAA027L17 and MEint1719
* acIV-B2: (AAA028K15 and MEint2729) and MEint1091
* Unknown V tribes
    * TBepi2973 and TBhypo3180
    * TBepi149 and TBhypo3219
    * TBepi4163 and TBhypo3765
    
However, the following classifications are "borderline," or sensitive to the final ANI and coverage cutoffs. 

* acI-B2: bin_7_acI-B2 and (TBepi2754 and TBhypo680)
* acIV-B2: (AAA028K15 and MEint2729) and MEint1091

Generally, these classifications arise when building a larger monophyletic group from a smaller one. For this reason, I have chosen only to consider merges taken at the narrowest phylogenetic group within a progression. Thus, the final merges are:

* acI-B4: AAA044-D11 and MEint4252
* Other acI-B tribes
    * TBepi2754 and TBhypo680
    * TBepi2057 and TBhypo3838
    * TBepi3207 and TBhypo3463
* Unknown acI-C tribe: BIN_10 and MEint885
* acIV-B1: AAA027L17 and MEint1719
* acIV-B2: AAA028K15 and MEint2729
* Unknown V tribes
    * TBepi2973 and TBhypo3180
    * TBepi149 and TBhypo3219
    * TBepi4163 and TBhypo3765
    
The phylogenetic tree shown below highlights these GFMs (in black). Merges which are rejected are shown in gray.

![Phylogenetic tree](imageFiles/2015-08-19-Tree-16S&Taxon&Clustering&Merging.png)

For reference, additional genomes which could be merged include:

* MEint1091
* TBhypo9906
* bin_7_acI-B2
* TBepi4208
* MEint3864
* MEint2297

There are still some shortcomings with this approach. For example, the acI-B clade has four tribes, A through D. If we assume tribes are monophyletic (which we have been assuming), our analysis indicate seven tribes within the acI-B clade: the four circled pairs plus the tree singletons. Our analysis similarly predicts too many tribes within the acIV and acV lineags as well.


To avoid this problem, for genome merging at the tribe level, only samples classified into a named tribe will be used. The phylogenetic tree shown below includes *only* those samples.

![Phylogenetic tree](imageFiles/2015-08-21-FinalTree-TribesOnly.png)
