# Database comparisons
The purpose of this notebook is to evaluate classification accuracy between different reference databases. Select mock community sequences are taxonomically classified using two or more different reference databases, e.g., Greengenes 13_8 [trimmed to 250 nt](./generate-tax-assignments.ipynb) and the same database [trimmed to 150 nt](./generate-tax-assignments-trimmed-dbs.ipynb). [This notebook](./generate-tax-assignments-trimmed-dbs.ipynb) can also be modified to provide taxonomic classification with any number of desired reference databases/versions. Limit the analysis to only a few mock communities and method/parameter combinations; the goal here is to compare the databases, not the methods.

Prepare the environment
-----------------------

First we'll import various functions that we'll need for generating the report. 

In [1]:
%matplotlib inline

from os import environ
from os.path import join, exists, expandvars
import pandas as pd

from tax_credit.eval_framework import (get_expected_tables_lookup, 
                                       find_and_process_result_tables,
                                       compute_mock_results,
                                       compute_mantel,
                                       generate_pr_scatter_plots,
                                       boxplot_from_data_frame,
                                       heatmap_from_data_frame,
                                       method_by_dataset_a1,
                                       method_by_dataset_a2)

Configure local environment-specific values
-------------------------------------------

**This is the only cell that you will need to edit to generate reports locally.** After editing this cell, you can run all cells in this notebook to generate your analysis report. Some of the analyses make take a few minutes to run, and analyses at more specific taxonomic levels (e.g., genus or species) will be slower than analyses at more general taxonomic levels (e.g., phylum, class). 

**This cell will not run until you fill in a taxonomic level (``2`` through ``7``).**

In [20]:
## project_dir should be the directory where you've downloaded (or cloned) the 
## short-read-tax-assignment repository. 
project_dir = expandvars("$HOME/Desktop/projects/short-read-tax-assignment")
precomputed_results_dir = join(project_dir, "data/precomputed-results/")
expected_results_dir = join(precomputed_results_dir, "mock-community")

## results_dirs should contain the directory or directories where
## results can be found. By default, this is just the precomputed 
## results included with the project. If other results should be included, 
## absolute paths to those directories should be added to this list.
results_dirs = \
 [precomputed_results_dir,
  ]

## Taxonomic level at which analyses should be performed. Edit this to
## the desired taxonomic level. 
# 2: phylum, 3: class, 4: order, 5: family, 6: genus, 7: species
taxonomic_level = 6

## Minimum number of times an OTU must be observed for it to be included in analyses. Edit this
## to analyze the effect of the minimum count on taxonomic results.
min_count = 1

In [21]:
# Define the subdirectories where the query mock community data should be, and confirm that they exist.
mock_results_dirs = [join(results_dir,"mock-community") for results_dir in results_dirs]

for mock_results_dir in mock_results_dirs:
    assert exists(mock_results_dir), "Mock community result directory doesn't exist: %s" % mock_results_dir


Find mock community pre-computed tables, expected tables, and "query" tables
----------------------------------------------------------------------------

Next we'll use the paths defined above to find all of the tables that will be compared. These include the *pre-computed result* tables (i.e., the ones that the new methods will be compared to), the *expected result* tables (i.e., the tables containing the known composition of the mock microbial communities), and the *query result* tables (i.e., the tables generated with the new method(s) that we want to compare to the *pre-computed result* tables).

In [22]:
results = []
for mock_results_dir in mock_results_dirs:
    results += find_and_process_result_tables(mock_results_dir)

In [23]:
expected_tables = get_expected_tables_lookup(expected_results_dir, level=taxonomic_level)

In [24]:
# Uncomment for test runs (looks at a small subset of the data)

# from random import shuffle
# shuffle(results)
# results = results[:10]

Evalution 1: Compute and summarize precision, recall, and F-measure for mock communities
----------------------------------------------------------------------------------------

In this evaluation, we compute and summarize precision, recall, and F-measure of each result (pre-computed and query) based on the known composition of the mock communities. We then summarize the results in two ways: first with boxplots, and second with a table of the top methods based on their F-measures. 

This is a qualitative evaluation, effectively telling us about the ability of the different methods to report the taxa that are present in each sample. These metrics are not concerned with the abundance of the different taxa.

In [25]:
mock_results = compute_mock_results(results, expected_tables, taxonomy_level=taxonomic_level, min_count=min_count)

ValueError: Actual table is empty after filtering at:                              /Users/nbokulich/Desktop/projects/short-read-tax-assignment/data/precomputed-results/mock-community/mock-3/gg_13_8_otus/rdp/0.0/table.biom

In [None]:
boxplot_from_data_frame(mock_results, group_by="Method", metric="Precision")

In [None]:
boxplot_from_data_frame(mock_results, group_by="Method", metric="Recall")

In [None]:
boxplot_from_data_frame(mock_results, group_by="Method", metric="F-measure")

In [None]:
boxplot_from_data_frame(mock_results, group_by="Dataset", metric="Precision")

In [None]:
boxplot_from_data_frame(mock_results, group_by="Dataset", metric="Recall")

In [None]:
boxplot_from_data_frame(mock_results, group_by="Dataset", metric="F-measure")

In [None]:
heatmap_from_data_frame(mock_results, "Precision")

In [None]:
heatmap_from_data_frame(mock_results, "Recall")

In [None]:
heatmap_from_data_frame(mock_results, "F-measure")

In [None]:
method_by_dataset_a1(mock_results, 'B1')

In [None]:
method_by_dataset_a1(mock_results, 'B2')

In [None]:
method_by_dataset_a1(mock_results, 'B3')

In [None]:
method_by_dataset_a1(mock_results, 'B4')

In [None]:
method_by_dataset_a1(mock_results, 'B5')

In [None]:
method_by_dataset_a1(mock_results, 'B6')

In [None]:
method_by_dataset_a1(mock_results, 'B7')

In [None]:
method_by_dataset_a1(mock_results, 'B8')

In [None]:
method_by_dataset_a1(mock_results, 'F1')

In [None]:
method_by_dataset_a1(mock_results, 'F2')

Evaluation 2: Compute and summarize correlations between observed and known mock community structure
----------------------------------------------------------------------------------------------------

In this evaluation, we compute and summarize the correlation between each result (pre-computed and query) and the known composition of the mock communities. We then summarize the results in two ways: first with a series of boxplots of correlation coefficients by method; and second with a table of the top methods based on their Pearson correlation coefficient. 

This is a quantitative evaluation, which tells us about the ability of the different methods to report the taxa that are present in each sample and accurately assess their abundance. Because many factors can affect the observed abundance of taxa beyond the accuracy of the taxonomic assigner (e.g., primer bias), the correlation coefficients are frequently low, but we expect that their relative values are informative in understanding which taxonomic assigners are more correct than others.

In [None]:
boxplot_from_data_frame(mock_results, group_by="Method", metric="Pearson r")

In [None]:
boxplot_from_data_frame(mock_results, group_by="Method", metric="Spearman r")

In [None]:
heatmap_from_data_frame(mock_results, "Pearson r", vmin=-1, vmax=1)

In [None]:
heatmap_from_data_frame(mock_results, "Spearman r", vmin=-1, vmax=1)

In [None]:
method_by_dataset_a2(mock_results, 'B1')

In [None]:
method_by_dataset_a2(mock_results, 'B2')

In [None]:
method_by_dataset_a2(mock_results, 'B3')

In [None]:
method_by_dataset_a2(mock_results, 'B4')

In [None]:
method_by_dataset_a2(mock_results, 'B5')

In [None]:
method_by_dataset_a2(mock_results, 'B6')

In [None]:
method_by_dataset_a2(mock_results, 'B7')

In [None]:
method_by_dataset_a2(mock_results, 'B8')

In [None]:
method_by_dataset_a2(mock_results, 'F1')

In [None]:
method_by_dataset_a2(mock_results, 'F2')