# autoMLST Wrapper
Summary of [autoMLST Wrapper](https://github.com/KatSteinke/automlst-simplified-wrapper) results from project: `[{{ project().name }}]`

## Description
This report provides an overview of the result from [autoMLST Wrapper](https://github.com/KatSteinke/automlst-simplified-wrapper), a modified version of [autoMLST](https://bitbucket.org/ziemertlab/automlst) tailored for simplified usability. By integrating a straightforward wrapper script, this fork eliminates the need for additional organism selection steps, streamlining the process for users.

In [None]:
import pandas as pd
from pathlib import Path
from IPython.display import display, Markdown, HTML

import warnings
warnings.filterwarnings('ignore')
warnings.filterwarnings('ignore', message='.*method overwritten by.*')

from itables import to_html_datatable as DT
import itables.options as opt
opt.css = """
.itables table td { font-style: italic; font-size: .8em;}
.itables table th { font-style: oblique; font-size: .8em; }
"""
opt.classes = ["display", "compact"]
opt.lengthMenu = [5, 10, 20, 50, 100, 200, 500]

report_dir = Path("../")

%load_ext rpy2.ipython

In [None]:
report_dir = Path("../")

with open(report_dir / "automlst_wrapper/final.newick", "r") as f:
    data = f.readlines()

value_to_replace = [i.split(":")[0] for i in data[0].replace("(", "").split(",")]

new_dict = {}
df = pd.read_csv("../automlst_wrapper/df_genomes_tree.csv")
genome_ids = list(df.genome_id)
for g in genome_ids:
    for v in value_to_replace:
        if v.startswith(g.split(".")[0]):
            new_dict[v] = g
            value_to_replace.remove(v)

data = data[0]
for k in new_dict.keys():
    data = data.replace(k, new_dict[k])

outfile = Path("assets/data/final_corrected.newick")
outfile.parent.mkdir(parents=True, exist_ok=True)
with open(outfile, "w") as f:
    f.write(data)

## Visualization
The tree visualization represents the phylogenetic relationships between various strains of the genus. This visualization aids in understanding the genetic diversity and evolutionary history of these genomes.

In [None]:
%%capture
%%R
suppressPackageStartupMessages({
  library("treeio")
  library("ggtree")
  library("tidyverse")
  library("ggstar")
  library("ggnewscale")
  library("ggtreeExtra")
  library("phangorn")
  library("RColorBrewer")
})

In [None]:
%%R  -w 800 -h 800
tree <- read.tree("assets/data/final_corrected.newick")
#data <- read.csv("../automlst_wrapper/df_genomes_tree.csv")
data <- read.csv("../tables/df_gtdb_meta.csv")

# Shorten the Organism column
data$Organism_short <- sub("^s__([A-Za-z])[a-z]*.*\\s", "\\1. ", data$Organism) # Shorten the genus name
data$Organism_short <- sub("^s__", "", data$Organism_short)  # Remove 's__'

# midpoint root
tree <- phangorn::midpoint(tree)
tree <- ladderize(reorder(tree))

# Get the unique genera
genera <- unique(data$Genus)

# Initialize the plot
p <- ggtree(tree, 
            #layout="fan", 
            size=1, open.angle=5, branch.length='none')

# Create a vector of colors
colors <- brewer.pal(length(genera), "Set1")

# Create a mapping from genera to colors
genus_to_color <- setNames(colors, genera)

# Add a clade label for each genus
for (genus in genera) {
  # Get the tips that belong to this genus
  genus_tips <- data$genome_id[data$Genus == genus]
  
  # Find the MRCA of these tips
  mrca_node <- getMRCA(tree, genus_tips)
  
  # Highlight this clade
  p <- p + geom_hilight(node = mrca_node, fill = genus_to_color[genus], alpha=.6,
                        type = "gradient", gradient.direction = 'rt',)
}

# Create a new column that combines the genome_id and Organism fields
data$new_label <- paste(data$genome_id, data$Organism_short, sep=" - ")

p <- p %<+% data + geom_tippoint(aes(color=Genus), size=3, show.legend = TRUE) + 
                   geom_tiplab(aes(label=new_label, offset = 0.5)) + hexpand(.4)

# Set the color scale manually
p <- p + scale_color_manual(values = genus_to_color)

# Move the legend to the bottom
p <- p + theme(legend.position = 'bottom')

# Display the plot
p

[Download Tree](assets/data/final_corrected.newick){:target="_blank" .md-button}

In [None]:
outfile = Path("assets/tables/automlst_tree_table.csv")
outfile.parent.mkdir(parents=True, exist_ok=True)
outfile.write_text(df.to_csv(index=False))

display(HTML(DT(df.loc[:, ["genome_id", "genus_original", "species_original", "strain", "phylum", "class", "order", "family", "genus", "species"]], scrollX=True)))

[Download Table](assets/tables/automlst_tree_table.csv){:target="_blank" .md-button}

## Interactive Visualization with iTOL
For an enhanced, interactive visualization experience, users are encouraged to download the tree file and the corresponding metadata table. These files can be uploaded to [iTOL (Interactive Tree Of Life)](https://itol.embl.de/), a web-based tool for the display, manipulation, and annotation of phylogenetic trees.

## References
<font size="2">

- **G Yu**, DK Smith, H Zhu, Y Guan, TTY Lam<sup>\*</sup>. ggtree: an
    R package for visualization and annotation of phylogenetic trees
    with their covariates and other associated data. ***Methods in
    Ecology and Evolution***. 2017, 8(1):28-36. doi:
    [10.1111/2041-210X.12628](https://doi.org/10.1111/2041-210X.12628)

{% for i in project().rule_used['automlst-wrapper']['references'] %}
- *{{ i }}*
{% endfor %}
</font>