# autoMLST Wrapper
Summary of [autoMLST Wrapper](https://github.com/KatSteinke/automlst-simplified-wrapper) results from project: `[{{ project().name }}]`

## Description
This report provides an overview of the result from [autoMLST Wrapper](https://github.com/KatSteinke/automlst-simplified-wrapper), a modified version of [autoMLST](https://bitbucket.org/ziemertlab/automlst) tailored for simplified usability. By integrating a straightforward wrapper script, this fork eliminates the need for additional organism selection steps, streamlining the process for users.

In [None]:
import pandas as pd
from pathlib import Path
from IPython.display import display, Markdown, HTML

import warnings
warnings.filterwarnings('ignore')
warnings.filterwarnings('ignore', message='.*method overwritten by.*')

from itables import to_html_datatable as DT
import itables.options as opt
opt.css = """
.itables table td { font-style: italic; font-size: .8em;}
.itables table th { font-style: oblique; font-size: .8em; }
"""
opt.classes = ["display", "compact"]
opt.lengthMenu = [5, 10, 20, 50, 100, 200, 500]

report_dir = Path("../")

%load_ext rpy2.ipython

In [None]:
report_dir = Path("../")

with open(report_dir / "automlst_wrapper/final.newick", "r") as f:
    data = f.readlines()

value_to_replace = [i.split(":")[0] for i in data[0].replace("(", "").split(",")]

new_dict = {}
df = pd.read_csv("../automlst_wrapper/df_genomes_tree.csv")
genome_ids = list(df.genome_id)
for g in genome_ids:
    for v in value_to_replace:
        if v.startswith(g.split(".")[0]):
            new_dict[v] = g
            value_to_replace.remove(v)

data = data[0]
for k in new_dict.keys():
    data = data.replace(k, new_dict[k])

outfile = Path("assets/data/final_corrected.newick")
outfile.parent.mkdir(parents=True, exist_ok=True)
with open(outfile, "w") as f:
    f.write(data)

## Visualization
The tree visualization represents the phylogenetic relationships between various strains of the genus. This visualization aids in understanding the genetic diversity and evolutionary history of these genomes.

In [None]:
%%capture
%%R
suppressPackageStartupMessages({
  library("treeio")
  library("ggtree")
  library("tidyverse")
  library("ggstar")
  library("ggnewscale")
  library("ggtreeExtra")
  library("phangorn")
  library("RColorBrewer")
})

In [None]:
%%R  -w 800 -h 800
tree <- read.tree("assets/data/final_corrected.newick")
#data <- read.csv("../automlst_wrapper/df_genomes_tree.csv")
data <- read.csv("../tables/df_gtdb_meta.csv")

# Shorten the Organism column
data$Organism_short <- sub("^s__([A-Za-z])[a-z]*.*\\s", "\\1. ", data$Organism) # Shorten the genus name
data$Organism_short <- sub("^s__", "", data$Organism_short)  # Remove 's__'

# midpoint root
tree <- phangorn::midpoint(tree)
tree <- ladderize(reorder(tree))

# Add labels to all nodes

tree$node.label <- paste0("N", 1:(Nnode(tree) + Ntip(tree)))

# Write the tree with internal node IDs to a new Newick file
if (!dir.exists("assets/iTOL_annotation")) {
  dir.create("assets/iTOL_annotation", recursive = TRUE)
}

write.tree(tree, file = "assets/iTOL_annotation/automlst_tree_with_ids.newick")

# Get the unique genera
genera <- unique(data$Genus)

# Initialize the plot
p <- ggtree(tree, 
            #layout="fan", 
            size=1, open.angle=5, branch.length='none')

# Create a vector of colors
colors <- brewer.pal(length(genera), "Set1")

# Create a mapping from genera to colors
genus_to_color <- setNames(colors, genera)

# Initialize new columns for color annotation
data$tree_color <- NA
data$tree_color_label <- NA
data$tree_color_MRCA <- NA

# Add a clade label for each genus
for (genus in genera) {
  # Get the tips that belong to this genus
  genus_tips <- data$genome_id[data$Genus == genus]
  
  # Find the MRCA of these tips
  mrca_node <- getMRCA(tree, genus_tips)
  
  # Subtract the number of tips from the MRCA node index
  internal_node_index <- mrca_node - Ntip(tree)

  # Get the label of the internal node
  mrca_label <- tree$node.label[internal_node_index]

  # Add the color, label, and MRCA to the new columns
  data$tree_color[data$Genus == genus] <- genus_to_color[genus]
  data$tree_color_label[data$Genus == genus] <- genus
  data$tree_color_MRCA[data$Genus == genus] <- mrca_label

  # Highlight this clade
  p <- p + geom_hilight(node = mrca_node, fill = genus_to_color[genus], alpha=.6,
                        type = "gradient", gradient.direction = 'rt',)
}

# Create a new column that combines the genome_id and Organism fields
data$tree_label <- paste(data$genome_id, data$Organism_short, sep=" - ")

# Write the data to a new CSV file
write.table(data, file = "assets/iTOL_annotation/tree_annotation.csv", sep = ",", row.names = FALSE)

p <- p %<+% data + geom_tippoint(aes(color=Genus), size=3, show.legend = TRUE) + 
                   geom_tiplab(aes(label=tree_label, offset = 0.5)) + hexpand(.4)

# Set the color scale manually
p <- p + scale_color_manual(values = genus_to_color)

# Move the legend to the bottom
p <- p + theme(legend.position = 'bottom')

# Combine branch support and label
p <- p + geom_text(aes(label=ifelse(isTip, "", paste0(label, " (", format(round(branch.length, 2), nsmall = 2), ")"))), vjust=-0.5, hjust=1.1, size=2.8)

# Add a scale bar
p <- p + geom_treescale(x=0, y=0, offset=0.1)

# Display the plot
p

In [None]:
outfile = Path("assets/tables/automlst_tree_table.csv")
outfile.parent.mkdir(parents=True, exist_ok=True)
outfile.write_text(df.to_csv(index=False))

display(HTML(DT(df.loc[:, ["genome_id", "genus_original", "species_original", "strain", "phylum", "class", "order", "family", "genus", "species"]], scrollX=True)))

[Download Table](assets/tables/automlst_tree_table.csv){:target="_blank" .md-button}

## Interactive Visualization with iTOL
For an enhanced, interactive visualization experience, users are encouraged to download the tree file and the corresponding annotation table. These files can be uploaded to [iTOL (Interactive Tree Of Life)](https://itol.embl.de/), a web-based tool for the display, manipulation, and annotation of phylogenetic trees. Please check the [iTOL help page](https://itol.embl.de/help.cgi) for the upload guide and annotation format.



In [None]:
df_annotation = pd.read_csv("assets/iTOL_annotation/tree_annotation.csv")

# create label annotation file for iTOL
outfile_label = Path("assets/iTOL_annotation/iTOL_tree_label.txt")
outfile_label.parent.mkdir(parents=True, exist_ok=True)

## Write the header to the file
with open(outfile_label, 'w') as f:
    f.write("LABELS\n")
    f.write("SEPARATOR TAB\n")
    f.write("DATA\n")

## Write the data to the file
df_annotation[['genome_id', 'tree_label']].to_csv(outfile_label, sep='\t', header=False, index=False, mode='a')

In [None]:
# create tree color annotation file for iTOL
outfile_color = Path("assets/iTOL_annotation/iTOL_tree_color.txt")

with open(outfile_color, 'w') as f:
    f.write("TREE_COLORS\n")
    f.write("SEPARATOR TAB\n")
    f.write("DATA\n")

df_annotation["tree_color_type"] = "range"
color_columns = ["tree_color_MRCA", "tree_color_type", "tree_color", "tree_color_label"]
df_color = df_annotation[~df_annotation[color_columns].duplicated()][color_columns]
#df_color['tree_color_MRCA'] = 'I' + df_color['tree_color_MRCA'].astype(str)

## Write the data to the file
df_color[color_columns].to_csv(outfile_color, sep='\t', header=False, index=False, mode='a')

In [None]:
button = f'<a href="../assets/iTOL_annotation/automlst_tree_with_ids.newick" download class="md-button">Download iTOL Tree</a> <a href="../{outfile_label}" download class="md-button">Download iTOL Label</a> <a href="../{outfile_color}" download class="md-button">Download iTOL Color</a>'
display(Markdown(button))

## References
<font size="2">

- Letunic I and Bork P (2021) Nucleic Acids Res doi: [10.1093/nar/gkab301](https://doi.org/10.1093/nar/gkab301) Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation
- **G Yu**, DK Smith, H Zhu, Y Guan, TTY Lam<sup>\*</sup>. ggtree: an
    R package for visualization and annotation of phylogenetic trees
    with their covariates and other associated data. ***Methods in
    Ecology and Evolution***. 2017, 8(1):28-36. doi:
    [10.1111/2041-210X.12628](https://doi.org/10.1111/2041-210X.12628)

{% for i in project().rule_used['automlst-wrapper']['references'] %}
- *{{ i }}*
{% endfor %}
</font>