<p align="right">
  <img src="../logo.png" alt="FoldTree2 Logo" align="right" style="height:350px"/>
</p>

### How Foldtree2 Works

Foldtree2 constructs phylogenetic trees from protein structures by leveraging structural similarity rather than sequence similarity. The workflow involves the following key steps:

1. **Input Preparation**  
    Users can provide input as AlphaFold DB cluster identifiers, UniProt identifiers, or custom PDB files. The tool fetches or processes the relevant protein structures accordingly.

2. **Structural Comparison**  
    Foldtree2 uses structural alignment tools (such as Foldseek) to compute pairwise distances between protein structures. These distances reflect the structural similarity between proteins.

3. **Distance Matrix Construction**  
    The computed pairwise distances are assembled into a distance matrix, which serves as the basis for tree construction.

4. **Tree Inference**  
    Using maximum likelihood methods, Foldtree2 infers a phylogenetic tree that best explains the observed structural distances. This approach aims to find the tree topology that maximizes the likelihood of the observed data.

5. **Visualization and Analysis**  
    The resulting tree can be visualized and compared using built-in plotting tools or exported for further analysis. Foldtree2 supports interactive exploration and comparison of different tree topologies and distance metrics.

By focusing on structural features, Foldtree2 enables the study of evolutionary relationships that may not be apparent from sequence data alone.

In [None]:
#@markdown ### Input (custom PDBs upload, identifier list, cluster ids)
from google.colab import files
import os
import re
import hashlib
import random
import zipfile

input_type = "afdb_cluster" #@param ["afdb_cluster", "identifier", "custom"]
#
#@markdown - afdb_cluster = identifier of an AFDB cluster,
#@markdown - identifier" = uniprot identifer (e.g. A0A074YNE0) list line by line,
#@markdown - custom - zip file with PDBs

cluster_id = "A0A074YNE0" #@param {type:"string"}
jobname = 'test' #@param {type:"string"}

def add_hash(x,y):
  return x+"_"+hashlib.sha1(y.encode()).hexdigest()[:5]

from sys import version_info
python_version = f"{version_info.major}.{version_info.minor}"


basejobname = "".join(jobname.split())
basejobname = re.sub(r'\W+', '', basejobname)
jobname = add_hash(basejobname, cluster_id)

# check if directory with jobname exists
def check(folder):
  if os.path.exists(folder):
    return False
  else:
    return True
if not check(jobname):
  n = 0
  while not check(f"{jobname}_{n}"): n += 1
  jobname = f"{jobname}_{n}"

# make directory to save results
os.makedirs(jobname, exist_ok=True)

if input_type == "custom":
  input_file = os.path.join(jobname,f"{jobname}.zip")
  if not os.path.isfile(input_file):
    zipfiles = files.upload()
    zipfile_name = list(zipfiles.keys())[0]
    os.rename(zipfile_name, input_file)
    # Unzipping the file
    with zipfile.ZipFile(input_file, 'r') as zip_ref:
      zip_ref.extractall(jobname)
    os.remove(input_file)

    input_file = os.path.join(jobname,f"identifiers.txt")
    with open(input_file, "w") as f:
      f.write("")
    os.mkdir(os.path.join(jobname,"structs"))
    for file in os.listdir(jobname):
      if file.endswith(".pdb"):
        os.rename(os.path.join(jobname,file), os.path.join(jobname,"structs",file))



elif input_type == "afdb_cluster":
  import requests
  # Define the endpoint and parameters
  base_url = "https://cluster.foldseek.com/api/cluster/"
  params = {
      "format": "accessions",
      "groupBy": "",
      "groupDesc": "",
      "itemsPerPage": 10,
      "multiSort": "false",
      "mustSort": "false",
      "page": 1,
      "sortBy": "",
      "sortDesc": "false"
  }

  # Make the request
  response = requests.get(f"{base_url}{cluster_id}/members", params=params)

  # Ensure the request was successful
  response.raise_for_status()

  # Save the response content to a file
  with open(f"{jobname}/identifiers.txt", "w") as file:
      file.write(response.text)
elif input_type == "identifier":
  input_file = os.path.join(jobname,f"identifiers.txt")
  if not os.path.isfile(input_file):
    identifierfiles = files.upload()
    identifierfilename = list(identifierfiles.keys())[0]
    os.rename(identifierfilename, input_file)


In [None]:
#@title Install dependecies
%%bash -s $python_version
PYTHON_VERSION=$1
# Check if fold_tree directory exists and remove it
if [ -d "foldtree2" ]; then
  rm -r foldtree2
fi
# Clone the repository
git clone -q https://github.com/DessimozLab/foldtree2

wget -qnc "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh -bfp /usr/local > /dev/null 2>&1

mamba config --set auto_update_conda false

# Minimal install for ft2treebuilder usage
mamba install -q -c conda-forge -c pytorch -c nvidia -c bioconda -c nodefaults \
  python="${PYTHON_VERSION}" \
  pip \
  pytorch \
  torch-geometric \
  numpy \
  pandas \
  tqdm \
  h5py \
  biopython > /dev/null 2>&1

# Install the package

cd foldtree2
pip install -q -e .


In [None]:
#@title Download structures
import sys
from foldtree2.src.AFDB_tools import grab_struct

# if the input type is afdb_cluster or identifier, download the structures
if input_type in ["afdb_cluster", "identifier"]:
    sys.path.insert(0, os.path.abspath("foldtree2/src"))
    identifiers_file = os.path.join(jobname, "identifiers.txt")
    structs_dir = os.path.join(jobname, "structs")
    os.makedirs(structs_dir, exist_ok=True)
    with open(identifiers_file) as f:
        identifiers = [line.strip() for line in f if line.strip()]
    for identifier in identifiers:
        try:
            grab_struct(identifier, structs_dir)
        except Exception as e:
            print(f"Error downloading {identifier}: {e}")

In [None]:
#@title Move structures if needed
%%bash -s $jobname $input_type
JOBNAME=$1
INPUT_TYPE=$2
IDENTIFIERS_FILE="${JOBNAME}/identifiers.txt"
STRUCTS_DIR="${JOBNAME}/structs"
OUTPUT_DIR="${JOBNAME}/foldtree2_out"
mkdir -p "$OUTPUT_DIR"

# If custom, move PDBs to structs dir (already handled above, but ensure)
if [[ $INPUT_TYPE = "custom" ]]; then
  mkdir -p "$STRUCTS_DIR"
  mv "${JOBNAME}/"*.pdb "${JOBNAME}/"*.cif "$STRUCTS_DIR" 2>/dev/null || true
fi


In [None]:
#@title Download model
%%bash -s $jobname
JOBNAME=$1
MODEL_URL="https://example.com/path/to/your/model"
MODEL_DIR="${JOBNAME}/model"
mkdir -p "$MODEL_DIR"  
wget -qnc "$MODEL_URL" -P "$MODEL_DIR"
#decompress if needed
if [[ "$MODEL_URL" == *.tar.gz ]]; then
  tar -xzf "$MODEL_DIR/$(basename $MODEL_URL)" -C "$MODEL_DIR"
elif [[ "$MODEL_URL" == *.zip ]]; then
  unzip -o "$MODEL_DIR/$(basename $MODEL_URL)" -d "$MODEL_DIR"
fi

In [None]:
#@title Download model
#find matrices and model file


python3 ft2treebuilder.py  --structs "$STRUCTS_DIR" --output "$OUTPUT_DIR" --model "$MODEL_DIR/$(basename $MODEL_URL)"

In [None]:
#@title root the tree
#root the tree
def madroot( treefile  ):
	mad_cmd = f'./madroot/mad {treefile} '
	print(mad_cmd)
	subprocess.run(mad_cmd, shell=True)
	return treefile+'.rooted'

treefile = ''.join(os.listdir(OUTPUT_DIR))
if treefile.endswith('.tree'):
    treefile = os.path.join(OUTPUT_DIR, treefile)
    rooted_treefile = madroot(treefile)
    print(f"Rooted tree file: {rooted_treefile}")
else:
    print("No tree file found in output directory.")

In [None]:
#@title Plot Foldtree output {run: "auto"}

tree = "foldtree2_rooted" #@param ["foldseek_rooted", "foldseek", "lddt_rooted", "lddt", "alntmscore_rooted", "alntmscore"]
import sys
if f"/usr/local/lib/python{python_version}/site-packages/" not in sys.path:
    sys.path.insert(0, f"/usr/local/lib/python{python_version}/site-packages/")

import os
os.environ['QT_QPA_PLATFORM']='offscreen'
from ete3 import Tree, TreeStyle, TextFace, CircleFace

filelookup = {
    "foldtree2_rooted": "foldtree_struct_tree.PP.nwk.rooted.final",
}

t = Tree(f"{jobname}/{filelookup[tree]}", format = 0)
# Define a tree style
ts = TreeStyle()
ts.mode = "c"  # This sets the tree layout to radial
ts.show_leaf_name = True
ts.show_branch_length = True
ts.show_branch_support = True

for n in t.traverse():
    support_face = CircleFace(radius=10, color="Thistle", style="circle")
    n.add_face(support_face, column=0, position="branch-right")
    n.img_style["vt_line_width"] = 50
    n.img_style["hz_line_width"] = 50


for leaf in t.iter_leaves():
    leaf.img_style["vt_line_type"] = 1  # for vertical lines
    leaf.img_style["hz_line_type"] = 1  # for horizontal lines
    leaf.add_face(TextFace(leaf.name, fsize=512), column=0, position="branch-right")

# Visualize the tree
t.render(jobname + "/tree.svg", w=1000, h=1000, units="px", tree_style=ts, dpi=300)

import base64
from IPython.display import display, HTML

with open(jobname + '/tree.svg', 'r') as f:
  display(HTML('<img style="width:100%; background:white; height:100%;max-width: 80vw;margin:1em;" src="data:image/svg+xml;base64,' + base64.b64encode(f.read().encode('ascii')).decode('ascii') + '" />'))


## Tree visualisation and comparison
The tree visulasition below is powered by [Phylo.io](https://beta.phylo.io/viewer/)
You can select structural distance metrics to compare tree topologies.
Comparison with sequence based trees is coming soon.

### Usage
Use the dropdown menus to select the rooted or unrooted tree, the distance metric and the tree to display.
The best results in the manuscript were obtained with the Foldseek score.

To return to a single tree view, select no tree in the second dropdown menu.
The color of the branches represents the maximum jaccard similarity between that subtree's leafset and the closest matching subtree's leafset in the tree on the opposite side of the visualization.
The darker the of the branch leading up to a node, the more similar the sets of leaves are.

In [None]:
#@title Phyloio visualization
!cp -r /content/fold_tree/docs/dist_server/* /usr/local/share/jupyter/nbextensions/google.colab
!cp -r /content/{jobname} /usr/local/share/jupyter/nbextensions/google.colab
import csv
filelookup = {
    "foldseek_rooted": "foldtree_struct_tree.PP.nwk.rooted.final",
    "foldseek": "foldtree_struct_tree.PP.nwk",
    "lddt_rooted": "lddt_struct_tree.PP.nwk.rooted.final",
    "lddt": "lddt_struct_tree.PP.nwk",
    "alntmscore_rooted" : "alntmscore_struct_tree.PP.nwk.rooted.final",
    "alntmscore" :  "alntmscore_struct_tree.PP.nwk"
}

id_mapper = {}

with open(jobname +  '/' + 'finalset.csv', newline='') as csvfile:
    for row in csv.reader(csvfile, delimiter=','):
      id_mapper[row[3]] = row[2]

with open('fold_tree/docs/dist_server/compare_tree.html', 'r') as f:
      html_string = f.read()
      html_string = html_string.replace( u'\u200b', '' )

      for key, value in filelookup.items():

        with open(jobname + '/' + value, 'r') as f:
          output = f.read()

          for name, name_species in id_mapper.items():
            output = output.replace( name, name_species )

          html_string = html_string.replace( key + '_123456789', output )

from IPython.display import HTML
HTML(html_string)

In [None]:
#@title Package and download results
!zip -FSr $jobname".result.zip" $jobname
files.download(f"{jobname}.result.zip")