<a href="https://colab.research.google.com/github/lauraluebbert/delphy/blob/main/tutorials/delphy_workflow_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div style="text-align: center;">
    <h1 style="font-size: 2.5em; color: #2C3E50;">Running <a href="https://github.com/broadinstitute/delphy" target="_blank">Delphy</a> is as simple as 1, 2, 3 – Ebolavirus Example</h1>
    <h2 style="font-size: 1.5em; color: #34495E;">Phylogenetic Tree Generation with Delphy on Ebolavirus Sequences</h2>
</div>

This notebook demonstrates how to generate a phylogenetic tree using [Delphy](https://github.com/broadinstitute/delphy) on viral sequences obtained from the [NCBI Virus Database](https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/). Optionally, you can upload your own sequences to be included in the analysis.

We will utilize the following tools:
- [**gget**](https://github.com/pachterlab/gget/) to download sequences from the [NCBI Virus Database](https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/)
- [**MAFFT**](https://mafft.cbrc.jp/alignment/server/index.html) for creating a Multiple-Sequence Alignment (MSA)
- [**Delphy**](https://github.com/broadinstitute/delphy) to generate the phylogenetic tree

In this example notebook, we already filled out the NCBI Virus filters and options below to download and analyze all **complete *Zaire ebolavirus* sequences from *Homo sapiens* hosts collected between January 01, 2014 and December 31, 2014**.  

Simply **click `Runtime` -> `Run all`** at the top of this notebook.

If you encounter any problems or questions while using this notebook, please [report them here](https://github.com/broadinstitute/delphy/issues).

Total runtime: ~2 min
___
___

# 1. Apply filters to download sequences from [NCBI Virus](https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/)

In [1]:
#@title NCBI Virus filtering options:
#@markdown Set any filter to `None` to disable it.

def arg_str_to_bool(arg):
  if arg == "True":
      return True
  elif arg == "False":
      return False
  elif arg == "None" or arg == "":
      return None
  else:
      return arg

#@markdown ## **Virus**

virus = 'Zaire ebolavirus'  #@param {type:"string"}
#@markdown  - Examples: 'Mammarenavirus lassaense' or 'coronaviridae' or 'NC_045512.2' or '142786' (Norovirus taxid).
virus = arg_str_to_bool(virus)

accession = False   #@param {type:"boolean"}
#@markdown  - Check this box if `virus` argument above is an NCBI accession (starts with 'NC'), e.g. 'NC_045512.2'.

#@markdown ## **Host**

host = 'homo sapiens'  #@param {type:"string"}
#@markdown  - Example: 'homo sapiens' (alternative: use the `host_taxid` filter below). Input `None` to disable filtering by host.
host = arg_str_to_bool(host)

host_taxid = None  #@param {type:"raw"}
#@markdown  - NCBI Taxonomy ID of host (e.g., 9443 for primates).
host_taxid = arg_str_to_bool(host_taxid)

#@markdown ## **Sequence completeness**

annotated = "None"   #@param ["True", "False", "None"]
#@markdown  - Set to `True` to only return sequences marked as 'annotated'; set to `False` to only return sequences NOT marked as 'annotated'.
annotated = arg_str_to_bool(annotated)

nuc_completeness = "complete"  #@param ["None", "complete", "partial"]
#@markdown  - Choose between 'partial' or 'complete' nucleotide completeness.
nuc_completeness = arg_str_to_bool(nuc_completeness)

min_seq_length = None  #@param {type:"raw"}
#@markdown  - Minimum sequence length, e.g. 6252.
min_seq_length = arg_str_to_bool(min_seq_length)

max_seq_length = None  #@param {type:"raw"}
#@markdown  - Maximum sequence length, e.g. 7815.
max_seq_length = arg_str_to_bool(max_seq_length)

has_proteins = None  #@param {type:"raw"}
#@markdown  - Require sequences to contain specific proteins (e.g. 'GPC' **- include the quotation marks**) or a list of proteins (e.g. ['GPC', 'L']). Also accepts names of genes or segments.
has_proteins = arg_str_to_bool(has_proteins)

proteins_complete = False    #@param {type:"boolean"}
#@markdown  - Check this box if the proteins/genes/segments in `has_proteins` should be marked as 'complete'.
proteins_complete = arg_str_to_bool(proteins_complete)

max_ambiguous_chars = None  #@param {type:"raw"}
#@markdown  - Maximum number of 'N' characters allowed in each sequence, e.g. 10.
max_ambiguous_chars = arg_str_to_bool(max_ambiguous_chars)

#@markdown ## **Gene/peptide/protein counts**

min_gene_count = None  #@param {type:"raw"}
#@markdown  - Minimum gene count, e.g. 1.
min_gene_count = arg_str_to_bool(min_gene_count)

max_gene_count = None  #@param {type:"raw"}
#@markdown  - Maximum gene count, e.g. 40.
max_gene_count = arg_str_to_bool(max_gene_count)

min_mature_peptide_count = None  #@param {type:"raw"}
#@markdown  - Minimum peptide count, e.g. 2.
min_mature_peptide_count = arg_str_to_bool(min_mature_peptide_count)

max_mature_peptide_count = None  #@param {type:"raw"}
#@markdown  - Maximum peptide count, e.g. 15.
max_mature_peptide_count = arg_str_to_bool(max_mature_peptide_count)

min_protein_count = None  #@param {type:"raw"}
#@markdown  - Minimum protein count, e.g. 2.
min_protein_count = arg_str_to_bool(min_protein_count)

max_protein_count = None  #@param {type:"raw"}
#@markdown  - Maximum protein count, e.g. 10.
max_protein_count = arg_str_to_bool(max_protein_count)

#@markdown ## **Geographic location**

geographic_location = None  #@param {type:"string"}
#@markdown  - Geographic location of sample collection, e.g. 'South Africa' or 'Germany'.
geographic_location = arg_str_to_bool(geographic_location)

geographic_region = None  #@param {type:"string"}
#@markdown  - Geographic region of sample collection, e.g. 'Africa' or 'Europe'.
geographic_region = arg_str_to_bool(geographic_region)

#@markdown ## **Dates**
#@markdown All dates should be supplied in YYYY-MM-DD format.
min_collection_date = "2014-01-01"  #@param {type:"string"}
#@markdown  - Minimum collection date, e.g. '2000-01-01'.
min_collection_date = arg_str_to_bool(min_collection_date)

max_collection_date = "2014-12-31"  #@param {type:"string"}
#@markdown  - Maximum collection date, e.g. '2014-12-04'.
max_collection_date = arg_str_to_bool(max_collection_date)

min_release_date = "None"  #@param {type:"string"}
#@markdown - Minimum release date of the sequences, e.g. '2000-01-01'.
#@markdown - **Specifying a minimum release date is recommended** where possible, as it speeds up and reduces the disk space required by the data download from NCBI Virus.
min_release_date = arg_str_to_bool(min_release_date)

max_release_date = "None"  #@param {type:"string"}
#@markdown  - Maximum release date of the sequences, e.g. '2014-12-04'.
max_release_date = arg_str_to_bool(max_release_date)

#@markdown ## **Source**

submitter_country = None  #@param {type:"string"}
#@markdown  - Country that submitted the sequence, e.g. 'South Africa' or 'Germany'.
submitter_country = arg_str_to_bool(submitter_country)

lab_passaged = "None"   #@param ["True", "False", "None"]
#@markdown  - Set to `True` to only return sequences that have been passaged in a laboratory setting; set to `False` to only return sequences that have NOT been passaged in a laboratory setting.
lab_passaged = arg_str_to_bool(lab_passaged)

source_database = None  #@param {type:"string"}
#@markdown  - Source database of the sequence, e.g. 'GenBank' or 'RefSeq'.
source_database = arg_str_to_bool(source_database)

# 2. Optional: Upload a fasta file with your own sequences to add to the analysis
  **1) Click on the folder icon on the left.  
  2) Upload your file(s) to the Google Colab server by dragging in your file(s) (or use rightclick -> Upload).  
  3) Specify the name of your file(s) here:**

In [2]:
#@title FASTA file containing additional sequences

fasta_file = None  #@param {type:"string"}
#@markdown  - Example: 'my_fasta_file.fa' or 'my_fasta_file.fasta'.


In [3]:
#@title Metadata

#@markdown **Option 1: The metadata is the same for all sequences in your FASTA file**
metadata = {'Collection Date': 'YYYY-MM-DD', 'Geo Location': 'South Korea'}  #@param {type:"raw"}
#@markdown - The 'Collection Date' field is required. Optional: you can add as many additional columns as you wish, e.g. 'Geo Location': 'South Korea'.
#@markdown - NOTE: Use NCBI column names where applicable (see https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus for example column names)

#@markdown **Option 2: Input a CSV file with metadata for each sequence**
metadata_csv = None  #@param {type:"string"}
#@markdown  - Example: 'my_metadata.csv'. This file must include at least an 'Accession' and 'Collection Date' column.
#@markdown  - NOTE: Make sure the IDs in the "Accession" column match the IDs of the sequences in the provided FASTA file
#@markdown  - NOTE: Use NCBI column names where applicable (see https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus for example column names)

# Convert empty strings to None
fasta_file = arg_str_to_bool(fasta_file)
metadata_csv = arg_str_to_bool(metadata_csv)

# 3. Optional: Specify Delphy arguments

In [4]:
#@title Delphy options:
mutation_rate = 0.01  #@param {type:"raw"}
#@markdown  - Virus mutation rate (mutations per site per year), e.g. 0.01. If set to `None`, the mutation rate will be estimated from the generated tree.

delphy_steps = None  #@param {type:"raw"}
#@markdown  - Number of steps to run in the Delphy algorithm (default: 500,000 * number of sequences).

delphy_samples = 200 #@param {type:"integer"}
#@markdown  - Number of logging and tree updates (will log every `delphy_steps/delphy_samples` step).

delphy_release = 0.9996  #@param {type:"number"}
#@markdown  - Delphy version to use (see https://github.com/broadinstitute/delphy/releases).

#@title Delphy options:
threads = 2  #@param {type:"integer"}
#@markdown  - Number of threads to use for the MSA (Mafft) and phylogenetic tree (Delphy) generation.

# All done! Select `Runtime` at the top of this notebook, then click `Run all` and lean back...
A completion message will be displayed below once the notebook has been successfully executed.  
**💡 Tip: Click on the folder icon on the left to view/download the files that are being generated.**
  
<br>

____
____

In [5]:
%%time

#@title # Generating the phylogenetic tree...
import subprocess
from IPython.display import display, HTML

def log_message(text):
    display(HTML(f"<h2 style='color: green;'>{text}</h2>"))
def log_message_error(text):
    display(HTML(f"<h2 style='color: red;'>{text}</h2>"))


log_message("1/5 Installing software...")

# Install gget
try:
    result = subprocess.run(['pip', 'install', '-q', 'mysql-connector-python', 'biopython'], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    # print(result.stdout)
except subprocess.CalledProcessError as e:
    log_message_error(f"An error occurred while installing mysql-connector-python and biopython: ")
    print(e.stderr)

try:
    # After the release, this will just be: pip install gget (dependence on biopython will be removed)
    result = subprocess.run(['pip', 'install', '-q', 'git+https://github.com/pachterlab/gget.git@delphy_dev'], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    # print(result.stdout)
except subprocess.CalledProcessError as e:
    log_message_error(f"An error occurred while installing gget: ")
    print(e.stderr)


import gget
from Bio import SeqIO
import pandas as pd
import re
from datetime import datetime

log_message("1/5 Software installation complete.")


# Downloading virus genomes from NCBI Virus
log_message("2/5 Download data from NCBI Virus... This might take a few minutes depending on the internet connection and how busy the NCBI server is.")
gget.ncbi_virus(
    virus = virus,
    accession = accession,
    host = host,
    min_seq_length = min_seq_length,
    max_seq_length = max_seq_length,
    min_gene_count = min_gene_count,
    max_gene_count = max_gene_count,
    nuc_completeness = nuc_completeness,
    has_proteins = has_proteins,
    proteins_complete = proteins_complete,
    host_taxid = host_taxid,
    lab_passaged = lab_passaged,
    geographic_region = geographic_region,
    geographic_location = geographic_location,
    submitter_country = submitter_country,
    min_collection_date = min_collection_date,
    max_collection_date = max_collection_date,
    annotated = annotated,
    source_database = source_database,
    min_release_date = min_release_date,
    max_release_date = max_release_date,
    min_mature_peptide_count = min_mature_peptide_count,
    max_mature_peptide_count = max_mature_peptide_count,
    min_protein_count = min_protein_count,
    max_protein_count = max_protein_count,
    max_ambiguous_chars = max_ambiguous_chars
)

ncbi_fasta_file = f"{'_'.join(str(virus).split(' '))}_sequences.fasta"
ncbi_metadata = f"{'_'.join(str(virus).split(' '))}_metadata.csv"

log_message(f"2/5 Data download from NCBI Virus complete. Sequences were saved in the {ncbi_fasta_file} file and the associated metadata in {ncbi_metadata}/.jsonl")


# Merging sequencing and metadata files if additional file(s) were provided
if fasta_file:
  log_message("Adding user-provided fasta file and metadata to the data from NCBI Virus...")

  # Combine sequence files
  combined_fasta_file = f"{'_'.join(str(virus).split(' '))}_sequences_combined.fasta"
  !cat $ncbi_fasta_file $fasta_file > $combined_fasta_file
  input_fasta_file = combined_fasta_file

  # Combine metadata
  combined_metadata_file = f"{'_'.join(virus.split(' '))}_metadata_combined.csv"
  ncbi_metadata_df = pd.read_csv(ncbi_metadata)
  if metadata_csv:
    # Combine provided metadata and NCBI metadata csv files
    user_metada_df = pd.read_csv(metadata_csv)
    comb_meta_df = pd.concat([ncbi_metadata_df, user_metada_df])
    comb_meta_df.to_csv(combined_metadata_file, index=False)
    metadata_file = combined_metadata_file

  else:
    # Extract sequence accessions from the provided FASTA file
    headers = [record.id.split(" ")[0] for record in SeqIO.parse(fasta_file, "fasta")]

    # Create a metadata dataframe with the accessions from the FASTA file and the provided metadata
    user_metada_df = pd.DataFrame(headers, columns=["Accession"])
    for key, value in metadata.items():
      user_metada_df[key] = value

    # Combine with NCBI metadata
    comb_meta_df = pd.concat([ncbi_metadata_df, user_metada_df])
    comb_meta_df.to_csv(combined_metadata_file, index=False)
    metadata_file = combined_metadata_file

  log_message(f"Merging user-provided and NCBI Virus data complete. The combined sequence and metadata files were saved as {combined_fasta_file} and {combined_metadata_file}.")

else:
  input_fasta_file = ncbi_fasta_file
  metadata_file = ncbi_metadata


# Create MSA
log_message("3/5 Multiple Sequence Aligment (MSA): Aligning the sequences to each other so they are all in the same frame...")

aligned_fasta_file = f"{'_'.join(str(virus).split(' '))}_aligned.afa"

# # Option 1: Using the MUSCLE algorithm (this works well for a few hundred sequences, but is too slow when dealing with a few thousand sequences)
# gget.muscle(input_fasta_file, super5=True, out=aligned_fasta_file)

# Option 2: Using mafft
# TO-DO: Wrap the following code into gget module and replace with command `gget.mafft(input_fasta_file, out=aligned_fasta_file)`

#Install MAFFT
try:
    # Download MAFFT .deb package
    subprocess.run(
        ['wget', 'https://mafft.cbrc.jp/alignment/software/mafft_7.526-1_amd64.deb'],
        check=True
    )

    # Install the .deb package using dpkg
    subprocess.run(
        ['dpkg', '-i', 'mafft_7.526-1_amd64.deb'],
        check=True
    )

    # Verify the MAFFT installation
    result = subprocess.run(
        ['mafft', '--version'],
        check=True,
        capture_output=True,
        text=True
    )
    print("MAFFT version:", result.stderr.strip())

except subprocess.CalledProcessError as e:
    print("An error occurred:", e.stderr)

# Aligning sequences to each other using mafft
# TO-DO: Add --parttree argument when aligning >10,000 sequences
try:
    with open(aligned_fasta_file, 'w') as output:
        result = subprocess.run(['mafft', '--auto', '--thread', str(threads), input_fasta_file], stdout=output, stderr=subprocess.PIPE, check=True)
    log_message(f"3/5 MSA complete and saved in the {aligned_fasta_file} file.")
except subprocess.CalledProcessError as e:
    log_message(f"An error occurred while generating the MSA using Mafft: ")
    print(e.stderr)


# TO-DO: Wrap the following code into a gget module and replace with command `gget.delphy(aligned_fasta_file, metadata_file)`
# Adjust the headers in the aligned fasta file to match header format required by Delphy (accession|YYYY-MM-DD):
log_message("4/5 Reformatting sequence files to match Delphy format...")

# Reformat collection date
default_day = '01'
default_month = '01'
def extract_and_format_date(date_string):
    # Define regular expressions for various date formats
    year_only = re.compile(r'(?P<year>\d{4})')
    year_month = re.compile(r'(?P<year>\d{4})[-/.](?P<month>\d{1,2})')
    full_date = re.compile(r'(?P<year>\d{4})[-/.](?P<month>\d{1,2})[-/.](?P<day>\d{1,2})')

    # Try to match the full date first
    match = full_date.search(date_string)
    if match:
        year = match.group('year')
        month = match.group('month').zfill(2)
        day = match.group('day').zfill(2)
    else:
        # Try to match year and month
        match = year_month.search(date_string)
        if match:
            year = match.group('year')
            month = match.group('month').zfill(2)
            day = default_day
        else:
            # Try to match only the year
            match = year_only.search(date_string)
            if match:
                year = match.group('year')
                month = default_month
                day = default_day
            else:
                # If no match, return None
                return None

    # Format the extracted date into YYYY-MM-DD
    formatted_date = f"{year}-{month}-{day}"

    try:
        # Validate date by trying to convert it to a datetime object
        datetime.strptime(formatted_date, '%Y-%m-%d')
    except ValueError:
        return None  # Return None if the date is invalid

    return formatted_date

def update_fasta_headers(fasta_file, csv_file, output_fasta):
    # Read the CSV file into a pandas DataFrame
    df = pd.read_csv(csv_file)

    # Create a dictionary from the DataFrame to map accession to date
    accession_to_date = pd.Series(df['Collection Date'].values, index=df['accession']).to_dict()

    # Open the input fasta file and output fasta file
    with open(fasta_file) as fasta_input, open(output_fasta, 'w') as fasta_output:
        # Iterate through each sequence record in the fasta file
        for record in SeqIO.parse(fasta_input, 'fasta'):
            accession = record.id

            # Check if the accession is in the pandas dictionary and has a non-NaN date
            if accession in accession_to_date and pd.notna(accession_to_date[accession]):
                date = accession_to_date[accession]

                # Format the date if necessary
                formatted_date = extract_and_format_date(date)

                if formatted_date is None:
                  # Skip the entry if date is NaN or accession not found
                  print(f"Skipping accession {accession} due to unrecognized date format: '{date}'")
                  continue

                # Update the seq header
                record.id = f"{accession}|{formatted_date}"
                record.description = ''  # Remove the original description to avoid duplication
            else:
                # Skip the entry if date is NaN or accession not found
                print(f"Skipping accession {accession} due to missing or NaN date.")
                continue

            # Write the updated record to the output fasta file
            SeqIO.write(record, fasta_output, 'fasta')

aligned_fasta_file_clean = f"{'_'.join(str(virus).split(' '))}_aligned_headers_adjusted.afa"
update_fasta_headers(aligned_fasta_file, metadata_file, aligned_fasta_file_clean)

log_message("4/5 Reformatting complete.")


# Run Delphy
log_message("5/5 Running Delphy...")

# Download Delphy binary
try:
    result = subprocess.run(['wget', f'https://github.com/broadinstitute/delphy/releases/download/{delphy_release}/delphy-ubuntu-x86_64'], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    # print(result.stdout)
except subprocess.CalledProcessError as e:
    log_message_error(f"An error occurred while downloading Delphy: ")
    print(e.stderr)

# Give permissions
try:
    result = subprocess.run(['chmod', 'u+x', './delphy-ubuntu-x86_64'], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    # print(result.stdout)
except subprocess.CalledProcessError as e:
    log_message_error(f"An error occurred while changing permissions for Delphy: ")
    print(e.stderr)

beast_log_out = f"{'_'.join(str(virus).split(' '))}_delphy_beast_log.txt"
delphy_beast_tree_out = f"{'_'.join(str(virus).split(' '))}_delphy_beast_tree.nwk"
dphy_out = f"{'_'.join(str(virus).split(' '))}_delphy_out.dphy"

# Define Delphy arguments
def count_fasta_sequences(fasta_file):
    with open(fasta_file, 'r') as f:
        return sum(1 for line in f if line.startswith('>'))

if delphy_steps is None:
  # Get number of sequences
  num_seqs = count_fasta_sequences(aligned_fasta_file_clean)
  delphy_steps = 500000 * num_seqs

log_every = int(delphy_steps / delphy_samples)
tree_every = int(delphy_steps / delphy_samples)

# Define the command
if mutation_rate:
    cmd = [
        './delphy-ubuntu-x86_64',
        '--v0-fix-mutation-rate',
        '--v0-init-mutation-rate', str(mutation_rate),
        '--v0-log-every', str(log_every),
        '--v0-tree-every', str(tree_every),
        '--v0-threads', str(threads),
        '--v0-steps', str(delphy_steps),
        '--v0-in-fasta', aligned_fasta_file_clean,
        '--v0-out-log-file', beast_log_out,
        '--v0-out-trees-file', delphy_beast_tree_out,
        '--v0-out-delphy-file', dphy_out
    ]
else:
    cmd = [
        './delphy-ubuntu-x86_64',
        '--v0-log-every', str(log_every),
        '--v0-tree-every', str(tree_every),
        '--v0-threads', str(threads),
        '--v0-steps', str(delphy_steps),
        '--v0-in-fasta', aligned_fasta_file_clean,
        '--v0-out-log-file', beast_log_out,
        '--v0-out-trees-file', delphy_beast_tree_out,
        '--v0-out-delphy-file', dphy_out
    ]

try:
    result = subprocess.run(cmd, check=True, text=True, capture_output=True)
    # print(result.stdout)

    display(HTML("""
    <h1>All done! 🎉</h1>
    <h2>To download the files we generated in this notebook to your local computer, click on the folder icon on the left and download files by right clicking a file of interest and selecting 'Download'.</h2>
    <h2>To further visualize your Delphy output, upload the <code>.dphy</code> (and <code>metadata.csv</code>) file(s) to <a href='https://delphy.fathom.info/' target='_blank'>https://delphy.fathom.info/</a></h2>
    """))

except subprocess.CalledProcessError as e:
    log_message_error(f"An error occurred while running Delphy: ")
    print(e.stderr)

New version of client (17.1.0) available at https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/datasets.
INFO:gget.utils:234 sequences passed the provided filters.


MAFFT version: v7.526 (2024/Apr/26)


# Seed: 4088835061
Reading fasta file Zaire_ebolavirus_aligned_headers_adjusted.afa
Building rough initial tree...
Attaching tip 2 to Candidate_region{branch=1, mut_idx=10, -2196.21<t<=-2191, min_muts=4, W_over_Wmax=0}
Attaching tip 3 to Candidate_region{branch=0, mut_idx=7, -2234.07<t<=-2191, min_muts=2, W_over_Wmax=0}
Attaching tip 4 to Candidate_region{branch=3, mut_idx=2, -2075.71<t<=-2047, min_muts=0, W_over_Wmax=0}
Attaching tip 5 to Candidate_region{branch=234, mut_idx=0, -1.79769e+308<t<=-2322, min_muts=3, W_over_Wmax=0}
Attaching tip 6 to Candidate_region{branch=238, mut_idx=0, -1.79769e+308<t<=-2323, min_muts=1, W_over_Wmax=0}
Attaching tip 7 to Candidate_region{branch=6, mut_idx=1, -2318.31<t<=-2046, min_muts=2, W_over_Wmax=0}
Attaching tip 8 to Candidate_region{branch=3, mut_idx=0, -2068.2<t<=-2047, min_muts=0, W_over_Wmax=0}
Attaching tip 9 to Candidate_region{branch=8, mut_idx=0, -2062.04<t<=-2045, min_muts=0, W_over_Wmax=0}
Attaching tip 10 to Candidate_region{branch=6, 