<a target="_blank" href="https://colab.research.google.com/github/amlalejini/alife-2024-phylo-tutorial/blob/main/notebooks/phylotrackpy-sync-gen-example.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Phylotrackpy with synchronous generations example

This notebook contains a simple evolving system with synchronous generations, which is typical for most evolutionary computing systems. 
That is, each generation, a population of candidate solutions are evaluated and selected to reproduce. 
This process repeats for a desired number of generations. 

In this example, we implement the one-max problem where individuals are binary strings, and a solution has a genome of all 1s. 

The code in this notebook does not incorporate phylogeny tracking (see [this notebook](https://github.com/amlalejini/alife-2024-phylo-tutorial/blob/main/notebooks/phylotrackpy-sync-gen-example-completed.ipynb) for 
a version of this code that _does_ already incorporate phylogeny tracking).
Think of this notebook as an opportunity to play around with how you could integrate
phylogeny tracking with `phylotrackpy` into an existing system.
Throughout the code, we have left comments where phylogeny tracking code should be added.
Below, we link to some existing examples and the `phylotrackpy` documentation to 
for your reference to get tracking working. 
If you're working on this during the tutorial at ALife 2024, feel free to ask for our help! 
We're happy to walk you through anything!  

## Helpful reference material

- `phylotrackpy` documentation: <https://phylotrackpy.readthedocs.io/en/latest/introduction.html>
- `phylotrackpy` GitHub repository: <https://github.com/emilydolson/phylotrackpy>
- Example code from an ALife 2023 tutorial: <https://github.com/emilydolson/alife-phylogeny-tutorial/blob/main/perfect_tracking_final.ipynb> 
  - Scroll down to the `phylotrackpy` heading!
- A completed version of this notebook with phylogeny tracking implemented already: <https://github.com/amlalejini/alife-2024-phylo-tutorial/blob/main/notebooks/phylotrackpy-sync-gen-example-completed.ipynb>

## Setup

First, install required Python packages for this example (e.g., phylotrackpy).

### Local setup

If you are running this locally, we recommend making a Python virtual environment to keep your local Python installation clean: 

```
python -m venv venv
source venv/bin/activate
!python -m pip install -r requirements.txt
```

### Google Colab setup

If you're running this on Google Colab, you can run the following python cell to install the requisite packages!

In [None]:
!python -m pip install phylotrackpy
!python -m pip install polars       # Used by example for data tracking

In [1]:
# Imports
from phylotrackpy import systematics
import random
import polars as pl

# Seed random number generator
random.seed(8)

## System implementation

Here, the `Organism` class (below) defines candidate solutions with binary genomes, an evaluated fitness value, and a taxon ID. 
The taxon ID member variable keeps track of the organism's taxon in the phylogeny, which is helpful for `phylotrackpy`'s phylogeny tracker.  

In [7]:
class Organism:
    def __init__(
        self,
        num_genes:int = 10,
        randomize_genome:bool = False
    ):
        # Genomes are vectors of binary values
        self.genome = [bool(random.randint(0, 1)) if randomize_genome else False for _ in range(num_genes)]
        # Evaluate initial fitness
        self.EvalFitness()
        # Organisms keep track of their taxon id (useful for phylogeny tracking)
        self.taxon_id = None

    @classmethod
    def FromGenome(cls, genome:list):
        # Create new organism from a given genome.
        org = cls(
            num_genes = 0,
            randomize_genome = False
        )
        org.SetGenome(genome)
        org.EvalFitness()
        return org

    def GetGenome(self):
        return self.genome

    def SetGenome(self, genome:list):
        self.genome = [gene for gene in genome]
        # Evaluate fitness after updating genome
        self.EvalFitness()

    def GetFitness(self):
        return self.fitness

    def EvalFitness(self):
        self.fitness = sum(self.genome)
        return self.fitness

    def GetTaxonID(self):
        return self.taxon_id

    def SetTaxonID(self, id):
        self.taxon_id = id

    def Mutate(self, per_site_mut_rate:float=0.01):
        num_muts = 0
        for gene_i in range(len(self.genome)):
            if (random.random() <= per_site_mut_rate):
                self.genome[gene_i] = not self.genome[gene_i]
                num_muts += 1
        # Update fitness evaluation after mutations
        self.EvalFitness()
        return num_muts


def TournamentSelect(num_parents:int, population:list, tourn_size:int = 4):
    parent_ids = [None for _ in range(num_parents)]
    candidate_ids = [i for i in range(len(population))]
    for i in range(num_parents):
        tourn_participants = random.sample(candidate_ids, tourn_size)
        parent_ids[i] = max([(population[idx].GetFitness(), idx) for idx in tourn_participants])[1]
    return parent_ids


Here, we implement a basic evolutionary computing loop solving the one-max problem to demonstrate how `phylotrackpy` can be integrated into a traditional evolutionary computing context.

**We've marked locations where phylogeny tracking code needs to be added with "TUTORIAL TASK" comments!**

In [None]:
pop_size = 500
generations = 1000
per_site_mut_rate = 0.05
genome_length = 100
tournament_size = 8
print_resolution = 100
summary_output_resolution = 10

assert(pop_size > 0)
assert(generations > 0)
assert(per_site_mut_rate >= 0 and per_site_mut_rate <= 1.0)
assert(genome_length > 0)
assert(print_resolution > 0)
assert(summary_output_resolution > 0)

'''
TUTORIAL TASK: Initialization
- (1) Create a new systematics object here to track the population's phylogeny
- (2) Add snapshot functions that capture basic information about a taxon (ask for clarification if you're not sure what this means)
- (3) Initialize the systematics object's update to 0
'''

# Create a list to hold phylo metrics over time
data = []

# Create initial population
population = [Organism(num_genes=genome_length, randomize_genome=False) for _ in range(pop_size)]

'''
TUTORIAL TASK: Add initial population to the phylogeny
- After adding the initial individuals to the phylogeny, store each taxon id on the individual
'''

for gen in range(0, generations+1):
    if (gen % print_resolution) == 0:
        print(f"---Update {gen}---")
        print(f"  Max fitness={max([indiv.GetFitness() for indiv in population])}")

    '''
    TUTORIAL TASK: Update the systematics object's update to current generation
    '''

    # Individuals are evaluated immediately after reproduction, so no need to evaluate
    # the full population here. In other systems, population evaulation would likely happen here.

    # Select parents to reproduce.
    parent_ids = TournamentSelect(
        num_parents = pop_size,
        population = population,
        tourn_size = tournament_size
    )

    # Create next generation's population by copying parents, mutating offspring,
    # and adding offspring to the phylogeny.
    next_pop = []
    for parent_id in parent_ids:
        next_pop.append(Organism.FromGenome(population[parent_id].GetGenome()))
        next_pop[-1].Mutate(per_site_mut_rate)
        '''
        TUTORIAL TASK: Add new offspring to phylogeny
        - (1) Add offspring to phylogeny
        - (2) Update offspring's taxon id
        '''

    # Record data for this update
    if (gen % summary_output_resolution) == 0:
        '''
        TUTORIAL TASK: Add some phylogeny metrics to the data being saved
        '''
        data.append({
            "generation": gen,
            "max_fit": max([indiv.GetFitness() for indiv in population])
        })

    '''
    TUTORIAL TASK: Remove current population individuals from the phylogeny before we replace them with next_pop
    '''
    population = next_pop

'''
TUTORIAL TASK: Output a final phylogeny snapshot
'''
summary_df = pl.DataFrame(data)
summary_df.write_csv("summary.csv")

# Visualization

Phylogeny visualization is an area of active research, particularly as it applies to the more complex phylogenies that we often get in ALife research. Here are some suggestions.

## ALife Phylogeny Visualizer

https://emilydolson.github.io/lineage_viz_tool/phylogeny_visualizations/phylogeny.html

This tool is built for visualizing artificial life phylogenies. It is written in Javascript. It expects data in [ALife standard phylogeny data format](https://alife-data-standards.github.io/alife-data-standards/phylogeny). You can upload a file using the file selector, and it should immediately visualize your phylogeny. From there, there are a few settings that you can adjust:

- Scale exponent: Under some evolutionary regimes, most of the interesting phylogenetic structure happens very near the end of evolutionary times. In other cases, it happens early on. To accommodate these differences, the visualizer uses an exponentially-scaled time axis. To adjust the exponent used (and therefore the temporal part of the phylogeny being emphasized), slide the slider.

- Show intermediate taxa: In biology, usually only the tips (i.e. leaf nodes) of the phylogeny are known. In ALife, we usually know details about every node of the phylogeny. We can take advantage of this information. By toggling on this checkbox, the visualizer will display rectangles alongside the phylogeny depicting the portion of evolutionary time that each taxon was alive for. Sometimes, the lineage that lead to an extant taxon will contain many taxa that coexisted with each other. In these cases, rectangles are drawn nexxt to each other.

- You can also adjust the colors of the lines and circles in the phylogeny.

## IcyTree

https://icytree.org/

This is a visualizer built for bioinformaticians. Consequently, it lacks the ALife-specific features. However, it is more mature and contains lots of useful features. To use it, you can use the alife-phyloinformatics-convert python package to convert an ALife Standard Phylogeny file to a bioinformatics standard such as newick. Note that newick format does not always play nicely with very large phylogenies, so you may need to sub-sample yours to get a good visualization.

The following code will convert an ALife standard phylogeny to newick

In [None]:
import alifedata_phyloinformatics_convert as apc

# Optional: if your data isn't already in a phylotrackpy systematics 
# object, load it in
# (only do this with an empty systematics pbject)
#
# phylo_tracker.load_from_file("phylo_snapshot_final.csv")


converter = apc.RosettaTree(phylo_tracker)
newick_string = converter.to_newick()  # returns newick string
with open("phylogeny.nwk", "w") as f:
    f.write(newick_string)

You can then upload this newick file to IcyTree!