# <img src="https://github.com/JuliaLang/julia-logo-graphics/raw/master/images/julia-logo-color.png" height="40" /> _ALIFE Phylogeny Tutorial Notebook_

[Link to original julia notebook template](https://github.com/ageron/julia_notebooks/blob/main/Julia_Colab_Notebook_Template.ipynb)

This notebook contains a simple evolving system with synchronous generations, which is typical for most evolutionary computing systems. That is, each generation, a population of candidate solutions are evaluated and selected to reproduce. This process repeats for a desired number of generations.

In this example, we implement phylogeny tracking for a continuous, 1-dimensional version of [the numbers game](http://www.demo.cs.brandeis.edu/pr/number_games/).

Throughout the code, we have left comments where phylogeny tracking code should be added. If you're working on this during the tutorial at ALife 2024, feel free to ask for our help! We're happy to walk you through anything!



# Install Julia
1. Work on a copy of this notebook: _File_ > _Save a copy in Drive_ (you will need a Google account). Alternatively, you can download the notebook using _File_ > _Download .ipynb_, then upload it to [Colab](https://colab.research.google.com/).
2. Execute the following cell (click on it and press Ctrl+Enter) to install Julia, IJulia and other packages (if needed, update `JULIA_VERSION` and the other parameters). This takes a couple of minutes.
3. Reload this page (press Ctrl+R, or ⌘+R, or the F5 key) and continue to the next section.

_Notes_:
* If your Colab Runtime gets reset (e.g., due to inactivity), repeat steps 2, 3 and 4.
* After installation, if you want to change the Julia version or activate/deactivate the GPU, you will need to reset the Runtime: _Runtime_ > _Factory reset runtime_ and repeat steps 3 and 4.

In [None]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.10.4" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia"
JULIA_PACKAGES_IF_GPU="CUDA" # or CuArrays for older Julia versions
JULIA_NUM_THREADS=8
#---------------------------------------------------#

if [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  nvidia-smi -L &> /dev/null && export GPU=1 || export GPU=0
  if [ $GPU -eq 1 ]; then
    JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"' &> /dev/null
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia

  echo ''
  echo "Successfully installed `julia -v`!"
  echo "Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then"
  echo "jump to the 'Checking the Installation' section."
fi

## Checking the Installation
The `versioninfo()` function should print your Julia version and some other info about the system. If you get an error, refresh/reload this page.

In [None]:
versioninfo()

If the above cell gives you an error, refresh this page.

# Install Jevo.jl and Dependencies

Run the following cell to install the `phylo-tutorial` branch of [Jevo.jl](https://www.github.com/jarbus/Jevo.jl) and its dependencies. This branch has all of the neuro-evolution code stripped out to install faster.

[Jevo.jl](https://www.github.com/jarbus/Jevo.jl) is currently in alpha, development docs can be found [here](https://jarbus.github.io/Jevo.jl/dev). Jevo is only required for the final section of this notebook.

This cell should take ~10 minutes on Google Collab. In the meantime, you can move on to the coding challenges below!

In [None]:
]add StatsBase https://github.com/jarbus/XPlot.jl.git https://github.com/jarbus/PhylogeneticTrees.jl.git https://github.com/jarbus/Jevo.jl.git#phylo-tutorial StableRNGs Logging

# The world's simplest Co-Evolutionary Algorithm

We will be implementing a trivial, one-dimensional version of the Number's Game from scratch, and then demonstrating a completed example using the Jevo.jl framework. You can implement phylogeny tracking using [PhylogeneticTrees.jl](https://github.com/jarbus/PhylogeneticTrees.jl/blob/master/src/tree.jl). There is no documentation yet for this package, so we provide the API where relevant, which is straightforward to use.

In [None]:
using StatsBase

mutable struct Organism
  id::Int
  genome::Float64  # Genome is a single float, which co-evolves with other floats to rise in value
  fitness::Float64
end
Organism(id::Int, genome::Float64) = Organism(id, genome, 0)


function mutate_genome(genome::Float64; mr::Float64)
  genome + (rand() * mr)
end

function evaluate_fitness!(pop_a::Vector{Organism}, pop_b::Vector{Organism})
  for org_a in pop_a, org_b in pop_b
    org_a.fitness += org_a.genome > org_b.genome
    org_b.fitness += org_a.genome < org_b.genome
  end
end

function fitness_proportional_selection(n_parents::Int, pop::Vector{Organism})
  weights = [org.fitness for org in pop]
  [sample(pop, Weights(weights)) for _ in 1:n_parents]
end

In [None]:
using PhylogeneticTrees
# Create two co-evolving populations
# Yes, this code is *horribly* inefficient
# We write it this way for clarity.
n_pop = 100
n_gens = 100
mr = 0.1

org_id = 1
pop_a, pop_b = Organism[], Organism[]
for _ in 1:n_pop
  push!(pop_a, Organism(org_id, rand()))
  push!(pop_b, Organism(org_id + 1, rand()))
  org_id += 2
end

"""
TUTORIAL TASK: Initialization

Create a new phylogenetic tree for each population. The relevant constructor is:

  function PhylogeneticTrees(genesis_pop_ids::Vector{Int})
"""


for gen in 1:n_gens
  evaluate_fitness!(pop_a, pop_b)
  parents_a = fitness_proportional_selection(n_pop, pop_a)
  parents_b = fitness_proportional_selection(n_pop, pop_b)
  next_pop_a, next_pop_b = Organism[], Organism[]

  # Create next generation
  for parent_a in parents_a
    new_child_a = Organism(org_id, mutate_genome(parent_a.genome, mr=mr))
    """
    TUTORIAL TASK: Update phylogeny for population a. Relevant function:

      function add_child!(tree::PhylogeneticTree, parent_id::Int, child_id::Int)
    """
    push!(next_pop_a, new_child_a)
    org_id += 1
  end

  for parent_b in parents_b
    new_child_b = Organism(org_id, mutate_genome(parent_b.genome, mr=mr))
    """
    TUTORIAL TASK: Update phylogeny for population b. Relevant function:

      function add_child!(tree::PhylogeneticTree, parent_id::Int, child_id::Int)
    """
    push!(next_pop_b, new_child_b)
    org_id += 1
  end
  pop_a = next_pop_a
  pop_b = next_pop_b
  @assert length(pop_a) == n_pop == length(pop_b)


  """"
  TUTORIAL TASK: Prune phylogeny of dead branches, saving memory. Relevant function:

    function purge_unreachable_nodes!(tree::PhylogeneticTree, ids::Set{Int})

  # Arguments:
    `ids` refers to a set of existing individual ids in a population to keep the lineages for. Any nodes
    in the tree which are not ancestors of any individual in this set will be deleted.
  """


  """
  OTHER (VERY OPTIONAL) TUTORIAL TASK: Compute pairwise distances between nodes in tree. Relevant function

    function compute_pairwise_distances!(tree::PhylogeneticTree, ids::Set{Int};)

    To view information about the returned types, see the function docstring: (MRCA = Most Recent Common Ancestor)

    https://github.com/jarbus/PhylogeneticTrees.jl/blob/4179f72050e4c4a3bb99da71a1d727f6ddd6074d/src/tree.jl#L92
  """

  if gen % 10 == 0
    pop_a_mean = round(mean(o.genome for o in pop_a), digits=2)
    pop_b_mean = round(mean(o.genome for o in pop_b), digits=2)
    println("gen $gen: average genome in pop a = $pop_a_mean")
    println("gen $gen: average genome in pop b = $pop_b_mean")
  end
end

In [None]:
function write_phylogeny(filename::String, tree::PhylogeneticTree)
  open(filename, "w") do io
    println(io, "id,ancestor_list")
    for id in keys(tree.tree) |> collect |> sort
      node = tree.tree[id]
      # for visualization purposes, we put all genesis organisms under a dummy parent
      if isnothing(node.parent)
        println(io, "$id,[none]")
      else
        println(io, "$id,[$(node.parent.id)]")
      end
    end
  end
end

"""
TUTORIAL TASK: Write phylogeny of each population to disk

  We implement the above function to write a phylogeny to disk. Use it to log the phylogenies for your populations!

  Here, your phylogenies files should be should be called `pop_a_custom.csv` and `pop_b_custom.csv`
"""

# Using Jevo

Jevo implements phylogeny tracking with PhylogeneticTrees via the [Operator](https://jarbus.net/Jevo.jl/dev/api/#Jevo.Operator) interface. In the cell below, we provide a completed example of using phylogeny tracking in a simple Jevo pipeline.

In [None]:
println("Compiling Jevo...")
using Jevo
using Logging
using StableRNGs

STATS_FILE = "statistics.h5"
isfile(STATS_FILE) && rm(STATS_FILE)

global_logger(JevoLogger())
rng = StableRNG(1)

k = 10
n_dims = 2
n_inds = 100
n_species = 2
n_gens = 100

counters = default_counters()
ng_gc = ng_genotype_creator = Creator(VectorGenotype, (n=n_dims,rng=rng))
ng_developer = Creator(VectorPhenotype)

# Create Composite population with two subpopulations, p1 and p2
comp_pop_creator = Creator(CompositePopulation, ("species", [("p$i", n_inds, ng_gc, ng_developer) for i in 1:n_species], counters))
env_creator = Creator(CompareOnOne)

#
state = State("numbers_game", rng,
    # State accepts a list of creators
    [comp_pop_creator, env_creator],
    # and a sequence of operators to apply to the state
    [InitializeAllPopulations(),
    InitializePhylogeny(),
    AllVsAllMatchMaker(),
    Performer(),
    ScalarFitnessEvaluator(),
    TruncationSelector(k),
    CloneUniformReproducer(n_inds),
    Mutator(),
    UpdatePhylogeny(),
    TrackPhylogeny(),  # Adds phylogeny updates to disk
    PurgePhylogeny(),  # removes individuals with no surviving descendants
    ClearInteractionsAndRecords(),
    Reporter(GenotypeSum, console=true)], counters=counters)

println("running")
run!(state, n_gens)


# Viewing the Phylogeny

Jevo outputs phylogenies compliant with the [ALIFE Data Standards](https://github.com/alife-data-standards/alife-data-standards) phylogeny format. Each population outputs a separate phylogeny file. For each of our co-evolutionary setups above, we output two phylogenies. For the custom setup, we output files `pop_a_custom.csv` and `pop_b_custom.csv`. For the Jevo setup, we output files
`p1-phylo.csv` and `p2-phylo`.csv

Here, we need to convert from the [ALIFE Data Standards format](https://alife-data-standards.github.io/alife-data-standards/) to the newick file format, which is compatible with [IcyTree](https://icytree.org/)

In [None]:
;pip install alifedata-phyloinformatics-convert joinem

In [None]:
# Don't worry about the details of these cells We need to execute complex bash commands like this because
# this notebook uses a julia runtime and doesn't like special symbols.
open("convert.sh", "w") do io
println(io, """
#!/bin/bash

echo pop_a_custom.csv | python3 -m joinem pop_a_custom_aliased.csv --with-column "pl.col('id').alias('taxon_label')"
alifedata-phyloinformatics-convert fromalifedata --input-file pop_a_custom_aliased.csv --output-schema newick --output-file pop_a_custom_aliased.newick

echo pop_b_custom.csv | python3 -m joinem pop_b_custom_aliased.csv --with-column "pl.col('id').alias('taxon_label')"
alifedata-phyloinformatics-convert fromalifedata --input-file pop_b_custom_aliased.csv --output-schema newick --output-file pop_b_custom_aliased.newick

echo p1-phylo.csv | python3 -m joinem p1-phylo-aliased.csv --with-column "pl.col('id').alias('taxon_label')"
alifedata-phyloinformatics-convert fromalifedata --input-file p1-phylo-aliased.csv --output-schema newick --output-file p1-phylo-aliased.newick

echo p2-phylo.csv | python3 -m joinem p2-phylo-aliased.csv --with-column "pl.col('id').alias('taxon_label')"
alifedata-phyloinformatics-convert fromalifedata --input-file p2-phylo-aliased.csv --output-schema newick --output-file p2-phylo-aliased.newick

rm pop_a_custom_aliased.csv pop_b_custom_aliased.csv p1-phylo-aliased.csv p2-phylo-aliased.csv
""")
end

In [None]:
;bash convert.sh

Convert your phylogenies to .newick files and upload them to [icytree](https://icytree.org) to view them!