# Polycube robustness

In [1]:
import robustness
import altair as alt
import pandas as pd
import pickle

Let's start by calculating $N_g$, the set of genotype neighbours one point mutation away from a given genotype $g$. Depending on the allowed amount of colors,input size and dimentionality, the corresponding neigbourhood will be of different sizes, as we shall see later. 

As an initial example, this is the number of 2D mutations for the genotype of a 2x2 square, if we allow only one color and one cube type:

In [2]:
', '.join(robustness.enumerateMutations('040087000000', maxColor=1, maxCubes=1, dim=2))

'840087000000, 000087000000, 048487000000, 040487000000, 040003000000, 040007000000, 040087840000, 040087040000, 040087008400, 040087000400, 040087000084, 040087000004'

## Genotype robustness

We define the genotype robustness $\rho_g$ of a genotype $g$ as:
$$ \rho_g = \frac{1}{\left | N_g \right |}\sum_{n \in N_g} \begin{cases}
 1 & \text{ if } p(n)=p(g) \\ 
 0 & \text{ otherwise } 
\end{cases} $$
where $N_g$ is the set of 1-mutant neigbours of $g$ and $p(g)$ is the phenotype assembled from genotype $g$.

Let's calculate the robustness for the same genotype as before. This time in the default 3D with a maximum of 3 colours and 2 cube types

In [3]:
robustness.calcGenotypeRobustness('040087000000', 3, 2)

0.7592592592592593

 If we allow a larger phenotype space the robustness will be higher, since there is more room for neutral mutations:

In [4]:
genotypeRobustnessData = []
for maxCol in range(1,10):
    for maxCubes in range(1,10):
        genotypeRobustnessData.append({
            'maxCol': maxCol,
            'maxCubes': maxCubes,
            'robustness': robustness.calcGenotypeRobustness('040087000000', maxCol, maxCubes)
        })
alt.Chart(pd.DataFrame(data=genotypeRobustnessData)).mark_rect().encode(
    alt.X('maxCol:O', title="Allowed colors"),
    alt.Y('maxCubes:O', title="Allowed cube types"),
    alt.Color('robustness', scale=alt.Scale(domain=(0,1))),
)

## Phenotype robustness

With the genotype robustness defined, we can then define the phenotype robustness $\rho_p$ as the average genotype robustness for all genotypes enconding for the given phenotype:
$$ \rho_p = \frac{1}{\left | P \right |} \sum_{g \in P}\rho_g $$
where $P$ is the set of all genotypes with phenotype $p$.

For a large dataset, this will take a while; so it's better to calculate it separately by running `python robustness.py` and saving the pickled data:

In this case, we load a random sample of 100 phenotypes:

In [5]:
#phenotypeRobustnessData = robustness.calcPhenotypeRobustness(path='../cpp/out/3d', sampleSize=100)
phenotypeRobustnessData = pickle.load(open('../cpp/out/3d/robustness_100.p', "rb"))

In [6]:
alt.Chart(phenotypeRobustnessData).transform_calculate(
        url='https://akodiat.github.io/polycubes?hexRule=' + alt.datum.rule
    ).mark_circle(size=60).encode(
        alt.X('frequency', scale=alt.Scale(type='log'), title="Frequency"),
        alt.Y('robustness', title="Phenotype Robustness"),
        href='url:N',
        tooltip=['rule']
    ).interactive()