# `natsel_zhang` -- a branch-site test

This is the hypothesis test presented in [Zhang et al](https://www.ncbi.nlm.nih.gov/pubmed/16107592). This test evaluates the hypothesis that a set of sites have undergone positive natural selection on a pre-specified set of lineages.

For this model class, there are groups of branches for which all positions are evolving neutrally but some proportion of those neutrally evolving sites change to adaptively evolving on so-called foreground edges. For the current example, we'll define the Chimpanzee and Human branches as foreground and everything else as background. The following table defines the parameter scopes.

| Site Class |    Proportion |          Background Edges |          Foreground Edges |
|------------|---------------|---------------------------|---------------------------|
|          0 | p<sub>0</sub> | 0 < omega<sub>0</sub> < 1 | 0 < omega<sub>0</sub> < 1 |
|          1 | p<sub>1</sub> |     omega<sub>1</sub> = 1 |     omega<sub>1</sub> = 1 |
|         2a | p<sub>2</sub> | 0 < omega<sub>0</sub> < 1 | 0 < omega<sub>2</sub> > 1 |
|         2b | p<sub>3</sub> |     omega<sub>1</sub> = 1 | 0 < omega<sub>0</sub> < 1 |

**NOTE:** Our implementation is not as parametrically succinct as that of Zhang et al, we have 1 additional bin probability.

In [1]:
from cogent3.app import io, evo

loader = io.load_aligned(format="fasta", moltype="dna")
aln = loader("../data/primate_brca1.fasta")

zhang_test = evo.natsel_zhang("GNC",
                              tree="../data/primate_brca1.tree",
                              optimise_motif_probs=False,
                              tip1="Human",
                              tip2="Chimpanzee")

result = zhang_test(aln)
result

LR,df,pvalue
4.9647,3,0.1744

hypothesis,key,lnL,nfp,DLC,unique_Q
,'GNC-null',-6708.3119,24,True,
alt,'GNC-alt',-6705.8296,27,True,


In [2]:
result.alt.lf

A>C,A>G,A>T,C>A,C>G,C>T,G>A,G>C,G>T,T>A
0.8554,3.5343,0.9744,1.6586,2.1937,6.2585,8.0104,1.2418,0.7942,1.2667

T>C
2.9645

bin,bprobs
0,0.0532
1,0.2655
2a,0.0403
2b,0.641

edge,parent,length
Galago,root,0.5419
HowlerMon,root,0.1359
Rhesus,edge.3,0.0648
Orangutan,edge.2,0.0235
Gorilla,edge.1,0.0075
Human,edge.0,0.0182
Chimpanzee,edge.0,0.0085
edge.0,edge.1,0.0
edge.1,edge.2,0.0099
edge.2,edge.3,0.0365

edge,bin,omega
Galago,0,0.0
Galago,1,1.0
Galago,2a,0.0
Galago,2b,1.0
HowlerMon,0,0.0
HowlerMon,1,1.0
HowlerMon,2a,0.0
HowlerMon,2b,1.0
Rhesus,0,0.0
Rhesus,1,1.0

AAA,AAC,AAG,AAT,ACA,ACC,ACG,ACT,AGA,AGC
0.0556,0.0235,0.0344,0.0556,0.0228,0.0046,0.0008,0.0289,0.0231,0.0286

AGG,AGT,ATA,ATC,ATG,ATT,CAA,CAC,CAG,CAT
0.014,0.0381,0.0186,0.007,0.0128,0.0192,0.0196,0.0052,0.0238,0.0221

CCA,CCC,CCG,CCT,CGA,CGC,CGG,CGT,CTA,CTC
0.0195,0.0062,0.0006,0.0263,0.0011,0.0009,0.0023,0.0032,0.0137,0.0078

CTG,CTT,GAA,GAC,GAG,GAT,GCA,GCC,GCG,GCT
0.0125,0.0105,0.0755,0.0105,0.0303,0.0315,0.0158,0.0096,0.0014,0.0137

GGA,GGC,GGG,GGT,GTA,GTC,GTG,GTT,TAC,TAT
0.0161,0.009,0.0067,0.0133,0.0148,0.007,0.0069,0.0213,0.0023,0.0101

TCA,TCC,TCG,TCT,TGC,TGG,TGT,TTA,TTC,TTG
0.0221,0.0082,0.0015,0.0251,0.0018,0.004,0.0201,0.0212,0.0078,0.0108

TTT
0.0187


## Getting the posterior probabilities of site-class membership

In [3]:
bprobs = result.alt.lf.get_bin_probs()
bprobs[:, :20]

0,1,2,3,4,5,6,7,8,9
0.0759,0.0427,0.0,0.067,0.0586,0.08,0.043,0.0608,0.0519,0.0411
0.2546,0.27,0.2929,0.2588,0.2628,0.2527,0.2699,0.2617,0.2658,0.2706
0.0568,0.0329,0.0,0.0504,0.0444,0.0597,0.0331,0.046,0.0396,0.0317
0.6127,0.6543,0.7071,0.6237,0.6343,0.6076,0.654,0.6315,0.6428,0.6566

10,11,12,13,14,15,16,17,18,19
0.0392,0.08,0.048,0.0,0.0797,0.2618,0.0411,0.0355,0.0586,0.062
0.2716,0.2527,0.2676,0.2926,0.2528,0.1564,0.2706,0.2733,0.2628,0.2611
0.0303,0.0597,0.0367,0.0,0.0595,0.2023,0.0317,0.0275,0.0444,0.0468
0.6589,0.6076,0.6477,0.7074,0.608,0.3794,0.6566,0.6636,0.6343,0.6301


## Getting all the statistics in tabular form

In [4]:
tab = evo.tabulate_stats()
stats = tab(result.alt)
stats

5x tabular_result('global params': Table, 'bin params': Table, 'edge params': Table, 'edge bin params': Table)

In [5]:
stats["edge bin params"][:10]  # truncating the table

edge,bin,omega
Galago,0,0.0
Galago,1,1.0
Galago,2a,0.0
Galago,2b,1.0
HowlerMon,0,0.0
HowlerMon,1,1.0
HowlerMon,2a,0.0
HowlerMon,2b,1.0
Rhesus,0,0.0
Rhesus,1,1.0
