# Todo

## Figure 1  *make fig*

Properties of the statistical measures.

- [X]  TOE LRT not $\chi^2$
- [X] $\nabla$ sensitive to alignment length, $\delta_{\nabla}$ fixes that
- [X] $\delta_{\nabla}$ relates to non-stationary, it is proportional to $JSD$

In [11]:
from pathlib import Path
from mdeq_analysis.plot import quantiles, nabla, tabulate

OUTPUT_ROOT = Path("~/repos/MutationDiseqMS").expanduser()
FIG_DIR = OUTPUT_ROOT / "figs"
TABLE_DIR = OUTPUT_ROOT / "tables"
FIG_DIR.mkdir(parents=True, exist_ok=True)
TABLE_DIR.mkdir(parents=True, exist_ok=True)

In [12]:
paths = list(Path("../results/micro/toe/fg-GSN-toe/").glob("*hi_hi*.tsv"))
width, height = 400, 400
fig = quantiles.get_one_plot(paths, width, height)
fig.update_layout(showlegend=True)
fig.show()

fig.write_image(FIG_DIR / "micro-toe-quantiles.pdf")

get the Hi-Hi for delta-nabla

In [13]:
paths = list(Path("../results/micro/convergence/fg-GSN-toe/").glob("*hi_hi*.sqlitedb"))

width, height = 400, 400
fig = nabla.fig_nabla_vs_delta_nabla(paths, width=width, height=height)

fig.show()
fig.write_image(FIG_DIR / "micro-nabla-delta_nabla.pdf")

plot jsd versus delta-nabla

In [14]:
rootdir = Path("/Users/gavin/repos/Honours2021/Kath/MutationDiseqAnalysis")
nabla_path = (
    rootdir / "results/micro/convergence/toe-filtered-selected-convergence.sqlitedb"
)
align_path = rootdir / "data/micro/filtered-selected.sqlitedb"
width, height = 400, 400
fig = nabla.fig_comparing_jsd_delta_nabla(align_path, nabla_path, width, height)
fig.show()
fig.write_image(FIG_DIR / "micro-jsd-delta_nabla.pdf")

## Figure 2 *make fig*

Evidence for systematically elevated mutation disequilibrium in *Drosophila melanogaster* compared to *Drosophila simulans*

- [ X ] Smile plots (Dmel, Dsim)
- [ ] $\delta_{\nabla}$ genomic plots, or histograms (autosome, X)

In [15]:
drosophila = quantiles.subplot_smiles(ape=False)
drosophila.show()
drosophila.write_image(FIG_DIR / "drosophila-smile-plots.pdf")

In [16]:
dros_hist = nabla.histogram_nabla_diff(ape=False, nbins=50)
dros_hist.update_layout(dict(width=700, height=400))
dros_hist.show()
dros_hist.write_image(FIG_DIR / "drosophila-hist.pdf")

## Figure 3 *make fig*

Majority of sampled genomic segments show mutation disequilibrium

- [ X ] Smile plots (intron, cds)
- [ ] $\delta_{\nabla}$ histogram between intron/cds

In [17]:
ape = quantiles.subplot_smiles(ape=True)
ape.show()
ape.write_image(FIG_DIR / "ape-smile-plots.pdf")

In [18]:
ape_hist = nabla.histogram_nabla_diff(ape=True, nbins=40)
ape_hist.update_layout(dict(width=700, height=400))
ape_hist.show()
ape_hist.write_image(FIG_DIR / "ape-hist.pdf")

## Table 1

PAR regions show greater mutation disequilibrium

- [ X ] produce stats

In [19]:
result_dir = Path("/Users/gavin/repos/Honours2021/Kath/MutationDiseqAnalysis/results/fxy")
data_dir = Path("/Users/gavin/repos/Honours2021/Kath/MutationDiseqAnalysis/data/fxy")
table = tabulate.fxy_table(data_dir, result_dir)

In [20]:
table.title=r"The magnitude of mutation disequilibrium is higher in the region of the \emph{Fxy} gene within the PAR."
table.legend=r"TOE $\hat{p}$-value and $\hat\delta_{\nabla}$ for the first six 5'- introns of \emph{M. musculus}. Column title is the \emph{M. musculus} intron rank. Data is from alignments of \emph{M. musculus}, \emph{M. spretus}, and \emph{R. norvegicus}. \emph{M. musculus} was treated as the foreground edge in model fitting. $\hat\sigma_\nabla$ is the estimated standard deviation of $\nabla$ from the null distribution. The length of each \emph{M. musculus} intron is presented."

latex = table.to_latex(label="tab:fxy")
outfile = TABLE_DIR / "fxy.tex"
outfile.write_text(latex)

1297