# Todo

## Figure 1  *make fig*

Properties of the statistical measures.

- [X]  TOE LRT not $\chi^2$
- [X] $\nabla$ sensitive to alignment length, $\delta_{\nabla}$ fixes that
- [X] $\delta_{\nabla}$ relates to non-stationary, it is proportional to $JSD$

In [1]:
from pathlib import Path
from mdeq_analysis.plot import quantiles, nabla, tabulate, mixed

OUTPUT_ROOT = Path("~/repos/MutationDiseqMS").expanduser()
FIG_DIR = OUTPUT_ROOT / "figs"
TABLE_DIR = OUTPUT_ROOT / "tables"
FIG_DIR.mkdir(parents=True, exist_ok=True)
TABLE_DIR.mkdir(parents=True, exist_ok=True)

In [None]:
rootdir = Path("/Users/gavin/repos/Honours2021/Kath/MutationDiseqAnalysis")
pval_paths=list(Path("../results/micro/toe/fg-GSN-toe/").glob("*hi_hi*.tsv"))
nabla_path = (
    rootdir / "results/micro/convergence/toe-filtered-selected-convergence.sqlitedb"
)
align_path = rootdir / "data/micro/filtered-selected.sqlitedb"
conv_paths = list(Path("../results/micro/convergence/fg-GSN-toe/").glob("*hi_hi*.sqlitedb"))
fig = mixed.make_mixed_properties(pval_paths=pval_paths, align_path=align_path, nabla_path=nabla_path, conv_paths=conv_paths)
fig.show()
fig.write_image(FIG_DIR / "properties.pdf")

## Figure 2 *make fig*

Evidence for systematically elevated mutation disequilibrium in *Drosophila melanogaster* compared to *Drosophila simulans*

- [ X ] Smile plots (Dmel, Dsim)
- [ ] $\delta_{\nabla}$ genomic plots, or histograms (autosome, X)

In [None]:
drosophila= mixed.mixed_smiled_hist(ape=False)
drosophila.show()
drosophila.write_image(FIG_DIR / "drosophila-smile-hist-plots.pdf")

## Figure 3 *make fig*

Majority of sampled genomic segments show mutation disequilibrium

- [ X ] Smile plots (intron, cds)
- [ ] $\delta_{\nabla}$ histogram between intron/cds

In [None]:
ape = mixed.mixed_smiled_hist(ape=True)
ape.show()
ape.write_image(FIG_DIR / "ape-smile-hist-plots.pdf")

## Table 1

PAR regions show greater mutation disequilibrium

- [ X ] produce stats

In [None]:
result_dir = Path("/Users/gavin/repos/Honours2021/Kath/MutationDiseqAnalysis/results/fxy")
data_dir = Path("/Users/gavin/repos/Honours2021/Kath/MutationDiseqAnalysis/data/fxy")
table = tabulate.fxy_table(data_dir, result_dir)

In [None]:
table.title=r"The magnitude of mutation disequilibrium is higher in the region of the \emph{Fxy} gene within the PAR."
table.legend=r"TOE $\hat{p}$-value and $\hat\delta_{\nabla}$ for the first six 5'- introns of \emph{M. musculus}. Column title is the \emph{M. musculus} intron rank. Data is from alignments of \emph{M. musculus}, \emph{M. spretus}, and \emph{R. norvegicus}. \emph{M. musculus} was treated as the foreground edge in model fitting. $\hat\sigma_\nabla$ is the estimated standard deviation of $\nabla$ from the null distribution. The length of each \emph{M. musculus} intron is presented."

latex = table.to_latex(label="tab:fxy")
outfile = TABLE_DIR / "fxy.tex"
outfile.write_text(latex)