The demographic history provided in input to Prepare Decoding represents a piece-wise constant history of past effective population sizes, with format
TimeStart PopulationSize
Where TimeStart is the first generation where the population has size PopulationSize.
Note that population size is haploid, and that the demographic model is usually built assuming a specific mutation rate, which is passed as an argument to the ASMCprepareDecoding
program.
The first line should contain generation 0
.
You can obtain this model using e.g. PSMC/MSMC/SMC++.
If your model is not piecewise constant, you will need to approximate it as piecewise constant.
The last provided interval is assumed to last until time=Infinity (and is usually remote enough to have negligible effects on the results).
The demographic models used with ASMC can be found here and were inferred using smc++ in the following paper:
Spence, J.P. and Song, Y.S. Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations. Science Advances, Vol. 5, No. 10, eaaw9206 (2019), [doi].
They correspond to these population sizes, but rescaled to assume mutation rate of 1.65e-8.
The list of discrete time intervals provided in input to Prepare Decoding contains a single number per line, representing time measured in (continuous) generations, and starting at generation 0.0
.
For instance, the list 30-100-2000_CEU.disc contains time intervals:
0.0
30.0
60.0
90.0
... <lines omitted>
79855.6
96263.0
124311.7
The intervals defined in this file are: {0.0-30.0, 30.0-60.0, ..., 96263.0-124311.7, 124311.7-Infinity}
.
A file containing SNP frequency data, in Plink format. These frequencies should reflect the allele frequency spectrum of the data you plan to analyze with ASMC. The file contains a header row, and data rows, one per variant, in the following form:
CHR SNP A1 A2 MAF NCHROBS
1 rs3131972 A G 0.1684 302964
1 rs12184325 T C 0.03716 304088
where:
- CHR Chromosome code
- SNP Variant identifier
- A1 Allele 1 (usually minor)
- A2 Allele 2 (usually major)
- MAF Allele 1 frequency
- NCHROBS Number of allele observations
Prepare decoding expects a file with a header row and minor allele frequencies in column 5.
A file containing CSFS information. If pre-calculated, a CSFS file can be passed to Prepare Decoding, but if not present CSFS will be calculated at runtime. Once calculated, you can save the CSFS file for re-use. There is no need to understand the content of this file.
The following files can be output by Prepare Decoding. See the relevant sections of the API documentation for how to save these files:
The *.decodingQuantities.gz
file is generated by Prepare Decoding and is an input into ASMC
.
It is used to perform efficient inference of pairwise coalescence times.
There is no need to understand the content of this file.
The *.intervalsInfo
file is generated by the Prepare Decoding and is an input into ASMC
.
It contains some useful information about the time discretization and the demographic model.
It contains a number of lines corresponding to the number of discrete time intervals used in the analysis.
Each line has format:
IntervalStart ExpectedCoalescenceTime IntervalEnd
The values IntervalStart
and IntervalEnd
represent the start/end of each discrete time interval, and ExpectedCoalescenceTime
is the expected coalescence time for a pair of individuals who have been inferred to coalesce within this time interval, and depends on the demographic model.
See above.
See above.
See above.