Prepare Decoding file formats

Input file formats
Output file formats

Input file formats

Demographic history (*.demo)

The demographic history provided in input to Prepare Decoding represents a piece-wise constant history of past effective population sizes, with format

TimeStart   PopulationSize

Where TimeStart is the first generation where the population has size PopulationSize. Note that population size is haploid, and that the demographic model is usually built assuming a specific mutation rate, which is passed as an argument to the ASMCprepareDecoding program. The first line should contain generation 0. You can obtain this model using e.g. PSMC/MSMC/SMC++. If your model is not piecewise constant, you will need to approximate it as piecewise constant. The last provided interval is assumed to last until time=Infinity (and is usually remote enough to have negligible effects on the results).

The demographic models used with ASMC can be found here and were inferred using smc++ in the following paper:

Spence, J.P. and Song, Y.S. Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations. Science Advances, Vol. 5, No. 10, eaaw9206 (2019), [doi].

They correspond to these population sizes, but rescaled to assume mutation rate of 1.65e-8.

Time discretization (*.disc)

The list of discrete time intervals provided in input to Prepare Decoding contains a single number per line, representing time measured in (continuous) generations, and starting at generation 0.0. For instance, the list 30-100-2000_CEU.disc contains time intervals:

0.0
30.0
60.0
90.0
... <lines omitted>
79855.6
96263.0
124311.7

The intervals defined in this file are: {0.0-30.0, 30.0-60.0, ..., 96263.0-124311.7, 124311.7-Infinity}.

frequencies (*.frq)

A file containing SNP frequency data, in Plink format. These frequencies should reflect the allele frequency spectrum of the data you plan to analyze with ASMC. The file contains a header row, and data rows, one per variant, in the following form:

 CHR           SNP   A1   A2          MAF    NCHROBS
   1     rs3131972    A    G       0.1684     302964
   1    rs12184325    T    C      0.03716     304088

where:

CHR Chromosome code
SNP Variant identifier
A1 Allele 1 (usually minor)
A2 Allele 2 (usually major)
MAF Allele 1 frequency
NCHROBS Number of allele observations

Prepare decoding expects a file with a header row and minor allele frequencies in column 5.

Csfs (*.csfs)

A file containing CSFS information. If pre-calculated, a CSFS file can be passed to Prepare Decoding, but if not present CSFS will be calculated at runtime. Once calculated, you can save the CSFS file for re-use. There is no need to understand the content of this file.

Output file formats

The following files can be output by Prepare Decoding. See the relevant sections of the API documentation for how to save these files:

Decoding quantities object
Other methods

Decoding quantities (*.decodingQuantities.gz)

The *.decodingQuantities.gz file is generated by Prepare Decoding and is an input into ASMC. It is used to perform efficient inference of pairwise coalescence times. There is no need to understand the content of this file.

Time discretization intervals (*.intervalsInfo)

The *.intervalsInfo file is generated by the Prepare Decoding and is an input into ASMC. It contains some useful information about the time discretization and the demographic model. It contains a number of lines corresponding to the number of discrete time intervals used in the analysis. Each line has format:

IntervalStart   ExpectedCoalescenceTime IntervalEnd

The values IntervalStart and IntervalEnd represent the start/end of each discrete time interval, and ExpectedCoalescenceTime is the expected coalescence time for a pair of individuals who have been inferred to coalesce within this time interval, and depends on the demographic model.

Csfs (output) (*.csfs)

See above.

Time discretization (output) (*.disc)

See above.

Demographic history (output) (*.demo)

See above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

file_formats.md

file_formats.md

Prepare Decoding file formats

Input file formats

Demographic history (*.demo)

Time discretization (*.disc)

frequencies (*.frq)

Csfs (*.csfs)

Output file formats

Decoding quantities (*.decodingQuantities.gz)

Time discretization intervals (*.intervalsInfo)

Csfs (output) (*.csfs)

Time discretization (output) (*.disc)

Demographic history (output) (*.demo)

Files

file_formats.md

Latest commit

History

file_formats.md

File metadata and controls

Prepare Decoding file formats

Input file formats

Demographic history (*.demo)

Time discretization (*.disc)

frequencies (*.frq)

Csfs (*.csfs)

Output file formats

Decoding quantities (*.decodingQuantities.gz)

Time discretization intervals (*.intervalsInfo)

Csfs (output) (*.csfs)

Time discretization (output) (*.disc)

Demographic history (output) (*.demo)