ASMC includes Python bindings which can be installed using pip:
pip install asmc-asmc
Before reading further you may wish to read the ASMC docs. In particular, these sections are directly relevant:
- Summary (TL;DR)
- Input/output file formats
- Tools, scripts, and analyses
- Precomputed decoding quantities
See the notebooks
directory for examples.
There are two Jupyter notebooks:
- a minimal working example, where sensible defaults for parameters are chosen automatically
- a more detailed example that demonstrates how to customise parameters
The core Python API for ASMC consists of the following classes:
ASMC
DecodingParams
DecodePairsReturnStruct
The main ASMC
object can be constructed minimally with an input file root and a decoding quantities file.
Optional parameters are the output file root and the decoding mode.
The full signature (with defaults indicated) is as follows:
asmc = ASMC(
in_dir=input_files_root, # path to the
dq_file=dq_file, # path to the decoding quantities file
out_dir="", # location of output files (default is the input file root)
decoding_mode="array" # one of "array" or "sequence"
)
This creates an ASMC object with sensible defaults.
To fine-tune parameters you can instead create the ASMC object with an instance of DecodingParams
:
# These are the arguments (with defaults indicated) for constructing a decoding paramters object
params = DecodingParams(
in_file_root=input_files_root,
dq_file=dq_file,
map_file="", # Optional override for map|map.gz file, if not in in_file_root
out_file_root="",
jobs=1, # Number of jobs being done in total
job_ind=1, # Job index (0, ..., jobs)
decoding_mode_string="array", # One of {"squence", "array"}
decoding_sequence=False,
using_CSFS=True, # Whether to use CSFS
compress=False, # Compress emission to binary (no CSFS)
use_ancestral=False, # Assume ancestral alleles are coded as 1 in input (will assume 1 = minor otherwise)
skip_CSFS_distance=0.0, # Genetic distance between two CSFS emissions
no_batches=False, # Decode with no vectorization (do not use without good reason)
do_posterior_sums=False,
do_per_pair_posterior_mean=False,
expected_coal_times_file="",
within_only=False,
do_major_minor_posterior_sums=False, #
do_per_pair_MAP=False
)
asmc = ASMC(params)
You can specify the outputs that will be calculated with the following methods (with defaults indicated):
# Per pair posterior mean, MAP and full posteriors, as well as the sum of posteriors can be stored in matrices
asmc.set_store_per_pair_posterior_mean(True) # <-- true by default; others false by default
asmc.set_store_per_pair_map(False)
asmc.set_store_per_pair_posterior(False)
asmc.set_store_sum_of_posterior(False)
# Per pair posterior mean and MAP can be written to file. This is typically slow.
asmc.set_write_per_pair_posterior_mean(False)
asmc.set_write_per_pair_map(False)
Finally, the ASMC method decode_pairs
will run the analysis.
There are three different signatures available:
a = [1, 2, 3]
b = [4, 5, 6]
a_str = [f"1_{x}_1" for x in range(1,149)]
b_str = [f"1_{x}_2" for x in range(1,149)]
asmc.decode_pairs(a, b) # two lists of haplotype indices
asmc.decode_pairs(a_str, b_str) # two lists of haplotype IDs, with _1 and _2 indicating the haplotype
asmc.decode_pairs() # <-- decode all pairs in the dataset
The results can then be accessed either by copy or reference:
return_vals = asmc.get_copy_of_results()
return_vals_ref = asmc.get_ref_of_results()
Getting the values by reference is safe if you are only planning to call decode_pairs
once, or if you are performing calculations that do not require the results to persist after the first call to decode_pairs
.
If you call decode_pairs
multiple times, the results will be overwritten, so you should ensure you get results by copy.
The return structure will contain results based on the options selected on the ASMC object before calling decode_pairs
.
# The index information for the pairs decoded
return_vals.per_pair_indices
# The `per_pair_posteriors` option gives the largest amount of information: a list of 2D numpy arrays
# The list has length numPairs, and each 2D array has size (numStates x numSites)
return_vals.per_pair_posteriors
# The sum of posteriors is a single 2D numpy array of size (numStates x numSites)
return_vals.sum_of_posteriors
# Turning on the per_pair_posteriors flag gives you the the following:
# A 2D numpy array with posterior means, of size (numPairs x numSites)
return_vals.per_pair_posterior_means
# Two 1D numpy arrays with the column-wise min and argmin of this array:
return_vals.min_posterior_means
return_vals.argmin_posterior_means
# Turning on the per_pair_MAPs flag gives you the the following:
# A 2D numpy array with posterior MAPs, of size (numPairs x numSites)
return_vals.per_pair_MAPs
# Two 1D numpy arrays with the column-wise min and argmin of this array:
return_vals.min_MAPs
return_vals.argmin_MAPs
Finally, the ASMC object can also return the list of expected coalescent times from the decoding quantities file: asmc.get_expected_times()