ASMC Python API

Examples using the Python bindings
API
- ASMC
- DecodePairsReturnStruct

ASMC includes Python bindings which can be installed using pip:

pip install asmc-asmc

Before reading further you may wish to read the ASMC docs. In particular, these sections are directly relevant:

Summary (TL;DR)
Input/output file formats
Tools, scripts, and analyses
Precomputed decoding quantities

Examples using the Python bindings

See the notebooks directory for examples. There are two Jupyter notebooks:

a minimal working example, where sensible defaults for parameters are chosen automatically
a more detailed example that demonstrates how to customise parameters

API

The core Python API for ASMC consists of the following classes:

ASMC
DecodingParams
DecodePairsReturnStruct

ASMC

The main ASMC object can be constructed minimally with an input file root and a decoding quantities file. Optional parameters are the output file root and the decoding mode. The full signature (with defaults indicated) is as follows:

asmc = ASMC(
    in_dir=input_files_root,          # path to the 
    dq_file=dq_file,                  # path to the decoding quantities file
    out_dir="",                       # location of output files (default is the input file root)
    decoding_mode="array"             # one of "array" or "sequence"
)

This creates an ASMC object with sensible defaults. To fine-tune parameters you can instead create the ASMC object with an instance of DecodingParams:

# These are the arguments (with defaults indicated) for constructing a decoding paramters object
params = DecodingParams(
    in_file_root=input_files_root,
    dq_file=dq_file,
    map_file="",                          # Optional override for map|map.gz file, if not in in_file_root
    out_file_root="",
    jobs=1,                               # Number of jobs being done in total
    job_ind=1,                            # Job index (0, ..., jobs)
    decoding_mode_string="array",         # One of {"squence", "array"}
    decoding_sequence=False,
    using_CSFS=True,                      # Whether to use CSFS
    compress=False,                       # Compress emission to binary (no CSFS)
    use_ancestral=False,                  # Assume ancestral alleles are coded as 1 in input (will assume 1 = minor otherwise)
    skip_CSFS_distance=0.0,               # Genetic distance between two CSFS emissions
    no_batches=False,                     # Decode with no vectorization (do not use without good reason)
    do_posterior_sums=False,
    do_per_pair_posterior_mean=False,
    expected_coal_times_file="",
    within_only=False,
    do_major_minor_posterior_sums=False,  # 
    do_per_pair_MAP=False    
)

asmc = ASMC(params)

You can specify the outputs that will be calculated with the following methods (with defaults indicated):

# Per pair posterior mean, MAP and full posteriors, as well as the sum of posteriors can be stored in matrices
asmc.set_store_per_pair_posterior_mean(True)  # <-- true by default; others false by default
asmc.set_store_per_pair_map(False)
asmc.set_store_per_pair_posterior(False)
asmc.set_store_sum_of_posterior(False)

# Per pair posterior mean and MAP can be written to file. This is typically slow.
asmc.set_write_per_pair_posterior_mean(False)
asmc.set_write_per_pair_map(False)

Finally, the ASMC method decode_pairs will run the analysis. There are three different signatures available:

a = [1, 2, 3]
b = [4, 5, 6]

a_str = [f"1_{x}_1" for x in range(1,149)]
b_str = [f"1_{x}_2" for x in range(1,149)]

asmc.decode_pairs(a, b)           # two lists of haplotype indices
asmc.decode_pairs(a_str, b_str)   # two lists of haplotype IDs, with _1 and _2 indicating the haplotype
asmc.decode_pairs()               # <-- decode all pairs in the dataset

The results can then be accessed either by copy or reference:

return_vals = asmc.get_copy_of_results()
return_vals_ref = asmc.get_ref_of_results()

Getting the values by reference is safe if you are only planning to call decode_pairs once, or if you are performing calculations that do not require the results to persist after the first call to decode_pairs. If you call decode_pairs multiple times, the results will be overwritten, so you should ensure you get results by copy.

DecodePairsReturnStruct

The return structure will contain results based on the options selected on the ASMC object before calling decode_pairs.

# The index information for the pairs decoded
return_vals.per_pair_indices

# The `per_pair_posteriors` option gives the largest amount of information: a list of 2D numpy arrays
# The list has length numPairs, and each 2D array has size (numStates x numSites)
return_vals.per_pair_posteriors

# The sum of posteriors is a single 2D numpy array of size (numStates x numSites)
return_vals.sum_of_posteriors

# Turning on the per_pair_posteriors flag gives you the the following:
# A 2D numpy array with posterior means, of size (numPairs x numSites)
return_vals.per_pair_posterior_means
# Two 1D numpy arrays with the column-wise min and argmin of this array:
return_vals.min_posterior_means
return_vals.argmin_posterior_means

# Turning on the per_pair_MAPs flag gives you the the following:
# A 2D numpy array with posterior MAPs, of size (numPairs x numSites)
return_vals.per_pair_MAPs
# Two 1D numpy arrays with the column-wise min and argmin of this array:
return_vals.min_MAPs
return_vals.argmin_MAPs

Finally, the ASMC object can also return the list of expected coalescent times from the decoding quantities file: asmc.get_expected_times()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asmc_python.md

asmc_python.md

ASMC Python API

Examples using the Python bindings

API

ASMC

DecodePairsReturnStruct

Files

asmc_python.md

Latest commit

History

asmc_python.md

File metadata and controls

ASMC Python API

Examples using the Python bindings

API

ASMC

DecodePairsReturnStruct