# Simulate Data

Now that we have built our models and processed our empirical data, we can simulate data under each model.

We will simulate data with the same number of individuals and SNPs as our downprojected empirical data

In [6]:
from delimitpy import simulate_data
from delimitpy import parse_input
import os
import pickle
import numpy as np

## Read the configuration file, and provide info from "Reading input data and building the SFS"

We will read our configuration dictionary into memory, along with our labels and parameterized models.

We will provide our downsampling dictionary and the number of sites to use. We determined these values in the previous section of the tutorial.

In [7]:
# read the configuration file
config_parser = parse_input.ModelConfigParser("../../examples/test1/config.txt")
config_values = config_parser.parse_config()

# read the labels and models from memory.
labels = np.load(os.path.join(config_values["output directory"], 'labels.npy'), allow_pickle=True)
with open(os.path.join(config_values["output directory"], 'parameterized_models.pickle'), 'rb') as f:
    parameterized_models = pickle.load(f)

downsampling={"A":8, "B":6, "C":6} # based on building SFS results
max_sites = 332 # based on building SFS results

## Simulate the data

We use the DataSimulator class to simulate data. The functions simulate_ancestry and simulate_mutations simulate ancestries and mutations using msprime. 

In [8]:
data_simulator = simulate_data.DataSimulator(parameterized_models, labels, config=config_values, cores=1, downsampling=downsampling, max_sites = max_sites)

simulated_ancestries = data_simulator.simulate_ancestry() # simulate ancestry in msprime
simulated_mutations = data_simulator.simulate_mutations(simulated_ancestries) # simulate mutations in msprime

INFO:delimitpy.simulate_data:Ancestry simulation execution time: 2.0542240142822266 seconds.
INFO:delimitpy.simulate_data:Mutation simulation execution time: 0.5718109607696533 seconds.


## Generate sfs from simulated data

Now, we convert our mutations to a numpy array and generate multidimensional SFS and 2-dimensional SFS.

In [4]:
arrays = data_simulator.mutations_to_numpy(simulated_mutations) # convert to numpy array
mSFS = data_simulator.mutations_to_sfs(arrays) # generate mSFS
jSFS = data_simulator.mutations_to_2d_sfs(arrays) # generate 2D SFS

INFO:delimitpy.simulate_data:Median simulated data has 1267 SNPs, and your input has 1038 SNPs.If these numbers are very different, you may want to change some priors.


## Save the simulated datasets

Save the simulated datasets, so we can use them in the next part of the tutorial

In [5]:
np.save(os.path.join(config_values["output directory"], 'mSFS.npy'), np.array(mSFS), allow_pickle=True)
with open(os.path.join(config_values["output directory"], 'jSFS.pickle'), 'wb') as f:
    pickle.dump(jSFS, f)