Python to create lists. To understand more see list comprenhension

In [283]:
demographic_summary = [(i+1,0) for i in range(6)]
demographic_summary

[(1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 0)]

Building dictionaries in python

In [284]:
demographic_summary = dict(demographic_summary)
demographic_summary

{1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0}

Demographic information for number of children per household in the US as of 2019. Extracted from https://www.census.gov/data/tables/2019/demo/families/cps-2019.html table C3 (Living arrangements of children under 18 years and marital status of parents -- info taken from the Presence of siblings section). Ask Gao where did he take the info from...I don't see the values for over 5 offspring.

In [285]:
demographic_summary[1] = 14788000/73524000
demographic_summary[2] = 28464000/73524000
demographic_summary[3] = 18288000/73524000
demographic_summary[4] = 7636000/73524000
demographic_summary[5] = 2697000/73524000
demographic_summary[6] = 1651000/73524000

In [286]:
demographic_summary

{1: 0.20113160328600185,
 2: 0.3871388934225559,
 3: 0.24873510690386813,
 4: 0.10385724389315054,
 5: 0.036681899787824386,
 6: 0.022455252706599205}

Now we want to normalize the data such that we always sample families with a number of offspring >=2 

In [287]:
for k in demographic_summary:
    if k == 1 :
        continue
    else:
        demographic_summary[k] /= (1 - demographic_summary[1])
demographic_summary[1]= 0

In [288]:
list(demographic_summary.values())

[0,
 0.4846090983383275,
 0.3113593026423318,
 0.1300054481067829,
 0.0459173249795696,
 0.028108825932988288]

Now, we draw 1000 pedigrees from this multinomial distribution. Changing the **n** to the number of pedigrees we would like to simulate, in this case 1000.

In [289]:
import numpy as np
n = 100
data = np.random.multinomial(n, list(demographic_summary.values()))
data

array([ 0, 49, 29, 16,  2,  4])

In [290]:
data = dict([(k, x) for k, x in zip(demographic_summary.keys(), data)])
data

{1: 0, 2: 49, 3: 29, 4: 16, 5: 2, 6: 4}

Now, the step of pedigree generation using the code below. To write the generated ped file use the `open()` function

In [311]:
ped_file = '/Users/dmc2245/Documents/Cornejo_Diana/family-association/seqsimla/input/simped100.txt'
proband_file = '/Users/dmc2245/Documents/Cornejo_Diana/family-association/seqsimla/input/proband100.txt'
offspring_file = '/Users/dmc2245/Documents/Cornejo_Diana/family-association/seqsimla/input/offspring100.txt'

In [312]:
pedigree = open (ped_file,'w')
num_fam = 0
fam_id = sid = fid = mid = sex = phen = ''
for fam_type in data:
    if fam_type == 1:
    # single off-spring family
        continue
    for i in range(data[fam_type]):
        num_fam += 1
        fam_id = f'FAM{num_fam}'
        fid = f'F{num_fam}'
        mid = f'M{num_fam}'
        # for founders
        print(f"{fam_id} {fid} 0 0 1 0", end = "\n", file=pedigree)
        print(f"{fam_id} {mid} 0 0 2 0", end = "\n", file=pedigree)
        for j in range(fam_type) :
            sid = f"O{j+1}"
            n,p = 1, 0.5 
            sex = np.random.binomial(n, p)
            sex = f"{sex+1}"
            print(f"{fam_id} {sid} {fid} {mid} {sex} 0", end = "\n", file=pedigree)
f.close()

In [314]:
probands = open(proband_file, 'w')
offspring = open(offspring_file, 'w')
num_fam = 0
fam_id = sid = fid = mid = sex = phen = ''
for fam_type in data:
    if fam_type == 1:
    # single off-spring family
        continue
    for i in range(data[fam_type]):
        num_fam += 1
        fam_id = f'FAM{num_fam}'
        fid = f'F{num_fam}'
        mid = f'M{num_fam}'
        #create proband file with unaffected parents and affected children
        #print(f"{fam_id} {fid} 0 0 1 1", end = "\n", file=probands)
        #print(f"{fam_id} {mid} 0 0 2 1", end = "\n", file=probands)
        for j in range(fam_type):
            sid = f"O{j+1}"
            n,p = 1, 0.5
            sex = np.random.binomial(n, p)
            sex = f"{sex+1}"
            print (f"{fam_id} {sid} {fid} {mid} {sex} 2", end = "\n", file=offspring)
f.close()

In [307]:
print(proband_file)

/Users/dmc2245/Documents/Cornejo_Diana/family-association/seqsimla/input/proband100.txt


In [304]:
print(ped_file)

/Users/dmc2245/Documents/Cornejo_Diana/family-association/seqsimla/input/simped100.txt
