Python to create lists. To understand more see list comprenhension

In [37]:
demographic_summary = [(i+1,0) for i in range(6)]
demographic_summary

[(1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 0)]

Building dictionaries in python

In [38]:
demographic_summary = dict(demographic_summary)
demographic_summary

{1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0}

Demographic information for number of children per household in the US as of 2019. Extracted from https://www.census.gov/data/tables/2019/demo/families/cps-2019.html table C3 (Living arrangements of children under 18 years and marital status of parents -- info taken from the Presence of siblings section). Ask Gao where did he take the info from...I don't see the values for over 5 offspring.

In [39]:
demographic_summary[1] = 14788000/73524000
demographic_summary[2] = 28464000/73524000
demographic_summary[3] = 18288000/73524000
demographic_summary[4] = 7636000/73524000
demographic_summary[5] = 2697000/73524000
demographic_summary[6] = 1651000/73524000

In [40]:
demographic_summary

{1: 0.20113160328600185,
 2: 0.3871388934225559,
 3: 0.24873510690386813,
 4: 0.10385724389315054,
 5: 0.036681899787824386,
 6: 0.022455252706599205}

Now we want to normalize the data such that we always sample families with a number of offspring >=2 

In [41]:
for k in demographic_summary:
    if k == 1 :
        continue
    else:
        demographic_summary[k] /= (1 - demographic_summary[1])
demographic_summary[1]= 0

In [42]:
list(demographic_summary.values())

[0,
 0.4846090983383275,
 0.3113593026423318,
 0.1300054481067829,
 0.0459173249795696,
 0.028108825932988288]

Now, we draw 1000 pedigrees from this multinomial distribution. Changing the **n** to the number of pedigrees we would like to simulate, in this case 1000.

In [43]:
import numpy as np
n = 1000
data = np.random.multinomial(n, list(demographic_summary.values()))
data

array([  0, 497, 296, 134,  45,  28])

In [44]:
data = dict([(k, x) for k, x in zip(demographic_summary.keys(), data)])
data

{1: 0, 2: 497, 3: 296, 4: 134, 5: 45, 6: 28}

Now, the step of pedigree generation using the code below. To write the generated ped file use the `open()` function

In [51]:
filename = 'simped1000.ped'
f = open(filename, 'w')
num_fam = 0
fam_id = sid = fid = mid = sex = phen = ''
for fam_type in data:
    if fam_type == 1:
    # single off-spring family
        continue
    for i in range(data[fam_type]):
        num_fam += 1
        fam_id = f'FAM{num_fam}'
        fid = f'F{num_fam}'
        mid = f'M{num_fam}'
        # for founders
        print(f"{fam_id}\t{fid}\t0\t0\t1\t0")
        print(f"{fam_id}\t{mid}\t0\t0\t2\t0")
        for j in range(fam_type):
            sid = f"O{j+1}"
            n,p = 1, 0.5
            sex = np.random.binomial(n, p)
            sex = f"{sex+1}"
            print (f"{fam_id}\t{sid}\t{fid}\t{mid}\t{sex}\t0")
            f.write(f"{fam_id}\t{sid}\t{fid}\t{mid}\t{sex}\t0\n")
f.close()

FAM1	F1	0	0	1	0
FAM1	M1	0	0	2	0
FAM1	O1	F1	M1	2	0
FAM1	O2	F1	M1	1	0
FAM2	F2	0	0	1	0
FAM2	M2	0	0	2	0
FAM2	O1	F2	M2	2	0
FAM2	O2	F2	M2	1	0
FAM3	F3	0	0	1	0
FAM3	M3	0	0	2	0
FAM3	O1	F3	M3	2	0
FAM3	O2	F3	M3	1	0
FAM4	F4	0	0	1	0
FAM4	M4	0	0	2	0
FAM4	O1	F4	M4	2	0
FAM4	O2	F4	M4	2	0
FAM5	F5	0	0	1	0
FAM5	M5	0	0	2	0
FAM5	O1	F5	M5	2	0
FAM5	O2	F5	M5	1	0
FAM6	F6	0	0	1	0
FAM6	M6	0	0	2	0
FAM6	O1	F6	M6	2	0
FAM6	O2	F6	M6	1	0
FAM7	F7	0	0	1	0
FAM7	M7	0	0	2	0
FAM7	O1	F7	M7	1	0
FAM7	O2	F7	M7	1	0
FAM8	F8	0	0	1	0
FAM8	M8	0	0	2	0
FAM8	O1	F8	M8	1	0
FAM8	O2	F8	M8	1	0
FAM9	F9	0	0	1	0
FAM9	M9	0	0	2	0
FAM9	O1	F9	M9	2	0
FAM9	O2	F9	M9	2	0
FAM10	F10	0	0	1	0
FAM10	M10	0	0	2	0
FAM10	O1	F10	M10	1	0
FAM10	O2	F10	M10	2	0
FAM11	F11	0	0	1	0
FAM11	M11	0	0	2	0
FAM11	O1	F11	M11	2	0
FAM11	O2	F11	M11	1	0
FAM12	F12	0	0	1	0
FAM12	M12	0	0	2	0
FAM12	O1	F12	M12	1	0
FAM12	O2	F12	M12	1	0
FAM13	F13	0	0	1	0
FAM13	M13	0	0	2	0
FAM13	O1	F13	M13	1	0
FAM13	O2	F13	M13	2	0
FAM14	F14	0	0	1	0
FAM14	M14	0	0	2	0
FAM14	O1	F14	M14	1	0
FAM14	O2	F14	M14	2	

In [52]:
filename = 'proband.txt'
f = open(filename, 'w')
num_fam = 0
fam_id = sid = fid = mid = sex = phen = ''
for fam_type in data:
    if fam_type == 1:
    # single off-spring family
        continue
    for i in range(data[fam_type]):
        num_fam += 1
        fam_id = f'FAM{num_fam}'
        fid = f'F{num_fam}'
        mid = f'M{num_fam}'
        for j in range(fam_type):
            sid = f"O{j+1}"
            n,p = 1, 0.5
            sex = np.random.binomial(n, p)
            sex = f"{sex+1}"
            print (f"{fam_id} {sid} {fid} {mid} {sex} 2")
            f.write(f"{fam_id} {sid} {fid} {mid} {sex} 2\n")
f.close()

FAM1 O1 F1 M1 2 2
FAM1 O2 F1 M1 2 2
FAM2 O1 F2 M2 2 2
FAM2 O2 F2 M2 1 2
FAM3 O1 F3 M3 2 2
FAM3 O2 F3 M3 1 2
FAM4 O1 F4 M4 1 2
FAM4 O2 F4 M4 1 2
FAM5 O1 F5 M5 2 2
FAM5 O2 F5 M5 1 2
FAM6 O1 F6 M6 2 2
FAM6 O2 F6 M6 2 2
FAM7 O1 F7 M7 1 2
FAM7 O2 F7 M7 1 2
FAM8 O1 F8 M8 1 2
FAM8 O2 F8 M8 1 2
FAM9 O1 F9 M9 1 2
FAM9 O2 F9 M9 2 2
FAM10 O1 F10 M10 1 2
FAM10 O2 F10 M10 2 2
FAM11 O1 F11 M11 1 2
FAM11 O2 F11 M11 1 2
FAM12 O1 F12 M12 1 2
FAM12 O2 F12 M12 2 2
FAM13 O1 F13 M13 2 2
FAM13 O2 F13 M13 1 2
FAM14 O1 F14 M14 2 2
FAM14 O2 F14 M14 1 2
FAM15 O1 F15 M15 2 2
FAM15 O2 F15 M15 2 2
FAM16 O1 F16 M16 1 2
FAM16 O2 F16 M16 2 2
FAM17 O1 F17 M17 2 2
FAM17 O2 F17 M17 1 2
FAM18 O1 F18 M18 1 2
FAM18 O2 F18 M18 1 2
FAM19 O1 F19 M19 2 2
FAM19 O2 F19 M19 1 2
FAM20 O1 F20 M20 2 2
FAM20 O2 F20 M20 2 2
FAM21 O1 F21 M21 2 2
FAM21 O2 F21 M21 1 2
FAM22 O1 F22 M22 2 2
FAM22 O2 F22 M22 1 2
FAM23 O1 F23 M23 2 2
FAM23 O2 F23 M23 2 2
FAM24 O1 F24 M24 2 2
FAM24 O2 F24 M24 2 2
FAM25 O1 F25 M25 1 2
FAM25 O2 F25 M25 2 2
FAM2