Simulation framework for family-based association vs case-control using GLMM

**Input data**
1. The PED file containing the families information
2. The SFS file containing the variant information for the simulated gene (6-column format: gene, chromosome, position, ref, alt, MAF, function score)


Global Parameter Setting

In [None]:
[global]
# Disease model scenario: complex qualitative trait
parameter: name = 'Prop100'
# Proportion of functional variants that contribute to the disease
parameter: proportion = 'None'
# Odd ratio
parameter: OR = 2.0
# model LOGIT for qualitate traits and LNR for quantitative traits
parameter: model = 'LOGIT'
# Path to the ped file (6-column PED in linkage format)
parameter: ped_file = path('simped1000.ped')
# Path to list of genes
parameter: gene_list = path('gene.txt')
# Output directory for VCF file
parameter: out_dir = path('output')

# gene names
genes = paths([f'{gene_list:d}/{x.strip()}.sfs' for x in open(gene_list).readlines()])

At this point, a configuration file for the disease model is needed which will be used by RarePedSim to simulate genotypic and phenotypic info. 

In [None]:
[make_config: provides = f'{out_dir}/{name}.conf']
# conf file contains the simulation specifications (either Mendelian or Complex, details in RarePedSim doc)
output: f'{out_dir}/{name}.conf'
report: expand=True, output=_output
    trait_type=Complex
    [model]
    model={model}
    [quality control]
    def_rare=0.1
    rare_only=False
    def_neutral=(-1E-5, 1E-5)
    def_protective=(-1, -1E-5)
    [phenotype parameters]
    baseline_effect=0.01
    moi=AAR
    proportion_causal={proportion}
    [LOGIT model]
    OR_rare_detrimental=None
    OR_rare_protective=None
    ORmax_rare_detrimental=None
    ORmin_rare_protective=None
    OR_common_detrimental={OR}
    OR_common_protective=None
    [LNR model]
    meanshift_rare_detrimental=0.0
    meanshift_rare_protective=None
    meanshiftmax_rare_detrimental=None
    meanshiftmax_rare_protective=None
    meanshift_common_detrimental=None
    meanshift_common_protective=None
    [genotyping artifact]
    missing_low_maf=None
    missing_sites=None
    missing_calls=None
    error_calls=None
    [other]
    max_vars=2
    ascertainment_qualitative=(2,0,1)
    ascertainment_quantitative=((0,~),(0,~))

Generate the genotypes for the given families based on the configuration file

In [None]:
[simulate_1 (rarepedsim)]
depends: f'{out_dir}/{name}.conf'
input: for_each = 'genes'
output: f'{out_dir}/{_genes:bn}.vcf.gz'
bash: container = 'statisticalgenetics/rvnpl', expand = '${ }'
    rm -rf ${_output:nn} ${_output} ${_output}.tbi && mkdir -p ${_output:nn}
    rarepedsim generate -s ${_genes:a} -c ${out_dir}/${name}.conf -p ${ped_file:a} --num_genes 1 --num_reps 1 -o ${_output:nn} --vcf -b -1 --debug \
    && mv ${_output:nn}/${_genes:bn}/rep1.vcf ${_output:n} && rm -rf ${_output:nn}
    bgzip ${_output:n} && tabix -p vcf ${_output}