Skip to content

Genetics

Alex V. Kotlar edited this page Sep 15, 2023 · 2 revisions

Liability Threshold Model For Multiple Diseases

Generative Process for Simulating Allele Counts in a Bivariate Liability Threshold Model

Step 1: Initialize Parameters

  • Sample Size (N): Total number of individuals in the population.
  • Number of Genes (G): Total number of genes considered.
  • Thresholds (θ1, θ2): Liability thresholds for Diseases 1 and 2 based on their prevalences.
  • Gene Effects (a): Pre-allocate or specify effect sizes for each gene.
  • Heritability (h2_1, h2_2): Proportion of variance explained by genetics for each disease.
  • Genetic Correlation (ρ): Degree of genetic correlation between Diseases 1 and 2.

Step 2: Simulate Genotypes

For each gene i in G:

  • Determine the number of carriers for the rare allele of that gene using a Poisson distribution.
  • Randomly assign these alleles to individuals in the population.

Step 3: Model Genetic and Environmental Contributions

For each individual j in N:

  • Calculate the genetic liability for Diseases 1 and 2.
  • Genetic Liability for Disease 1 (L1[j]) = Σ (Genotype[i] × Effect_Size[i] for Disease 1)
  • Genetic Liability for Disease 2 (L2[j]) = Σ (Genotype[i] × Effect_Size[i] for Disease 2)
  • Calculate the environmental liability for Diseases 1 and 2, typically from a Normal distribution.
  • Calculate the total liability for each disease.
  • Total Liability for Disease 1 = Genetic Liability for Disease 1 + Environmental Liability for Disease 1
  • Total Liability for Disease 2 = Genetic Liability for Disease 2 + Environmental Liability for Disease 2
  • Account for genetic correlation (ρ) between the diseases.

Step 4: Apply Threshold Model

For each individual j in N:

  • Determine the disease status based on the liability and the pre-defined threshold.
  • If Total Liability for Disease 1 >= θ1, then individual has Disease 1
  • If Total Liability for Disease 2 >= θ2, then individual has Disease 2

Step 5: Count Alleles for Each Disease Category

Count the number of rare alleles in individuals with:

  • Neither disease
  • Only Disease 1
  • Only Disease 2
  • Both diseases

By following this generative process, you can simulate a population with realistic genetic and environmental contributions to two diseases, taking into account the specified prevalence, heritability, and correlation between them.

Simplified Python Code

import torch

# Initialize Parameters
N = 1000  # Number of individuals
G = 1     # Number of genes (for simplicity)
theta1, theta2 = 1.0, 1.0  # Thresholds for Disease 1 and 2
h2_1, h2_2 = 0.5, 0.5  # Heritability for Disease 1 and 2
rho = 0.8  # Genetic correlation between Diseases 1 and 2

# Simulate Genotypes
# Randomly assign alleles 0, 1, 2 (homozygous common, heterozygous, homozygous rare)
genotypes = torch.randint(0, 3, (N, G)).float()

# Model Genetic and Environmental Contributions
genetic_liability_1 = genotypes.sum(dim=1)
genetic_liability_2 = rho * genetic_liability_1  # Include genetic correlation

# Add environmental effects
environmental_liability_1 = torch.normal(mean=0, std=torch.sqrt(1 - h2_1), size=(N,))
environmental_liability_2 = torch.normal(mean=0, std=torch.sqrt(1 - h2_2), size=(N,))

total_liability_1 = genetic_liability_1 + environmental_liability_1
total_liability_2 = genetic_liability_2 + environmental_liability_2

# Apply Threshold Model to Determine Disease Status
disease_status_1 = (total_liability_1 >= theta1).float()
disease_status_2 = (total_liability_2 >= theta2).float()

# Count Alleles for Each Disease Category
counts = {
    'neither': torch.sum(genotypes[(disease_status_1 == 0) & (disease_status_2 == 0)]),
    'only_1': torch.sum(genotypes[(disease_status_1 == 1) & (disease_status_2 == 0)]),
    'only_2': torch.sum(genotypes[(disease_status_1 == 0) & (disease_status_2 == 1)]),
    'both': torch.sum(genotypes[(disease_status_1 == 1) & (disease_status_2 == 1)])
}

print("Allele counts for each disease category:", counts)