-
Notifications
You must be signed in to change notification settings - Fork 13
Genetics
Alex V. Kotlar edited this page Sep 15, 2023
·
2 revisions
- Sample Size (N): Total number of individuals in the population.
- Number of Genes (G): Total number of genes considered.
- Thresholds (θ1, θ2): Liability thresholds for Diseases 1 and 2 based on their prevalences.
- Gene Effects (a): Pre-allocate or specify effect sizes for each gene.
- Heritability (h2_1, h2_2): Proportion of variance explained by genetics for each disease.
- Genetic Correlation (ρ): Degree of genetic correlation between Diseases 1 and 2.
For each gene i in G:
- Determine the number of carriers for the rare allele of that gene using a Poisson distribution.
- Randomly assign these alleles to individuals in the population.
For each individual j in N:
- Calculate the genetic liability for Diseases 1 and 2.
- Genetic Liability for Disease 1 (L1[j]) = Σ (Genotype[i] × Effect_Size[i] for Disease 1)
- Genetic Liability for Disease 2 (L2[j]) = Σ (Genotype[i] × Effect_Size[i] for Disease 2)
- Calculate the environmental liability for Diseases 1 and 2, typically from a Normal distribution.
- Calculate the total liability for each disease.
- Total Liability for Disease 1 = Genetic Liability for Disease 1 + Environmental Liability for Disease 1
- Total Liability for Disease 2 = Genetic Liability for Disease 2 + Environmental Liability for Disease 2
- Account for genetic correlation (ρ) between the diseases.
For each individual j in N:
- Determine the disease status based on the liability and the pre-defined threshold.
- If Total Liability for Disease 1 >= θ1, then individual has Disease 1
- If Total Liability for Disease 2 >= θ2, then individual has Disease 2
Count the number of rare alleles in individuals with:
- Neither disease
- Only Disease 1
- Only Disease 2
- Both diseases
By following this generative process, you can simulate a population with realistic genetic and environmental contributions to two diseases, taking into account the specified prevalence, heritability, and correlation between them.
import torch
# Initialize Parameters
N = 1000 # Number of individuals
G = 1 # Number of genes (for simplicity)
theta1, theta2 = 1.0, 1.0 # Thresholds for Disease 1 and 2
h2_1, h2_2 = 0.5, 0.5 # Heritability for Disease 1 and 2
rho = 0.8 # Genetic correlation between Diseases 1 and 2
# Simulate Genotypes
# Randomly assign alleles 0, 1, 2 (homozygous common, heterozygous, homozygous rare)
genotypes = torch.randint(0, 3, (N, G)).float()
# Model Genetic and Environmental Contributions
genetic_liability_1 = genotypes.sum(dim=1)
genetic_liability_2 = rho * genetic_liability_1 # Include genetic correlation
# Add environmental effects
environmental_liability_1 = torch.normal(mean=0, std=torch.sqrt(1 - h2_1), size=(N,))
environmental_liability_2 = torch.normal(mean=0, std=torch.sqrt(1 - h2_2), size=(N,))
total_liability_1 = genetic_liability_1 + environmental_liability_1
total_liability_2 = genetic_liability_2 + environmental_liability_2
# Apply Threshold Model to Determine Disease Status
disease_status_1 = (total_liability_1 >= theta1).float()
disease_status_2 = (total_liability_2 >= theta2).float()
# Count Alleles for Each Disease Category
counts = {
'neither': torch.sum(genotypes[(disease_status_1 == 0) & (disease_status_2 == 0)]),
'only_1': torch.sum(genotypes[(disease_status_1 == 1) & (disease_status_2 == 0)]),
'only_2': torch.sum(genotypes[(disease_status_1 == 0) & (disease_status_2 == 1)]),
'both': torch.sum(genotypes[(disease_status_1 == 1) & (disease_status_2 == 1)])
}
print("Allele counts for each disease category:", counts)