# Comprehensive Statistical Genetics Topics Table

| Topic | Popular Methods | Keywords | Key Notes |
|-------|----------------|----------|-----------|
| Linkage Analysis | • Parametric LOD Score Analysis<br>• Non-parametric Affected Sib-Pair Analysis<br>• Variance Components Linkage Analysis | Recombination Fraction, Genetic Distance, Multipoint Analysis, Identity-by-Descent (IBD), Haplotype Reconstruction, Phase Determination, Penetrance, Disease Models, Genetic Heterogeneity | Maps disease genes by tracking co-inheritance of genetic markers with disease phenotypes through family pedigrees. Parametric methods require specifying inheritance models, while non-parametric approaches are model-free but less powerful. Statistical evidence quantified by LOD scores (>3 considered significant). |
| Genome-Wide Association Studies (GWAS) | • Single-SNP Association Testing<br>• Linear Mixed Models (BOLT-LMM, GCTA-GREML)<br>• Set-Based Association Tests (SKAT, MAGMA)<br>• Multi-phenotype Association Methods | Minor Allele Frequency, Hardy-Weinberg Equilibrium, Genotype Imputation, Principal Component Analysis, Genomic Control, False Discovery Rate, Quantile-Quantile Plots, Manhattan Plots, Lead SNPs, Independent Signals, Conditional Analysis, Meta-analysis | Tests statistical associations between genetic variants (typically SNPs) and traits/diseases across large populations, controlling for population structure and multiple testing. Modern GWAS encompasses various approaches beyond simple SNP-trait associations, including gene-based tests and multi-trait approaches. |
| Heritability Estimation | • Twin/Family Studies (ACE Models)<br>• Genome-Wide Complex Trait Analysis (GCTA-GREML)<br>• Linkage Disequilibrium Score Regression (LDSC)<br>• Haseman-Elston Regression<br>• Restricted Maximum Likelihood (REML) | Narrow-sense vs. Broad-sense Heritability, Additive Genetic Variance, Dominance Variance, Epistasis, Shared Environment, SNP-based Heritability, Missing Heritability, Assortative Mating, Genetic Correlation | Quantifies the proportion of phenotypic variance attributable to genetic factors. Estimates inform genetic architecture of traits, guide power calculations, and reveal shared genetic basis across traits through genetic correlations. |
| Quantitative Trait Locus (QTL) Mapping | • Interval Mapping<br>• Composite Interval Mapping<br>• Multiple QTL Mapping (MQM)<br>• Bayesian Interval Mapping | Expression QTLs (eQTLs), Protein QTLs (pQTLs), Splicing QTLs (sQTLs), Methylation QTLs (meQTLs), Local vs. Distant QTLs, Cis vs. Trans Effects, Pleiotropy, QTL Hotspots, Genetic Regulatory Networks | Identifies genomic loci influencing quantitative trait variation, particularly molecular traits like gene expression or protein levels. Modern QTL studies integrate multiple molecular layers to reconstruct genetic regulatory networks and identify molecular mediators of disease associations. |
| Population Genetics | • F-statistics (FST, FIS, FIT)<br>• Site Frequency Spectrum Analysis<br>• Coalescent Modeling<br>• Selection Scans (iHS, XP-EHH, Fay & Wu's H)<br>• Demographic Inference (∂a∂i, MSMC) | Genetic Drift, Gene Flow, Natural Selection, Mutation Rate, Effective Population Size, Selective Sweeps, Balancing Selection, Background Selection, Population Bottlenecks, Founder Effects, Isolation-by-Distance | Analyzes the distribution and changes in genetic variation within and between populations, inferring evolutionary forces and demographic history. Modern approaches integrate machine learning and large genomic datasets to reconstruct complex population histories and identify selection signatures. |
| Genetic Risk Prediction | • Polygenic Risk Scores (PRS)<br>• Bayesian Approaches (LDpred, PRS-CS)<br>• Machine Learning Methods (Random Forests, SVMs)<br>• Deep Learning Approaches | Clumping and Thresholding, Cross-validation, AUC-ROC, Calibration, Net Reclassification Index, Variable Thresholding, Ancestry-specific PRS, Trans-ancestry Prediction, Risk Stratification | Develops statistical tools to predict disease risk or trait values from genetic data, primarily using the combined effects of many variants. Methods differ in how they account for linkage disequilibrium, effect size distribution, and population differences in genetic architecture. |
| Identity-By-Descent (IBD) Analysis | • Hidden Markov Models (GERMLINE, BEAGLE)<br>• Haplotype-based Methods (RELATE, IBDseq)<br>• IBD Segment Detection (KING, PLINK)<br>• IBD Mapping (HomozygosityMapper, GIBDLD) | Runs of Homozygosity (ROH), Autozygosity, Haplotype Sharing, Segment Length Distribution, Detection Thresholds, Identity-by-State, Pedigree Reconstruction, Consanguinity, Demographic Inference | Detects genome segments shared between individuals that were inherited from a common ancestor without recombination. IBD segments inform on relatedness, population history, and can be used for mapping recessive disease variants through homozygosity mapping. |
| Fine-mapping | • Bayesian Statistical Approaches (CAVIAR, FINEMAP, PAINTOR)<br>• Functional Annotation Integration (CAVIARBF, fGWAS)<br>• Trans-ethnic Fine-mapping<br>• Statistical Colocalization (COLOC, eCAVIAR) | Credible Sets, Posterior Inclusion Probability, Functional Annotation Enrichment, Linkage Disequilibrium, Conditional Analysis, Molecular QTL Colocalization, Causal Variants | Refines association signals to identify likely causal variants, integrating statistical evidence with functional genomic annotations. Fine-mapping is critical for translating GWAS hits into biological mechanisms by pinpointing specific causal variants among many correlated SNPs. |
| Genetic Imputation | • Reference Panel-based Methods (IMPUTE, Minimac)<br>• Phasing Algorithms (SHAPEIT, Eagle)<br>• Machine Learning Approaches (BEAGLE)<br>• Low-coverage Sequencing Imputation (STITCH) | Haplotype Phasing, Reference Panels, Imputation Quality Score (R²), Rare Variant Imputation, Population-specific Reference Panels, Custom Reference Creation, Post-imputation QC | Statistically infers untyped genetic variants based on haplotype patterns from reference populations, increasing genomic coverage of genotyping studies. Modern imputation methods enable accurate inference of variants down to ~0.1% frequency using large, diverse reference panels. |
| Admixture Analysis | • Global Ancestry Estimation (ADMIXTURE, STRUCTURE)<br>• Local Ancestry Inference (RFMix, LAMP-LD)<br>• Ancestry-specific PCA<br>• Admixture Dating (ALDER, MALDER) | Ancestry Proportions, Population Substructure, Ancestry Informative Markers (AIMs), Deconvolution, Admixture Mapping, Genetic Ancestry, Chromosomal Ancestry Segments, Ancestry-specific LD | Estimates the proportion of an individual's genome derived from different ancestral populations and identifies chromosome segments of specific ancestral origin. Applications include controlling for population structure in association studies, studying migration history, and mapping ancestry-specific disease risk loci. |
| Mendelian Randomization | • Two-Sample MR<br>• Multivariable MR<br>• Pleiotropy-robust Methods (MR-Egger, Weighted Median)<br>• MR-PRESSO (MR Pleiotropy RESidual Sum and Outlier) | Instrumental Variables, Genetic Instruments, Horizontal Pleiotropy, Vertical Pleiotropy, Weak Instrument Bias, F-statistic, MR-Base, Two-stage Least Squares, Causal Inference | Uses genetic variants as instrumental variables to assess causal relationships between exposures and outcomes, controlling for confounding. MR exploits the random assortment of alleles during meiosis as a natural randomization process, analogous to randomized controlled trials. |
| RNA-Seq Analysis | • Differential Expression Analysis (DESeq2, edgeR, limma-voom)<br>• Expression Quantitative Trait Loci (eQTL) Mapping<br>• Alternative Splicing Analysis (rMATS, MAJIQ)<br>• RNA Co-expression Network Analysis (WGCNA) | Read Counts, Normalization Methods, Batch Effects, Dispersions, FDR Control, Transcript Assembly, Isoform Quantification, Splice Junction Analysis, RNA Editing | Analyzes gene expression patterns through next-generation sequencing of RNA, identifying differentially expressed genes, alternatively spliced transcripts, and expression networks. Integration with genetic variation data reveals regulatory genetic architecture. |
| Methylation Analysis | • Epigenome-Wide Association Studies (EWAS)<br>• Methylation QTL Analysis<br>• Bisulfite Sequencing Analysis<br>• Differentially Methylated Region (DMR) Detection | CpG Islands, Beta Values, M-values, Methylation Arrays, Cell-type Heterogeneity Correction, Age Acceleration, Epigenetic Clocks, Imprinting, X-inactivation | Studies patterns of DNA methylation across the genome, identifying epigenetic marks associated with diseases, environmental exposures, or genetic variation. Methylation serves as a potential biomarker and mediator between genetics, environment, and disease. |
| Gene-Environment Interaction | • Regression Models with Interaction Terms<br>• Case-only Designs<br>• Two-step Testing Approaches<br>• Bayesian Mixture Models | Effect Modification, Statistical Power, Stratified Analysis, Environmental Exposure Assessment, Exposure Timing, Scale-dependent Interactions, GxE Heritability | Investigates how environmental factors modify genetic effects on traits or diseases, helping explain variable penetrance. Methodological challenges include limited statistical power, exposure measurement error, and appropriate statistical modeling. |
| Pathway and Network Analysis | • Gene Set Enrichment Analysis (GSEA)<br>• Over-representation Analysis (ORA)<br>• Network Medicine Approaches<br>• Protein-Protein Interaction Analysis | Biological Pathways, Gene Ontology, Canonical Pathways, Network Modules, Disease Networks, Functional Enrichment, Network Topology, Key Driver Analysis | Examines whether groups of functionally related genes show collective association with traits, providing biological context for genetic findings. Advanced methods integrate protein interaction networks, regulatory information, and tissue-specific data. |
| Genetic Architecture Modeling | • Polygenic Models (Infinitesimal Model)<br>• Omnigenic Model Testing<br>• Genetic Architecture Parameter Estimation<br>• Bayesian Non-parametric Models | Polygenicity, Effect Size Distribution, Core Genes, Peripheral Genes, Pleiotropy, Genetic Correlation, Cross-trait Architecture, Selection Pressure | Characterizes the distribution of genetic effect sizes, number of causal variants, and relationships between genetic architecture and evolutionary forces. Models range from highly polygenic to more sparse architectures with larger effect sizes. |
| Rare Variant Association Analysis | • Burden Tests (CAST, WSS)<br>• Variance-component Tests (SKAT)<br>• Combined Tests (SKAT-O)<br>• Bayesian Multiple Regression (BMR) | Collapsing Methods, Sliding Window, Functional Weighting, De Novo Mutations, Transmission Disequilibrium, Trio Sequencing, VEP, Variant Aggregation | Tests for association between rare genetic variants (MAF<1%) and phenotypes by aggregating variants within functional units like genes or regulatory elements. Methods differ in how they handle protective vs. risk variants and variant weighting schemes. |
| Copy Number Variation (CNV) Analysis | • Hidden Markov Models for CNV Calling<br>• Read-depth Analysis<br>• CNV-GWAS Methods<br>• CNV Burden Analysis | Structural Variants, Deletion, Duplication, Insertion, Copy Number Polymorphisms, Array-CGH, SNP-array CNV Detection, Segmental Duplications | Identifies and analyzes genomic regions with non-diploid copy number, which can have large phenotypic effects. CNVs contribute significantly to genomic diversity and disease risk, particularly for neurodevelopmental disorders. |
| Haplotype Analysis | • Haplotype Phasing Algorithms (SHAPEIT, Eagle)<br>• Haplotype Association Testing<br>• Ancestral Haplotype Reconstruction<br>• Haplotype Blocks and Tagging | Phase Determination, LD Blocks, Tag SNPs, Haplotype Diversity, Extended Haplotype Homozygosity, Recombination Hotspots, Diplotype Analysis | Analyzes combinations of alleles inherited together on the same chromosome, capturing multi-SNP effects and evolutionary history. Haplotype-based analyses can increase power for detecting associations and identifying selection signatures. |
| Pharmacogenomics | • Genome-wide PGx Association Studies<br>• Drug Response QTL Mapping<br>• Gene-Drug Interaction Testing<br>• Polygenic Drug Response Prediction | Adverse Drug Reactions, Drug Metabolism, Drug Transporters, Clinical Implementation, PharmGKB, Drug Dosing Algorithms, Drug Efficacy, CPIC Guidelines | Studies genetic influences on drug response, metabolism, and adverse effects, aiming to optimize drug selection and dosing based on genetic profile. Clinical implementation involves translating research findings into actionable prescribing guidelines. |
| Single-cell Genomics Analysis | • Single-cell RNA-seq Analysis (Seurat, Monocle)<br>• Single-cell eQTL Analysis<br>• Cell-type Deconvolution<br>• Trajectory Inference | Cell Clusters, Marker Genes, Pseudotime, Cell Lineage, Droplet-based Methods, UMIs, Cell-type Specific Expression, Spatial Transcriptomics | Analyzes genetic and genomic variation at single-cell resolution, revealing cell-type specific effects and developmental trajectories. This resolves cellular heterogeneity masked in bulk tissue analyses and identifies rare cell populations. |
| Pleiotropy Analysis | • Multi-trait GWAS Methods (MTAG, CPASSOC)<br>• Genetic Correlation Analysis<br>• Mendelian Randomization Network Analysis<br>• Multivariate Decomposition Methods | Horizontal Pleiotropy, Vertical Pleiotropy, Cross-Phenotype Association, Shared Genetic Architecture, Biobank-scale Phenome Analysis, PheWAS, Cross-trait Colocalization | Studies genetic variants affecting multiple traits, revealing shared biological pathways and genetic architecture. Methods distinguish between direct pleiotropy (one variant affecting multiple traits) and mediated pleiotropy (one trait affecting another). |
| Epistasis and Gene-Gene Interaction | • Regression-based Interaction Tests<br>• Machine Learning Methods (Random Forests, MDR)<br>• Exhaustive Search Methods (BOOST, BiForce)<br>• Network-based Approaches | Statistical Epistasis, Biological Epistasis, Pairwise Interactions, Higher-order Interactions, Dimensionality Reduction, Functional Epistasis, Genetic Background Effects | Investigates non-additive genetic effects where the influence of one genetic variant depends on others. Epistasis contributes to missing heritability and can reveal biological pathway interactions, though statistical detection is computationally and statistically challenging. |
| Evolutionary Genomics | • dN/dS Ratio Analysis<br>• Phylogenetic Analysis<br>• Ancestral Sequence Reconstruction<br>• Molecular Clock Analysis | Natural Selection, Positive Selection, Purifying Selection, Molecular Evolution, Adaptive Evolution, Species Divergence, Conserved Elements, Accelerated Regions | Analyzes genomic data across species to infer evolutionary history, selection pressures, and functional importance of genomic regions. Comparative genomics provides context for understanding human genetic variation and identifying functionally important regions. |
| Chromatin Accessibility and Structure Analysis | • ATAC-seq Analysis<br>• Hi-C/Chromosome Conformation Analysis<br>• DNase-seq Footprinting<br>• ChromHMM State Analysis | Open Chromatin, Chromatin States, Topologically Associated Domains (TADs), Enhancer-Promoter Interactions, Chromatin QTLs, 3D Genome Organization, Regulatory Elements | Investigates three-dimensional genome organization and chromatin accessibility, mapping regulatory elements and their interactions. Integration with genetic variation data reveals functional mechanisms underlying GWAS associations through effects on gene regulation. |
| Mitochondrial Genetics | • Mitochondrial Haplogroup Analysis<br>• Heteroplasmy Detection and Quantification<br>• Mitochondrial-Nuclear Interaction Analysis<br>• Maternal Lineage Tracing | mtDNA, Heteroplasmy, Maternal Inheritance, Mitochondrial Diseases, Homoplasmy, Haplogroups, Threshold Effects, Mutation Load | Analyzes genetic variation in the mitochondrial genome and its interaction with nuclear genes, relevant for maternal lineage tracing, population history, and mitochondrial diseases. Special methods account for heteroplasmy (mixed mitochondrial populations) and maternal inheritance. |
| Sex Chromosome Analysis | • X-inactivation Analysis<br>• Sex-stratified Association Testing<br>• Y-chromosome Haplogroup Analysis<br>• Dosage Compensation Analysis | X-inactivation, Pseudo-autosomal Regions, Y-chromosome Haplogroups, Dosage Compensation, Sex-biased Expression, Sex-specific Effects, Turner/Klinefelter Syndromes | Applies specialized methods for analyzing genetic variation on sex chromosomes, accounting for their unique inheritance patterns, gene dosage, and regulatory mechanisms. Sex chromosomes require specific analytical approaches distinct from autosomes. |
| Genetic Clustering and Dimensionality Reduction | • Principal Component Analysis (PCA)<br>• t-SNE and UMAP for Genetic Data<br>• ADMIXTURE/STRUCTURE Analysis<br>• K-means Clustering of Genetic Data | Population Structure, Dimensionality Reduction, Visualization, Genetic Similarity, Outlier Detection, Ancestry Components, Sample Relatedness, Batch Effects | Applies statistical techniques to visualize and identify patterns in high-dimensional genetic data, particularly useful for detecting population structure, sample relatedness, and outliers in large-scale genetic studies. |
| Parent-of-Origin Effects | • Imprinting Analysis<br>• Parent-of-Origin Association Testing<br>• Methylation-based Imprinting Detection<br>• Trio-based Analysis | Genomic Imprinting, Maternal Effects, Paternal Effects, Uniparental Disomy, Transmission Asymmetry, Polar Overdominance, Epigenetic Regulation | Investigates genetic effects that depend on whether variants are inherited from the mother or father, primarily due to genomic imprinting (epigenetic silencing of one parental allele). These effects are important in development and certain disorders. |
| Somatic Mutation Analysis | • Tumor-Normal Comparison<br>• Variant Allele Frequency Analysis<br>• Cancer Driver Gene Identification<br>• Mutational Signature Analysis | Clonal Evolution, Tumor Heterogeneity, Driver vs. Passenger Mutations, Mosaicism, Cancer Genomics, Mutational Processes, Somatic Mutation Rate | Identifies and characterizes genetic changes that occur after conception in specific tissues, particularly important in cancer genomics. Methods distinguish between germline and somatic variants and identify patterns of mutation that reflect underlying mutational processes. |