## **Notation Table**  

| Symbol           | Meaning | Example Meaning |
|------------------|---------|----------------|
|$\mathbf{X}_{raw}$ | Raw genotype matrix (without centering or standardization)| |
| $ \mathbf{X} $           | Genotype matrix after normalization | |
| $\mathbf{X}_{i,\cdot} $          | Genotype for individual $ i $ | Genotype (e.g., number of risk alleles) of individual $ i $, when considering one variant |
| $ \mathbf{X}_{i,j} $          | Genotype variant $j$ for individual $ i $ | Genotype variant $j$ of individual $ i $, when considering multiple variants|
| $\mathbf{X}_{j}$ or $\mathbf{X}_{\cdot,j}$ | Genotype variant $j$ for all indivials |Genotype variant $j$ for all indivials |
| $i$|Index for individual|$i=1,2,...,N$|
| $j$|Index for genetic variant|$j=1,2,...,M$|
|$k$| Index for study |$k=1,2,...,K$|
| $ N $            | Number of samples | Number of individuals in the dataset |
| $M$ | Number of variants | Number of variants in the genotype data|
|$A$|Major allele|Major allele|
|$a$|Minor allele|Minor allele|
| $f_j$|Frequency for genetic variant $j$|0.1|
| $l_j$| LD score of genetic variant $j$ |$l_j$|
| $\Phi_{ij}$| The kinship coefficient between two individuals, $i$ and $j$|  
|$\mathbf{G}$|Genetic relatedness matrix (GRM)| $N$ by $N$ matrix|
| $ Y $           | Random variable (trait) | Trait (e.g., height $Y_1$ and weight $Y_2$ when bi-variate) |
| $ y_i $          | Trait value for individual $ i $ | Height of individual $ i $ |
|$\beta_0$| baseline level in the regression model, intercept|170cm for height|
|$\beta$| marginal effect size of a variant in GWAS study||
| $ \sigma^2 $   | Variance of a random variable | |
| $ \mathbf{Z} $           | Covariate matrix after normalization | |
|$\mathbf{u}$| a vector of the **fixed** effects of covariates| |
|$Q$ |the Cochran’s Q statistic| |
|$I^2$|$I^2$ statistic||


| Symbol           | Meaning | Example Meaning |
|------------------|---------|----------------|
| $ M_1 $          | Model 1 (no genetic effect) | Model assuming no association between genotype and trait |
| $ M_2 $          | Model 2 (genetic effect) | Model assuming an association between genotype and trait |
| $ D $            | Observed data $\{(x_1,y_1), (x_2, y_2), \dots, (x_n, y_n)\}$ | Collected genotype-trait pairs |
| $ \mu $         | Mean of the normal distribution (unknown) | Mean of the normal distribution in the population  (e.g., $\mu_1$ for height and $\mu_2$ for weight when bi-variate)  |
| $ \theta $         | Mean of the normal distribution (unknown) | Causal effect of genetic variant on the trait|
| $ \theta_0 $     | Prior mean  | Prior mean of $\theta$ before observing data $D$ |
| $ \theta_1 $     | Posterior mean  |  Posterior mean of $\theta$ after observing data $D$ |
| $ \sigma_0^2 $   | Variance of the normal distribution (known) | Prior uncertainty in $\mu$ |
| $ \sigma_1^2 $   | Posterior variance | Updated uncertainty in $\mu$ after observing data |
| $ \tau_0 $       | Prior precision ($\frac{1}{\sigma_0^2}$)  | Inverse of prior variance |
| $ \tau_1 $       | Posterior precision ($\frac{1}{\sigma_1^2}$) | Inverse of posterior variance |
| $ \Sigma $ | Variance-covariance matrix|variance-covariance matrix for height $Y_1$and weight $Y_2$|
| $ \Omega $ | Precision matrix|Precision matrix for height $Y_1$and weight $Y_2$|
| $ n $            | Number of samples | Number of individuals in the dataset |
| $ L(M) $         | Likelihood for model $ M $ | Probability of observing the data under model $M$ |
| $ l(M) $         | Log-likelihood for model $ M $ | Log-transformed likelihood for numerical stability |


| Concept | Description | Importance | Rui's notes|
|---------|-------------|------------|-------------------|
| Kinship | Statistical measure of relatedness between individuals based on shared genetic material | Fundamental for understanding genetic relationships and forms the basis for many statistical methods in genetics | remove because it's more popgen |
| Identity by Descent (IBD) | Probability that alleles in two individuals are identical because they were inherited from a common ancestor | Essential for relatedness inference and forms the basis for many genetic analyses | |
| Haplotypes | Combination of alleles at different loci that are inherited together | Critical for understanding inheritance patterns and LD structure | ?|
| Effective Population Size (Ne) | The size of an idealized population that would experience the same amount of genetic drift as the observed population | Crucial for understanding how genetic variation changes over time in populations | |
| Allele/Genotype Coding | Numerical representation of genetic variants (additive, dominant, recessive models) | Basic requirement for any genetic analysis involving regression models | i think we should add it|
| Assortative Mating | Non-random mating based on phenotypic similarity | Affects genetic structure and can bias heritability estimates | popgen?|
| Ascertainment Bias | Systematic error due to non-random sampling | Critical for understanding limitations in genetic studies | popgen?|
| Genetic Distance | Measures of genetic dissimilarity between individuals or populations | Basic concept for population genetic analyses | covered in GRM|
| Threshold Models | Statistical models linking continuous genetic liability to categorical phenotypes | Essential for analyzing binary and categorical traits | we should probably create a `phenotype` notebook|
| Cross-Validation | Method to assess how genetic models generalize to independent data | Fundamental for evaluating predictive models | too pure stat?|
|Genomic Control / Inflation Factor | include this with QQ plots and how to interpret deflation/inflation in test statistics | too pure stat? |
| Genetic Heterogeneity | Different genetic variants producing similar phenotypes | Essential for understanding why multiple genes can cause similar diseases or traits | ?|
| Genetic Load | Total quantity of deleterious genetic material in a population or individual | Fundamental for understanding the accumulation of harmful mutations | ?|
| Genotype-Environment Correlation | Association between genotypes and the environments individuals experience | Critical for distinguishing genetic effects from environmental factors | ?|
| Genotype Imputation | Statistical inference of unmeasured genotypes based on known haplotype patterns | Fundamental for increasing statistical power in genetic studies | ?|
| Genetic Anticipation | Phenomenon where genetic disorders manifest at earlier ages in successive generations | Important concept for understanding intergenerational disease patterns | popgen?|
| Genomic Inflation Factor | Measure of systematic bias in test statistics from a genetic association study | Essential quality control metric in genetic association studies | |
| Regression to the Mean | The tendency for extreme phenotypic values to produce offspring with less extreme values | Fundamental concept for prediction and breeding applications | ?|
| Parentage Analysis | Statistical methods to assign parents based on genetic data | Fundamental for pedigree verification and relationship testing | |
| Genetic Background Effects | Influence of overall genetic makeup on the phenotypic expression of specific variants | Important for understanding variant effect heterogeneity across populations | ?|
| Synthetic Association | Phenomenon where common variants tag effects of multiple rare causal variants | Critical concept for interpreting GWAS findings | ?|
| Nested Association Mapping | Design combining advantages of linkage analysis and association mapping | Powerful approach for dissecting complex trait genetics, especially in plants | ?|