# Summary

This notebook introduces the basic concepts in statistical genetics, including:

- kinship, GRM

# Intuition

Here we will put a cartoon

# Notations

## Kinship Matrix

The **kinship matrix** estimates the genetic similarity between individuals. Given a genotype matrix $\mathbf{X}$ of size **$ N \times J $** (where **$ N $** is the number of individuals and **$ J $** is the number of variants), it is computed as:

$$
\mathbf{K} = \frac{1}{J} \mathbf{X} \mathbf{X}^T
$$

Where:
- $ \mathbf{X} $ is the centered genotype matrix (mean genotype for each variant is subtracted).
- $ J $ is the number of variants.
- $ \mathbf{K} $ is an $ N \times N $ symmetric matrix, where each entry $ K_{ij} $ represents the genetic relatedness between individual $ i $ and individual $ j $.


## Genomic Relationship Matrix 

The **Genomic Relationship Matrix (GRM)** is a standardized version of the kinship matrix that accounts for allele frequencies. One common formulation is:

$$
\mathbf{G} = \frac{1}{J} \mathbf{X} \mathbf{D}^{-1} \mathbf{X}^T
$$

Where:
- $ \mathbf{D} $ is a diagonal matrix containing the variances of each SNP:
  $$
  \mathbf{D}_{jj} = 2 f_j (1 - f_j)
  $$
  where $ f_j $ is the allele frequency at variant $ j $.
- $ \mathbf{G} $ is an $ N \times N $ matrix capturing the pairwise genetic relationships.

If $ \mathbf{X} $ is already **standardized (i.e., mean-centered and scaled by allele frequency)**, then the GRM simplifies to:

$$
\mathbf{G} = \frac{1}{J} \mathbf{X} \mathbf{X}^T
$$



## Key Differences
- **Kinship Matrix ($ \mathbf{K} $)**: Measures relatedness but may not account for allele frequencies.
- **GRM ($ \mathbf{G} $)**: Incorporates allele frequency scaling, making it widely used in GWAS and mixed models.

These matrices are crucial for capturing genetic structure and controlling for population stratification in association studies.



# Example

In [6]:
rm(list=ls())
# Genotype matrix for 5 individuals and 2 variants
# Rows correspond to individuals, columns to variants
N=5
J=3
genotypes <- matrix(c(0, 1, 1, 2, 2, 0, 1, 1, 1, 0, 2, 1, 0, 0, 2), 
                    nrow = N, ncol = J, byrow = TRUE)
# genotypes <- matrix(sample(0:2, N*J, replace = TRUE), 
#                    nrow = N, ncol = J)
# Adding row and column names
rownames(genotypes) <- paste("Individual", 1:N)
colnames(genotypes) <- paste("Variant", 1:J)
genotypes

Unnamed: 0,Variant 1,Variant 2,Variant 3
Individual 1,0,1,1
Individual 2,2,2,0
Individual 3,1,1,1
Individual 4,0,2,1
Individual 5,0,0,2


In [7]:
# Normalize the genotype matrix by centering each column (subtract mean)
p <- colMeans(genotypes) / 2  # Allele frequency (assuming bi-allelic)
X <- sweep(genotypes, 2, colMeans(genotypes), "-")  # Centering


In [8]:
# Compute the kinship matrix
kinship_matrix <- (X %*% t(X)) / J
# Print results
cat("Kinship Matrix:\n")
print(kinship_matrix)

Kinship Matrix:
             Individual 1 Individual 2  Individual 3 Individual 4  Individual 5
Individual 1   0.13333333  -0.33333333 -6.666667e-02   0.06666667  2.000000e-01
Individual 2  -0.33333333   1.20000000  1.333333e-01  -0.06666667 -9.333333e-01
Individual 3  -0.06666667   0.13333333  6.666667e-02  -0.13333333 -1.776357e-17
Individual 4   0.06666667  -0.06666667 -1.333333e-01   0.33333333 -2.000000e-01
Individual 5   0.20000000  -0.93333333 -1.776357e-17  -0.20000000  9.333333e-01


In [9]:
# Compute the GRM using allele frequencies
denominator <- sum(2 * p * (1 - p))
grm <- (X %*% t(X)) / denominator
cat("\nGenetic Relationship Matrix (GRM):\n")
print(grm)


Genetic Relationship Matrix (GRM):
             Individual 1 Individual 2  Individual 3 Individual 4  Individual 5
Individual 1    0.2857143   -0.7142857 -1.428571e-01    0.1428571  4.285714e-01
Individual 2   -0.7142857    2.5714286  2.857143e-01   -0.1428571 -2.000000e+00
Individual 3   -0.1428571    0.2857143  1.428571e-01   -0.2857143 -3.806479e-17
Individual 4    0.1428571   -0.1428571 -2.857143e-01    0.7142857 -4.285714e-01
Individual 5    0.4285714   -2.0000000 -3.806479e-17   -0.4285714  2.000000e+00


# TODO

- [ ] double check details of formulas and wikipedia to see if this is correct