# Summary

This notebook introduces the basic concepts in statistical genetics, including:

- LD, LD score

# Intuition

![Figure 1_2](https://github.com/gaow/statgen-prerequisites/raw/main/cartoons/1_2.svg)



# Notations

## Linkage Disequilibrium (LD)

Linkage Disequilibrium (LD) refers to the non-random association of alleles at two or more loci. In a population, if two variants (or loci) are in LD, their allele combinations occur **more or less frequently than expected** based on their individual allele frequencies. In short words, LD describes the **sharing of certain combination of variants**.

Given the scaled genotype matrix $\mathbf{X}$, the LD matrix can be computed as :
$$
\mathbf{R} = \frac{\mathbf{X}^T \mathbf{X}}{N}
$$

where:

- $\mathbf{X}$ is the centered genotype matrix.
- $N$ is the number of individuals.

When $\mathbf{X}$ is scaled, the covariance matrix is the same as correlation matrix.

## LD Score

The **LD score** is a measure of the extent to which a given variant is in linkage disequilibrium (LD) with other variants across the genome. It is used to summarize the amount of genetic information (in terms of LD) that a variant shares with all other variants in a region of interest. The LD score for a variant $j$ is defined as the sum of the squared correlation coefficients $r^2$ between that variant and all other variants in the genome, typically within a specified genomic window or region. Mathematically, the LD score for variant $j$ is given by:

$$
l_j = \sum_{k=1, k \neq j}^Mcor^2(\textbf{X}_j, \textbf{X}_k)
$$

Since $\textbf{X}$ is standardized, we can just calculate the sum of the squared sample correlations like this:

$$\widetilde{l}_{j} = \frac{1}{N^2}\textbf{X}^\top_j\textbf{X}\textbf{X}^\top\textbf{X}_j$$
However, this is not an unbiased estimate. We can correct for the bias like this:

$$l_j = \frac{\widetilde{l}_{j} N - M}{N + 1}$$


This score reflects how much a variant is correlated with other variants across the genome, providing insight into the local structure of LD around that variant.



# Example

In [1]:
rm(list=ls())
# Genotype matrix for 5 individuals and 2 variants
# Rows correspond to individuals, columns to variants
N=5
M=3
X_raw <- matrix(c(0, 1, 1, 2, 2, 0, 1, 1, 1, 0, 2, 1, 0, 0, 2), 
                    nrow = N, ncol = M, byrow = TRUE)
# Adding row and column names
rownames(X_raw) <- paste("Individual", 1:N)
colnames(X_raw) <- paste("Variant", 1:M)
X_raw

Unnamed: 0,Variant 1,Variant 2,Variant 3
Individual 1,0,1,1
Individual 2,2,2,0
Individual 3,1,1,1
Individual 4,0,2,1
Individual 5,0,0,2


In [2]:
# standardize genotype matrix
X = scale(X_raw, scale=TRUE)
X

Unnamed: 0,Variant 1,Variant 2,Variant 3
Individual 1,-0.6708204,-0.2390457,0.0
Individual 2,1.5652476,0.9561829,-1.414214
Individual 3,0.4472136,-0.2390457,0.0
Individual 4,-0.6708204,0.9561829,0.0
Individual 5,-0.6708204,-1.4342743,1.414214


## Calculate LD matrix

In [3]:
LD = cor(X)
LD

Unnamed: 0,Variant 1,Variant 2,Variant 3
Variant 1,1.0,0.4677072,-0.7905694
Variant 2,0.4677072,1.0,-0.8451543
Variant 3,-0.7905694,-0.8451543,1.0


## Calculate LD scores

In [4]:
# Calculate the squared correlation matrix (LD matrix)
LD_squared <- LD^2  # Element-wise square to get r^2

# Calculate the LD score for each variant
# Sum the squared correlations in each column (excluding diagonal)
ld_scores_raw <- colSums(LD_squared) - diag(LD_squared)
# Bias correction
ld_scores_corrected <- ((ld_scores_raw * N) - M) / (N + 1)

# Print the corrected LD scores
ld_scores_corrected

The third variant is in high LD with the first two variants, leading to a higher LD score.

# Supplementary


### Ref
> LDSC: slide 110-117 from Xin HE
>
> LDSC: slide 82-84 from GW
> 
> slide 48-50 from GW


### Purpose of LD Score

- **Genetic Association Studies**: LD scores are useful in genetic association studies to account for the correlations between variants when performing polygenic risk score (PRS) analysis or genome-wide association studies (GWAS).
- **Controlling for Confounding**: In association studies, high LD between variants can lead to confounding effects, where the signal from a variant may be shared with other nearby variants. By using LD scores, researchers can assess the relative contribution of each variant and better control for LD when interpreting results.
- **Estimating Heritability**: LD scores are used to estimate the heritability of complex traits by calculating how much of the genetic variation in a trait can be explained by LD between variants.

### Interpretation

- **High LD Score**: A variant with a high LD score indicates that it is in strong LD with many other variants in the genome, meaning it shares a substantial amount of genetic variation with neighboring variants.
- **Low LD Score**: A variant with a low LD score indicates that it is not strongly correlated with many other variants, implying that it may have a unique genetic contribution or be in a region with low LD.

In summary, the LD score is a way to quantify the genetic "information content" of a variant based on its correlations with surrounding variants, and it is used in the context of genetic studies to account for LD structure in the genome.


<svg viewBox="0 0 800 600" xmlns="http://www.w3.org/2000/svg">
  <!-- Background -->
  <rect x="0" y="0" width="800" height="600" fill="#f8f9fa" />
  
  <!-- Title -->
  <text x="400" y="30" font-family="Arial" font-size="22" font-weight="bold" text-anchor="middle">Linkage Disequilibrium (LD) and LD Score</text>
  
  <!-- LD Concept Section -->
  <rect x="40" y="50" width="720" height="200" fill="#e3f2fd" stroke="#2196f3" stroke-width="2" rx="5" />
  <text x="400" y="75" font-family="Arial" font-size="18" font-weight="bold" text-anchor="middle">Linkage Disequilibrium (LD)</text>
  
  <!-- LD Definition -->
  <text x="60" y="100" font-family="Arial" font-size="14">Non-random association of alleles at different loci</text>
  
  <!-- Genome visualization -->
  <g transform="translate(150, 125)">
    <!-- Chromosome representation -->
    <rect x="0" y="0" width="500" height="30" fill="#c5cae9" stroke="#3f51b5" stroke-width="1" rx="5" />
    
    <!-- SNP positions -->
    <line x1="50" y1="35" x2="50" y2="55" stroke="#1976d2" stroke-width="1.5" />
    <line x1="100" y1="35" x2="100" y2="55" stroke="#1976d2" stroke-width="1.5" />
    <line x1="250" y1="35" x2="250" y2="55" stroke="#e91e63" stroke-width="1.5" />
    <line x1="350" y1="35" x2="350" y2="55" stroke="#e91e63" stroke-width="1.5" />
    <line x1="450" y1="35" x2="450" y2="55" stroke="#e91e63" stroke-width="1.5" />
    
    <!-- SNP labels -->
    <text x="50" y="20" font-family="Arial" font-size="14" text-anchor="middle">SNP₁</text>
    <text x="100" y="20" font-family="Arial" font-size="14" text-anchor="middle">SNP₂</text>
    <text x="250" y="20" font-family="Arial" font-size="14" text-anchor="middle">SNP₃</text>
    <text x="350" y="20" font-family="Arial" font-size="14" text-anchor="middle">SNP₄</text>
    <text x="450" y="20" font-family="Arial" font-size="14" text-anchor="middle">SNP₅</text>
    
    <!-- LD blocks visualization -->
    <rect x="25" y="60" width="100" height="25" fill="#1976d2" fill-opacity="0.3" stroke="#1976d2" stroke-width="2" rx="5" />
    <rect x="225" y="60" width="250" height="25" fill="#e91e63" fill-opacity="0.3" stroke="#e91e63" stroke-width="2" rx="5" />
    
    <!-- Block labels -->
    <text x="75" y="80" font-family="Arial" font-size="14" text-anchor="middle" fill="#1976d2">LD Block 1</text>
    <text x="350" y="80" font-family="Arial" font-size="14" text-anchor="middle" fill="#e91e63">LD Block 2</text>
    
    <!-- Description -->
    <text x="250" y="115" font-family="Arial" font-size="14" text-anchor="middle">SNPs within the same LD block are inherited together more often than expected by chance</text>
  </g>
  
  <!-- LD Matrix Section -->
  <rect x="40" y="260" width="360" height="320" fill="#e8eaf6" stroke="#3f51b5" stroke-width="2" rx="5" />
  <text x="220" y="285" font-family="Arial" font-size="16" font-weight="bold" text-anchor="middle">LD Matrix (R = X^T X / N)</text>
  
  <!-- LD Matrix Visual with clear block structure -->
  <g transform="translate(100, 310)">
    <rect x="0" y="0" width="240" height="240" fill="white" stroke="#3f51b5" stroke-width="1" />
    
    <!-- Matrix Headers -->
    <text x="24" y="-10" font-family="Arial" font-size="12" text-anchor="middle">SNP₁</text>
    <text x="72" y="-10" font-family="Arial" font-size="12" text-anchor="middle">SNP₂</text>
    <text x="120" y="-10" font-family="Arial" font-size="12" text-anchor="middle">SNP₃</text>
    <text x="168" y="-10" font-family="Arial" font-size="12" text-anchor="middle">SNP₄</text>
    <text x="216" y="-10" font-family="Arial" font-size="12" text-anchor="middle">SNP₅</text>
    
    <text x="-10" y="24" font-family="Arial" font-size="12" text-anchor="end">SNP₁</text>
    <text x="-10" y="72" font-family="Arial" font-size="12" text-anchor="end">SNP₂</text>
    <text x="-10" y="120" font-family="Arial" font-size="12" text-anchor="end">SNP₃</text>
    <text x="-10" y="168" font-family="Arial" font-size="12" text-anchor="end">SNP₄</text>
    <text x="-10" y="216" font-family="Arial" font-size="12" text-anchor="end">SNP₅</text>
    
    <!-- Grid Lines -->
    <line x1="0" y1="48" x2="240" y2="48" stroke="#999" stroke-width="1" />
    <line x1="0" y1="96" x2="240" y2="96" stroke="#999" stroke-width="1.5" stroke-dasharray="5,2" />
    <line x1="0" y1="144" x2="240" y2="144" stroke="#999" stroke-width="1" />
    <line x1="0" y1="192" x2="240" y2="192" stroke="#999" stroke-width="1" />
    
    <line x1="48" y1="0" x2="48" y2="240" stroke="#999" stroke-width="1" />
    <line x1="96" y1="0" x2="96" y2="240" stroke="#999" stroke-width="1.5" stroke-dasharray="5,2" />
    <line x1="144" y1="0" x2="144" y2="240" stroke="#999" stroke-width="1" />
    <line x1="192" y1="0" x2="192" y2="240" stroke="#999" stroke-width="1" />
    
    <!-- Matrix Values with Color Intensity showing clear LD block structure -->
    <!-- Block 1: SNP 1-2 strong LD -->
    <rect x="0" y="0" width="48" height="48" fill="#3f51b5" fill-opacity="1.0" />
    <rect x="48" y="0" width="48" height="48" fill="#3f51b5" fill-opacity="0.8" />
    <rect x="0" y="48" width="48" height="48" fill="#3f51b5" fill-opacity="0.8" />
    <rect x="48" y="48" width="48" height="48" fill="#3f51b5" fill-opacity="1.0" />
    
    <!-- Block 2: SNP 3-5 strong LD -->
    <rect x="96" y="96" width="48" height="48" fill="#e91e63" fill-opacity="1.0" />
    <rect x="144" y="96" width="48" height="48" fill="#e91e63" fill-opacity="0.8" />
    <rect x="192" y="96" width="48" height="48" fill="#e91e63" fill-opacity="0.7" />
    <rect x="96" y="144" width="48" height="48" fill="#e91e63" fill-opacity="0.8" />
    <rect x="144" y="144" width="48" height="48" fill="#e91e63" fill-opacity="1.0" />
    <rect x="192" y="144" width="48" height="48" fill="#e91e63" fill-opacity="0.8" />
    <rect x="96" y="192" width="48" height="48" fill="#e91e63" fill-opacity="0.7" />
    <rect x="144" y="192" width="48" height="48" fill="#e91e63" fill-opacity="0.8" />
    <rect x="192" y="192" width="48" height="48" fill="#e91e63" fill-opacity="1.0" />
    
    <!-- Low correlation between blocks -->
    <rect x="0" y="96" width="48" height="48" fill="#bdbdbd" fill-opacity="0.2" />
    <rect x="0" y="144" width="48" height="48" fill="#bdbdbd" fill-opacity="0.2" />
    <rect x="0" y="192" width="48" height="48" fill="#bdbdbd" fill-opacity="0.2" />
    <rect x="48" y="96" width="48" height="48" fill="#bdbdbd" fill-opacity="0.2" />
    <rect x="48" y="144" width="48" height="48" fill="#bdbdbd" fill-opacity="0.2" />
    <rect x="48" y="192" width="48" height="48" fill="#bdbdbd" fill-opacity="0.2" />
    <rect x="96" y="0" width="48" height="48" fill="#bdbdbd" fill-opacity="0.2" />
    <rect x="96" y="48" width="48" height="48" fill="#bdbdbd" fill-opacity="0.5" />
    <rect x="144" y="48" width="48" height="48" fill="#bdbdbd" fill-opacity="0.5" />
    <rect x="192" y="48" width="48" height="48" fill="#bdbdbd" fill-opacity="0.5" />
    <rect x="96" y="0" width="48" height="48" fill="#bdbdbd" fill-opacity="0.5" />
    <rect x="144" y="0" width="48" height="48" fill="#bdbdbd" fill-opacity="0.5" />
    <rect x="192" y="0" width="48" height="48" fill="#bdbdbd" fill-opacity="0.5" />
    <rect x="144" y="0" width="48" height="48" fill="#bdbdbd" fill-opacity="0.2" />
    <rect x="144" y="48" width="48" height="48" fill="#bdbdbd" fill-opacity="0.2" />
    <rect x="192" y="0" width="48" height="48" fill="#bdbdbd" fill-opacity="0.2" />
    <rect x="192" y="48" width="48" height="48" fill="#bdbdbd" fill-opacity="0.2" />
    <rect x="0" y="96" width="48" height="48" fill="#bdbdbd" fill-opacity="0.5" />
    <rect x="0" y="144" width="48" height="48" fill="#bdbdbd" fill-opacity="0.5" />
    <rect x="0" y="192" width="48" height="48" fill="#bdbdbd" fill-opacity="0.5" />
    <rect x="48" y="96" width="48" height="48" fill="#bdbdbd" fill-opacity="0.5" />
    <rect x="47" y="144" width="48" height="48" fill="#bdbdbd" fill-opacity="0.5" />
    <rect x="47" y="192" width="48" height="48" fill="#bdbdbd" fill-opacity="0.5" />
    <!-- Text values - just for key cells -->
    <text x="24" y="24" font-family="Arial" font-size="12" text-anchor="middle" fill="white">1.0</text>
    <text x="72" y="24" font-family="Arial" font-size="12" text-anchor="middle" fill="white">0.9</text>
    <text x="24" y="72" font-family="Arial" font-size="12" text-anchor="middle" fill="white">0.9</text>
    <text x="72" y="72" font-family="Arial" font-size="12" text-anchor="middle" fill="white">1.0</text>
    
    <text x="120" y="120" font-family="Arial" font-size="12" text-anchor="middle" fill="white">1.0</text>
    <text x="168" y="120" font-family="Arial" font-size="12" text-anchor="middle" fill="white">0.8</text>
    <text x="216" y="120" font-family="Arial" font-size="12" text-anchor="middle" fill="white">0.7</text>
    <text x="120" y="168" font-family="Arial" font-size="12" text-anchor="middle" fill="white">0.8</text>
    <text x="168" y="168" font-family="Arial" font-size="12" text-anchor="middle" fill="white">1.0</text>
    <text x="216" y="168" font-family="Arial" font-size="12" text-anchor="middle" fill="white">0.8</text>
    <text x="120" y="216" font-family="Arial" font-size="12" text-anchor="middle" fill="white">0.7</text>
    <text x="168" y="216" font-family="Arial" font-size="12" text-anchor="middle" fill="white">0.8</text>
    <text x="216" y="216" font-family="Arial" font-size="12" text-anchor="middle" fill="white">1.0</text>
    
    <!-- Low correlation text -->
    <text x="24" y="120" font-family="Arial" font-size="12" text-anchor="middle">0.2</text>
    <text x="24" y="170" font-family="Arial" font-size="12" text-anchor="middle">0.2</text>
    <text x="72" y="120" font-family="Arial" font-size="12" text-anchor="middle">0.1</text>
    <text x="24" y="220" font-family="Arial" font-size="12" text-anchor="middle">0.15</text>
    <text x="72" y="220" font-family="Arial" font-size="12" text-anchor="middle">0.1</text>
    <text x="72" y="168" font-family="Arial" font-size="12" text-anchor="middle">0.1</text>
    <text x="120" y="24" font-family="Arial" font-size="12" text-anchor="middle">0.2</text>
    <text x="168" y="72" font-family="Arial" font-size="12" text-anchor="middle">0.1</text>
    <text x="168" y="24" font-family="Arial" font-size="12" text-anchor="middle">0.2</text>
    <text x="216" y="24" font-family="Arial" font-size="12" text-anchor="middle">0.15</text>
    <text x="216" y="72" font-family="Arial" font-size="12" text-anchor="middle">0.2</text>
    <text x="120" y="72" font-family="Arial" font-size="12" text-anchor="middle">0.1</text>
    
    <!-- Block outlines -->
    <rect x="0" y="0" width="96" height="96" fill="none" stroke="#1976d2" stroke-width="2" />
    <rect x="96" y="96" width="144" height="144" fill="none" stroke="#e91e63" stroke-width="2" />
  </g>
  
  <!-- LD Score Section -->
  <rect x="420" y="260" width="360" height="320" fill="#fce4ec" stroke="#e91e63" stroke-width="2" rx="5" />
  <text x="600" y="285" font-family="Arial" font-size="16" font-weight="bold" text-anchor="middle">LD Score</text>
  
  <!-- LD Score Definition -->
  <text x="440" y="310" font-family="Arial" font-size="13">• Sum of squared correlation coefficients (r²) between</text>
  <text x="450" y="330" font-family="Arial" font-size="13">  variant j and all other variants</text>
  <text x="440" y="350" font-family="Arial" font-size="13">• Formula: l_j = ∑(cor²(X_j, X_k)) where k≠j</text>
  
  <!-- LD Score Bar Chart for SNP3 -->
  <g transform="translate(450, 365)">
    <text x="150" y="0" font-family="Arial" font-size="14" font-weight="bold" text-anchor="middle"> R^2 with SNP₃</text>
    
    <!-- Axes -->
  <line x1="0" y1="20" x2="0" y2="180" stroke="#333" stroke-width="2" />
  <line x1="0" y1="180" x2="300" y2="180" stroke="#333" stroke-width="2" />
  
  <!-- Y-axis labels -->
  <text x="-5" y="30" font-family="Arial" font-size="12" text-anchor="end">1</text>
  <text x="-5" y="66.25" font-family="Arial" font-size="12" text-anchor="end">0.75</text>
  <text x="-5" y="102.5" font-family="Arial" font-size="12" text-anchor="end">0.5</text>
  <text x="-5" y="138.75" font-family="Arial" font-size="12" text-anchor="end">0.25</text>
  <text x="-5" y="175" font-family="Arial" font-size="12" text-anchor="end">0.0</text>
  
  <!-- X-axis labels -->
  <text x="40" y="195" font-family="Arial" font-size="12" text-anchor="middle">r²₁₃</text>
  <text x="100" y="195" font-family="Arial" font-size="12" text-anchor="middle">r²₂₃</text>
  <text x="160" y="195" font-family="Arial" font-size="12" text-anchor="middle">r²₃₃</text>
  <text x="220" y="195" font-family="Arial" font-size="12" text-anchor="middle">r²₄₃</text>
  <text x="280" y="195" font-family="Arial" font-size="12" text-anchor="middle">r²₅₃</text>

  <!-- Bars -->
  <rect x="25" y="172.5" width="30" height="5.8" fill="#bdbdbd" />
  <rect x="85" y="177.5" width="30" height="1.45" fill="#bdbdbd" />
  <rect x="145" y="35" width="30" height="145" fill="#e91e63" fill-opacity="1" />
  <rect x="205" y="86" width="30" height="92.5" fill="#e91e63" fill-opacity="0.8" />
  <rect x="265" y="110" width="30" height="71" fill="#e91e63" fill-opacity="0.7" />

  <!-- Value labels -->
  <text x="40" y="170" font-family="Arial" font-size="11" text-anchor="middle">0.04</text>
  <text x="100" y="174" font-family="Arial" font-size="11" text-anchor="middle">0.01</text>
  <text x="160" y="30" font-family="Arial" font-size="11" text-anchor="middle">1</text>
  <text x="220" y="85" font-family="Arial" font-size="11" text-anchor="middle">0.64</text>
  <text x="280" y="105" font-family="Arial" font-size="11" text-anchor="middle">0.49</text>

    <!-- Sum equation -->
    <text x="150" y="210" font-family="Arial" font-size="13" text-anchor="middle">LD Score = 0.04 + 0.01 + 0.64 + 0.49 = 1.18</text>
    
    <!-- Highlight SNPs from same block -->
    <rect x="145" y="35" width="30" height="145" fill="none" stroke="#e91e63" stroke-width="2" />
    <rect x="205" y="86" width="30" height="92.5" fill="none" stroke="#e91e63" stroke-width="2" />
    <rect x="265" y="109" width="30" height="69" fill="none" stroke="#e91e63" stroke-width="2" />
    
    <!-- Lines to show sum -->
    <!-- <line x1="40" y1="160" x2="280" y2="160" stroke="#999" stroke-width="1" stroke-dasharray="3,3" /> -->
    <!-- <line x1="100" y1="170" x2="280" y2="170" stroke="#999" stroke-width="1" stroke-dasharray="3,3" /> -->
    <!-- <line x1="160" y1="100" x2="280" y2="100" stroke="#999" stroke-width="1" stroke-dasharray="3,3" /> -->
    <!-- <line x1="220" y1="120" x2="280" y2="120" stroke="#999" stroke-width="1" stroke-dasharray="3,3" /> -->
  </g>
  
  <!-- Definitions -->
  <defs>
    <marker id="arrowhead" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
      <polygon points="0 0, 10 3.5, 0 7" fill="#999" />
    </marker>
  </defs>
</svg>