## Likelihood Ratio Test (LRT) and Bayes Factor (BF)

# Intuition


- **Likelihood Ratio Test (LRT)**: A frequentist method that compares the fit of two models (null and alternative) by evaluating the ratio of their likelihoods to determine if the alternative model provides a significantly better fit to the data.

- **Bayes Factor (BF)**: A Bayesian method that compares the marginal likelihoods (or evidence) of two models to quantify how much more likely the data are under one model compared to the other.


**FIGURE PLACEHOLDER:** ![Likelihood Intuition Cartoon](image_placeholder)

# Notations

## Likelihood Ratio Test (LRT)

The Likelihood Ratio Test (LRT) is a statistical test used to compare the fit of two models: a **null model** and an **alternative model**. The null model usually represents a simpler hypothesis (e.g., no effect), and the alternative model represents a more complex hypothesis (e.g., an effect exists). 

The basic idea is to test whether the data provides enough evidence to reject the null hypothesis in favor of the alternative hypothesis. Note that the Likelihood Ratio Test (LRT) is primarily a frequentist approach to hypothesis testing, not Bayesian.


The Likelihood Ratio Test statistic is defined as:

$$
\Lambda = -2 \log \left( \frac{L(\text{null})}{L(\text{alternative})} \right)
$$

Where:
- $L(\text{null})$ is the likelihood of the data under the null model,
- $L(\text{alternative})$ is the likelihood of the data under the alternative model.

The test statistic $\Lambda$ is compared to a **chi-squared distribution** with degrees of freedom equal to the difference in the number of parameters between the two models.

### Bayes Factor (BF)
In the Bayesian framework, we approach model comparison slightly differently than the frequentist approach. Instead of using the likelihood ratio to compute a p-value for hypothesis testing, Bayesian methods typically use the **Bayes Factor** (BF) to compare the models.

The **Bayes Factor** compares the marginal likelihoods (or evidence) of the models:

$$
\text{BF} = \frac{P(\text{data} | \text{model 1})}{P(\text{data} | \text{model 2})}
$$

Where:
- $P(\text{data} | \text{model 1})$ and $P(\text{data} | \text{model 2})$ are the marginal likelihoods (or evidence) of the models. In the frequentist LRT, this would correspond to the likelihoods of the two models.

In this way, the Bayes Factor provides a measure of the relative evidence for one model over another, rather than using a p-value to decide whether to reject a model.



# Example

Let’s say the study examines the distribution of two single nucleotide polymorphisms (SNPs), SNP1 and SNP2, under the assumption that one of these SNPs is causal for the disease, in a cohort of 4000 individuals, split into cases and controls. 


|         | **Cases** | | **Controls** | |
|---------|-----------|--------------|--|--|
| | A1|A2|A1|A2|
| **SNP1**| 1200      | 800          | 1000      | 1000         |
| **SNP2**| 1191      | 809          | 1000      | 1000         |


<!-- Consider the following table showing the number of individuals who either carry at least one copy of the **risk allele (T)** or **no risk allele (A)**, and whether they are **cases of the disease** or **controls of the disease**:

|                     | **Cases** | **Controls** |
|---------------------|---------------|--------------|
| **Carry at least one T allele (AT or TT)**  | 70            | 30           |
| **Do not carry T allele (AA)**     | 45            | 55           |
 -->

In [164]:
rm(list=ls())

In [165]:
# Data for SNP1 and SNP2
# Format: [Cases with A1, Cases with A2, Controls with A1, Controls with A2]

# SNP1
snp1_data <- matrix(c(1200, 800, 1000, 1000), nrow = 2, byrow = TRUE)
colnames(snp1_data) <- c("A1", "A2")
rownames(snp1_data) <- c("Cases", "Controls")

# SNP2
snp2_data <- matrix(c(1191, 809, 1000, 1000), nrow = 2, byrow = TRUE)
colnames(snp2_data) <- c("A1", "A2")
rownames(snp2_data) <- c("Cases", "Controls")

# Display the tables
print("Contingency Table for SNP1:")
print(snp1_data)

print("Contingency Table for SNP2:")
print(snp2_data)

[1] "Contingency Table for SNP1:"
           A1   A2
Cases    1200  800
Controls 1000 1000
[1] "Contingency Table for SNP2:"
           A1   A2
Cases    1191  809
Controls 1000 1000


## LRT

The LRT compares the likelihood of the null model (no effect of allele on disease) and the alternative model (the allele affects the disease probability).

- Null Model: Assumes the probability of disease is the same for both groups.
- Alternative Model: Assumes the disease probabilities differ between those carrying the risk allele and those not carrying it.

In [166]:
# SNP1 contingency table

# Perform the Chi-Square test
chi_square_test <- chisq.test(snp1_data)

# Print the p-value
print(paste("Chi-Square p-value for SNP1: ", chi_square_test$p.value))


[1] "Chi-Square p-value for SNP1:  2.53831565566666e-10"


In [167]:
# SNP2 contingency table
# Perform the Chi-Square test
chi_square_test <- chisq.test(snp2_data)

# Print the p-value
print(paste("Chi-Square p-value for SNP2: ", chi_square_test$p.value))


[1] "Chi-Square p-value for SNP2:  1.58064319586183e-09"


In [168]:
# Returns likelihood ratio of H1 vs H0
library(MASS)
get_2x2_lr = function(tbl){
    tbl = as.table(matrix(tbl, 2, 2, dimnames = list(status = c('case', 'control'), genotype = c('minor_allele', 'major_allele'))))
    test = MASS::loglm(~status + genotype, data = tbl)
    return(exp(test$lrt / 2))
}

# Calculate likelihood ratios for SNP1 and SNP2
lr1 = get_2x2_lr(snp1_data)
lr2 = get_2x2_lr(snp2_data)

# Calculate the relative likelihood ratios
lr1_relative = lr1 / (lr1 + lr2)
lr2_relative = lr2 / (lr1 + lr2)

# Print results
cat("Likelihood Ratio for SNP1: ", lr1, "\n")
cat("Likelihood Ratio for SNP2: ", lr2, "\n")

cat("Relative Likelihood for SNP1: ", lr1_relative, "\n")
cat("Relative Likelihood for SNP2: ", lr2_relative, "\n")

Likelihood Ratio for SNP1:  615262594 
Likelihood Ratio for SNP2:  101725043 
Relative Likelihood for SNP1:  0.8581216 
Relative Likelihood for SNP2:  0.1418784 


This will give you the LRT statistic. A larger value of this statistic suggests that the alternative model (with an effect of the allele) is a better fit than the null model (no effect of the allele).

## BF

In [169]:
# returns Bayes factor of H_1 vs H_0
get_2x2_bf = function(tbl, prior_a = 1) {
    n = matrix(tbl, 2, 2, byrow=T)
    dimnames(n) = list(status=c('case','control'), 
                        genotype=c('minor','major'))
    res = BayesFactor::contingencyTableBF(n, sampleType="indepMulti",
          fixedMargin="rows", priorConcentration=prior_a)
    bf = exp(res@bayesFactor$bf)
    return(list(bf = bf, log10bf = log(bf)/log(10)))
}
res1 = get_2x2_bf(c(1200,800,1000,1000))
res2 = get_2x2_bf(c(1190,809,1000,1000))

In [170]:
# Calculate Bayes factors for SNP1 and SNP2
res1 = get_2x2_bf(snp1_data)
res2 = get_2x2_bf(snp2_data)

# Relative Bayes factors
relative_bf1 = res1$bf / (res1$bf + res2$bf)
relative_bf2 = res2$bf / (res1$bf + res2$bf)

# Print results
cat("Bayes Factor for SNP1: ", res1$bf, "\n")
cat("Bayes Factor for SNP2: ", res2$bf, "\n")

cat("Relative Bayes Factor for SNP1: ", relative_bf1, "\n")
cat("Relative Bayes Factor for SNP2: ", relative_bf2, "\n")

Bayes Factor for SNP1:  24241380 
Bayes Factor for SNP2:  4009967 
Relative Bayes Factor for SNP1:  0.858061 
Relative Bayes Factor for SNP2:  0.141939 


# supple: this is to calculate manually (without testing)

In [None]:
# Function to compute likelihood ratio (LR) for a given SNP contingency table
compute_LR <- function(snp_data) {
  # Total number of individuals
  total_cases <- sum(snp_data[1, ])
  total_controls <- sum(snp_data[2, ])
  total_population <- total_cases + total_controls
  
  # Null model (H0): Disease probability is the same across genotypes (A1, A2)
  p_null <- (total_cases) / total_population  # Total probability of disease
  
  # Likelihood under H0 (same disease probability for both groups)
  L_H0 <- dbinom(snp_data[1, 1], size = sum(snp_data[1,]), prob = p_null, log = TRUE) +
          dbinom(snp_data[1, 2], size = sum(snp_data[1,]), prob = p_null, log = TRUE) +
          dbinom(snp_data[2, 1], size = sum(snp_data[2,]), prob = p_null, log = TRUE) +
          dbinom(snp_data[2, 2], size = sum(snp_data[2,]), prob = p_null, log = TRUE)
  
  # Alternative model (H1): Disease probability differs for each genotype
  p_cases_A1 <- snp_data[1, 1] / sum(snp_data[1, ])  # Probability of disease in cases with A1
  p_cases_A2 <- snp_data[1, 2] / sum(snp_data[1, ])  # Probability of disease in cases with A2
  p_controls_A1 <- snp_data[2, 1] / sum(snp_data[2, ])  # Probability of disease in controls with A1
  p_controls_A2 <- snp_data[2, 2] / sum(snp_data[2, ])  # Probability of disease in controls with A2
  
  # Likelihood under H1 (disease probability differs across genotypes for cases and controls)
  L_H1 <- dbinom(snp_data[1, 1], size = sum(snp_data[1,]), prob = p_cases_A1, log = TRUE) +
          dbinom(snp_data[1, 2], size = sum(snp_data[1,]), prob = p_cases_A2, log = TRUE) +
          dbinom(snp_data[2, 1], size = sum(snp_data[2,]), prob = p_controls_A1, log = TRUE) +
          dbinom(snp_data[2, 2], size = sum(snp_data[2,]), prob = p_controls_A2, log = TRUE)
  
  # Likelihood Ratio (LR)
  LR <- exp(L_H1 - L_H0)
  return(LR)
}

In [None]:
# Compute LR for SNP1
LR_snp1 <- compute_LR(snp1_data)
cat("Likelihood Ratio for SNP1: ", LR_snp1, "\n")

# Compute LR for SNP2
LR_snp2 <- compute_LR(snp2_data)
cat("Likelihood Ratio for SNP2: ", LR_snp2, "\n")

In [None]:
LR_snp1/(LR_snp1+LR_snp2)

In [None]:
LR_snp2/(LR_snp1+LR_snp2)

In [None]:
# Load required library
library(BayesFactor)

# Define the contingency table
snp1_data <- matrix(c(1200, 800, 1000, 1000), nrow = 2, byrow = TRUE)
colnames(snp1_data) <- c("A1", "A2")
rownames(snp1_data) <- c("Cases", "Controls")

# Calculate Bayes factor for contingency table
# contingencyTableBF compares two models:
# H0: No association between variables
# H1: Association between variables
bayes_factor_result <- contingencyTableBF(snp1_data, 
                                          sampleType = "indepMulti", 
                                          fixedMargin = "rows")

In [None]:
print(bayes_factor_result)


In [None]:
bayes_factor_result2 <- contingencyTableBF(snp2_data, 
                                          sampleType = "indepMulti", 
                                          fixedMargin = "rows")
bayes_factor_result2

In [None]:
23999209/(23999209+3973431)

In [None]:
# Function to calculate Bayes Factor from the definition
calculate_bayes_factor_manual <- function(data) {
  # Extract values from the contingency table
  a <- data[1, 1]  # Cases with A1
  b <- data[1, 2]  # Cases with A2
  c <- data[2, 1]  # Controls with A1
  d <- data[2, 2]  # Controls with A2
  
  # Compute the Odds Ratio (OR)
  OR <- (a * d) / (b * c)
  
  # Likelihood under H0 (no effect, OR = 1)
  likelihood_H0 <- 1  # By definition
  
  # Likelihood under H1 (use the observed OR)
  likelihood_H1 <- exp(-0.5 * ((log(OR))^2))  # Approximation for likelihood based on the OR
  
  # Compute the Bayes Factor (BF)
  BF <- likelihood_H1 / likelihood_H0
  
  return(BF)
}

# Data for SNP1 and SNP2
snp1_data <- matrix(c(1200, 800, 1000, 1000), nrow = 2, byrow = TRUE)
colnames(snp1_data) <- c("A1", "A2")
rownames(snp1_data) <- c("Cases", "Controls")

snp2_data <- matrix(c(1191, 809, 1000, 1000), nrow = 2, byrow = TRUE)
colnames(snp2_data) <- c("A1", "A2")
rownames(snp2_data) <- c("Cases", "Controls")

# Calculate Bayes Factor for SNP1
bf_snp1 <- calculate_bayes_factor_manual(snp1_data)
print("Bayes Factor for SNP1:")
print(bf_snp1)

# Calculate Bayes Factor for SNP2
bf_snp2 <- calculate_bayes_factor_manual(snp2_data)
print("Bayes Factor for SNP2:")
print(bf_snp2)


In [None]:
bf_snp1/(bf_snp1+bf_snp2)

In [None]:
# Function to compute Bayes Factor for a given SNP contingency table
compute_BayesFactor <- function(snp_data) {
  # Total number of individuals
  total_cases <- sum(snp_data[1, ])
  total_controls <- sum(snp_data[2, ])
  total_population <- total_cases + total_controls
  
  # Null model (H0): Disease probability is the same across genotypes (A1, A2)
  p_null <- (total_cases) / total_population  # Total probability of disease
  
  # Likelihood under H0 (same disease probability for both groups)
  L_H0 <- dbinom(snp_data[1, 1], size = sum(snp_data[1,]), prob = p_null, log = TRUE) +
          dbinom(snp_data[1, 2], size = sum(snp_data[1,]), prob = p_null, log = TRUE) +
          dbinom(snp_data[2, 1], size = sum(snp_data[2,]), prob = p_null, log = TRUE) +
          dbinom(snp_data[2, 2], size = sum(snp_data[2,]), prob = p_null, log = TRUE)
  
  # Alternative model (H1): Disease probability differs for each genotype
  p_cases_A1 <- snp_data[1, 1] / sum(snp_data[1, ])  # Probability of disease in cases with A1
  p_cases_A2 <- snp_data[1, 2] / sum(snp_data[1, ])  # Probability of disease in cases with A2
  p_controls_A1 <- snp_data[2, 1] / sum(snp_data[2, ])  # Probability of disease in controls with A1
  p_controls_A2 <- snp_data[2, 2] / sum(snp_data[2, ])  # Probability of disease in controls with A2
  
  # Likelihood under H1 (disease probability differs across genotypes for cases and controls)
  L_H1 <- dbinom(snp_data[1, 1], size = sum(snp_data[1,]), prob = p_cases_A1, log = TRUE) +
          dbinom(snp_data[1, 2], size = sum(snp_data[1,]), prob = p_cases_A2, log = TRUE) +
          dbinom(snp_data[2, 1], size = sum(snp_data[2,]), prob = p_controls_A1, log = TRUE) +
          dbinom(snp_data[2, 2], size = sum(snp_data[2,]), prob = p_controls_A2, log = TRUE)
  
  # Bayes Factor: Ratio of likelihoods
  BF <- exp(L_H1 - L_H0)
  return(BF)
}


In [None]:

# SNP1 data
snp1_data <- matrix(c(1200, 800, 1000, 1000), nrow = 2, byrow = TRUE)
colnames(snp1_data) <- c("A1", "A2")
rownames(snp1_data) <- c("Cases", "Controls")

# SNP2 data
snp2_data <- matrix(c(1191, 809, 1000, 1000), nrow = 2, byrow = TRUE)
colnames(snp2_data) <- c("A1", "A2")
rownames(snp2_data) <- c("Cases", "Controls")

# Compute Bayes Factor for SNP1
BF_snp1 <- compute_BayesFactor(snp1_data)
cat("Bayes Factor for SNP1: ", BF_snp1, "\n")

# Compute Bayes Factor for SNP2
BF_snp2 <- compute_BayesFactor(snp2_data)
cat("Bayes Factor for SNP2: ", BF_snp2, "\n")

In [None]:
BF_snp1/(BF_snp1+BF_snp2)

In [None]:
BF_snp2/(BF_snp1+BF_snp2)