# Fine Mapping LD variants in the 1K Genomes Project Phase 3 European Cohort using Human Genetic Atrial Fibrillation Lead Index Variants from GWAS Summary Statistics Reported in Neilsen et al., 2018

**Authors**: Andrew Blair

**Maintainer(s)**: Andrew Blair

**Email**: apblair.lab@gmail.com

## Purpose
The purpose of this study is to fine-map linkage disequilibrium (LD) variants in the European cohort of the 1000 Genomes Project Phase 3, using lead index variants for atrial fibrillation (AF) identified in a genome-wide association study (GWAS) by Nielsen et al., 2018. This analysis aims to identify credible sets of candidate causal variants to further understand the genetic architecture and biology of AF.

## Materials & Methods- Data Analysis

We obtained published AF GWAS summary statistics and index variants for 111 disease-associated loci (Nielsen et al., 2018). To construct credible sets of variants for each locus, we first extracted all variants in LD (r^2 > 0.1 using the EUR (European) subset of 1000 Genomes phase 3) in a large window (±1 Mb) around each index variant. We next calculated approximate Bayes factors (ABF) for each variant using effect size and SE estimates. We then calculated posterior probability of association (PPA) for each variant by dividing its ABF by the sum of ABF for all variants within the locus. For each locus, we then defined 99% credible sets by sorting variants by descending PPA and retaining variants that added up to a cumulative PPA of >0.99. This resulted in an output of 456 candidate causal variants.

## Models


| Models    | Description |
| -------- | ------- |
|  Approximate Bayes Factor | A statistical model that estimates the strength of association between genetic variants and the trait of interest, accounting for effect size and standard error.  |
|  Prior Probability Association | A model that calculates the probability that a variant is causally associated with the trait, based on the ABF and the sum of ABFs for all variants within a locus.    |

## Databases

| Genomic Database Reference    | Description |
| -------- | ------- |
| Linkage Disequilibrium Variants  | Variants that are inherited together more often than would be expected by chance, indicating a non-random association in a population. |
| Reference Genome | The standard DNA sequence to which other sequences are compared; in this case, the European subset of the 1000 Genomes Project Phase 3.
 |
| GWAS Lead Index Variant Set | A collection of top-associated genetic variants from GWAS that serve as proxies for mapping additional linked variants.
 |
| SNP Database | A repository of single nucleotide polymorphisms, including information about their genetic location, alleles, and associated traits.|

## Resources
[LDlink](https://ldlink.nih.gov/?tab=home): A suite of web-based applications designed to easily and efficiently interrogate linkage disequilibrium in population groups

[LDlinkR](https://cran.r-project.org/web/packages/LDlinkR/vignettes/LDlinkR.html): An R Package for Rapidly Calculating Linkage Disequilibrium Statistics in Diverse Populations

[LDexpress](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04531-8): An online tool for integrating population-specific linkage disequilibrium patterns with tissue-specific expression data

## References

1. Nielsen, J. B. et al. Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nat. Genet. 50, 1234–1239 (2018).

2. Lin, S.-H., Thakur, R. & Machiela, M. J. LDexpress: an online tool for integrating population-specific linkage disequilibrium patterns with tissue-specific expression data. BMC Bioinform. 22, 608 (2021).

3. 1000 Genomes Project Consortium, A. Auton, L. D. Brooks, R. M. Durbin, E. P. Garrison, H. M. Kang, J. O. Korbel, J. L. Marchini, S. M. Carthy, G. A. McVean, G. R. Abecasis, A global reference for human genetic variation. Nature 526, 68–74 (2015).

# Import Libraries and LDlinkR Token

In [4]:
# Import libraries
library(LDlinkR) 
library(biomaRt)
library(BSgenome.Hsapiens.UCSC.hg38)
library(SNPlocs.Hsapiens.dbSNP155.GRCh38)

In [5]:
ldlink_token <- Sys.getenv("LDLINK_TOKEN")

# Define Recipe Functions

## Convert pvalue to Standard Error

In [6]:
# Function to convert p-values to Z-scores and calculate SEs for a list of p-values and effect sizes
convert_pvalue_to_SE <- function(p_values, effect_sizes) {
  # Ensure p_values and effect_sizes are vectors
  if (!is.vector(p_values) || !is.vector(effect_sizes)) {
    stop("Both p_values and effect_sizes need to be vectors.")
  }
  
  # Check if the length of p_values and effect_sizes matches
  if (length(p_values) != length(effect_sizes)) {
    stop("The length of p_values and effect_sizes must be the same.")
  }
  
  # Convert p-values to Z-scores using vectorized operations
  z_scores <- qnorm(1 - p_values / 2)
  
  # Calculate Standard Errors (SEs) using the Z-scores and effect sizes (vectorized operation)
  ses <- abs(effect_sizes / z_scores)
  
  # Return the Standard Errors
  return(ses)
}

## Caclulate Approximate Bayes Factor (ABF)

The ABF quantifies the strength of evidence in favor of association between a variant and the trait, relative to the null hypothesis of no association.

In [7]:

calculate_abf_vectorized <- function(effect_sizes, standard_errors, sigma2_prior) {
  if (length(effect_sizes) != length(standard_errors)) {
    stop("effect_sizes and standard_errors must be vectors of the same length.")
  }
  
  abf_values <- numeric(length = length(effect_sizes))
  
  for (i in 1:length(effect_sizes)) {
    beta <- effect_sizes[i]
    se <- standard_errors[i]
    variance_ratio <- sqrt(se^2 + sigma2_prior) / se
    exponent_term <- exp(-beta^2 / (2 * (se^2 + sigma2_prior)))
    abf <- variance_ratio * exponent_term
    abf_values[i] <- abf
  }
  
  return(abf_values)
}

## Simulation Example for Approximate Bayes Factor 

In [8]:
# Assuming the dataframe credible_sets_df exists
# For demonstration, create a simplified version of the dataframe
credible_sets_df <- data.frame(
  Variant_ID = c('Variant_1', 'Variant_2', 'Variant_3'),
  Effect_Size = c(0.2, 0.15, 0.25),
  Standard_Error = c(0.05, 0.06, 0.04)
)

head(credible_sets_df)

Unnamed: 0_level_0,Variant_ID,Effect_Size,Standard_Error
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
1,Variant_1,0.2,0.05
2,Variant_2,0.15,0.06
3,Variant_3,0.25,0.04


In [9]:
# Instantiate ABF model parameters
sigma2_prior_example <- 0.04 # Assumed prior variance
effect_sizes <- credible_sets_df$Effect_Size
standard_errors <- credible_sets_df$Standard_Error

In [10]:
# Calculate the ABF for each variant in the dataframe
abf_values <- calculate_abf_vectorized(effect_sizes, standard_errors, sigma2_prior_example)

# Add the ABF values to the dataframe
credible_sets_df$ABF <- abf_values

print(credible_sets_df)

  Variant_ID Effect_Size Standard_Error      ABF
1  Variant_1        0.20           0.05 2.575435
2  Variant_2        0.15           0.06 2.688636
3  Variant_3        0.25           0.04 2.405713


# Instantiate Reference Genome HG38 and dbSNP155 

In [14]:
# Integrate reference genome and alternate reference with SNPs
reference_genome <- BSgenome.Hsapiens.UCSC.hg38
alt_genome <- injectSNPs(reference_genome, "SNPlocs.Hsapiens.dbSNP155.GRCh38")

In [12]:
# Human SNP locations and alleles extracted from dbSNP Build 155 and placed on the hg38 assembly
snps <- SNPlocs.Hsapiens.dbSNP155.GRCh38

# Import AFib GWAS Lead Index Variants

In [15]:
# Import GWAS lead index variants reported by Nielsen et al., 2018
gwas_AF <- read.table('41588_2018_171_MOESM3_ESM.tsv', sep='\t', header=TRUE)

In [16]:
head(gwas_AF)

Unnamed: 0_level_0,rsID,Position..hg19.,Risk.reference.allele,RAF,OR..95..CI.,P.value,P.value.heterogeneity,Novelty.of.locus,Annotation,No..additional.independent.risk.variants,No..prioritized.genes,Prioritized.genes
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<chr>,<chr>,<int>,<int>,<chr>
1,rs284277,chr1:10790797,C/A,0.383,1.04 [1.03-1.06],1.245e-09,0.7272,novel,intronic/CASZ1,0,1,CASZ1
2,rs7529220,chr1:22282619,C/T,0.847,1.06 [1.04-1.08],1.983e-10,0.517,novel,intergenic/.,0,1,HSPG2
3,rs2885697,chr1:41544279,G/T,0.352,1.04 [1.03-1.06],2.884e-10,0.9084,novel,intronic/SCMH1,0,1,SCMH1
4,rs11590635,chr1:49309764,A/G,0.024,1.16 [1.10-1.21],4.123e-09,0.04013,novel,intronic/AGBL4,0,1,AGBL4
5,rs146518726,chr1:51535039,A/G,0.033,1.17 [1.13-1.22],8.27e-15,0.09511,known,intergenic/.,0,1,MIR6500
6,rs1545300,chr1:112464004,C/T,0.691,1.06 [1.04-1.07],1.481e-14,0.166,known,intronic/KCND3,0,1,KCND3


In [17]:
# # Import GWAS lead index variants reported by Nielsen et al., 2018
# gwas_AF <- read.table('GWAS-SummaryStatistics-AtrialFibrillation.tsv', sep='\t', header=TRUE)

# rsID to Reference Genome HG38 Coordinates

In [19]:
# Extract SNP information for a set of rsIDs
AF_snps <- snpsById(snps, head(gwas_AF$rsID, n=10))
# Translate IUPAC ambiguity codes used to reprsent the alleles into nucleotides
map_alleles_nucleotides <- IUPAC_CODE_MAP[mcols(AF_snps)$alleles_as_ambig]

# Map GWAS rsID to hg37 genomic coordinates
ensembl <- useEnsembl("snp",dataset = "hsapiens_snp",GRCh = "38")
rsID_chr_mapping <- getBM(attributes=c("refsnp_id",
                                       "chr_name",
                                       "chrom_start",
                                       "chrom_end"),
                          filters ="snp_filter",
                          values =gwas_AF$rsID,
                          mart = ensembl, uniqueRows=TRUE)

In [20]:
# head(AF_snps)

# Calculate Variance of the AFib Lead Variant's Effect Size Estimate

In [21]:
parse_odds_ratio_ci <- function(odds_ratio_ci) {
  # Initialize vectors to store the extracted values
  odds_ratios <- numeric(length = length(odds_ratio_ci))
  lower_bounds <- numeric(length = length(odds_ratio_ci))
  upper_bounds <- numeric(length = length(odds_ratio_ci))
  
  # Iterate over the list and extract the odds ratio and CI bounds using regular expressions
  for (i in 1:length(odds_ratio_ci)) {
    # Extract the full match including odds ratio and CI bounds
    matches <- regmatches(odds_ratio_ci[i], gregexpr("\\d+\\.\\d+", odds_ratio_ci[i]))
    
    # Convert matches to numeric and assign to vectors
    nums <- as.numeric(matches[[1]])
    odds_ratios[i] <- nums[1]
    lower_bounds[i] <- nums[2]
    upper_bounds[i] <- nums[3]
  }
  
  # Create a dataframe to hold the results with specified column names
  results_df <- data.frame(`odds-ratio` = odds_ratios,
                           `lower-bound` = lower_bounds,
                           `upper-bound` = upper_bounds)
  
  return(results_df)
}

summary_stats_df <- parse_odds_ratio_ci(gwas_AF$OR..95..CI.)

In [22]:
z <- qnorm(0.975)

# Calculate SE and sigma^2 for each row
summary_stats_df$SE <- with(summary_stats_df, (log(upper.bound) - log(lower.bound)) / (2 * z))
summary_stats_df$sigma2 <- summary_stats_df$SE^2

summary_stats_df$rsID <- gwas_AF$rsID

In [23]:
head(summary_stats_df, n=10)

Unnamed: 0_level_0,odds.ratio,lower.bound,upper.bound,SE,sigma2,rsID
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
1,1.04,1.03,1.06,0.007324141,5.364304e-05,rs284277
2,1.06,1.04,1.08,0.009627812,9.269476e-05,rs7529220
3,1.04,1.03,1.06,0.007324141,5.364304e-05,rs2885697
4,1.16,1.1,1.21,0.024314268,0.0005911836,rs11590635
5,1.17,1.13,1.22,0.019549652,0.0003821889,rs146518726
6,1.06,1.04,1.07,0.007254709,5.26308e-05,rs1545300
7,1.05,1.04,1.06,0.004859323,2.361302e-05,rs4073778
8,1.12,1.09,1.16,0.015878432,0.0002521246,rs79187193
9,1.14,1.13,1.16,0.006684401,4.468122e-05,rs11264280
10,1.22,1.19,1.26,0.014581496,0.00021262,rs72700114


# LDexpress

In [25]:
# limit querying 11 snps at a time
# TODO: Iterate over variants
credible_variant_sets <- LDexpress(snps = head(gwas_AF$rsID, n=10), 
                                   pop = "EUR", 
                                   tissue = "ALL",
                                   r2d = "r2",
                                   r2d_threshold = 0.1, 
                                   p_threshold = 0.1,
                                   win_size = 1000000,
                                   genome_build = "grch38",
                                   token =ldlink_token,
                                   file = FALSE)


LDlink server is working...




# Calculate LD Credible Set's Standard Error Using P-Value and Effect Size

In [26]:
credible_variant_sets$Effect_Size <- as.numeric(credible_variant_sets$Effect_Size)
credible_variant_sets$P_value <- as.numeric(credible_variant_sets$P_value)

credible_variant_sets_SE <- convert_pvalue_to_SE(credible_variant_sets$P_value, 
                                                 credible_variant_sets$Effect_Size)

credible_variant_sets$Standard_Error <- credible_variant_sets_SE

In [27]:
subset_credible_variant_set <- credible_variant_sets[credible_variant_sets$Query == 'rs7529220',]

In [33]:
dim(subset_credible_variant_set)

In [29]:
head(subset_credible_variant_set)

Unnamed: 0_level_0,Query,RS_ID,Position_grch38,R2,D',Gene_Symbol,Gencode_ID,Tissue,Non_effect_Allele_Freq,Effect_Allele_Freq,Effect_Size,P_value,Standard_Error
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
6,rs7529220,rs113029877,chr1:21856457,0.128229538121059,0.59869829939114,ECE1,ENSG00000117298.14,Heart - Atrial Appendage,A=0.945,G=0.055,-0.326773,8.0924e-05,0.08290265
7,rs7529220,rs2290497,chr1:21872462,0.132183444920385,0.522604321553988,C1QC,ENSG00000159189.11,Esophagus - Mucosa,G=0.927,A=0.073,-0.179315,5.91353e-05,0.04464754
8,rs7529220,rs2290497,chr1:21872462,0.132183444920385,0.522604321553988,C1QA,ENSG00000173372.16,Esophagus - Mucosa,G=0.927,A=0.073,-0.1945,3.49603e-05,0.04699785
9,rs7529220,rs59420797,chr1:21873328,0.120191178039954,0.348131656337747,HSPG2,ENSG00000142798.17,Esophagus - Muscularis,G=0.138,A=0.862,-0.129096,1.82241e-05,0.03012315
10,rs7529220,rs59420797,chr1:21873328,0.120191178039954,0.348131656337747,HSPG2,ENSG00000142798.17,Skin - Not Sun Exposed (Suprapubic),G=0.138,A=0.862,-0.133889,0.000110671,0.03463315
11,rs7529220,rs59420797,chr1:21873328,0.120191178039954,0.348131656337747,HSPG2,ENSG00000142798.17,Nerve - Tibial,G=0.138,A=0.862,-0.212789,2.79692e-07,0.04142571


In [33]:
unique(subset_credible_variant_set$Gene_Symbol)
print(length(unique(subset_credible_variant_set$Gene_Symbol)))

[1] 19


In [32]:
unique(subset_credible_variant_set$Tissue)
print(length(unique(subset_credible_variant_set$Tissue)))

[1] 20


# Calculate the Approximate Bayes Factor for each LD-associated variant using Effect Size and SE with the prior set as the GWAS AFib sigma2

In [31]:
summary_stats_df[summary_stats_df$rsID == 'rs7529220',]

Unnamed: 0_level_0,odds.ratio,lower.bound,upper.bound,SE,sigma2,rsID
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
2,1.06,1.04,1.08,0.009627812,9.269476e-05,rs7529220


In [34]:
head(subset_credible_variant_set)

Unnamed: 0_level_0,Query,RS_ID,Position_grch38,R2,D',Gene_Symbol,Gencode_ID,Tissue,Non_effect_Allele_Freq,Effect_Allele_Freq,Effect_Size,P_value,Standard_Error
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
6,rs7529220,rs113029877,chr1:21856457,0.128229538121059,0.59869829939114,ECE1,ENSG00000117298.14,Heart - Atrial Appendage,A=0.945,G=0.055,-0.326773,8.0924e-05,0.08290265
7,rs7529220,rs2290497,chr1:21872462,0.132183444920385,0.522604321553988,C1QC,ENSG00000159189.11,Esophagus - Mucosa,G=0.927,A=0.073,-0.179315,5.91353e-05,0.04464754
8,rs7529220,rs2290497,chr1:21872462,0.132183444920385,0.522604321553988,C1QA,ENSG00000173372.16,Esophagus - Mucosa,G=0.927,A=0.073,-0.1945,3.49603e-05,0.04699785
9,rs7529220,rs59420797,chr1:21873328,0.120191178039954,0.348131656337747,HSPG2,ENSG00000142798.17,Esophagus - Muscularis,G=0.138,A=0.862,-0.129096,1.82241e-05,0.03012315
10,rs7529220,rs59420797,chr1:21873328,0.120191178039954,0.348131656337747,HSPG2,ENSG00000142798.17,Skin - Not Sun Exposed (Suprapubic),G=0.138,A=0.862,-0.133889,0.000110671,0.03463315
11,rs7529220,rs59420797,chr1:21873328,0.120191178039954,0.348131656337747,HSPG2,ENSG00000142798.17,Nerve - Tibial,G=0.138,A=0.862,-0.212789,2.79692e-07,0.04142571


In [35]:
# TODO: For each query rsID compute approximate bayes factor
abf_values <- calculate_abf_vectorized(subset_credible_variant_set$Effect_Size,
                         subset_credible_variant_set$Standard_Error, 
                         summary_stats_df[summary_stats_df$rsID == 'rs7529220',]$sigma2)

In [36]:
sum(abf_values[!is.nan(abf_values) & !is.infinite(abf_values)])

In [37]:
subset_credible_variant_set$ABF <- abf_values

In [38]:
dim(subset_credible_variant_set)

In [39]:
subset_credible_variant_set <- subset_credible_variant_set[!is.nan(subset_credible_variant_set$ABF)
                                                           & !is.infinite(subset_credible_variant_set$ABF), ]

In [40]:
# removed 61 variants that had low PPA estimates
dim(subset_credible_variant_set)

# Calculate Prior Probability Association Estimates (PPA) by Dividing the ABF by the Sum of ABF for all Variants within the Locus

In [41]:
ppa <- subset_credible_variant_set$ABF / sum(subset_credible_variant_set$ABF)

In [42]:
subset_credible_variant_set$PPA <- ppa

# Define 99% Credible Sets by Sorting Variants by Descending PPA and Retaining Variants that Add Up to a Cumulative PPA

In [43]:
# Identify the point at which the cumulative sum reaches or exceeds 0.99 and filter to keep only the rows that contribute to reaching this cumulative sum
subset_credible_variant_set_sorted <- subset_credible_variant_set[order(subset_credible_variant_set$PPA, decreasing = TRUE), ]
subset_credible_variant_set_sorted$cumsum_value <- cumsum(subset_credible_variant_set_sorted$PPA)

In [44]:
sum(subset_credible_variant_set_sorted$PPA)

In [45]:
subset_credible_variant_set_sorted_filtered <- subset_credible_variant_set_sorted[subset_credible_variant_set_sorted$cumsum_value <= 0.99,]

In [46]:
sum(subset_credible_variant_set_sorted_filtered$PPA)

In [47]:
dim(subset_credible_variant_set_sorted_filtered)

In [48]:
unique(subset_credible_variant_set_sorted_filtered$Gene_Symbol)
print(length(unique(subset_credible_variant_set_sorted_filtered$Gene_Symbol)))

[1] 19


In [49]:
unique(subset_credible_variant_set_sorted_filtered$Tissue)
print(length(unique(subset_credible_variant_set_sorted_filtered$Tissue)))

[1] 20


# Evaluate Credible Set ABF and PPA to Hocker et al., 2021 ABF and PPA 

In [52]:
# Import Hocker et al., 2021 fine mapped Atrial Fibrillation credible set 
credible_AF_set <- read.table('99credset.AtrialFibrillation.tsv', sep='\t', header=TRUE)

In [53]:
head(credible_AF_set)

Unnamed: 0_level_0,rsid,hg19_chr,hg19_start,hg19_end,hg19_index_var,hg38_chr,hg38_start,hg38_end,ABF,PPA
Unnamed: 0_level_1,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<int>,<int>,<dbl>,<dbl>
1,rs166913,chr1,10789633,10789634,1:10790797:C:A,chr1,10729576,10729577,13.03517,0.03117538
2,rs284279,chr1,10789696,10789697,1:10790797:C:A,chr1,10729639,10729640,14.45415,0.12884431
3,rs284278,chr1,10790535,10790536,1:10790797:C:A,chr1,10730478,10730479,15.14228,0.25640068
4,rs284277,chr1,10790796,10790797,1:10790797:C:A,chr1,10730739,10730740,15.31275,0.3040545
5,rs17035646,chr1,10796546,10796547,1:10790797:C:A,chr1,10736489,10736490,13.61232,0.05552197
6,rs880315,chr1,10796865,10796866,1:10790797:C:A,chr1,10736808,10736809,14.03069,0.08436426


In [54]:
sum(credible_AF_set$PPA)

In [55]:
sum(credible_AF_set$ABF)

In [56]:
credible_AF_set[credible_AF_set$rsid == 'rs7529220',]

Unnamed: 0_level_0,rsid,hg19_chr,hg19_start,hg19_end,hg19_index_var,hg38_chr,hg38_start,hg38_end,ABF,PPA
Unnamed: 0_level_1,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<int>,<int>,<dbl>,<dbl>
33,rs7529220,chr1,22282618,22282619,1:22282619:T:C,chr1,21956125,21956126,17.01188,0.57169


In [57]:
credible_AF_set_comparison <- credible_AF_set[credible_AF_set$rsid %in% subset_credible_variant_set_sorted_filtered$RS_ID,]

In [58]:
dim(credible_AF_set_comparison)

In [59]:
credible_AF_set_comparison

Unnamed: 0_level_0,rsid,hg19_chr,hg19_start,hg19_end,hg19_index_var,hg38_chr,hg38_start,hg38_end,ABF,PPA
Unnamed: 0_level_1,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<int>,<int>,<dbl>,<dbl>
13,rs12039740,chr1,22245849,22245850,1:22282619:T:C,chr1,21919356,21919357,10.11671,0.0005789334
14,rs10799719,chr1,22248880,22248881,1:22282619:T:C,chr1,21922387,21922388,10.22956,0.0006480942
17,rs78570036,chr1,22262503,22262504,1:22282619:T:C,chr1,21936010,21936011,10.20779,0.000634134
18,rs79899643,chr1,22265350,22265351,1:22282619:T:C,chr1,21938857,21938858,10.11671,0.0005789334
19,rs111887321,chr1,22271578,22271579,1:22282619:T:C,chr1,21945085,21945086,10.66977,0.0010065078
20,rs10917069,chr1,22278425,22278426,1:22282619:T:C,chr1,21951932,21951933,13.14769,0.0119940046
21,rs10917070,chr1,22278789,22278790,1:22282619:T:C,chr1,21952296,21952297,12.98131,0.0101556359
24,rs6426729,chr1,22280381,22280382,1:22282619:T:C,chr1,21953888,21953889,13.20334,0.0126804065
27,rs10917072,chr1,22281663,22281664,1:22282619:T:C,chr1,21955170,21955171,13.93553,0.0263705073
29,rs12140171,chr1,22281717,22281718,1:22282619:T:C,chr1,21955224,21955225,13.99252,0.0279171022


In [60]:
dim(subset_credible_variant_set_sorted_filtered)

In [61]:
rownames(subset_credible_variant_set_sorted_filtered) <- 1:nrow(subset_credible_variant_set_sorted_filtered)

In [62]:
head(subset_credible_variant_set_sorted_filtered)

Unnamed: 0_level_0,Query,RS_ID,Position_grch38,R2,D',Gene_Symbol,Gencode_ID,Tissue,Non_effect_Allele_Freq,Effect_Allele_Freq,Effect_Size,P_value,Standard_Error,ABF,PPA,cumsum_value
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,rs7529220,rs7539859,chr1:21969208,0.184614285168061,0.799468438538206,USP48,ENSG00000090686.15,Skin - Sun Exposed (Lower leg),C=0.359,T=0.641,-0.0873424,0.000193269,0.02343092,0.00283391,0.02027152,0.02027152
2,rs7529220,rs60471357,chr1:21973972,0.18367570484697,0.799157054125998,USP48,ENSG00000090686.15,Skin - Sun Exposed (Lower leg),T=0.36,A=0.64,-0.0873098,0.000190755,0.02340146,0.002810825,0.02010639,0.04037791
3,rs7529220,rs6671775,chr1:21975220,0.18367570484697,0.799157054125998,USP48,ENSG00000090686.15,Skin - Sun Exposed (Lower leg),C=0.36,T=0.64,-0.0873098,0.000190755,0.02340146,0.002810825,0.02010639,0.06048429
4,rs7529220,rs4233282,chr1:21973163,0.18367570484697,0.799157054125998,USP48,ENSG00000090686.15,Skin - Sun Exposed (Lower leg),T=0.36,G=0.64,-0.0877835,0.000176215,0.02340344,0.002636816,0.01886166,0.07934596
5,rs7529220,rs112193378,chr1:21985814,0.182303441859004,0.789307760141093,USP48,ENSG00000090686.15,Skin - Sun Exposed (Lower leg),C=0.356,G=0.644,-0.0871091,0.000155751,0.02303428,0.002462175,0.01761242,0.09695838
6,rs7529220,rs35389875,chr1:21986124,0.182303441859004,0.789307760141093,USP48,ENSG00000090686.15,Skin - Sun Exposed (Lower leg),T=0.356,G=0.644,-0.0870394,0.000150209,0.02296116,0.002406765,0.01721607,0.11417445


In [63]:
subset_credible_variant_set_sorted_filtered_comparison <- subset_credible_variant_set_sorted_filtered[subset_credible_variant_set_sorted_filtered$RS_ID %in% credible_AF_set_comparison$rsid, ]

In [64]:
subset_credible_variant_set_sorted_filtered_comparison

Unnamed: 0_level_0,Query,RS_ID,Position_grch38,R2,D',Gene_Symbol,Gencode_ID,Tissue,Non_effect_Allele_Freq,Effect_Allele_Freq,Effect_Size,P_value,Standard_Error,ABF,PPA,cumsum_value
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
8,rs7529220,rs10917075,chr1:21957178,0.285130573814177,1.0,WNT4,ENSG00000162552.14,Thyroid,C=0.362,T=0.638,0.127382,0.000159972,0.03374308,0.001430172,0.0102303011,0.1372674
16,rs7529220,rs10917076,chr1:21957242,0.276700509826136,1.0,USP48,ENSG00000090686.15,Skin - Sun Exposed (Lower leg),C=0.369,T=0.631,-0.09477,6.56862e-05,0.02374359,0.001153782,0.0082532333,0.2082024
31,rs7529220,rs10917076,chr1:21957242,0.276700509826136,1.0,WNT4,ENSG00000162552.14,Thyroid,C=0.369,T=0.631,0.132287,9.28774e-05,0.03384612,0.000887207,0.0063463664,0.3114929
40,rs7529220,rs10917076,chr1:21957242,0.276700509826136,1.0,RP11-26H16.4,ENSG00000283234.1,Lung,C=0.369,T=0.631,0.152937,9.46354e-05,0.03917494,0.0007793646,0.0055749484,0.3646633
55,rs7529220,rs10917075,chr1:21957178,0.285130573814177,1.0,HSPG2,ENSG00000142798.17,Adipose - Visceral (Omentum),C=0.362,T=0.638,-0.134667,6.69428e-05,0.03377739,0.0006678666,0.0047773817,0.4421456
76,rs7529220,rs10917076,chr1:21957242,0.276700509826136,1.0,LINC00339,ENSG00000218510.6,Esophagus - Muscularis,C=0.369,T=0.631,0.240939,9.14869e-05,0.06158773,0.0005769543,0.0041270678,0.5384604
79,rs7529220,rs7539092,chr1:21959568,0.845457863449521,0.990555586849171,USP48,ENSG00000090686.15,Esophagus - Mucosa,A=0.122,G=0.878,-0.159331,6.36617e-05,0.0398446,0.0005389958,0.0038555429,0.5503125
87,rs7529220,rs10799719,chr1:21922388,0.282689079872285,1.0,HSPG2,ENSG00000142798.17,Adipose - Visceral (Omentum),G=0.364,A=0.636,-0.137586,4.41637e-05,0.03368455,0.0004654655,0.0033295664,0.5782665
100,rs7529220,rs10917075,chr1:21957178,0.285130573814177,1.0,LINC00339,ENSG00000218510.6,Pancreas,C=0.362,T=0.638,0.241598,6.16943e-05,0.0603055,0.000404477,0.0028933039,0.6189963
147,rs7529220,rs10917069,chr1:21951933,0.893726637412644,1.0,CELA3B,ENSG00000219073.7,Stomach,C=0.126,T=0.874,-0.298132,4.44989e-05,0.0730217,0.0002792111,0.001997252,0.7367906


In [65]:
length(unique(subset_credible_variant_set_sorted_filtered_comparison$RS_ID))