# Analysis 10: Summarize Strain HDRs

Filter to HDR regions present in GWA mapping panel strains

In [None]:
library(tidyverse)


── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

# Inputs

-   `raw_hvr_file`: path the bed file with HVRs for the species
-   `pheno.df`: The phenotype data frame. Loaded from `load(data/processed/pheno.df.rda)`

In [None]:
raw_hvr_file <- "data/raw/HDR/20240408_c_elegans_divergent_regions.bed"
pheno_path <- "data/processed/phenotypes/pheno.df.rda"


# Main

In [None]:

# Load the hdr data
raw_hvr_df <- data.table::fread(
  raw_hvr_file,
  col.names = c("chrom", "start", "stop", "strain")
)

# load phenotype data
load(pheno_path)

gwas_strains <- pheno.df %>%
  pull(strain) %>%
  unique()

hvr_df <- raw_hvr_df %>%
  dplyr::filter(strain %in% gwas_strains) %>%
  tidyr::unite("hd_region", chrom, start, stop, sep = "_", remove = FALSE) %>%
  dplyr::group_by(chrom, start, stop, hd_region) %>%
  dplyr::summarize(
    strains = paste(strain, collapse = ","),
    n_strains = n_distinct(strain)
  )


`summarise()` has grouped output by 'chrom', 'start', 'stop'. You can override
using the `.groups` argument.