# Analysis 08: GO Enrichment for Intervals

In [None]:
library(clusterProfiler)




Registered S3 methods overwritten by 'treeio':
  method              from    
  MRCA.phylo          tidytree
  MRCA.treedata       tidytree
  Nnode.treedata      tidytree
  Ntip.treedata       tidytree
  ancestor.phylo      tidytree
  ancestor.treedata   tidytree
  child.phylo         tidytree
  child.treedata      tidytree
  full_join.phylo     tidytree
  full_join.treedata  tidytree
  groupClade.phylo    tidytree
  groupClade.treedata tidytree
  groupOTU.phylo      tidytree
  groupOTU.treedata   tidytree
  is.rooted.treedata  tidytree
  nodeid.phylo        tidytree
  nodeid.treedata     tidytree
  nodelab.phylo       tidytree
  nodelab.treedata    tidytree
  offspring.phylo     tidytree
  offspring.treedata  tidytree
  parent.phylo        tidytree
  parent.treedata     tidytree
  root.treedata       tidytree
  rootnode.phylo      tidytree
  sibling.phylo       tidytree

clusterProfiler v4.6.2  For help: https://yulab-smu.top/biomedical-knowledge-mining-book/

If you use clusterProfiler in published research, please cite:
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141


Attaching package: 'clusterProfiler'

The following object is masked from 'package:stats':

    filter

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()   masks clusterProfiler::filter(), stats::filter()
✖ dplyr::lag()      masks stats::lag()
✖ purrr::simplify() masks clusterProfiler::simplify()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Loading required package: AnnotationDbi

Loading required package: stats4

Loading required package: BiocGenerics


Attaching package: 'BiocGenerics'


The following objects are masked from 'package:lubridate':

    intersect, setdiff, union


The following objects are masked from 'package:dplyr':

    combine, intersect, setdiff, union


The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs


The following objects are masked from 'package:base':

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is

In [None]:
ns_dir <- "data/processed/20231116_Analysis_NemaScan"
tox_file <- "data/processed/tox_data/tox_metadata.csv"

#### outputs ####
out_dir <- "data/processed/interval_genes/enrichments"

## create the output directory if it doesn't exist
if (!dir.exists(out_dir)) {
  dir.create(out_dir, recursive = TRUE)
}


In [None]:
#### Main ####
# load the toxicant metadata
con_metadata <- data.table::fread(tox_file)

# pull trait and nice drug label
con_key <- con_metadata %>%
  dplyr::select(
    trait,
    nice_drug_label2,
    big_class,
    moa_class
  )

# load the interval data
inbred <- data.table::fread(
  glue::glue("{ns_dir}/INBRED/Mapping/Processed/QTL_peaks_inbred.tsv"),
  data.table = F
) %>%
  # fix drug names
  dplyr::mutate(
    drug = stringr::str_replace(
      trait,
      pattern = "^length_",
      replacement = ""
    )
  ) %>%
  # remove the CV_ length_ traits
  dplyr::filter(
    !stringr::str_detect(drug, "CV_")
  ) %>%
  dplyr::select(
    trait,
    drug,
    CHROM,
    marker,
    startPOS,
    endPOS
  ) %>%
  # add the nice drug label
  dplyr::left_join(con_key, by = c("trait"))

unique_traits <- unique(inbred$trait)

# Initialize list to store results
all_genes_dfs <- list()

# Process each trait
for (trait_id in unique_traits) {
  # Get the nice drug label for this trait
  nice_label <- inbred %>%
    dplyr::filter(trait == trait_id) %>%
    dplyr::pull(nice_drug_label2) %>%
    unique()

  # Get genes for each interval
  genes_list <- get_genes_each_interval(
    qtl = inbred,
    trait_id = trait_id,
    gff_df = gff,
    cEGO = cEGO
  )

  # Convert to dataframe using nice drug label
  genes_df <- genes_list_to_df(genes_list, nice_label)

  # Store in results list
  all_genes_dfs[[trait_id]] <- genes_df
}


'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 9.33% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 2.07% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 16.34% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 11.49% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 10.74% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 17.57% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 8.37% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 8.75% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 24.08% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 7.55% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 8.54% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 7.46% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 3.28% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 5.88% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 18.29% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 6.58% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 1.73% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 11.22% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 4.92% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 4.78% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 5.23% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 0.63% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 9.66% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 15.79% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 18.02% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 11.58% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 4.96% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 18.28% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 15.56% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 6.26% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 6.13% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 23.71% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 16.37% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 8.3% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 3.47% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 7.65% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 16.67% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 11.46% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 7.26% of input gene IDs are fail to map...

'select()' returned 1:1 mapping between keys and columns

"ENSEMBL", : 20.15% of input gene IDs are fail to map...

not a multiple of vector length (arg 3)