Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
R
 
 
 
 
 
 
man
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Build Status codecov

geneset

Gene sets and functions for working with them.

data sets

This package contains the following data sets:

  • smoking: Blood transcriptome gene signatures that associate with cigarette smoking from the Huan et al. 2016 meta-analysis http://dx.doi.org/10.1093/hmg/ddw288
  • stress: Gene expression levels in blood found as signatures of stress in five different studies. See the in-package description ?stress for details.

examples

installation

devtools::install_github("3inar/geneset")

load .gmt files

If you're working with gene sets from MSigDB, it's quite likely that you have files in the Gene Matrix Transposed format; the load_gmt() function will read a .gmt file into a gset object:

library(geneset)
geneset <- load_gmt("tests/testthat/testgmt.gmt")  # dummy .gmt file for testing
geneset
## $names
## [1] "set1"         "set2"         "name w space"
## 
## $descriptions
## [1] "description 1" "description 2" "description3" 
## 
## $genesets
## $genesets[[1]]
## [1] "a" "b" "c"
## 
## $genesets[[2]]
## [1] "d" "e" "f"
## 
## $genesets[[3]]
## [1] "a" "b" "c" "d" "e" "f" "g"
## 
## 
## attr(,"class")
## [1] "gset"

subset gset objects

You can subset gsets like you would a vector. There is also a lenght function for them that returns the number of sets in the gset:

geneset[2]
## $names
## [1] "set2"
## 
## $descriptions
## [1] "description 2"
## 
## $genesets
## $genesets[[1]]
## [1] "d" "e" "f"
## 
## 
## attr(,"class")
## [1] "gset"
geneset[c(T, F, T)]
## $names
## [1] "set1"         "name w space"
## 
## $descriptions
## [1] "description 1" "description3" 
## 
## $genesets
## $genesets[[1]]
## [1] "a" "b" "c"
## 
## $genesets[[2]]
## [1] "a" "b" "c" "d" "e" "f" "g"
## 
## 
## attr(,"class")
## [1] "gset"
length(geneset)
## [1] 3

remove genes that aren't in your data

Inevitably some gene sets will contain symbols that for one reason or another aren't present in the data set you're investigating. These can be removed by gsintersect():

mygenes <- c("a", "b", "d", "e", "f")
geneset <- gsintersect(geneset, mygenes); geneset
## $names
## [1] "set1"         "set2"         "name w space"
## 
## $descriptions
## [1] "description 1" "description 2" "description3" 
## 
## $genesets
## $genesets[[1]]
## [1] "a" "b"
## 
## $genesets[[2]]
## [1] "d" "e" "f"
## 
## $genesets[[3]]
## [1] "a" "b" "d" "e" "f"
## 
## 
## attr(,"class")
## [1] "gset"

filter out gene sets that are too small or large

Perhaps a two-gene set is too small to be taken seriously for whatever reason, gsfilter() will remove gene sets with cardinality outside of provided limits:

geneset <- gsfilter(geneset, min=3); geneset
## $names
## [1] "set2"         "name w space"
## 
## $descriptions
## [1] "description 2" "description3" 
## 
## $genesets
## $genesets[[1]]
## [1] "d" "e" "f"
## 
## $genesets[[2]]
## [1] "a" "b" "d" "e" "f"
## 
## 
## attr(,"class")
## [1] "gset"

About

Gene sets and functions for working with them.

Topics

Resources

License

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.