Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bioconductor/sweave2rmd: Conversion of howtogenefilter.Rnw to Rmd #11

Merged
merged 31 commits into from
Mar 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
cf4895e
Update DESCRIPTION with BiocStyle, knitr, VignetteBuilder
Khadeeejah Mar 9, 2023
cfcb04c
Convert file to Rmd
Khadeeejah Mar 10, 2023
60aef6c
Remove howtogenefilter.Rnw
Khadeeejah Mar 10, 2023
2a41702
Update DESCRIPTION
Khadeeejah Mar 13, 2023
9bf84cf
fixed indentation andspelling error
Khadeeejah Mar 13, 2023
9f00995
fixed-bugs
Khadeeejah Mar 13, 2023
c6d9cb5
fixed-bugs
Khadeeejah Mar 13, 2023
8145ce2
fixed-bugs
Khadeeejah Mar 13, 2023
2589ac9
Merge branch 'file-Rmd' of github.com:Khadeeejah/genefilter into file…
Khadeeejah Mar 13, 2023
edea81e
Update howtogenefilter.Rmd
Khadeeejah Mar 13, 2023
ed43e00
fixed-bugs
Khadeeejah Mar 13, 2023
fceb346
Merge branch 'file-Rmd' of github.com:Khadeeejah/genefilter into file…
Khadeeejah Mar 13, 2023
f8568a8
resolved-all-fix
Khadeeejah Mar 14, 2023
9d93609
resolved-all-fix
Khadeeejah Mar 14, 2023
1d0af95
resolved-all-fix
Khadeeejah Mar 14, 2023
c03f572
fixed-changes
Khadeeejah Mar 19, 2023
11e4a10
merged-conflicts
Khadeeejah Mar 19, 2023
bdf4ef1
merged-conflicts
Khadeeejah Mar 20, 2023
83f6d74
fixed-changes
Khadeeejah Mar 21, 2023
970e8fd
fixed-changes
Khadeeejah Mar 22, 2023
e640330
fixed-changes
Khadeeejah Mar 22, 2023
3c3e3bc
fixed-changes
Khadeeejah Mar 22, 2023
233a73e
fixed-changes
Khadeeejah Mar 22, 2023
adf2ac4
Convert file to Rmd
Khadeeejah Mar 23, 2023
6f0d36d
Remove file.Rnw
Khadeeejah Mar 23, 2023
bfe8e97
Convert file to Rmd
Khadeeejah Mar 23, 2023
3bacc40
Remove file.Rnw
Khadeeejah Mar 23, 2023
7e768f9
Convert file to Rmd
Khadeeejah Mar 23, 2023
6debbd7
Convert file to Rmd
Khadeeejah Mar 25, 2023
383f6ca
Convert file to Rmd
Khadeeejah Mar 25, 2023
6f325ac
Convert file to Rmd
Khadeeejah Mar 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions DESCRIPTION
jwokaty marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Authors@R: c(
person("Florian", "Hahne", role = "aut"),
person("Emmanuel", "Taiwo", role = "ctb",
comment = "'howtogenefinder' vignette translation from Sweave to RMarkdown / HTML."),
person("Khadijah", "Amusat", role = "ctb",
comment = "Converted genefilter vignette from Sweave to RMarkdown / HTML."),
person("Bioconductor Package Maintainer", role = "cre",
email = "maintainer@bioconductor.org"))
Description: Some basic functions for filtering genes.
Expand Down
162 changes: 162 additions & 0 deletions vignettes/howtogenefilter.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
---
title: "Using the genefilter function to filter genes from a microarray dataset"
author:
- name: "Khadijah Amusat"
affiliation: "Vignette translation from Sweave to R Markdown / HTML"
date: "`r format(Sys.time(), '%B %d , %Y')`"
output:
BiocStyle::html_document:
number_sections: true
toc: true
toc_depth: 4
package: genefilter
vignette: >
%\VignetteIndexEntry{Using the genefilter function to filter genes from a microarray}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
---

Khadeeejah marked this conversation as resolved.
Show resolved Hide resolved
# Introduction

The `r Biocpkg("genefilter")` package can be used to filter (select) genes from
a microarray dataset according to a variety of different filtering mechanisms.
Here, we will consider the example dataset in the `sample.ExpressionSet` example
from the `r Biocpkg("Biobase")` package. This experiment has 26 samples, and
there are 500 genes and 3 covariates. The covariates are named `sex`, `type` and
`score`. The first two have two levels and the last one is continuous.

```{r closeg, message = FALSE}
library("Biobase")
library("genefilter")
data(sample.ExpressionSet)
varLabels(sample.ExpressionSet)
table(sample.ExpressionSet$sex)
table(sample.ExpressionSet$type)
```

One dichotomy that can be of interest for subsequent analyses is whether the
filter is *specific* or *non-specific*. Here, specific means that we are
filtering with reference to sample metadata, for example, `type`. For example,
if we want to select genes that are differentially expressed in the two groups
defined by `type`, that is a specific filter. If on the other hand we want to
select genes that are expressed in more than 5 samples, that is an example of a
non--specific filter.

jwokaty marked this conversation as resolved.
Show resolved Hide resolved
First, let us see how to perform a non--specific filter. Suppose we want to
select genes that have an expression measure above 200 in at least 5 samples. To
do that we use the function `kOverA`.

There are three steps that must be performed.

1. Create function(s) implementing the filtering criteria.

2. Assemble it (them) into a (combined) filtering function.

3. Apply the filtering function to the expression matrix.

```{r message=FALSE}
f1 <- kOverA(5, 200)
ffun <- filterfun(f1)
wh1 <- genefilter(exprs(sample.ExpressionSet), ffun)
sum(wh1)
```

Here `f1` is a function that implies our "expression measure above 200 in at
least 5 samples" criterion, the function `ffun` is the filtering function (which
in this case consists of only one criterion), and we apply it using `r
Biocpkg("genefilter")`. There were `r sum(wh1)` genes that satisfied the
criterion and passed the filter. As an example for a specific filter, let us
select genes that are differentially expressed in the groups defined by `type`.

```{r}
f2 <- ttest(sample.ExpressionSet$type, p=0.1)
wh2 <- genefilter(exprs(sample.ExpressionSet), filterfun(f2))
sum(wh2)
```

Here, `ttest` is a function from the `r Biocpkg("genefilter")` package which
provides a suitable wrapper around `t.test` from package `r Rpackage("stats")`.
Now we see that there are `r sum(wh2)` genes that satisfy the selection
criterion. Suppose that we want to combine the two filters. We want those genes
for which at least 5 have an expression measure over 200 *and* which also are
differentially expressed between the groups defined by `type`

```{r gene-indexing}
ffun_combined <- filterfun(f1, f2)
wh3 <- genefilter(exprs(sample.ExpressionSet), ffun_combined)
sum(wh3)
```

Now we see that there are only `r sum(wh3)` genes that satisfy both conditions.

## Selecting genes that appear useful for prediction

The function `knnCV` defined below performs $k$--nearest neighbour
classification using leave--one--out cross--validation. At the same time it
aggregates the genes that were selected. The function returns the predicted
classifications as its returned value. However, there is an additional side
effect. The number of times that each gene was used (provided it was at least
one) are recorded and stored in the environment of the aggregator `Agg`. These
can subsequently be retrieved and used for other purposes.

```{r knnCV}
knnCV <- function(x, selectfun, cov, Agg, pselect = 0.01, scale=FALSE) {
nc <- ncol(x)
outvals <- rep(NA, nc)
for(i in seq_len(nc)) {
v1 <- x[,i]
expr <- x[,-i]
glist <- selectfun(expr, cov[-i], p=pselect)
expr <- expr[glist,]
if( scale ) {
expr <- scale(expr)
v1 <- as.vector(scale(v1[glist]))
}
else
v1 <- v1[glist]
out <- paste("iter ",i, " num genes= ", sum(glist), sep="")
print(out)
Aggregate(row.names(expr), Agg)
if( length(v1) == 1)
outvals[i] <- knn(expr, v1, cov[-i], k=5)
else
outvals[i] <- knn(t(expr), v1, cov[-i], k=5)
}
return(outvals)
}
```

```{r aggregate1}
gfun <- function(expr, cov, p=0.05) {
f2 <- ttest(cov, p=p)
ffun <- filterfun(f2)
which <- genefilter(expr, ffun)
}
```

Next we show how to use this function on the dataset `geneData`

```{r aggregate2, results="hide"}
library("class")
##scale the genes
##genescale is a slightly more flexible "scale"
##work on a subset -- for speed only
geneData <- genescale(exprs(sample.ExpressionSet)[1:75,], 1)
Agg <- new("aggregator")
testcase <- knnCV(geneData, gfun, sample.ExpressionSet$type,
Agg,pselect=0.05)
```

```{r aggregate3}
sort(sapply(aggenv(Agg), c), decreasing=TRUE)
```

The environment `Agg` contains, for each gene, the number of times it was selected in the cross-validation.

# Session Information

The version number of R and packages loaded for generating the vignette were:

```{r echo=FALSE}
sessionInfo()
```
207 changes: 0 additions & 207 deletions vignettes/howtogenefilter.Rnw

This file was deleted.