Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new function and fix bugs #190

Merged
merged 15 commits into from
Jun 3, 2022
Merged

Conversation

MingLi-929
Copy link
Contributor

@MingLi-929 MingLi-929 commented May 14, 2022

update 2022.6.2

(1) combine gr and txdb parameter: users can pass self-made gr to txdb.
(2) combine plotPeakProf3() , plotPeakProf2() and plotPeakProf3() : Users can use plotPeakProf() to call plotPeakProf2() and plotPeakProf3().
The usage lay below.
------------------------------------------------------------------------------->
This pull request do following things:
(1) fix several bugs
(2) perfect the annotation of functions
(3) add test files for getTagMatrix() and plotTagMatrix()
(4) getBioRegion() can support UTR regions(3'UTR + 5'UTR), according to #189
(5) add new function makeBioRegionFromGranges() to support make windows from self-made granges object, according to #189.

  1. getBioRegion() can support UTR regions(3'UTR + 5'UTR)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(ChIPseeker)

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
threeUTR <- getBioRegion(TxDb = txdb,
                         by = "3UTR",
                         type = "body")

length(threeUTR)

fiveUTR <- getBioRegion(TxDb = txdb,
                        by = "5UTR",
                        type = "body")

length(fiveUTR)

UTR <- getBioRegion(TxDb = txdb,
                    by = "UTR",
                    type = "body")

> length(UTR) == length(threeUTR) + length(fiveUTR)
[1] TRUE

Now, getBioRegion() support UTR regions

  1. add new function makeBioRegionFromGranges()
    According to Feature requests about plotPeakProf2 #189, users will need to investigate insulator or enhancer regions, which can not be attained through txdb object. We add new function makeBioRegionFromGranges() to support users investigate insulator or enhancer regions by using self-made granges object.

These self-made granges object served the same role as txdb object. For example, users can prepare a granges object containing enhancer regions. And makeBioRegionFromGranges() can use these self-made granges object to produce the windows("start_site", "end_site" and "body")

# we consider transcript region as enhancer region
# and make self-made granges object
# they can be the same in the form of granges object
enhancer <- transcripts(txdb)[1:5000,]

## we test three kinds of region, start_site, end_site and body
enhancer_body <- makeBioRegionFromGranges(gr = enhancer,
                                          by = "enhancer",
                                          type = "body")

enhancer_start <- makeBioRegionFromGranges(gr = enhancer,
                                           by = "enhancer",
                                           type = "start_site",
                                           upstream = 1000,
                                           downstream = 1000)

enhancer_end <- makeBioRegionFromGranges(gr = enhancer,
                                         by = "enhancer",
                                         type = "end_site",
                                         upstream = 1000,
                                         downstream = 1000)

Also, we modify the getTagMatrix() to call makeBioRegionFromGranges().

# previous getTagMatrix() can only use txdb object to produce function
# now it support the self-made granges object through gr parameter
peak <- getSampleFiles()[[4]]
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

# make the window by makeBioRegionFromGranges()
enhancer <- transcripts(txdb)[1:5000,]
  
enhancer_body <- makeBioRegionFromGranges(gr = enhancer,
                                            by = "enhancer",
                                            type = "body")

# input the window made from self-made granges object
mt1 <- getTagMatrix(peak = peak,
                        windows = enhancer_body,
                        weightCol = "V5",
                        nbin = 800)

# without the window parameter
# input the self-made granges object directly
mt2 <- getTagMatrix(peak = peak,
                        gr = enhancer,
                        weightCol = "V5",
                        by = "enhancer",
                        type = "body",
                        upstream = rel(0.2),
                        downstream = rel(0.2),
                        nbin = 800)

# input self-made granges object to make the start_site windows
mt3 <- getTagMatrix(peak = peak,
                      weightCol = "V5",
                      gr = enhancer,
                      by = "gene",
                      type = "start_site",
                      upstream = 1000,
                      downstream = 1000)

plotPeakProf2() is also adjusted to support self-made granges object

plotPeakProf2(peak = peak,
                        by = "gene",
                        type = "body",
                        gr = enhancer)

plotPeakProf2(peak = peak,
                        by = "gene",
                        type = "body",
                        gr = enhancer,
                        upstream = rel(0.2),
                        downstream = rel(0.2),
                        nbin = 800)

@yuanlizhanshi
Copy link

Sorry to bother you.
Could it be possible to accept two gr object inputs in plotPeakProf2, I want to compare the distribution of H3K4me1 in the enhancer region and non-enhancer region.

@MingLi-929
Copy link
Contributor Author

According to issue#192, plotPeakProf3() is created to received two or more windows. It can not only received self-made granges object, but also supported to use windows from txdb at the same time. In a word, Users can use self-made granges object to compare with the windows from txdb.

PS: this function do not support to compare region in different types. It should all be
one of "start_site", "end_site" and "body".

the usage lay below:

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(ChIPseeker)
library(GenomicRanges)

peak <- getSampleFiles()[[4]]
peak_list <- getSampleFiles()[4:5]
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

## accept two granges object
## compare the peak in enhancer regions and non-enhancer

## self-made enhancer region in the form of granges object
enhancer <- transcripts(txdb)[1:5000,]

## self-made non-enhancer region in the form of granges object
non_enhancer <- unlist(fiveUTRsByTranscript(txdb))[1:5000]
  
gr <- list(enhancer,non_enhancer)

plotPeakProf3(peak = peak,
              conf = 0.95,
              gr = gr,
              by = c("enhancer","non-enhancer"),
              windows_name = c("enhancer","non-enhancer"),
              weightCol = "V5",
              type = "start_site",
              upstream = 1000,
              downstream = 1000)

1

## users can remove the ribbon plot by omit the conf parameter
plotPeakProf3(peak = peak,
              gr = gr,
              by = c("enhancer","non-enhancer"),
              windows_name = c("enhancer","non-enhancer"),
              weightCol = "V5",
              type = "start_site",
              upstream = 1000,
              downstream = 1000)

2

## you can also input a list of peaks
plotPeakProf3(peak = peak_list,
              gr = gr,
              conf = 0.95,
              by = c("enhancer","non-enhancer"),
              windows_name = c("enhancer","non-enhancer"),
              weightCol = "V5",
              type = "start_site",
              upstream = 1000,
              downstream = 1000)

3

## if users want to compare the regions obtained 
## from txbd and self-made granges object
## uses can input gr and txdb in the same time
plotPeakProf3(peak = peak,
              gr = gr,
              conf = 0.95,
              by = c("enhancer","gene"),
              windows_name = c("enhancer","gene"),
              weightCol = "V5",
              type = "start_site",
              upstream = 1000,
              downstream = 1000,
              TxDb = txdb)

4

## check the body region
plotPeakProf3(peak = peak,
              gr = gr,
              conf = 0.95,
              by = c("enhancer","gene"),
              windows_name = c("enhancer","gene"),
              weightCol = "V5",
              type = "body",
              upstream = 1000,
              downstream = 1000,
              TxDb = txdb,
              nbin = 800)

5

plotPeakProf3(peak = peak,
              gr = gr,
              conf = 0.95,
              by = c("enhancer","gene"),
              windows_name = c("enhancer","gene"),
              weightCol = "V5",
              type = "body",
              TxDb = txdb,
              nbin = 800)

6

plotPeakProf3(peak = peak,
              gr = gr,
              conf = 0.95,
              by = c("enhancer","gene"),
              windows_name = c("enhancer","gene"),
              weightCol = "V5",
              type = "body",
              upstream = rel(0.2),
              downstream = rel(0.2),
              TxDb = txdb,
              nbin = 800)

7

plotPeakProf3(peak = peak_list,
              gr = gr,
              conf = 0.95,
              by = c("enhancer","gene"),
              windows_name = c("enhancer","gene"),
              weightCol = "V5",
              type = "body",
              upstream = rel(0.2),
              downstream = rel(0.2),
              TxDb = txdb,
              nbin = 800)

8

@yuanlizhanshi
Copy link

Thank you very much.

Wish you all the best in your scientific research.

@GuangchuangYu
Copy link
Member

the features are good, but I don't like fun1, fun2, fun3, etc.

We need to design a better API.

For the gr, passing the gr to the txdb parameter is OK, as we did this in annotatePeak.

@MingLi-929
Copy link
Contributor Author

update usage lay below:

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(ChIPseeker)
library(GenomicRanges)

peak <- getSampleFiles()[[4]]
peak_list <- getSampleFiles()[4:5]
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

## accept two granges object
## compare the peak in enhancer regions and non-enhancer

## self-made enhancer region in the form of granges object
enhancer <- transcripts(txdb)[1:5000,]

## self-made non-enhancer region in the form of granges object
non_enhancer <- unlist(fiveUTRsByTranscript(txdb))[1:5000]

plotPeakProf(peak = peak,
             conf = 0.95,
             by = c("enhancer","non-enhancer"),
             windows_name = c("enhancer","non-enhancer"),
             weightCol = "V5",
             type = "start_site",
             upstream = 1000,
             downstream = 1000,
             TxDb = list(enhancer,non_enhancer))


## users can remove the ribbon plot by omit the conf parameter
plotPeakProf(peak = peak,
             TxDb = list(enhancer,non_enhancer),
             by = c("enhancer","non-enhancer"),
             windows_name = c("enhancer","non-enhancer"),
             weightCol = "V5",
             type = "start_site",
             upstream = 1000,
             downstream = 1000)


## you can also input a list of peaks
plotPeakProf(peak = peak_list,
             TxDb = list(enhancer,non_enhancer),
             conf = 0.95,
             by = c("enhancer","non-enhancer"),
             windows_name = c("enhancer","non-enhancer"),
             weightCol = "V5",
             type = "start_site",
             upstream = 1000,
             downstream = 1000)


## if users want to compare the regions obtained 
## from txbd and self-made granges object
## uses can input gr and txdb in the same time
plotPeakProf(peak = peak,
             TxDb = list(enhancer,txdb),
             conf = 0.95,
             by = c("enhancer","gene"),
             windows_name = c("enhancer","gene"),
             weightCol = "V5",
             type = "start_site",
             upstream = 1000,
             downstream = 1000)


## check the body region
plotPeakProf(peak = peak,
             TxDb = list(enhancer,txdb),
             conf = 0.95,
             by = c("enhancer","gene"),
             windows_name = c("enhancer","gene"),
             weightCol = "V5",
             type = "body",
             upstream = 1000,
             downstream = 1000,
             nbin = 800)

plotPeakProf(peak = peak,
             TxDb = list(enhancer,txdb),
             conf = 0.95,
             by = c("enhancer","gene"),
             windows_name = c("enhancer","gene"),
             weightCol = "V5",
             type = "body",
             nbin = 800)


plotPeakProf(peak = peak,
             TxDb = list(enhancer,txdb),
             conf = 0.95,
             by = c("enhancer","gene"),
             windows_name = c("enhancer","gene"),
             weightCol = "V5",
             type = "body",
             upstream = rel(0.2),
             downstream = rel(0.2),
             nbin = 800)


plotPeakProf(peak = peak_list,
             TxDb = list(enhancer,txdb),
             conf = 0.95,
             by = c("enhancer","gene"),
             windows_name = c("enhancer","gene"),
             weightCol = "V5",
             type = "body",
             upstream = rel(0.2),
             downstream = rel(0.2),
             nbin = 800)


plotPeakProf(peak = peak,
             upstream = 1000,
             downstream = 1000,
             by = "gene",
             type = "start_site",
             TxDb = txdb,
             nbin = 800)

# test a list of peak files
plotPeakProf(peak = peak_list,
             upstream = 1000,
             downstream = 1000,
             by = "gene",
             type = "start_site",
             TxDb = txdb,
             nbin = 800)
# test body region
# without extension
plotPeakProf(peak = peak_list,
             by = "gene",
             type = "body",
             TxDb = txdb,
             nbin = 800)

# extend with rel object
plotPeakProf(peak = peak_list,
             by = "gene",
             type = "body",
             TxDb = txdb,
             upstream = rel(0.2),
             downstream = rel(0.2),
             nbin = 800)

# extend with actual number
plotPeakProf(peak = peak_list,
             by = "gene",
             type = "body",
             TxDb = txdb,
             upstream = 1000,
             downstream = 1000,
             nbin = 800)

  

plotPeakProf(peak = peak,
             upstream = 1000,
             downstream = 1000,
             by = "gene",
             type = "start_site",
             TxDb = enhancer,
             nbin = 800)

# test a list of peak files
plotPeakProf(peak = peak_list,
             upstream = 1000,
             downstream = 1000,
             by = "gene",
             type = "start_site",
             TxDb = enhancer,
             nbin = 800)
  
# test body region
# without extension
plotPeakProf(peak = peak_list,
             by = "gene",
             type = "body",
             TxDb = enhancer,
             nbin = 800)
  
# extend with rel object
plotPeakProf(peak = peak_list,
             by = "gene",
             type = "body",
             TxDb = enhancer,
             upstream = rel(0.2),
             downstream = rel(0.2),
             nbin = 800)
  
# extend with actual number
plotPeakProf(peak = peak_list,
             by = "gene",
             type = "body",
             TxDb = enhancer,
             upstream = 1000,
             downstream = 1000,
             nbin = 800)

@GuangchuangYu GuangchuangYu merged commit 0e7ba84 into YuLab-SMU:master Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants