Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looking for Precompiled CMAP and L1000 data for v.2.2.0 #73

Open
Swilson6 opened this issue Nov 17, 2020 · 13 comments
Open

Looking for Precompiled CMAP and L1000 data for v.2.2.0 #73

Swilson6 opened this issue Nov 17, 2020 · 13 comments

Comments

@Swilson6
Copy link

Started using this tool for specifically for the Perturbation analysis features. The tutorial offers CMAPsmall but when I use the availablePSets() the tools arent offered.

Screen Shot 2020-11-17 at 10 58 31 AM

As an alternate solution I've tried downloading the CMAP and L1000 .Rdata files from your website (https://www.pmgenomics.ca/bhklab/datasets). However when I try to run the drugPerturbationSig command I get the error:

Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘metadata’ for signature ‘"ExpressionSet"’

From the CoreGx vignette (https://bioconductor.org/packages/release/bioc/vignettes/CoreGx/inst/doc/coreGx.html) I understand that the method is no longer compatible between the CMAP.Rdata I downloaded so I am looking for help either:

  1. CMAP and L1000 .Rdata files that work with the PharmacoGx v2.2.0 version
  2. Help understanding how to change the method from the datasets downloaded from the website (https://www.pmgenomics.ca/bhklab/datasets) to work with the current release.
@ChristopherEeles
Copy link
Contributor

Hey @Swilson6,

In a recent release we added a new parameter to availablePSets(), canonical. The parameter defaults to TRUE, and when TRUE only shows the most correct and up to date PSets. Since Our version of CMAP and L1000 need to be updated, they aren't included in this list.

To see ALL available PSets, please use:
availablePSets(canonical=FALSE)

Also, all of our PSets are generated using ORCESTRA, so you can alternatively download PSets there as well as request custom PSets be created.

Best,
Chris

@Swilson6
Copy link
Author

Hello Chris,

Thanks for the reply! The availablePSets(canonical=FALSE) command was able to list the older Psets including the CMAP_2016, however it doesn't list the L1000 dataset. The L1000 wasn't listed on the Orchestra website as well. Based on my searches the last couple of days I believe that the L1000 was previously available for analysis? I understand that they need to be updated, but would it be possible to make the archived L1000 available again?

Best,
Swilson6

@ChristopherEeles
Copy link
Contributor

Hi @Swilson6,

I will speak with my colleagues about getting you a copy of L1000, as well as figure out why it is no longer available.

Be in touch shortly.

Best,
Chris

@Swilson6
Copy link
Author

Hello @ChristopherEeles ,

I was able to download CMAP_2016 using the command:

CMAP.2016<-downloadPSet("CMAP_2016")

However I am unable to download the Drug signatures with the command:

Cmap.pertub <- downloadPertSig("CMAP_2016")

Its mentioning that its "Unknown Dataset. Please use the availablePSet function for the table of available PharamcoSets."

is there an additional parameter that I am missing?

Best,
Swilson6

@ChristopherEeles
Copy link
Contributor

Hello @Swilson6,

Thanks for catching that! Looks like we forgot to update downloadPertSig to reflect the changes in availablePSets. I have pushed a fix to master. I will also be updating our Bioconductor release, but that will take a few days to get inlcuded.

Alternatively, here is the updated code if you don't want to do a reinstall.

#' Download Drug Perturbation Signatures
#' 
#' This function allows you to download an array of drug perturbation
#' signatures, as would be computed by the \code{drugPerturbationSig} function,
#' for the available perturbation \code{PharmacoSets}. This function allows the
#' user to skip these very lengthy calculation steps for the datasets available,
#' and start their analysis from the already computed signatures
#' 
#' @examples
#' if (interactive()){
#' downloadPertSig("CMAP")
#' }
#'  
#' @param name A \code{character} string, the name of the PharmacoSet for which
#'   to download signatures. The name should match the names returned in the
#'   `Dataset Name` column of `availablePSets(canonical=FALSE)`.
#' @param saveDir A \code{character} string with the folder path where the
#'   PharmacoSet should be saved. Defaults to \code{"./PSets/Sigs/"}. Will
#'   create directory if it does not exist.
#' @param myfn \code{character} string, the file name to save the dataset under
#' @param verbose \code{bool} Should status messages be printed during download.
#'   Defaults to TRUE.
#'
#' @return An array type object contaning the signatures
#'
#' @export
#' @import downloader
downloadPertSig <- function(name, saveDir=file.path(".", "PSets", "Sigs"),
    myfn=NULL, verbose=TRUE) {


    pSetTable <- availablePSets(canonical=FALSE)

    whichx <- match(name, pSetTable[,1])
    if (is.na(whichx)){
        stop('Unknown Dataset. Please use the availablePSet function for the table of available PharamcoSets.')
    }
    if (!pSetTable[whichx,"type"] %in% c("perturbation", "both")){
        stop('Signatures are available only for perturbation type datasets')
    }

    if(!file.exists(saveDir)) {
        dir.create(saveDir, recursive=TRUE)
    }

    myfn <- paste(name, "_signatures.RData", sep="")

    downloader::download(file.path(
        "https://www.pmgenomics.ca/bhklab/sites/default/files/downloads/", myfn),
        destfile=file.path(saveDir, myfn), quiet=!verbose, mode='wb')
    sig <- load(file.path(saveDir, myfn))
    return(get(sig))
}

Also, RE: L1000. @iamsinht will contact you directly regarding the data.

Best,
Christopher Eeles

@Swilson6
Copy link
Author

Hello @ChristopherEeles

Thanks for lettting me know. I will keep an eye out for the updated release as I'm not confident enough to alter the code myself. I will try the drugPerturbationSig() command in the meantime.

Best,
Swilson6

@ChristopherEeles
Copy link
Contributor

Hi @Swilson6,

You can just copy and paste the code into your console and then use the updated function. You could also install the updated version of PharmacoGx using devtools::install_github('bhklab/PharmacoGx'). I am just ensuring the changes didn't break anything before pushing to Bioconductor. I expect they will be available by Saturday (look for version 2.2.1).

Best,
Chris

@Swilson6
Copy link
Author

Hello @ChristopherEeles

I downloaded the updated version 2.2.1 and I am unable to download the Perturbation signature using the command:

Cmap.pertub <- downloadPertSig("CMAP_2016")

I also tried coping and pasting the code into my console and that didnt work either. Are there additional parameters I'm not seeing?

Best
Swilson6

@ChristopherEeles
Copy link
Contributor

Hi @Swilson6,

I will look into this and get back to you shortly. Sorry for the inconvenience.

Best,
Chris

@ChristopherEeles
Copy link
Contributor

I am debugging the function. In the mean time you can download the file at this URL: https://www.pmgenomics.ca/bhklab/sites/default/files/downloads/CMAP_signatures.RData

@ChristopherEeles
Copy link
Contributor

Hi @Swilson6,

I fixed the function and have pushed to both the release and development branch of Bioconductor. The updates should be available within 2 days, look for versions 2.2.3.

Best,
Chris

@Swilson6
Copy link
Author

Hello @ChristopherEeles,

Ill keep an eye out for it.

In the meantime I downloaded the file you posted earlier, and tried running it with the HDAC_genes like you see in the tutorial and I got this error:

Error in CoreGx::connectivityScore(x, y, method, nperm, nthread, gwc.method, :
Row names of x and y are either missing or have no intersection

The drug.perturbation@.Data of the file you shared with me had rownames in the format of ("ENSG00000069535_at") while the drug.perturbation@.Data of the CMAPsmall drugPerturbationSig had rownames in the format of ("ENSG00000069535"). Is this the reason for the error?

Best,
Stephen

@ChristopherEeles
Copy link
Contributor

Hi @Swilson6,

Yes the source of the error is definitely the row name format. Our perturbation signature have not been updated for quite a while, and some changes in PharmacoGx have occurred since then. We are currently looking into updating the signatures, but it may take some time.

For now you can fix this issue with something like this: rownames(object) <- gsub('_at', '', rownames(object)).

Best,
Chris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants