Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with intersectPSets #96

Open
bhaibeka opened this issue Jul 28, 2021 · 10 comments
Open

Problem with intersectPSets #96

bhaibeka opened this issue Jul 28, 2021 · 10 comments
Labels

Comments

@bhaibeka
Copy link
Member

PharmacoGx version 2.4.0

CCLE vs GDSC1 (similar to Haibe-Kains Nature 2013)

ccle <- PharmacoGx::downloadPSet("CCLE_2015")
gdsc1 <- PharmacoGx::downloadPSet("GDSC_2020(v1-8.2)")

ccle.gdsc1 <- PharmacoGx::intersectPSet(pSets=list("CCLE"=ccle, "GDSC1"=gdsc1))

returns

Intersecting large PSets may take a long time ...
Error in which.min(abs(as.numeric(doses[[i]]) - min.dose)):max(which(abs(as.numeric(doses[[i]]) - :
argument of length 0
In addition: Warning messages:
1: In min(as.numeric(doses[[i]]), na.rm = TRUE) :
no non-missing arguments to min; returning Inf
2: In max(as.numeric(doses[[i]]), na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
3: In min(abs(as.numeric(doses[[i]]) - max.dose), na.rm = TRUE) :
no non-missing arguments to min; returning Inf
4: In max(which(abs(as.numeric(doses[[i]]) - max.dose) == min(abs(as.numeric(doses[[i]]) - :
no non-missing arguments to max; returning -Inf

@bhaibeka bhaibeka added the bug label Jul 28, 2021
@bhaibeka
Copy link
Member Author

looks like the intersection on concentrations causes the error

@ChristopherEeles
Copy link
Contributor

I am currently working on updating this method. Expect a fix shortly.

Best,
Chris

@p-smirnov
Copy link
Contributor

@bhaibeka I agree that it is a bug, in that it should fail more gracefully than this. The method fundamentally does not work on datasets with replicates, since we need 1-1 mapping of experiments across studies.

The only way I can think of salvaging is to first "collapse replicates", and then allow intersection. What are your thoughts on this?

@bhaibeka
Copy link
Member Author

I would collapse the replicate and add a clear warning to inform the users

@ChristopherEeles
Copy link
Contributor

ChristopherEeles commented Jul 28, 2021

So with the introduction of the LongTable as the class which stores sensitivity profiles, I hadn't intended to subset on dose (concentration) at all in intersect-methods. They would instead subset on features, samples and compounds (at least at the PSet level). The user could then extract the LongTable and select whichever dose they like with the very straightforward subset methods.

I can think of lots of situations where the dose is non even comparable. For example, if two PSets use different dose units. If the magnitude of those units differs (i.e., micromolar vs molar). Do we have standards in place around dose formatting?

Do you think preserving the intersection on dose is important despite these limitations? Or is it fine to have that as a secondary operation done by the user on the LongTable.

From a design perspective, I think the problem with a lot of our methods is that they try to do everything. Programming best practices indicate a function should do one operation and users are responsible for composing them into a pipeline. This is even easier now in R 4.1 with the base R pipe |>.

@ChristopherEeles
Copy link
Contributor

So what I imagine would look like:
intersect(PSet1, PSet) |> subsetLongTable(.(dose1 == dose2, , by=.(drug1id, drug2id)), )

@p-smirnov
Copy link
Contributor

Im not exactly sure I understand what this subsetLongTable function is doing in this case.

I am 100% for making the collapsing of replicates within a PSet and the intersection of PSets into two (or more) different steps, because I think their is a lot of choices on how to collapse replicates that are possible

@ChristopherEeles
Copy link
Contributor

ChristopherEeles commented Jul 28, 2021

In this case, subsetLongTable is a wrapper function around the subset method for a LongTable that works on a PSet. So you can subset a LongTable inside a PSet without extracting it.

If you mean the groupby, I'm selecting instances where the doses are equal for each drug combination.

@ChristopherEeles
Copy link
Contributor

ChristopherEeles commented Jul 28, 2021

setMethod('subsetLongTable', signature(x='PSet'), function(x, subset, select) { 
    LT <- subset(sensitivitySlot(x), substitute(subset), substitute(select))
    sensitivitySlot(x) <- LT
    harmonize(x) # drop drugs from the rest of the PSet if they are lost in the subset
    return(x)
} 

Or something like that.

@ChristopherEeles
Copy link
Contributor

ChristopherEeles commented Jul 28, 2021

As for collapsing replicates. There is already a method like that for MutliAssayExperiment. For LongTable we would do it through an assay aggregate and let the backpressure take care of the rest (i.e., when you subset an assay, if you lose a drug or sample, it is automatically dropped from the LongTable row- or colData.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants