impute gates that failed the QA check #10

mikejiang · 2013-06-27T00:10:02Z

a new API is going to take a list of failed samples as input and return a list of reference samples

refSamples <- .nearestSamples(gs, "MTG_gate", failedSamples)

Once the reference samples are selected, it should be fairly straightforward to do the the gate imputation with the existing APIs ( getGate and setGate )

#extract reference gates
refGates <- sapply(refSamples,function(i)getGate(gs[[i]],"MTG_gate"))

#impute the gates
setGate(gs[failedSamples],"MTG_gate",refGates)

The text was updated successfully, but these errors were encountered:

raphg · 2013-06-27T05:27:59Z

Really nice Mike. I can't wait to see some real tests. Perhaps on the
Newell data.

On Wed, Jun 26, 2013 at 5:10 PM, Mike Jiang notifications@github.comwrote:

a new API is going to take a list of failed samples as input and return
a list of reference samples

refSamples <- .nearestSamples(gs, "MTG_gate", failedSamples)

Once the reference samples are selected, it should be fairly
straightforward to do the the gate imputation with the existing APIs (
getGate and setGate )

#extract reference gatesrefGates <- sapply(refSamples,function(i)getGate(gs[[i]],"MTG_gate"))
#impute the gatessetGate(gs[failedSamples],"MTG_gate",refGates)

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/10
.

mikejiang · 2013-06-28T16:56:39Z

By using flowCore::expressionFilter

expression1 <- paste0("`",params,"`>0")
ef <- char2ExpressionFilter(expression1)
> ef
expression filter '`<Blue F 525/50-A>`>0' evaluating the expression:
`<Blue F 525/50-A>`>0

I am able to gate out the outlier events (below zero)

tData <- Subset(tData,ef)

And here is the new emd distance, which seems to be more reasonable.

raphg · 2013-06-28T17:16:36Z

Yes it does look more resonable. I am surprised that outliers make such a
big difference, because the density is so low.
I just did a quick test comparing emd and ks.test, and the latter seems
more robust and it's faster.
Can you try it (without) filtering outliers to see what it gives.

On Fri, Jun 28, 2013 at 9:56 AM, Mike Jiang notifications@github.comwrote:

By using flowCore::expressionFilter

expression1 <- paste0("",params,">0")ef <- char2ExpressionFilter(expression1)> efexpression filter '<Blue F 525/50-A>>0' evaluating the expression:<Blue F 525/50-A>>0

I am able to gate out the outlier events

tData <- Subset(tData,ef)

And here is the new emd distance, which seems more reasonable.
[image: rplot001]https://f.cloud.github.com/assets/1385649/723264/ac65f00c-e013-11e2-9883-0cc2e00a4411.png

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/10#issuecomment-20200514
.

mikejiang · 2013-06-28T17:44:46Z

EM with outlier filter

> system.time(refSamples <- .nearestSamples(gs, "MTG_gate", failedSamples , method = "em")))
   user  system elapsed 
 30.002   0.000  29.466

and ks.test without outlier filter is indeed faster and more robust

> system.time(refSamples <- .nearestSamples(gs, "MTG_gate", failedSamples, method = "ks.test"))
   user  system elapsed 
  4.096   0.000   4.313

raphg · 2013-06-28T17:45:45Z

Great.

On Fri, Jun 28, 2013 at 10:44 AM, Mike Jiang notifications@github.comwrote:

EM with outlier filter

system.time(refSamples <- .nearestSamples(gs, "MTG_gate", failedSamples , method = "em"))) user system elapsed 30.002 0.000 29.466

and ks.test without outlier filter is indeed faster and more robust

system.time(refSamples <- .nearestSamples(gs, "MTG_gate", failedSamples, method = "ks.test")) user system elapsed 4.096 0.000 4.313

[image: rplot001]https://f.cloud.github.com/assets/1385649/723492/8458c868-e019-11e2-9739-732f6923196a.png

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/10#issuecomment-20203117
.

gfinak · 2013-06-28T17:49:27Z

Nice!

On 06/28/2013 01:44 PM, Mike Jiang wrote:

|EM| with |outlier filter|

system.time(refSamples <- .nearestSamples(gs, "MTG_gate", failedSamples , method = "em")))
user system elapsed
30.002 0.000 29.466

and |ks.test| without |outlier filter| is indeed faster and more robust

system.time(refSamples <- .nearestSamples(gs, "MTG_gate", failedSamples, method = "ks.test"))
user system elapsed
4.096 0.000 4.313

rplot001
https://f.cloud.github.com/assets/1385649/723492/8458c868-e019-11e2-9739-732f6923196a.png

—
Reply to this email directly or view it on GitHub
#10 (comment).

ramhiser · 2013-06-28T22:35:18Z

@mikejiang This is nice.

While I think the K-S approach deserves some credit here, it will give some peculiar results in other cases. Consider for example the following contrived example.

In this example, we are trying to determine if x1 is nearer to x2 or x3 using the K-S criterion. Notice that there is no overlap in the figure of the empirical CDFs, but in the density plot, there is some overlap. Because the K-S statistic simply finds the largest difference in empirical CDFs, samples x2 and x3 are equidistant from x1 because there is no overlap of the samples. Even if there were, it would be relatively straightforward to contrive an example where we would choose x3 as often as x2.

With this in mind, I am in favor of computing some divergence (e.g., Kullback-Leibler) between the estimated densities. I will concoct something soon.

raphg · 2013-06-28T22:39:48Z

In this case, 1 is the largest distance possible, and it makes sense given
that there is no overlap between any of the distributions.
So it's doing the right thing, that is all three distributions are as far
as they can be.

This being said, I am in favor of doing more comparison. John, thanks for
looking at it.

On Fri, Jun 28, 2013 at 3:35 PM, John Ramey notifications@github.comwrote:

@mikejiang https://github.com/mikejiang This is nice.

While I think the K-S approach deserves some credit here, it will give
some peculiar results in other cases. Consider for example the following
contrived example https://gist.github.com/ramey/5888640.

In this example, we are trying to determine if x1 is nearer to x2 or x3using the K-S criterion. Notice that there is no overlap in the figure of
the empirical CDFs, but in the density plot, there is some overlap. Because
the K-S statistic simply finds the largest difference in empirical CDFs,
samples x2 and x3 are equidistant from x1 because there is no overlap of
the samples. Even if there were, it would be relatively straightforward to
contrive an example where we would choose x3 as often as x2.

With this in mind, I am in favor of computing some divergence (e.g.,
Kullback-Leibler) between the estimated densities. I will concoct something
soon.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/10#issuecomment-20218235
.

mikejiang · 2015-10-09T22:42:16Z

Add an options passed to specify the samples passes QA and can be served as references.
By default, all samples other than failed are used. But sometime it is helpful to narrow it down to a few of really good samples.

mikejiang · 2017-04-07T19:26:26Z

@gfinak , 2d gate imputing is added through emd2d as you suggested, it is very slow though ,which is one of the reason @raphg asked me to switch to ks.test for 1d gate. (See the previous discussion of this thread).
I've added parallel through mc.cores and hopefully it can finish the job in the reasonable time for you.

Here is the example for matching one bad sample against 8 good samples

> system.time(res <- nearestSamples(gs, node = "CD4", failed = "1349_3_Tcell_A06.fcs", gridsize = c(70, 70), mc.cores = 8))
Finding reference sample for: 1349_3_Tcell_A06.fcs
    user   system  elapsed 
1123.817    1.582  160.287

We can fiddle with gridsize for bkde2D to find the optimal default settings (trade off between speed and accuracy)

mikejiang · 2017-04-10T20:55:17Z

> pData(gs)[, 3:4]
                      Sample Replicate
12828_1_Tcell_A01.fcs  12828         1
12828_2_Tcell_A02.fcs  12828         2
12828_3_Tcell_A03.fcs  12828         3
1349_1_Tcell_A04.fcs    1349         1
1349_2_Tcell_A05.fcs    1349         2
1349_3_Tcell_A06.fcs    1349         3
1369_1_Tcell_A07.fcs    1369         1
1369_2_Tcell_A08.fcs    1369         2
1369_3_Tcell_A09.fcs    1369         3

earth mover on 2d data is pretty expensive, which costs more than 2min with parallel computing in 6 cores.

> system.time(res <- nearestSamples(gs, node = node
+                                   , failed = failed
+                                   , passed = passed
+                                   , gridsize = c(50, 50)
+                                    , method = "em"
+                                   , mc.cores = 6
+                                   )
+             )
 user  system elapsed 
657.522  10.993 156.177
> res
  12828_1_Tcell_A01.fcs    1349_1_Tcell_A04.fcs    1369_1_Tcell_A07.fcs 
"12828_2_Tcell_A02.fcs"  "1349_3_Tcell_A06.fcs"  "1369_2_Tcell_A08.fcs"

As @raphg suggested, we could use ks.test on each dimension and sum the test statistics, here is the result, which finished the job within seconds and matches the correct replicates!

> system.time(res <- nearestSamples(gs, node = node
+                                   , failed = failed
+                                   , passed = passed
+                                   , method = "ks.test"
+                                   , mc.cores = 6
+                                   )
+             )
   user  system elapsed 
  2.391   1.529   1.179 
> res
  12828_1_Tcell_A01.fcs    1349_1_Tcell_A04.fcs    1369_1_Tcell_A07.fcs 
"12828_3_Tcell_A03.fcs"  "1349_2_Tcell_A05.fcs"  "1369_2_Tcell_A08.fcs"

mikejiang pushed a commit that referenced this issue Jun 27, 2013

add .nearestSamples #10

9e3585c

mikejiang pushed a commit that referenced this issue Jun 28, 2013

exclude marginal events before den-distance calcaulation. #10

de23fcc

mikejiang pushed a commit that referenced this issue Jul 26, 2013

add ks.test as default distance method. #10

05d3969

mikejiang closed this as completed Aug 13, 2013

mikejiang reopened this Oct 9, 2015

mikejiang pushed a commit that referenced this issue Oct 9, 2015

#10

365b07f

mikejiang pushed a commit that referenced this issue Apr 7, 2017

support imputing 2d gate. #10

e0fd41a

mikejiang pushed a commit that referenced this issue Apr 7, 2017

strip names from 'failed' vector #10

677bdf9

mikejiang pushed a commit that referenced this issue Apr 7, 2017

import mclapply #10

74db5c9

mikejiang pushed a commit that referenced this issue Apr 10, 2017

ks.test for 2d gates #10

35744d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

impute gates that failed the QA check #10

impute gates that failed the QA check #10

mikejiang commented Jun 27, 2013

raphg commented Jun 27, 2013

mikejiang commented Jun 28, 2013

raphg commented Jun 28, 2013

mikejiang commented Jun 28, 2013

raphg commented Jun 28, 2013

gfinak commented Jun 28, 2013

ramhiser commented Jun 28, 2013

raphg commented Jun 28, 2013

mikejiang commented Oct 9, 2015

mikejiang commented Apr 7, 2017 •

edited

Loading

mikejiang commented Apr 10, 2017

impute gates that failed the QA check #10

impute gates that failed the QA check #10

Comments

mikejiang commented Jun 27, 2013

raphg commented Jun 27, 2013

mikejiang commented Jun 28, 2013

raphg commented Jun 28, 2013

mikejiang commented Jun 28, 2013

raphg commented Jun 28, 2013

gfinak commented Jun 28, 2013

ramhiser commented Jun 28, 2013

raphg commented Jun 28, 2013

mikejiang commented Oct 9, 2015

mikejiang commented Apr 7, 2017 • edited Loading

mikejiang commented Apr 10, 2017

mikejiang commented Apr 7, 2017 •

edited

Loading