inefficient keyword<- method #44

mikejiang · 2019-01-31T20:02:47Z

User anders.tondell wrote Question: ncdfFlowSet with dim=3:

I am currently working with a ncdfflowset from a large number of fcs-files (>1500).
I need to set several keywords to NA, but it seems changing keywords on this large ncdfflowset are >extremely time-consuming (~30 sec. pr. sample). However, on subsets of the ncdfflowset object with >100 samples (FCS-files) the time used pr. sample is 1/10 as long.

keyword(ncfs) <- list(EXPERIMENT NAME=NA, $SRC=NA, $FIL=NA,$FILNAME=NA)

Is there any way to make this code more effective. The documentation on read.ncdfFlowSet is a bit >scarce. What is the pros and cons of 'dim=3'?

Thanx in advance for any suggestions,

Anders T

mikejiang · 2019-01-31T20:10:44Z

dim = 3 was the legacy data storage format (thus not recommended) and is not relevant to this issue.
To address the speed problem, we will need to overload the default setReplaceMethod("keyword", signature=c("flowSet", "list"), with dedicated method for ncdfFlowSet to avoid
the unnecessary disk IO incurred by accessing events data

SamGG · 2019-01-31T21:13:20Z

Hi,
If you split the 1500 FCS set in 15 sets wouldn't it be efficient enough?
My two cents...

mikejiang · 2019-01-31T22:15:34Z

I've overloaded both keyword and keyword<-, here was the timing before

fs <- GvHD[pData(GvHD)$Patient %in% 6:7][1:4]
suppressMessages(ncfs <- ncdfFlowSet(fs))

Unit: microseconds
                  expr      min       lq      mean   median       uq      max neval
   keyword(fs, "$TOT")  242.661  251.696  288.7938  260.757  281.541  511.322    10
 keyword(ncfs, "$TOT") 2641.320 2668.504 2735.5251 2685.657 2739.053 3150.761    10


                                   expr       min       lq       mean    median        uq
   keyword(fs) <- list(`$FILENAME` = NA)   436.772   448.92   538.1075   584.194   591.762
 keyword(ncfs) <- list(`$FILENAME` = NA)) 35974.567 36208.98 37112.2368 36615.207 37122.648

And here is the results with the latest patch 1dd94c1

Unit: microseconds
                  expr     min      lq      mean   median      uq       max neval
   keyword(fs, "$TOT") 241.601 249.573 7774.8146 252.9335 271.164 75441.778    10
 keyword(ncfs, "$TOT") 162.750 167.529  297.6278 180.9890 192.738  1350.745    10

                                          expr     min      lq     mean   median      uq      max
   keyword(fs) <- list(`$FILENAME` = NA) 423.215 431.601 545.9565 437.4825 452.387 1506.077
 keyword(ncfs) <- list(`$FILENAME` = NA) 116.812 122.090 748.6854 129.7095 151.257 5588.899

mikejiang pushed a commit that referenced this issue Jan 31, 2019

#44

1dd94c1

mikejiang closed this as completed Jan 31, 2019

mikejiang mentioned this issue Jan 31, 2019

Complete cytoframe/cytoset accessor/mutator overloads RGLab/flowWorkspace#275

Closed

mikejiang pushed a commit that referenced this issue Feb 22, 2019

import keyword from flowCore. #44

bee1f4e

mikejiang pushed a commit that referenced this issue Jun 3, 2019

#44

5b1e0c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inefficient keyword<- method #44

inefficient keyword<- method #44

mikejiang commented Jan 31, 2019

mikejiang commented Jan 31, 2019

SamGG commented Jan 31, 2019

mikejiang commented Jan 31, 2019 •

edited

Loading

inefficient keyword<- method #44

inefficient keyword<- method #44

Comments

mikejiang commented Jan 31, 2019

mikejiang commented Jan 31, 2019

SamGG commented Jan 31, 2019

mikejiang commented Jan 31, 2019 • edited Loading

mikejiang commented Jan 31, 2019 •

edited

Loading