Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inefficient keyword<- method #44

Closed
mikejiang opened this issue Jan 31, 2019 · 3 comments
Closed

inefficient keyword<- method #44

mikejiang opened this issue Jan 31, 2019 · 3 comments

Comments

@mikejiang
Copy link
Member

User anders.tondell wrote Question: ncdfFlowSet with dim=3:

I am currently working with a ncdfflowset from a large number of fcs-files (>1500).
I need to set several keywords to NA, but it seems changing keywords on this large ncdfflowset are >extremely time-consuming (~30 sec. pr. sample). However, on subsets of the ncdfflowset object with >100 samples (FCS-files) the time used pr. sample is 1/10 as long.

keyword(ncfs) <- list(EXPERIMENT NAME=NA, $SRC=NA, $FIL=NA,$FILNAME=NA)

Is there any way to make this code more effective. The documentation on read.ncdfFlowSet is a bit >scarce. What is the pros and cons of 'dim=3'?

Thanx in advance for any suggestions,

Anders T

@mikejiang
Copy link
Member Author

dim = 3 was the legacy data storage format (thus not recommended) and is not relevant to this issue.
To address the speed problem, we will need to overload the default setReplaceMethod("keyword", signature=c("flowSet", "list"), with dedicated method for ncdfFlowSet to avoid
the unnecessary disk IO incurred by accessing events data

@SamGG
Copy link

SamGG commented Jan 31, 2019

Hi,
If you split the 1500 FCS set in 15 sets wouldn't it be efficient enough?
My two cents...

mikejiang pushed a commit that referenced this issue Jan 31, 2019
@mikejiang
Copy link
Member Author

mikejiang commented Jan 31, 2019

I've overloaded both keyword and keyword<-, here was the timing before

fs <- GvHD[pData(GvHD)$Patient %in% 6:7][1:4]
suppressMessages(ncfs <- ncdfFlowSet(fs))
Unit: microseconds
                  expr      min       lq      mean   median       uq      max neval
   keyword(fs, "$TOT")  242.661  251.696  288.7938  260.757  281.541  511.322    10
 keyword(ncfs, "$TOT") 2641.320 2668.504 2735.5251 2685.657 2739.053 3150.761    10


                                   expr       min       lq       mean    median        uq
   keyword(fs) <- list(`$FILENAME` = NA)   436.772   448.92   538.1075   584.194   591.762
 keyword(ncfs) <- list(`$FILENAME` = NA)) 35974.567 36208.98 37112.2368 36615.207 37122.648

And here is the results with the latest patch 1dd94c1

Unit: microseconds
                  expr     min      lq      mean   median      uq       max neval
   keyword(fs, "$TOT") 241.601 249.573 7774.8146 252.9335 271.164 75441.778    10
 keyword(ncfs, "$TOT") 162.750 167.529  297.6278 180.9890 192.738  1350.745    10

                                          expr     min      lq     mean   median      uq      max
   keyword(fs) <- list(`$FILENAME` = NA) 423.215 431.601 545.9565 437.4825 452.387 1506.077
 keyword(ncfs) <- list(`$FILENAME` = NA) 116.812 122.090 748.6854 129.7095 151.257 5588.899

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants