Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in GENIE run #3

Closed
asmagen opened this issue Oct 14, 2017 · 10 comments
Closed

Error in GENIE run #3

asmagen opened this issue Oct 14, 2017 · 10 comments

Comments

@asmagen
Copy link

asmagen commented Oct 14, 2017

Hello,

I get the following when I run GENIE:
Error in weightMatrix[regulatorNames, ] <- weightMatrix.reg :
number of items to replace is not a multiple of replacement length

Any idea of what might it be?

Thanks,
Assaf

@FloWuenne
Copy link

Hi Assaf,

I am having the same problem when running my data. Did you figure out what your problem was?

Thanks,

Florian

@s-aibar
Copy link
Member

s-aibar commented Nov 21, 2017

Hello,

That code is at the end of the parallel computation, so as temporary solution you might want to use nCores=1.

In order to reproduce the error (and provide more useful help...) I would need more info... What type of system are you using? (Windows/Linux/Mac? Some of the parallel functions are not available on Windows...) Can you provide a minimal example (or part of the data that is producing the error?) and the output of sessionInfo() to try to reproduce the error?

@FloWuenne
Copy link

Hi there,

I am running GENIE3 on a Linux cluster using a torque scheduler system. I am running the code on 1 node with 12 cores, therefore, using parallel computation at 12 cores.

My data is a matrix with normalized expression values from Drop-seq. I tried runnin a subset of data for computation speed but also tried full matrix and both gave the same error. The current matrix I am trying to run is 225 cells x 7566 genes.

Here is my code snippet, mainly adopted from the "Running SCENIC" tutorial. I load my expression matrix from a precomputed seurat object and then filter out genes:

Thanks for your help in advance!

### Define expression matrix
exp_matrix <- as.matrix(expression_seurat_hqc@data)

org <- "mm9"

if(org=="hg19")
{
  library(RcisTarget.hg19.motifDatabases.20k)
  
  ### Get genes in databases:
  data(hg19_500bpUpstream_motifRanking) # or 10kbp, they should have the same genes
  genesInDatabase <- hg19_500bpUpstream_motifRanking@rankings$rn
  
  ### Get TFS in databases:
  data(hg19_direct_motifAnnotation)
  allTFs <- hg19_direct_motifAnnotation$allTFs
}

if(org=="mm9")
{
  library(RcisTarget.mm9.motifDatabases.20k)
  
  ### Get genes in databases:
  data(mm9_500bpUpstream_motifRanking) # or 10kbp, they should have the same genes
  genesInDatabase <- mm9_500bpUpstream_motifRanking@rankings$rn
  
  ### Get TFS in databases:
  data(mm9_direct_motifAnnotation)
  allTFs <- mm9_direct_motifAnnotation$allTFs
}

### Gene filter / selection
nCellsPerGene <- apply(exp_matrix, 1, function(x) sum(x>0))
nCountsPerGene <- apply(exp_matrix, 1, sum)

gene_info <- data.frame("CellperGene" = nCellsPerGene,
                        "CountsPerGene" = nCountsPerGene)

### Filter genes
gene_info_filtered <- subset(gene_info,log10(CountsPerGene) > 1)
gene_info_filtered <- subset(gene_info_filtered,CellperGene > nrow(exp_pData)*0.01)

### Filter out genes that are not in the Rcis database
genesLeft_minCells_inDatabases <- rownames(gene_info_filtered)[which(rownames(gene_info_filtered) %in% genesInDatabase)]
length(genesLeft_minCells_inDatabases)

### Subset expression matrix for genes to use
exp_matrix_filtered <- exp_matrix[genesLeft_minCells_inDatabases,]

### Potential regulators
inputTFs <- allTFs[allTFs%in% rownames(exp_matrix_filtered)]
save(inputTFs, file="./int/1.2_inputTFs.RData")

### Run GENIE3
weightMatrix <- GENIE3(exp_matrix_filtered,regulators=inputTFs, nCores=12)


sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux release 6.3 (Carbon)

Matrix products: default
BLAS: /home/apps/Logiciels/R/3.4.0-gcc/lib64/R/lib/libRblas.so
LAPACK: /home/apps/Logiciels/R/3.4.0-gcc/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] forcats_0.2.0       stringr_1.2.0       dplyr_0.7.4        
 [4] purrr_0.2.4         readr_1.1.1         tidyr_0.7.2        
 [7] tibble_1.3.4        tidyverse_1.2.1     doParallel_1.0.11  
[10] iterators_1.0.8     foreach_1.4.3       GENIE3_1.0.0       
[13] SCENIC_0.1.7        Seurat_2.1.0        bigmemory_4.5.31   
[16] bigmemory.sri_0.1.3 Biobase_2.38.0      BiocGenerics_0.24.0
[19] Matrix_1.2-9        cowplot_0.8.0       ggplot2_2.2.1      

loaded via a namespace (and not attached):
  [1] readxl_1.0.0         backports_1.1.0      Hmisc_4.0-3         
  [4] VGAM_1.0-4           NMF_0.20.6           sn_1.5-0            
  [7] plyr_1.8.4           igraph_1.1.2         lazyeval_0.2.0      
 [10] splines_3.4.0        gridBase_0.4-7       digest_0.6.12       
 [13] htmltools_0.3.6      lars_1.2             gdata_2.18.0        
 [16] magrittr_1.5         checkmate_1.8.3      cluster_2.0.6       
 [19] mixtools_1.1.0       ROCR_1.0-7           modelr_0.1.1        
 [22] R.utils_2.6.0        colorspace_1.3-2     rvest_0.3.2         
 [25] haven_1.1.0          crayon_1.3.4         jsonlite_1.5        
 [28] lme4_1.1-13          bindr_0.1            survival_2.41-3     
 [31] ape_4.1              glue_1.1.1           registry_0.3        
 [34] gtable_0.2.0         MatrixModels_0.4-1   car_2.1-5           
 [37] kernlab_0.9-25       prabclus_2.2-6       DEoptimR_1.0-8      
 [40] SparseM_1.77         scales_0.4.1         mvtnorm_1.0-6       
 [43] rngtools_1.2.4       Rcpp_0.12.13         dtw_1.18-1          
 [46] xtable_1.8-2         htmlTable_1.9        tclust_1.2-7        
 [49] foreign_0.8-67       proxy_0.4-17         mclust_5.3          
 [52] SDMTools_1.1-221     Formula_1.2-2        stats4_3.4.0        
 [55] tsne_0.1-3           htmlwidgets_0.9      httr_1.3.1          
 [58] FNN_1.1              gplots_3.0.1         RColorBrewer_1.1-2  
 [61] fpc_2.1-10           acepack_1.4.1        modeltools_0.2-21   
 [64] ica_1.0-1            pkgconfig_2.0.1      R.methodsS3_1.7.1   
 [67] flexmix_2.3-14       nnet_7.3-12          caret_6.0-76        
 [70] rlang_0.1.4          reshape2_1.4.2       cellranger_1.1.0    
 [73] munsell_0.4.3        tools_3.4.0          cli_1.0.0           
 [76] ranger_0.8.0         broom_0.4.3          ModelMetrics_1.1.0  
 [79] knitr_1.16           robustbase_0.92-7    caTools_1.17.1      
 [82] bindrcpp_0.2         pbapply_1.3-3        nlme_3.1-131        
 [85] quantreg_5.33        R.oo_1.21.0          xml2_1.1.1          
 [88] rstudioapi_0.7       compiler_3.4.0       pbkrtest_0.4-7      
 [91] ggjoy_0.3.0          stringi_1.1.5        lattice_0.20-35     
 [94] trimcluster_0.1-2    psych_1.7.8          nloptr_1.0.4        
 [97] diffusionMap_1.1-0   data.table_1.10.4-3  bitops_1.0-6        
[100] irlba_2.2.1          AUCell_0.99.5        R6_2.2.2            
[103] latticeExtra_0.6-28  KernSmooth_2.23-15   gridExtra_2.2.1     
[106] RcisTarget_0.99.0    codetools_0.2-15     MASS_7.3-47         
[109] gtools_3.5.0         assertthat_0.2.0     pkgmaker_0.22       
[112] mnormt_1.5-5         diptest_0.75-7       mgcv_1.8-17         
[115] hms_0.3              grid_3.4.0           rpart_4.1-11        
[118] class_7.3-14         minqa_1.2.4          segmented_0.5-2.1   
[121] Rtsne_0.13           numDeriv_2016.8-1    scatterplot3d_0.3-40
[124] lubridate_1.7.1      base64enc_0.1-3

@s-aibar
Copy link
Member

s-aibar commented Nov 21, 2017

Hi again,

Thanks for the info!

I have run GENIE3 using your code and some Drop-seq data with similar characteristics, but the only way I have managed to reproduce the error is by artificially changing the dimensions of weightMatrix and weightMatrix.reg inside the function.

So, just to make sure... can you confirm that the size of the expression matrix and the row names just before entering GENIE3 are what you expect? (the matrix should contain the gene names as rownames() ...)

dim(exp_matrix_filtered)
exp_matrix_filtered[1:5,1:4]

I have added some extra checks for the next version (in case a similar error appears in the future...), but if you would like to help finding out exactly what is causing your error, you can re-run GENIE3 with the same settings after runningoptions(error = recover). This will trigger the debugger, and then you can explore the values of the variables that caused the error, which probably has something to do with inconsistencies within these values:

length(targetNames)
length(regulatorNames)
head(targetNames)
head(regulatorNames)
dim(weightMatrix.reg)
dim(weightMatrix)
weightMatrix.reg[1:5,1:4]
weightMatrix[1:5,1:4]

@FloWuenne
Copy link

Thank you for the quick feedback. I was also troubleshooting and I found that running with only 1 core (nCores=1) seems to work just fine, so it suggests to me that there might be some issue with the Parallelization going on when running it on a remote node rather than a local machine...

Could there be an issue with the remote node not having any dependencies or similar? On our cluster we run jobs via qsub to the torque scheduler which then launches a remote node that will run the job. All the required R packages will be loaded but maybe I am missing a linux package that is not generally loaded on our worker nodes but is present on the login node?

I will let you know whether I can get it to work with multiple cores but so far I did not have any luck...

@s-aibar
Copy link
Member

s-aibar commented Nov 22, 2017

Have you checked if the basic example in GENIE3 works? (adding multiple cores, of course)
If it also crashes, then at least we know that it is something in the setup/parallelization, not depending on the data itself...

(We often run GENIE3 also on cluster with qsub, and we have not come up with this error so far...)

## Generate fake expression matrix
exprMatrix <- matrix(sample(1:10, 100, replace=TRUE), nrow=20)
rownames(exprMatrix) <- paste("Gene", 1:20, sep="")
colnames(exprMatrix) <- paste("Sample", 1:5, sep="")

## Run GENIE3
set.seed(123) # For reproducibility of results
weightMatrix <- GENIE3(exprMatrix, regulators=paste("Gene", 1:5, sep=""), nCores=4)

@FloWuenne
Copy link

Thanks for the running example, should've tried with a simple small snippet like this, my bad.
The code you send works, so I guess it definitely has to do with my matrix. I am using normalized values, this could not be the issue that they are not integers right?

I will go over my data again and see what might cause the problem...

@FloWuenne
Copy link

So all smaller examples I have run so far have worked now even with multiple cores on the cluster.
I am currently running the full dataset with high-quality cells using GENIE3 and will let you know whether this works and we can close the ticket.

Quick optimization question for other people as well. I had actually not considered this before but does using normalized data slow down the GENIE3 run as well since we are using double values instead of integers and therefore have to load a lot more data into the function?

@FloWuenne
Copy link

Working fine now! I have to say I can't really pinpoint what the problem was. I guess when people run into this error the best way is to make sure that the input is a matrix and that inputTFs are in rownames of that matrix.

I think your examples will help many people running into this issue.
Thank you very much @s-aibar , you can close the ticket for me! 👍

@s-aibar s-aibar closed this as completed Dec 5, 2017
@liuyifang
Copy link

Hi, the problem maybe some parallel jobs die due to lack of memory. Perhaps move to a larger memory cluster would help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants