Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: scheduled core X did not deliver a result #160

Open
ArthurDondi opened this issue Jan 30, 2024 · 3 comments
Open

Error: scheduled core X did not deliver a result #160

ArthurDondi opened this issue Jan 30, 2024 · 3 comments

Comments

@ArthurDondi
Copy link

Hi! I am running numbat using singularity, with 4 cores (128Gb each) on a <500 cells dataset, and it seems to be stuck at Retesting CNVs.. for hours, then it throws an error:

out = run_numbat(
    count_mat_dgC, # gene x cell integer UMI count matrix 
    ref_hca,#ref_internal reference expression profile, a gene x cell type normalized expression level matrix
    df_allele, # allele dataframe generated by pileup_and_phase script
    genome = "hg38",
    t = 1e-5,
    ncores = 4,
    ncores_nni = 4,
    plot = TRUE,
    out_dir = '/mnt/test'
)

Filtering out 22 cells with 0 coverage
Numbat version: 1.2.3
Running under parameters:
t = 1e-05
alpha = 1e-04
gamma = 20
min_cells = 50
init_k = 3
max_cost = 142.5
n_cut = 0
max_iter = 2
max_nni = 100
min_depth = 0
use_loh = auto
multi_allelic = TRUE
min_LLR = 5
min_overlap = 0.45
max_entropy = 0.5
skip_nj = FALSE
diploid_chroms = 
ncores = 4
ncores_nni = 4
common_diploid = TRUE
tau = 0.3
check_convergence = FALSE
plot = TRUE
genome = hg38
Input metrics:
475 cells
Mem used: 1.25Gb
Approximating initial clusters using smoothed expression ..
Mem used: 1.25Gb
number of genes left: 10941
running hclust...
Iteration 1
Mem used: 1.78Gb
Running HMMs on 5 cell groups..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Error in vctrs::vec_locate_matches(needles = needles, haystack = haystack,  : 
  Match procedure results in an allocation larger than 2^31-1 elements. Attempted allocation size was 39716884476.
ℹ In file match.c at line 2644.
ℹ Install the winch package to get additional debugging info the next time you get this error.
ℹ This is an internal error that was detected in the vctrs package.
  Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.

Error in `recycle_columns()`:
! Tibble columns must have compatible sizes.
• Size 248571: Column `1`.
• Size 74786391: Column `2`.
ℹ Only values of size one are recycled.
Run `rlang::last_trace()` to see where the error occurred.
Warning messages:
1: In mclapply(bulks %>% split(.$sample), mc.cores = ncores, function(bulk) { :
  scheduled core 2 did not deliver a result, all values of the job will be affected
2: In mclapply(bulks %>% split(.$sample), mc.cores = ncores, function(bulk) { :
  scheduled core 1 encountered error in user code, all values of the job will be affected

recycle_columns() seems to be a problem with cores not communicating, I don't know about vctrs::vec_locate_matches

Any idea on how to solve this?

And how to improve speed in general? It's a small dataset so I guess it should be able to run on one core worst-case, but is it supposed to take >10 hours?

@ArthurDondi
Copy link
Author

I tried running it with one core and it's still crashing, after being blocked for a few hours on Retesting CNVs..

INFO [2024-01-30 17:45:03] Filtering out 22 cells with 0 coverage
Filtering out 22 cells with 0 coverage
Numbat version: 1.2.3
Running under parameters:
t = 1e-05
alpha = 1e-04
gamma = 20
min_cells = 50
init_k = 3
max_cost = 142.5
n_cut = 0
max_iter = 2
max_nni = 100
min_depth = 0
use_loh = auto
multi_allelic = TRUE
min_LLR = 5
min_overlap = 0.45
max_entropy = 0.5
skip_nj = FALSE
diploid_chroms = 
ncores = 1
ncores_nni = 1
common_diploid = TRUE
tau = 0.3
check_convergence = FALSE
plot = TRUE
genome = hg38
Input metrics:
475 cells
Mem used: 1.25Gb
Approximating initial clusters using smoothed expression ..
Mem used: 1.25Gb
number of genes left: 11006
running hclust...
Iteration 1
Mem used: 1.78Gb
Running HMMs on 5 cell groups..
Retesting CNVs..
Error in `vctrs::vec_locate_matches()`:
! Match procedure results in an allocation larger than 2^31-1 elements. Attempted allocation size was 43404245895.
ℹ In file 'match.c' at line 2644.
ℹ This is an internal error that was detected in the vctrs package.
  Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.
Backtrace:
     ▆
  1. ├─numbat::run_numbat(...)
  2. │ └─bulk_subtrees %>% ...
  3. ├─numbat:::run_group_hmms(...)
  4. │ └─parallel::mclapply(...)
  5. │   └─base::lapply(X = X, FUN = FUN, ...)
  6. │     └─numbat (local) FUN(X[[i]], ...)
  7. │       └─bulk %>% ...
  8. ├─numbat::analyze_bulk(...)
  9. │ └─... %>% ungroup()
 10. ├─dplyr::ungroup(.)
 11. ├─dplyr::mutate(., phi_mle_roll = zoo::na.locf(phi_mle_roll, na.rm = FALSE))
 12. ├─dplyr::group_by(., CHROM)
 13. ├─dplyr::left_join(...)
 14. ├─dplyr:::left_join.data.frame(...)
 15. │ └─dplyr:::join_mutate(...)
 16. │   └─dplyr:::join_rows(...)
 17. │     └─dplyr:::dplyr_locate_matches(...)
 18. │       ├─base::withCallingHandlers(...)
 19. │       └─vctrs::vec_locate_matches(...)
 20. └─rlang:::stop_internal_c_lib(...)
 21.   └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame)
Warning message:
There were 18 warnings in `summarise()`.
The first warning was:
ℹ In argument: `approx_theta_post(...)`.
ℹ In group 30: `CHROM = 1`, `seg = 1jj`, `seg_start = 71440768`, `seg_end =
  71581715`, `cnv_state = "del_2"`.
Caused by warning in `cppdbbinom()`:
! NaNs produced
ℹ Run `dplyr::last_dplyr_warnings()` to see the 17 remaining warnings. 
Execution halted

The command I ran:

library(data.table)
library(numbat)

filename<-"/mydata/bam/featurecount/B486_Tum.counts.formated.txt"
temp <- read.csv(filename, row.names=1)
sc_counts<- as.matrix(temp)
count_mat_dgC <- as(sc_counts, "dgCMatrix") 

filename<-"/mydata/bam/featurecount/B486_Om.counts.formated.txt"
temp <- read.csv(filename, row.names=1)
sc_counts<- as.matrix(temp)
refcount_mat_dgC <- as(sc_counts, "dgCMatrix")

cell_annot <- read.csv('/mnt/ctypes/B486_Om.txt', sep='\t')
cell_annot <- as.data.frame(cell_annot)

ref_internal <- numbat::aggregate_counts(refcount_mat_dgC , cell_annot)

df_allele_file<-'/mnt/run_01/B486_Tum_allele_counts.tsv.gz'
df_allele <-fread(df_allele_file)

# run
out = run_numbat(
    count_mat_dgC, # gene x cell integer UMI count matrix 
    ref_internal,#ref_hca reference expression profile, a gene x cell type normalized expression level matrix
    df_allele, # allele dataframe generated by pileup_and_phase script
    genome = "hg38",
    t = 1e-5,
    ncores = 1,
    ncores_nni = 1,
    plot = TRUE,
    out_dir = '/mnt/test2'
)

From head(), count_mat_dgC, ref_internal and df_allele all look OK

@teng-gao
Copy link
Collaborator

teng-gao commented Feb 1, 2024

Thanks for reporting this. This shouldn't be happening unless you have extremely high coverage in those cells. If you share the input with me via email I can take a look

@teng-gao
Copy link
Collaborator

teng-gao commented Feb 23, 2024

I took a look. Please upgrade your numbat version to 1.3.2 or use the more recent docker image; I get more informative error message this way:

numbat version: 1.4.0
scistreer version: 1.2.0
hahmmr version: 1.0.0
Running under parameters:
t = 1e-05
alpha = 1e-04
gamma = 20
min_cells = 50
init_k = 3
max_cost = 142.5
n_cut = 0
max_iter = 2
max_nni = 100
min_depth = 0
use_loh = auto
segs_loh = None
call_clonal_loh = FALSE
segs_consensus_fix = None
multi_allelic = TRUE
min_LLR = 5
min_overlap = 0.45
max_entropy = 0.5
skip_nj = FALSE
diploid_chroms = None
ncores = 30
ncores_nni = 30
common_diploid = TRUE
tau = 0.3
check_convergence = FALSE
plot = TRUE
genome = hg38
Input metrics:
475 cells

Mem used: 1.35Gb

Approximating initial clusters using smoothed expression ..

Mem used: 1.35Gb

number of genes left: 10941

running hclust...

Iteration 1

Mem used: 1.88Gb

High SNP contamination detected (41%). Please make sure that cells from only one individual are included in genotyping step.

Expression noise level (MSE): high (1.8). Consider using a custom expression reference profile.

Running HMMs on 5 cell groups..

Warning message in mclapply(bulks %>% split(.$sample), mc.cores = ncores, function(bulk) {:
“scheduled cores 5, 4, 2, 1 encountered errors in user code, all values of the jobs will be affected”
Error in find_common_diploid(bulks, gamma = gamma, alpha = alpha, ncores = ncores): Error in smooth_segs(., min_genes = min_genes) : 
  No segments containing more than 10 genes for CHROM 18,21.

Traceback:

1. run_numbat(count_mat_dgC, ref_hca, df_allele, genome = "hg38", 
 .     t = 1e-05, ncores = 30, plot = TRUE, out_dir = "/home/tenggao/numbat_issues/160/results")
2. bulk_subtrees %>% run_group_hmms(t = t, gamma = gamma, alpha = alpha, 
 .     nu = nu, min_genes = min_genes, common_diploid = common_diploid, 
 .     diploid_chroms = diploid_chroms, ncores = ncores, verbose = verbose)   # at line 301-311 of file /home/tenggao/numbat/R/main.R
3. run_group_hmms(., t = t, gamma = gamma, alpha = alpha, nu = nu, 
 .     min_genes = min_genes, common_diploid = common_diploid, diploid_chroms = diploid_chroms, 
 .     ncores = ncores, verbose = verbose)
4. find_common_diploid(bulks, gamma = gamma, alpha = alpha, ncores = ncores)   # at line 814 of file /home/tenggao/numbat/R/main.R
5. stop(results[bad][[1]])   # at line 1151 of file /home/tenggao/numbat/R/utils.R

Notably:

High SNP contamination detected (41%). Please make sure that cells from only one individual are included in genotyping step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants