Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joint Calling Problem #403

Closed
rosehuang24 opened this issue Sep 22, 2020 · 2 comments
Closed

Joint Calling Problem #403

rosehuang24 opened this issue Sep 22, 2020 · 2 comments

Comments

@rosehuang24
Copy link

Dear GRIDSS team,

When I perform gridss on germline data for 60 individuals (diploid, genome size about 1.1GB and average depth 10X), the program reported an error message
Caused by: java.lang.RuntimeException: Fatal error: GRIDSS assembly does not have the expected number of input categories (found 40, expected 60). GRIDSS performs joint assembly and does not support per-input assembly. Make sure the same input and labels are specified in the same order for the assembly and variant calling steps.

I have checked for input labels, input, assembly several times to ensure there are all 60 of them, and when I run the program separately based on populations (8, 12, and 40 individuals) it ran successfully.

Your help and time is greatly appreciated!
Rose

The code is
$gridss --threads 8 -j $gridssjar --reference $ref_genome --output output_60indv \ --repeatmaskerbed $repeatmasker \ --assembly SRS589245 --assembly SRS589246 --assembly SRS589247 --assembly SRS589248 --assembly SRS589249 --assembly SRS589250 --assembly SRS589251 --assembly SRS589252 --assembly Clean_01 --assembly Clean_02 --assembly Clean_03 --assembly Clean_04 --assembly Clean_05 --assembly Clean_06 --assembly Clean_07 --assembly Clean_08 --assembly Clean_09 --assembly Clean_ns --assembly SRS420686 --assembly SRS524489 --assembly GGS_174 --assembly GGS_175 --assembly GGS_176 --assembly GGS_2887 --assembly GGS_2888 --assembly GGS_2890 --assembly GGS_2891 --assembly GGS_2892 --assembly GGS_2893 --assembly GGS_2895 --assembly GGS_3001 --assembly GGS_3002 --assembly GGS_3003 --assembly GGS_3004 --assembly GGS_3005 --assembly GGS_3006 --assembly GGS_3007 --assembly GGS_3008 --assembly GGS_3009 --assembly GGS_3010 --assembly GGS_3011 --assembly GGS_3012 --assembly GGS_3016 --assembly GGS_3017 --assembly GGS_3028 --assembly GGS_3038 --assembly GGS_3040 --assembly GGS_3041 --assembly GGS_3042 --assembly GGS_3043 --assembly GGS_3044 --assembly GGS_3045 --assembly GGS_3046 --assembly GGS_3047 --assembly GGS_3050 --assembly GGS_3051 --assembly GGS_3052 --assembly GGS_3061 --assembly GGS_3069 --assembly GGS_3072 \ --labels SRS589245,SRS589246,SRS589247,SRS589248,SRS589249,SRS589250,SRS589251,SRS589252,Clean_01,Clean_02,Clean_03,Clean_04,Clean_05,Clean_06,Clean_07,Clean_08,Clean_09,Clean_ns,SRS420686,SRS524489,GGS_174,GGS_175,GGS_176,GGS_2887,GGS_2888,GGS_2890,GGS_2891,GGS_2892,GGS_2893,GGS_2895,GGS_3001,GGS_3002,GGS_3003,GGS_3004,GGS_3005,GGS_3006,GGS_3007,GGS_3008,GGS_3009,GGS_3010,GGS_3011,GGS_3012,GGS_3016,GGS_3017,GGS_3028,GGS_3038,GGS_3040,GGS_3041,GGS_3042,GGS_3043,GGS_3044,GGS_3045,GGS_3046,GGS_3047,GGS_3050,GGS_3051,GGS_3052,GGS_3061,GGS_3069,GGS_3072 \ $midfileDIR/dedup_SRS589245.bam $midfileDIR/dedup_SRS589246.bam $midfileDIR/dedup_SRS589247.bam $midfileDIR/dedup_SRS589248.bam $midfileDIR/dedup_SRS589249.bam $midfileDIR/dedup_SRS589250.bam $midfileDIR/dedup_SRS589251.bam $midfileDIR/dedup_SRS589252.bam $midfileDIR/final_Clean_01.bam $midfileDIR/final_Clean_02.bam $midfileDIR/final_Clean_03.bam $midfileDIR/final_Clean_04.bam $midfileDIR/final_Clean_05.bam $midfileDIR/final_Clean_06.bam $midfileDIR/final_Clean_07.bam $midfileDIR/final_Clean_08.bam $midfileDIR/final_Clean_09.bam $midfileDIR/final_Clean_ns.bam $midfileDIR/dedup_SRS420686.bam $midfileDIR/dedup_SRS524489.bam $midfileDIR/GGS_174_dedup.bam $midfileDIR/GGS_175_dedup.bam $midfileDIR/GGS_176_dedup.bam $midfileDIR/GGS_2887_dedup.bam $midfileDIR/GGS_2888_dedup.bam $midfileDIR/GGS_2890_dedup.bam $midfileDIR/GGS_2891_dedup.bam $midfileDIR/GGS_2892_dedup.bam $midfileDIR/GGS_2893_dedup.bam $midfileDIR/GGS_2895_dedup.bam $midfileDIR/GGS_3001_dedup.bam $midfileDIR/GGS_3002_dedup.bam $midfileDIR/GGS_3003_dedup.bam $midfileDIR/GGS_3004_dedup.bam $midfileDIR/GGS_3005_dedup.bam $midfileDIR/GGS_3006_dedup.bam $midfileDIR/GGS_3007_dedup.bam $midfileDIR/GGS_3008_dedup.bam $midfileDIR/GGS_3009_dedup.bam $midfileDIR/GGS_3010_dedup.bam $midfileDIR/GGS_3011_dedup.bam $midfileDIR/GGS_3012_dedup.bam $midfileDIR/GGS_3016_dedup.bam $midfileDIR/GGS_3017_dedup.bam $midfileDIR/GGS_3028_dedup.bam $midfileDIR/GGS_3038_dedup.bam $midfileDIR/GGS_3040_dedup.bam $midfileDIR/GGS_3041_dedup.bam $midfileDIR/GGS_3042_dedup.bam $midfileDIR/GGS_3043_dedup.bam $midfileDIR/GGS_3044_dedup.bam $midfileDIR/GGS_3045_dedup.bam $midfileDIR/GGS_3046_dedup.bam $midfileDIR/GGS_3047_dedup.bam $midfileDIR/GGS_3050_dedup.bam $midfileDIR/GGS_3051_dedup.bam $midfileDIR/GGS_3052_dedup.bam $midfileDIR/GGS_3061_dedup.bam $midfileDIR/GGS_3069_dedup.bam $midfileDIR/GGS_3072_dedup.bam

@d-cameron
Copy link
Member

d-cameron commented Sep 22, 2020

GRIDSS (currently) requires joint assembly of all samples. Only one assembly --assembly arguments is recognised so that command line is equivalent to --assembly GGS_3072.

Fatal error: GRIDSS assembly does not have the expected number of input categories (found 40, expected 60)

It looks very much like GRIDSS is reading the assembly file containing the 40 individuals, recognising that doesn't match the 60 samples provided, and immediately terminating (which is preferably to incorrectly allocating assembly support to the wrong samples).

For very large cohorts, the total sequencing depth will be far too deep for the assembly to be reliable. In such scenarios, you'll need to do assembly in batches. #354 contains details of how you can trick GRIDSS into doing batched assembly. Unforunately, you're going to have to regenerate your existing assembly.bam files since they've already been generated with the incorrect number of samples.

Proper support for batched assembly is already on the backlog as issue #397

@d-cameron
Copy link
Member

That said, GRIDSS should be able to do joint assembly on 600x worth of samples (we've run in on ~1000x aggregate coverage for some tumour xenograft evolution analysis). Just make sure to check the log file and the .bed files in the assembly.bam.gridss.working to see if GRIDSS aborted assembly in any regions of importance. There's a few assembly complexity filters that you're more likely to hit at high coverage so you might need to tweak them. You'll also need to either reduce thread count or allocate more memory as you might run out of memory with the default settings. If you have a cluster, you can use the --jobindex and --jobnodes parameters to split GRIDSS assemblies across multiple jobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants