Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scylla taking a very long time to process WES data / getting stuck #72

Open
fmazzarotto opened this issue Jul 8, 2022 · 1 comment
Open

Comments

@fmazzarotto
Copy link

Dear Tamsen,
I am using the Pisces suite to process tumor-only WES data. However, I am facing an issue with regards to Scylla as it often gets apparently stuck in a region of chromosome 3 that seems to be particularly complex. This happens in approximately 10-20% of the samples that I process.
An example is provided in the attached screenshot of a sample that I have been processing for days now, where you can see that Scylla appears to be stuck since yesterday morning (26 hours ago) trying to resolve a 192-variants MNV on chr 3. Another sample is in the same situation (stuck in the same region of chr 3 on a 175-variants MNV since 2 days ago).
Right now I am using Dotnet v5.0.408 and Pisces v5.3.0.0 - not sure if updating Dotnet can be of any help.
I just wanted to check:

  • do you think there is any obvious solution to this issue, or are you aware of anyone else facing a similar problem?
  • would removing Scylla from the pipeline be an acceptable workaround in your opinion, considering that I intersect results obtained with Pisces with those of a couple other callers?
    Thanks very much for your assistance

Screenshot from 2022-07-08 10-51-28

@tamsen
Copy link
Contributor

tamsen commented Aug 23, 2022

Hi,

Thanks for your interest, and for switching to latest.

I think the combinatorics with your 192-variant is probably just causing a slow clustering problem. Thats a bummer that its hitting so many of your samples. How confident are you in all those variants? If its just noise, I would skip calling in that region, or pre-filter the variants before feeding them to Scylla, so you only spend time clustering true variants.

Another idea is to try fiddling with the clustering settings themselves (see https://github.com/tamsen/Pisces/wiki/Scylla-5.2.10-Design-Document) or run Scylla with no arguments, to see the list of exposed parameters.

Off the top of my head, I'd suggest changing the "dist" parameter from 50 to, say 10 or 5. Then variants have to be within 10 base pairs to cluster. So, you'd have more small clusters, instead of one big one, so less of a combinatoric compute problem. You could also constrain the cluster size with some small code changes.

(and yes, if you just remove Scylla, you will still get the same variant calls, just not organized into MNVs. So if you are OK with that result, just go ahead and remove Scylla)

best
Tamsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants