Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of Redundant Candidates in assemble-complexSVs #41

Closed
alobo4 opened this issue Dec 6, 2022 · 4 comments
Closed

Number of Redundant Candidates in assemble-complexSVs #41

alobo4 opened this issue Dec 6, 2022 · 4 comments

Comments

@alobo4
Copy link

alobo4 commented Dec 6, 2022

Hello,

I am running assemble-complexSVs for an HCC1954 Hi-C data, GSM3258551, SRR7475914, the same cell-line used in the NeoLoopFinder paper. The command is taking an extremely long time (>4 days) even with 256MB requested for memory as the number of redundant candidates are 84996, 141704, and 114354 for the 5kb, 10kb, and 25kb resolutions, respectively. You mentioned in your tutorial that assemble-complexSVs should only take around ~6 mins to complete. I understand that was a test example but I am curious if my numbers are expected for a bigger sample or if something messed up in previous steps. I inferred my SVs using EagleC with the NeoLoopFinder output. Here is the command I am running and what the logging file shows:
assemble-complexSVs -O HCC1954 -B HCC1954.CNN_SVs.NeoLoopFinder.txt --balance-type CNV --protocol insitu --nproc 6 \ -H HCC1954-MboI-R1-filtered.mcool::resolutions/25000 \ HCC1954-MboI-R1-filtered.mcool::resolutions/10000 \ HCC1954-MboI-R1-filtered.mcool::resolutions/5000

root INFO @ 12/02/22 12:13:04:
# ARGUMENT LIST:
# Output Prefix = HCC1954
# Break Points = HCC1954.CNN_SVs.NeoLoopFinder.txt
# Minimum fragment size = 500000bp
# Cooler URI = ['HCC1954-MboI-R1-filtered.mcool::resolutions/25000', 'HCC1954-MboI-R1-filtered.mcool::resolutions/10000', 'HCC1954-MboI-R1-filtered.mcool::resolutions/5000']
# Extended Genomic Span = 5000000bp
# Balance Type = CNV
# Experimental protocol = insitu
# Number of Processes = 6
# Log file name = assembleSVs.log
root INFO @ 12/02/22 12:13:24: Current resolution: 25000
root INFO @ 12/02/22 12:13:24: Calculate the global average contact frequencies at each genomic distance ...
root INFO @ 12/02/22 12:14:04: Done
root INFO @ 12/02/22 12:14:04: Filtering SVs by checking distance decay of chromatin contacts across SV breakpoints ...
root INFO @ 12/02/22 12:17:52: 296 SVs left
root INFO @ 12/02/22 12:17:52: Building SV connecting graph ...
root INFO @ 12/02/22 12:17:52: Discovering and re-ordering complex SVs ...
neoloop.assembly INFO @ 12/02/22 12:20:02: Filtering 114354 redundant candidates ...

@XiaoTaoWang
Copy link
Owner

Hi, thanks for reporting this. I recently updated NeoLoopFinder so that it can deal with smaller SVs than what we analyzed in the original paper, but didn't notice the running time complexity issue. I will take a look at this and get back to you later this week or next week.

Best,
Xiaotao

@alobo4
Copy link
Author

alobo4 commented Dec 12, 2022 via email

@XiaoTaoWang
Copy link
Owner

Hi, can you upgrade you NeoLoopFinder to the latest version (v0.4.3) by pip install -U neoloop and try again? In my test, I finished the job within 1hr with this version.

@alobo4
Copy link
Author

alobo4 commented Dec 20, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants