Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple threads? #12

Closed
davidaray opened this issue Mar 2, 2021 · 7 comments
Closed

multiple threads? #12

davidaray opened this issue Mar 2, 2021 · 7 comments

Comments

@davidaray
Copy link

Trying this out for a project. Seems to be running fine on two mammal genomes but it's taking quite a while to run.

Can this package be run on multiple processors to speed things up?

Thanks.

@SAMtoBAM
Copy link
Owner

SAMtoBAM commented Mar 3, 2021

Hi there,

The run time is mainly down to the nucmer alignment process unfortunately
This is an issue with trying to maintain both MUMmer4 and MUMmer3 compatibility as multi-threading is only available with MUMmer4
I quickly generated a version for only MUMmer4 with a thread number option. I tested it on my side and the threading option appears to be working. Could you give it a whirl? "mumandco_v2.4.2_MUMmer4_multithreads.sh"
The only modification is to add an option -t/--threads to the end of the command;
eg:
bash mumandco_v2.4.2_MUMmer4_multithreads.sh -r ./yeast.tidy.fa -q ./yeast_tidy_DEL100.fa -g 12500000 -o DEL100_test -t 10

@davidaray
Copy link
Author

davidaray commented Mar 3, 2021

Thank you for the help. I made an attempt on the same mammal genomes I originally tried. By the way, those ran to completion. took about 30 hours.

Here is the output from the new run. The command was the same except for the addition of the -t option.
$MUMCOPATH/mumandco_v2.4.2_MUMmer4_multithreads.sh -r assemblies/rAff.fa -q assemblies/rSed.fa -g 2500000000 -o rAffvrSed -t 36

Nucmer alignment of genomes, filtering and converting to coordinates

My what a large genome you have, this may take some time
Unknown option: threads

USAGE: nucmer [options]

Try '/home/daray/conda/bin/nucmer -h' for more information.
ERROR: Could not parse delta file, rAffvrSed_ref.delta
error no: 400
ERROR: Could not parse delta file, rAffvrSed_ref.delta_filter
error no: 402
ERROR: Could not parse delta file, rAffvrSed_ref.delta_filter
error no: 402
Unknown option: threads

USAGE: nucmer [options]

Try '/home/daray/conda/bin/nucmer -h' for more information.
ERROR: Could not parse delta file, rAffvrSed_query.delta
error no: 400
ERROR: Could not parse delta file, rAffvrSed_query.delta_filter
error no: 402
ERROR: Could not parse delta file, rAffvrSed_query.delta_filter
error no: 402

Could this be a mummer version issue? I notice that this is for MUMmer4 and the version that ran successfully was v3.23.

In anticipation of this problem, I installed MUMmer4 using conda, 'conda install mummer4' .

$ conda list

packages in environment at /home/daray/conda/envs/mumco4:

Name Version Build Channel

_libgcc_mutex 0.1 main
libgcc-ng 9.1.0 hdf63c60_0
libstdcxx-ng 9.1.0 hdf63c60_0
mummer4 4.0.0rc1 pl526he1b5a44_0 bioconda
perl 5.26.2 h14c3975_0

It looks to me like mummer4 is installed properly.

Any ideas?

Thank you again.

--EDIT--

I just took a look at nucmer and, indeed, there is no -t option when looking at the help file.

Not sure if this makes a difference but....
$ nucmer -V
nucmer
NUCmer (NUCleotide MUMmer) version 3.1

@SAMtoBAM
Copy link
Owner

SAMtoBAM commented Mar 4, 2021

Well its nice to know it did finish within a 'reasonable' amount of time!

It seems based on your version output that it is still running MUMmer3

In the MUM&Co script the way it finds the nucmer aligner is through 'which nucmer' (and the other MUMmer tools respectively on lines 9-11)
NUCMER=$(which nucmer) DELTAFILTER=$(which delta-filter) SHOWCOORDS=$(which show-coords)
Therefore it goes after the nucmer command in your path
You can edit this line directly to direct it to a newer installation of MUMmer4, eg:
NUCMER=$(/home/user/mummer4/nucmer)

Or alternatively, build and install MUMmer4 as from the github (with 'sudo make install' at the end)
Perhaps you have already tried that and it still didn't change the path?

@davidaray
Copy link
Author

That seems to be doing the trick. It's running. I'll see if it finishes successfully. Regardless, thanks for being so responsive.

@davidaray
Copy link
Author

Yep. That works and is a vast improvement. I ran a test using the same two genomes.

V1 used 36 processors. V2 used only one.

V1 finished in ~20 minutes. V2 is still running after 35 minutes. I expect it will take 20-30 hours based on the earlier run.

So, you might want to put this in the final version. It works.

Thanks for the help.

@SAMtoBAM
Copy link
Owner

SAMtoBAM commented Mar 4, 2021

Thanks for giving me the results. It makes a hell of a difference

Looks like i'll be adding this option permanently soon and mentioning that MUMmer4 will then be a requirement
cheers

@SAMtoBAM SAMtoBAM closed this as completed Mar 4, 2021
@davidaray
Copy link
Author

Reopening this thread if that's ok.

I've come back to this after a while and have found there's a new version based on what we discussed above.

Unfortunately, this new version isn't working for me. I'm getting some very strange errors that I have no idea how to deal with.

These new errors seem to stem from this issue that comes up four times during a run of the test data.

g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory
g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory
g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory
g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory

I've done some troubleshooting and I think I've narrowed it down to these lines:

$SHOWCOORDS -T -r -c -l -d -g ""$prefix"_ref".delta_filter > ""$prefix"_ref".delta_filter.coordsg
$SHOWCOORDS -T -r -c -l -d ""$prefix"_ref".delta_filter > ""$prefix"_ref".delta_filter.coords

$NUCMER --threads ${threads} --maxmatch --nosimplify -p ""$prefix"_query" $query_assembly $reference_assembly
$DELTAFILTER -m ""$prefix"_query".delta > ""$prefix"_query".delta_filter
$SHOWCOORDS -T -r -c -l -d -g ""$prefix"_query".delta_filter > ""$prefix"_query".delta_filter.coordsg
$SHOWCOORDS -T -r -c -l -d ""$prefix"_query".delta_filter > ""$prefix"_query".delta_filter.coords

There are four $SHOWCOORDS commands.

Any idea what might be going on here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants