Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeMoMa restart error #7

Closed
sjfleck opened this issue Apr 16, 2021 · 7 comments
Closed

GeMoMa restart error #7

sjfleck opened this issue Apr 16, 2021 · 7 comments
Labels
GeMoMa Everything what concerns GeMoMa

Comments

@sjfleck
Copy link

sjfleck commented Apr 16, 2021

Hello,
I'm currently using GeMoMa to transfer gene models from a draft assembly to a pseudo assembly created on the CoGe online platform. I have 40 draft assemblies that we created "pseudoassemblies" for each by mapping each assembly on to a chromosome-level reference assembly we created for a very close relative of all 40 species. I was told that GeMoMa would be a easy and quick option for "transferring" each set gene models from draft assembly to pseudo assembly.

I installed GeMoMa-1.6.4 and ran this command:
java -Xmx512g -jar /path/to/GeMoMa-1.6.4.jar CLI GeMoMaPipeline threads=32 outdir=$OUTDIR GeMoMa.Score=ReAlign AnnotationFinalizer.r=NO o=true t=$TARGET a=$REFASS g=$REFGEN

our cluster doesn't allow jobs to run for more than 72 hours. I hoped that it would finish before then, but it timed out. Luckily there is the restart option, but when I ran this command:
java -Xmx187g -jar /path/to/GeMoMa-1.6.4.jar CLI GeMoMaPipeline threads=32 restart=true outdir=$OUTDIR GeMoMa.Score=ReAlign AnnotationFinalizer.r=NO o=true t=$TARGET a=$REFASS g=$REFGEN

I got this error message:
Unknown parameters: {restart=[true]}

I'm just realizing now that restart is only available for version 1.7+. This was a big mistake on my end, but I still want to ask my question to see if I'm doing this job efficiently. Is this the command that you would use to do what I'm hoping to do? Thank you for your time
-Steve

@JensKeilwagen
Copy link
Contributor

Hi Steve,

Thanks a lot for your interest in GeMoMa. Your command line looks okay.

Since GeMoMa 1.7, we provide the restart option, but we also changed from tblastn to mmseqs as default search algorithm. mmseqs is much faster than the tblastn (expecially if you're using recent tblastn version with have a bug that leads to huge runtimes). Hence, I would check whether version 1.7 gives results within the 72 hours.
If GeMoMaPipeline does not finish within your time limit you can ran indivdual modules of GeMoMa on parts of the data. This is what the GeMoMaPipeline does internally. However, doing it manually allows you to use as much threads as you like as each you can be run on an own machine, while in the GeMoMaPipeline you're limited by the number of CPUs at one server.

I could imagine that using several instead of one reference organism for each target genome might improve the result.
In addition, RNA-seq data either from your own lab or collaborators or publicly available (e.g. from NCBI) might furthre improve the results.

Please keep me updated.

@JensKeilwagen JensKeilwagen added the GeMoMa Everything what concerns GeMoMa label Apr 19, 2021
@sjfleck
Copy link
Author

sjfleck commented Apr 19, 2021

Thank you for getting back to me on this. I installed GeMoMa-1.7.1 and it seems to work (though it did cause a node failure so I need to resubmit the job).

For some reason, even though I have a working mmseqs and used the -m flag to provide the path to the executable directory, GeMoMa isn't recognizing it. I get this error:

Searching for the new GeMoMa updates ...
You are using the latest GeMoMa version.

run new GeMoMaPipeline job
Running the GeMoMaPipeline with 16 threads

search algorithm:
/projects/academic/vaalbert/modulefiles/MMseqs2/build/bin/mmseqs not available!

my GeMoMa works with tblastn, but I would rather use mmseqs if it's faster. Do you have any experience with this issue? Thanks!
-Steve Fleck

@JensKeilwagen
Copy link
Contributor

If you're running GeMoMa on a cluster, it could be that mmseqs is not available on the compute node although it is available on the head node. Have you checked with a simple script that mmseqs is available at your path on the compute node?

@sjfleck
Copy link
Author

sjfleck commented Apr 19, 2021

Thanks for the suggestion. Even though mmseqs works on the front end, it does not work when I submit it to the compute nodes. I've never had this happen before, so I'll have to ask them how to fix this issue. Do you have experience with or have you heard about this happening with mmseqs before? Thanks

@JensKeilwagen
Copy link
Contributor

Yes, I have heard of similar problems before. But you need to clarify this with your admins.
It seems that mmseqs is only installed on the head node but not on the compute nodes.

PS: I saw that you might be interested in tandem duplicated genes:
https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-020-00795-3
The upcomming version 1.8 might improve the accuracy for these cases ;)

@sjfleck
Copy link
Author

sjfleck commented Apr 19, 2021

Thank you! I appreciate the heads up on the new version.

I ended up fixing my problem just now. I previously installed mmseqs2-13.45111 using cmake, make, and make install and that was the one that gave me issues on the compute nodes. I decided to install the same version using conda and that version works just fine on the compute nodes. I'm going to use my new mmseqs2 installation with GeMoMa-1.7.1 now and hopefully it'll work fine. Thanks for your help again. I think we can close this issue.

@JensKeilwagen
Copy link
Contributor

You're welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GeMoMa Everything what concerns GeMoMa
Projects
None yet
Development

No branches or pull requests

2 participants