Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agora-generic pipeline seems to be “stuck” for several days after adding 1 more genome #29

Closed
maxnest opened this issue Feb 14, 2024 · 4 comments

Comments

@maxnest
Copy link

maxnest commented Feb 14, 2024

Dear colleagues,
Thank you for your useful and important approach! Unfortunately, I have some issues with ‘agora-generic’ pipeline. Previously, I have gotten good results when analyzing data using this pipeline. To improve the results, I added one new genome and after 6 completed tasks (Status: 49 to do, 1 running, 6 done, 0 failed -- 56 total) the program seems to be “stuck” for several days. Restarting the pipeline and running the analysis on another computer did not help. It is worth noting that, firstly, previously the pipeline successfully completed the analysis after a few minutes, secondly, table of processes indicates that Python-related processes are still running, thirdly, when analyzing the same data set using ‘basic’ and ‘plants’ pipeline, all processes are completed successfully in a few minutes. Given that the ‘generic’ pipeline tries to find the best parameters for each ancestral node of the phylogenetic tree, is such a long data processing time expected or not? Have you noticed this before? And what can you recommend?
I would be grateful for any help,
Thank you very much!

@alouis72
Copy link
Collaborator

Hi maxnest,
sorry for the late answer.
It seems that agora-generic is "stuck" when trying to process agora on the constrained families "1.0-1.0".

It's probably due to the fact that there are very few ancestral genes in this category for one ancestor after adding the new species.
Will it be possible for you to give me the number of conserved pairs for each ancestors in "pairwise/pairs-size-1.0-1.0/log" (without the name of the ancestors if it's private of course)?
by doing this command:

grep "conserved pairs for" pairwise/pairs-size-1.0-1.0/log

On the data before the insertion of the new genome and after the insertion?

you can contact me directly at alouis@bio.ens.psl.eu if you want.
regards,
Alex

@diekei
Copy link

diekei commented Apr 24, 2024

Hi @alouis72 and colleagues,

Thanks for the great tool! I also got the same problem with my dataset, but instead that the problem appeared after adding one more genome, I couldn't run the generic pipeline in the first place (but works for the basic). I wonder what is the problem and whether you could help solving it. Thank you!

@alouis72
Copy link
Collaborator

Hi diekei,
I think that the problem is that when adding new genomes, the parameter of constrained genes 1.0-1.0 is to stringent, leading to graphs too complex to be parsed when the algorithm tries to fillin-in the first skeleton of ancestral blocks…

Maybe the thing to do, is to try generic pipeline only on specific ancestors (with option -target= ) and loop.
and stay on basic for the oldests ancestors.
Or
edit the agora-generic.py scripts, and remove the parameter 1.0-1.0 on line 29:

for sizeParams in [(1.0,1.0), (0.9,1.1), (0.77,1.33)]:
Becoming
for sizeParams in [(0.9,1.1), (0.77,1.33)]:

Hope this will work,
regards,
Alex

@diekei
Copy link

diekei commented Apr 24, 2024

Hi @alouis72,

Many thanks for the super swift response! and the two possible suggestions.
We're mostly interested on reconstructing ALG for the oldest ancestors, so I don't think I can use the first solution approach.
I've tried to run it using the modified agora-generic.py scripts as suggested, and it worked!

Thank you!!
Best,
Arif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants