Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Running agora-basic.py: "assert oldName not in seen" #23

Closed
erin-thei opened this issue Apr 22, 2023 · 5 comments
Closed

Error Running agora-basic.py: "assert oldName not in seen" #23

erin-thei opened this issue Apr 22, 2023 · 5 comments

Comments

@erin-thei
Copy link

Hello,

I am trying to run Agora using my own data (the example worked with no issues). This is the command I tried to run: ~/Agora/src/agora-basic.py species-tree.nwk orthologyGroups/orthologyGroups.%s.list genes/genes.%s.list

(agora) [theillere@Escalante3 Single_Copy_Orthologue_Sequences]$ ~/Agora/src/agora-basic.py species-(agora) [theillere@Escalante3 Single_Copy_Orthologue_Sequences]$ ~/Agora/src/agora-basic.py species-(agora) [theillere@Escalante3 Single_Copy_Orthologue_Sequences]$ ~/Agora/src/agora-basic.py species-tree.nwk orthologyGroups/orthologyGroups.%s.list genes/genes.%s.list

| Key | Values |

| speciesTree | species-tree.nwk |
| geneTrees|orthologyGroups | orthologyGroups/orthologyGroups.%s.list |
| genes | genes/genes.%s.list |
| target | |
| extantSpeciesFilter | |
| compress | bz2 |
| workingDir | . |
| nbThreads | 24 |
| forceRerun | False |
| sequential | True |

New task 0 ('ancgenes', 'all')
[]
Command(args=['/home/theillere/Agora/src/ALL.reformatGeneFamilies.py', 'species-tree.nwk', 'orthologyGroups/orthologyGroups.%s.list', '-IN.genesFiles=genes/genes.%s.list', '-OUT.ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-OUT.genesFiles=genes/genes.%s.list.bz2'], out='GeneTreeForest.withAncGenes.nhx.bz2', log='ancGenes/ancGenes.log')

New task 1 ('pairwise', 'ancgenes-all')
[('ancgenes', 'all')]
Command(args=['/home/theillere/Agora/src/buildSynteny.pairwise-conservedPairs.py', 'species-tree.nwk', 'NAME_0', '-ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-genesFiles=genes/genes.%s.list.bz2', '-OUT.pairwise=pairwise/pairs-all/%s.list.bz2'], out=None, log='pairwise/pairs-all/log')

New task 2 ('integr', 'denovo-all')
[('pairwise', 'ancgenes-all')]
Command(args=['/home/theillere/Agora/src/buildSynteny.integr-denovo.py', 'species-tree.nwk', 'NAME_0', '+searchLoops', '-OUT.ancBlocks=ancBlocks/denovo-all/blocks.%s.list.bz2', 'pairwise/pairs-all/%s.list.bz2', '-ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-LOG.ancGraph=ancBlocks/denovo-all/graph.%s.txt.bz2'], out=None, log='ancBlocks/denovo-all/log')

New task 3 ('integr', 'denovo-all.scaffolds')
[('integr', 'denovo-all')]
Command(args=['/home/theillere/Agora/src/buildSynteny.integr-scaffolds.py', 'species-tree.nwk', 'NAME_0', '-OUT.ancBlocks=ancBlocks/denovo-all.scaffolds/blocks.%s.list.bz2', '-ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-IN.ancBlocks=ancBlocks/denovo-all/blocks.%s.list.bz2', '-genesFiles=genes/genes.%s.list.bz2', '-LOG.ancGraph=ancBlocks/denovo-all.scaffolds/graph.%s.txt.bz2'], out=None, log='ancBlocks/denovo-all.scaffolds/log')

New task 4 ('conversion', 'basic-workflow')
[('integr', 'denovo-all.scaffolds')]
Command(args=['/home/theillere/Agora/src/convert.ancGenomes.blocks-to-genes.py', 'species-tree.nwk', 'NAME_0', '+orderBySize', '-IN.ancBlocks=ancBlocks/denovo-all.scaffolds/blocks.%s.list.bz2', '-ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-OUT.ancGenomes=ancGenomes/basic-workflow/ancGenome.%s.list.bz2'], out=None, log='ancGenomes/basic-workflow/log')

Status: 5 to do, 0 running, 0 done, 0 failed -- 5 total
Available tasks: [0]
Control file ancGenes/ancGenes.log.agora missing
Launching task 0 ['/home/theillere/Agora/src/ALL.reformatGeneFamilies.py', 'species-tree.nwk', 'orthologyGroups/orthologyGroups.%s.list', '-IN.genesFiles=genes/genes.%s.list', '-OUT.ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-OUT.genesFiles=genes/genes.%s.list.bz2'] > GeneTreeForest.withAncGenes.nhx.bz2 2> ancGenes/ancGenes.log
Status: 4 to do, 1 running, 0 done, 0 failed -- 5 total
Waiting ...
task 0 report: 0.106603 sec CPU time / 0.107803 sec elapsed = 98.8865% CPU usage, 17.625 MB RAM
task 0 is now finished (status 1)

Inspect ancGenes/ancGenes.log for more information
Status: 4 to do, 0 running, 0 done, 1 failed -- 5 total
Available tasks: []
Workflow stopped because of failures
Workflow report: 0.114315 sec CPU time / 0.115183 sec elapsed = 99.2463% CPU usage, 18.0391 MB RAM
(agora) [theillere@Escalante3 Single_Copy_Orthologue_Sequences]$

Here is the input data that I'm working with: https://www.dropbox.com/scl/fo/en4rlnwvvnspv9sj51d3u/h?dl=0&rlkey=ybt2vi7hi09xfgnp2uuw85oz7

Please let me know if you have any insight as to how I can solve this issue. I'm also attaching the log file.
Thanks!

Agora_Log.txt

@alouis72
Copy link
Collaborator

Hi @erin-thei ,
The format of the orthogroups files is not good.
There should not have the first line, lines should be only list of genes, with no comma.
I guess you used Orthofinder to generate these HOGs. You can try to use the script I wrote on the agora_dev branch in src/import :
https://github.com/DyogenIBENS/Agora/blob/dev/src/import/orthofinder_hogs/convert_hogs_sp.py

I didn't get the opportunity to try it through all the ancestral reconstruction process, therefore, I would greatly appreciate it if you could provide me feedbacks on that.

@erin-thei
Copy link
Author

Hi @alouis72,

Thanks for your timely response. I will give that a try!

Since I'm new to this workflow, a couple of questions. Given my species tree, I was told to run OrthoFinder on all of the nodes (so I ran 68 iterations of OF). Each of those OF runs produced their own HOGs. Am I supposed to use that script for all of those? I guess I am a bit confused on the ancestral reconstruction process as a whole. Any help would be much appreciated. Thanks!

@erin-thei
Copy link
Author

Hi again @alouis72 ,

I was able to get past the error I was facing earlier, but I got an error during the buildSynteny.pairwise-conservedPairs.py step saying: No such file or directory: 'ancGenes/all/ancGenes.NAME_0.list.bz2. Upon inspecting the scripts, I printed phylTree.listAncestr:

['A10', 'A11', 'A12', 'A13', 'A14', 'A15', 'A16', 'A17', 'A18', 'A19', 'A2', 'A20', 'A21', 'A22', 'A23', 'A24', 'A25', 'A26', 'A27', 'A28', 'A29', 'A3', 'A30', 'A31', 'A32', 'A33', 'A34', 'A35', 'A36', 'A37', 'A38', 'A39', 'A4', 'A40', 'A41', 'A42', 'A43', 'A44', 'A45', 'A46', 'A47', 'A48', 'A49', 'A5', 'A50', 'A51', 'A52', 'A53', 'A54', 'A55', 'A56', 'A57', 'A58', 'A59', 'A6', 'A60', 'A61', 'A62', 'A63', 'A64', 'A65', 'A66', 'A67', 'A68', 'A7', 'A8', 'A9', 'NAME_0']

Why is that last ancestor listed when it's not present in my species tree?

@alouis72
Copy link
Collaborator

Hi Erin,
The root of the species tree has no name, so AGORA infer it as NAME_0, but... do not have OrthoGroups for it.
Either you name and give orthogroups for the root (if you have them), or you add an option "-target=A2" to the agora command line to build ancestor A2 and its descendants.

About, your first question... I don't understand how you build your OrthoGroups. Maybe there is a risk of inconsistancy between ancestors...
I know that Orthofinder2 build Hierarchical Orthogroups (Phylogenetic_Hierarchical_Orthogroups in results), with consistency across the species tree. Maybe you should try that.

@erin-thei
Copy link
Author

Great, thanks for the information. I was actually able to fix the issue prior to your response, and get it working successfully which is great.

I haven't done a deep dive into the results yet, or how to interpret them, but does Agora report the average number of genes per synteny block? Or is that something that should be done manually?

Thanks so much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants