Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues running EvidenceModeler #55

Open
juanlu16 opened this issue Apr 7, 2022 · 13 comments
Open

Issues running EvidenceModeler #55

juanlu16 opened this issue Apr 7, 2022 · 13 comments

Comments

@juanlu16
Copy link

juanlu16 commented Apr 7, 2022

Hi @brianjohnhaas

I have some doubts about EVM and also some execution problems. I was wondering if you could help me. I would be extremely grateful to you if you could help me solve these doubts and problems.

The doubts is the next: I have several prediction files with transcriptomes from PASA. I would just like to merge the predictions or alignments from these files, I prefer not to enter an Ab initio gene prediction file. However, the gene prediction file is mandatory to run EVM. What could I do? I had thought about using as prediction file ( --gene_predictions) some file coming from PASA results, but I don't know if this is the right thing to do.

On the other hand, i have a trouble too. To test that evidence modeler works correctly in my computer, I have launched a test with several PASA transcript files, but before finishing, the execution stops and I get the message "Out of memory!" What can this message be due to?

Thanks so much in advantage for your helps. I'm so gratefull with you.

Best regards,

Juan Luis

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Apr 7, 2022 via email

@juanlu16
Copy link
Author

Hi Brian Jonnas,

I am really grateful for your reply.

From what I understand in your explanation, the ab initio prediction is totally necessary, isn't it. I have that prediction done with Augustus, but I would like to use only PASA data since the data coming from PASA I consider to be more secure. what do you recommend me?

Regarding the "out of memory!" problem, I have implemented what you told me about reducing the partition parameter, and I have reduced it to 10000 instead of leaving 100000 as it comes in the Evidence Modeler web. However, I get the same error when I run this test with the prediction of 2000 genes from each of the four species I want to merge, i.e. with a total of 8000 genes to merge into 2000 genes approximately. To do this test, I had 165 Gb of RAM, so I suppose that by saturating all this RAM is because Evidence Modeler really requires a lot of RAM. In this case, my question is more specific. Taking into account that I am working with genomes of about 24 really big plant varieties (between 13-17 Gb), and that each raisin alignment-prediction (gff3 files) contains about 125000 alignments-assemblies on average and a prediction of a reference genome with 71000 predicted genes, could you know how much is the approximate RAM requirement in a system for a correct functioning of Evidence Modeler with all this data?

In case this process needs a lot of RAM and other resources in quantity, we had thought about the option of extracting the predicted genes in each chromosome, and run evidence modeler chromosome by chromosome. Would this be a good option?

Thank you very much in advance for your help.

Best regards,

Juan Luis

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Apr 18, 2022 via email

@juanlu16
Copy link
Author

Dear Brian Jonnas:

On the one hand, I attach a document containing screenshots and a description of all the documents I used as input to run evidence modeler, plus the initial run command I used.

On the other hand, evidence modeler has allowed me to run a test with my own data without memory problems as it happened to me before, since I have reduced the "segmentSize" parameter to 40000, instead of 100000 as the example tutorial does. However, I have not been successful in the result, since the final gff3 file does not contain any data. Therefore I wanted to ask you two specific questions:

  1. Does the --transcript_alignments parameter only accept a single input file or can it accept multiple? In case it only accepts a single input file, could I merge all the files coming from the PASA run for each variety into a single final file, and use this final file as input for the "--transcript_alignments" parameter?

  2. Why could it be that the gff3 files generated in each reference genome partition and the final gff3 file do not contain any data?

Thank you very much for your help and your patience Brian Jonnas.

Best regards,

Juan Luis

INPUT FILES.pdf

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Apr 19, 2022 via email

@juanlu16
Copy link
Author

juanlu16 commented Apr 20, 2022

Hi Brian

I have concatenated the gff3 files coming from PASA, and subsequently run it again. So, the gff3 files generated for me are still empty.

On the other hand, I am looking at enhancing the Augustus data with the PASA data, but from what I have seen, the prediction has to be redone and this would be a long and costly process, whereas I need to merge all the predictions as soon as possible, as this is a matter I am in a hurry.
I had thought that perhaps the fault might be in the input data. To check that the input gff3s are OK, I used the script " gff3_gene_prediction_file_validator.pl ". When I ran that script using the Augustus prediction as input, I got the following error:

(base) support@srvapp02:/BIOINFOR/EvidenceModeler/test2$ /home/support/sw_installed/EVidenceModeler-1.1.1/EvmUtils/gff3_gene_prediction_file_validator.pl ab_inition_prediction.gff3
Fatal Error: cannot parse ID from entry
at /home/support/sw_installed/EVidenceModeler-1.1.1/EvmUtils/gff3_gene_prediction_file_validator.pl line 54, <$fh> line 18.
(base) support@srvapp02:
/BIOINFOR/EvidenceModeler/test2$

I think it is because in this gff3 file the identifier is missing. how could I correct this gff3 file? how could I introduce the ID of each gene? could it be the lack of these gene identifiers in the augustus file that causes that the gff3 that is generated when I run evidence modeler is empty?

If you want, I could send you the original files to your email in case you want to take a look at them to see if they are ok or not.

Thank you very much for your attention and speed Brian.

Best regards!

Juan Luis

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Apr 20, 2022 via email

@juanlu16
Copy link
Author

Hi Brian

I have sent you the files to your email.

Best,

Juan Luis

@juanlu16
Copy link
Author

Hi Brian

I hope all goes well for you.

I sent you an email with the files I use as input for EVM. However, I don't think it would reach you due to the lack of response. On the other hand, I have tried to share these files with you here so that you can access them more easily, but it does not allow me to upload these file formats. What I am doing is to send you a new email with the files to see if you get them this time and you can take a look at them.

Sorry for the inconvenience Brian, but EVM is a software that interests me a lot.

Thank you very much for your attention and help in advance.

Best regards,

Juan Luis

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented May 10, 2022 via email

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented May 10, 2022 via email

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented May 10, 2022 via email

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants