New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues running EvidenceModeler #55
Comments
Hi Juan,
responses below:
On Thu, Apr 7, 2022 at 6:27 AM juanlu16 ***@***.***> wrote:
Hi @brianjohnhaas <https://github.com/brianjohnhaas>
I have some doubts about EVM and also some execution problems. I was
wondering if you could help me. I would be extremely grateful to you if you
could help me solve these doubts and problems.
The doubts is the next: I have several prediction files with
transcriptomes from PASA. I would just like to merge the predictions or
alignments from these files, I prefer not to enter an Ab initio gene
prediction file. However, the gene prediction file is mandatory to run EVM.
What could I do? I had thought about using as prediction file (
--gene_predictions) some file coming from PASA results, but I don't know if
this is the right thing to do.
EVM primarily works of ab initio gene predictions and uses alignments
(protein and transcripts) to guide selection of exons and introns. You
mentioned you prefer to not use ab initio gene predictions. If you have
PASA transcripts, you could run TransDecoder on them to predict ORFs and
propagate them to genome structures.
On the other hand, i have a trouble too. To test that evidence modeler
works correctly in my computer, I have launched a test with several PASA
transcript files, but before finishing, the execution stops and I get the
message "Out of memory!" What can this message be due to?
EVM should come with a bunch of example data sets. Try running some of
these to see that it works on your system. If you're getting out of
memory errors, then explore the EVM genome partitioning step and making the
partitions smaller so they will contain multiple complete genes but not be
overwhelmed by the amount of data per genome partition.
hope this helps,
~brian
… Thanks so much in advantage for your helps. I'm so gratefull with you.
Best regards,
Juan Luis
—
Reply to this email directly, view it on GitHub
<#55>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX2TCCX7WO4ZYTMMMU3VD22AXANCNFSM5SY4ROHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Hi Brian Jonnas, I am really grateful for your reply. From what I understand in your explanation, the ab initio prediction is totally necessary, isn't it. I have that prediction done with Augustus, but I would like to use only PASA data since the data coming from PASA I consider to be more secure. what do you recommend me? Regarding the "out of memory!" problem, I have implemented what you told me about reducing the partition parameter, and I have reduced it to 10000 instead of leaving 100000 as it comes in the Evidence Modeler web. However, I get the same error when I run this test with the prediction of 2000 genes from each of the four species I want to merge, i.e. with a total of 8000 genes to merge into 2000 genes approximately. To do this test, I had 165 Gb of RAM, so I suppose that by saturating all this RAM is because Evidence Modeler really requires a lot of RAM. In this case, my question is more specific. Taking into account that I am working with genomes of about 24 really big plant varieties (between 13-17 Gb), and that each raisin alignment-prediction (gff3 files) contains about 125000 alignments-assemblies on average and a prediction of a reference genome with 71000 predicted genes, could you know how much is the approximate RAM requirement in a system for a correct functioning of Evidence Modeler with all this data? In case this process needs a lot of RAM and other resources in quantity, we had thought about the option of extracting the predicted genes in each chromosome, and run evidence modeler chromosome by chromosome. Would this be a good option? Thank you very much in advance for your help. Best regards, Juan Luis |
I'm not sure what the issue is with EVM but it's likely related to some
aspect of the data. If you have a small example that you could share with
me that demonstrates the problem, I could look into it further.
If you have augustus predictions and transcript data that you want to
integrate, you could just upload the augustus predictions as annotations
into PASA and have PASA add UTRs and alt splice variants. If you run PASA
with the --TRANSDECODER option, it'll add novel genes and revise the
annotations more aggressively based on what it infers as full-length
transcripts.
best,
~b
…On Mon, Apr 18, 2022 at 8:28 AM juanlu16 ***@***.***> wrote:
Hi Brian Jonnas,
I am really grateful for your reply.
From what I understand in your explanation, the ab initio prediction is
totally necessary, isn't it. I have that prediction done with Augustus, but
I would like to use only PASA data since the data coming from PASA I
consider to be more secure. what do you recommend me?
Regarding the "out of memory!" problem, I have implemented what you told
me about reducing the partition parameter, and I have reduced it to 10000
instead of leaving 100000 as it comes in the Evidence Modeler web. However,
I get the same error when I run this test with the prediction of 2000 genes
from each of the four species I want to merge, i.e. with a total of 8000
genes to merge into 2000 genes approximately. To do this test, I had 165 Gb
of RAM, so I suppose that by saturating all this RAM is because Evidence
Modeler really requires a lot of RAM. In this case, my question is more
specific. Taking into account that I am working with genomes of about 24
really big plant varieties (between 13-17 Gb), and that each raisin
alignment-prediction (gff3 files) contains about 125000
alignments-assemblies on average and a prediction of a reference genome
with 71000 predicted genes, could you know how much is the approximate RAM
requirement in a system for a correct functioning of Evidence Modeler with
all this data?
In case this process needs a lot of RAM and other resources in quantity,
we had thought about the option of extracting the predicted genes in each
chromosome, and run evidence modeler chromosome by chromosome. Would this
be a good option?
Thank you very much in advance for your help.
Best regards,
Juan Luis
—
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKXZ45OT3C2LR3GO2WJTVFVIN5ANCNFSM5SY4ROHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Dear Brian Jonnas: On the one hand, I attach a document containing screenshots and a description of all the documents I used as input to run evidence modeler, plus the initial run command I used. On the other hand, evidence modeler has allowed me to run a test with my own data without memory problems as it happened to me before, since I have reduced the "segmentSize" parameter to 40000, instead of 100000 as the example tutorial does. However, I have not been successful in the result, since the final gff3 file does not contain any data. Therefore I wanted to ask you two specific questions:
Thank you very much for your help and your patience Brian Jonnas. Best regards, Juan Luis |
hi,
EVM generally expects at least three sets of ab initio predictions and then
the alignment data. It only takes individual gff3 files per parameter, so
you must concatenate data together for the corresponding data type.
Since you're just using Augustus with transcript data, try just using PASA
to update your Augustus predictions.
Not sure who Jonnas is. ;-)
…On Tue, Apr 19, 2022 at 10:01 AM juanlu16 ***@***.***> wrote:
Dear Brian Jonnas:
On the one hand, I attach a document containing screenshots and a
description of all the documents I used as input to run evidence modeler,
plus the initial run command I used.
On the other hand, evidence modeler has allowed me to run a test with my
own data without memory problems as it happened to me before, since I have
reduced the "segmentSize" parameter to 40000, instead of 100000 as the
example tutorial does. However, I have not been successful in the result,
since the final gff3 file does not contain any data. Therefore I wanted to
ask you two specific questions:
1.
Does the --transcript_alignments parameter only accept a single input
file or can it accept multiple? In case it only accepts a single input
file, could I merge all the files coming from the PASA run for each variety
into a single final file, and use this final file as input for the
"--transcript_alignments" parameter?
2.
Why could it be that the gff3 files generated in each reference genome
partition and the final gff3 file do not contain any data?
Thank you very much for your help and your patience Brian Jonnas.
Best regards,
Juan Luis
INPUT FILES.pdf
<https://github.com/EVidenceModeler/EVidenceModeler/files/8513622/INPUT.FILES.pdf>
—
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKXY54LJYMPRPYA7FZLLVF24BVANCNFSM5SY4ROHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Hi Brian I have concatenated the gff3 files coming from PASA, and subsequently run it again. So, the gff3 files generated for me are still empty. On the other hand, I am looking at enhancing the Augustus data with the PASA data, but from what I have seen, the prediction has to be redone and this would be a long and costly process, whereas I need to merge all the predictions as soon as possible, as this is a matter I am in a hurry. (base) support@srvapp02: I think it is because in this gff3 file the identifier is missing. how could I correct this gff3 file? how could I introduce the ID of each gene? could it be the lack of these gene identifiers in the augustus file that causes that the gff3 that is generated when I run evidence modeler is empty? If you want, I could send you the original files to your email in case you want to take a look at them to see if they are ok or not. Thank you very much for your attention and speed Brian. Best regards! Juan Luis |
Sure, send me what you can and I'll take a look. We have a converter that
should make Augustus gff3 look more compatible to PASA or EVM in case that
helps. The converter is in the evm_utils/ area of the EVM software.
best,
~b
…On Wed, Apr 20, 2022 at 1:35 PM juanlu16 ***@***.***> wrote:
Hi Brian
I have concatenated the gff3 files coming from PASA, and subsequently run
it again. So, the gff3 files generated for me are still empty.
On the other hand, I am looking at enhancing the Augustus data with the
PASA data, but from what I have seen, the prediction has to be redone and
this would be a long and costly process, whereas I need to merge all the
predictions as soon as possible, as this is a matter I am in a hurry.
I had thought that perhaps the fault might be in the input data. To check
that the input gff3s are OK, I used the script "
gff3_gene_prediction_file_validator.pl ". When I ran that script using
the Augustus prediction as input, I got the following error:
*(base) ***@***.***:/BIOINFOR/EvidenceModeler/test2$
/home/support/sw_installed/EVidenceModeler-1.1.1/EvmUtils/gff3_gene_prediction_file_validator.pl
<http://gff3_gene_prediction_file_validator.pl> ab_inition_prediction.gff3
Fatal Error: cannot parse ID from entry at
/home/support/sw_installed/EVidenceModeler-1.1.1/EvmUtils/gff3_gene_prediction_file_validator.pl
<http://gff3_gene_prediction_file_validator.pl> line 54, <$fh> line 18.
(base) ***@***.***:/BIOINFOR/EvidenceModeler/test2$*
I think it is because in this gff3 file the identifier is missing. how
could I correct this gff3 file? how could I introduce the ID of each gene?
could it be the lack of these gene identifiers in the augustus file that
causes that the gff3 that is generated when I run evidence modeler is empty?
If you want, I could send you the original files to your email in case you
want to take a look at them to see if they are ok or not.
Thank you very much for your attention and speed Brian.
Best regards!
Juan Luis
—
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX56EYTIGMSLRJN2SYDVGA547ANCNFSM5SY4ROHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Hi Brian I have sent you the files to your email. Best, Juan Luis |
Hi Brian I hope all goes well for you. I sent you an email with the files I use as input for EVM. However, I don't think it would reach you due to the lack of response. On the other hand, I have tried to share these files with you here so that you can access them more easily, but it does not allow me to upload these file formats. What I am doing is to send you a new email with the files to see if you get them this time and you can take a look at them. Sorry for the inconvenience Brian, but EVM is a software that interests me a lot. Thank you very much for your attention and help in advance. Best regards, Juan Luis |
Hi Juan,
I don't think I received anything from you. Please try again: bhaas
at broadinstitute dot org
many thanks,
~b
…On Tue, May 10, 2022 at 4:03 AM juanlu16 ***@***.***> wrote:
Hi Brian
I hope all goes well for you.
I sent you an email with the files I use as input for EVM. However, I
don't think it would reach you due to the lack of response. On the other
hand, I have tried to share these files with you here so that you can
access them more easily, but it does not allow me to upload these file
formats. What I am doing is to send you a new email with the files to see
if you get them this time and you can take a look at them.
Sorry for the inconvenience Brian, but EVM is a software that interests me
a lot.
Thank you very much for your attention and help in advance.
Best regards,
Juan Luis
—
Reply to this email directly, view it on GitHub
<#55 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKXYURBGLOPC25FBN4JDVJIJ6XANCNFSM5SY4ROHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
actually did receive them. Came through my personal email
more soon
On Tue, May 10, 2022 at 10:15 AM Brian Haas ***@***.***>
wrote:
… Hi Juan,
I don't think I received anything from you. Please try again: bhaas
at broadinstitute dot org
many thanks,
~b
On Tue, May 10, 2022 at 4:03 AM juanlu16 ***@***.***> wrote:
> Hi Brian
>
> I hope all goes well for you.
>
> I sent you an email with the files I use as input for EVM. However, I
> don't think it would reach you due to the lack of response. On the other
> hand, I have tried to share these files with you here so that you can
> access them more easily, but it does not allow me to upload these file
> formats. What I am doing is to send you a new email with the files to see
> if you get them this time and you can take a look at them.
>
> Sorry for the inconvenience Brian, but EVM is a software that interests
> me a lot.
>
> Thank you very much for your attention and help in advance.
>
> Best regards,
>
> Juan Luis
>
> —
> Reply to this email directly, view it on GitHub
> <#55 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABZRKXYURBGLOPC25FBN4JDVJIJ6XANCNFSM5SY4ROHA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Hi Juan,
Lots of problems with these inputs:
- the genome has CHR1 as the accession, whereas the annotation files have
chr1. These need to match exactly.
- the weights file evidence source field doesn't match exactly with the 2nd
column of the gff3 input files.
- the Augustus format needs to be converted to a gff3 file that EVM can
work with: EVidenceModeler/EvmUtils/misc/augustus_GTF_to_EVM_GFF3.pl
ab_inition_prediction.gff3 > gene_predictions.gff3 seemed to do the
conversion.
- but even after I partitioned the data into 1M chunks with 100k overlap,
the gene predictions for one of the partitions I examined didn't have
splice sites according to the expected positions in the genome - so not
sure if there's an overall problem with the genome file you sent me not
matching up with the Augustus predictions (given the CHR1 vs. chr1 issues,
I suspect this might be the case).
Then, when it comes to the inputs, there's only the one ab initio type
(Augustus), and EVM won't do much other than rearrange some intron/exon
combinations and maybe merge some predictions based on the transcript
data. EVM is really targeted towards combining multiple ab initio types
along with the alignments.
I'm going to continue to encourage you to use PASA for integrating the
transcript data with the Augustus predictions, as it would be your best bet
towards achieving your goal here given your inputs.
Hope this helps,
~b
On Tue, May 10, 2022 at 10:27 AM Brian Haas ***@***.***>
wrote:
… actually did receive them. Came through my personal email
more soon
On Tue, May 10, 2022 at 10:15 AM Brian Haas ***@***.***>
wrote:
> Hi Juan,
>
> I don't think I received anything from you. Please try again:
> bhaas at broadinstitute dot org
>
> many thanks,
>
> ~b
>
> On Tue, May 10, 2022 at 4:03 AM juanlu16 ***@***.***>
> wrote:
>
>> Hi Brian
>>
>> I hope all goes well for you.
>>
>> I sent you an email with the files I use as input for EVM. However, I
>> don't think it would reach you due to the lack of response. On the other
>> hand, I have tried to share these files with you here so that you can
>> access them more easily, but it does not allow me to upload these file
>> formats. What I am doing is to send you a new email with the files to see
>> if you get them this time and you can take a look at them.
>>
>> Sorry for the inconvenience Brian, but EVM is a software that interests
>> me a lot.
>>
>> Thank you very much for your attention and help in advance.
>>
>> Best regards,
>>
>> Juan Luis
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#55 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/ABZRKXYURBGLOPC25FBN4JDVJIJ6XANCNFSM5SY4ROHA>
>> .
>> You are receiving this because you were mentioned.Message ID:
>> ***@***.***>
>>
>
>
> --
> --
> Brian J. Haas
> The Broad Institute
> http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
>
>
>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Thanks, Juan!
From looking at the files, I expect:
EVidenceModeler/EvmUtils/misc/augustus_GTF_to_EVM_GFF3.pl
should convert your Augustus ab initio predictions to a compatible gff3
format.
Note, the example fasta file you sent has 'CHR1' as the accession instead
of 'chr1', which is a problem. You'll need to have the chromosome or contig
identifiers match up exactly. Also, when I tried to take the resulting
gff3 file and match it up to this genome sequence, the coding regions
didn't translate as I expected them too, so maybe there's a disconnect
there.
hope this helps.
best,
~b
On Thu, Apr 21, 2022 at 3:35 AM juanlu16 ***@***.***> wrote:
Hi Brian
I have sent you the files to your email.
Best,
Juan Luis
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID:
***@***.***>
…--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas
|
Hi @brianjohnhaas
I have some doubts about EVM and also some execution problems. I was wondering if you could help me. I would be extremely grateful to you if you could help me solve these doubts and problems.
The doubts is the next: I have several prediction files with transcriptomes from PASA. I would just like to merge the predictions or alignments from these files, I prefer not to enter an Ab initio gene prediction file. However, the gene prediction file is mandatory to run EVM. What could I do? I had thought about using as prediction file ( --gene_predictions) some file coming from PASA results, but I don't know if this is the right thing to do.
On the other hand, i have a trouble too. To test that evidence modeler works correctly in my computer, I have launched a test with several PASA transcript files, but before finishing, the execution stops and I get the message "Out of memory!" What can this message be due to?
Thanks so much in advantage for your helps. I'm so gratefull with you.
Best regards,
Juan Luis
The text was updated successfully, but these errors were encountered: