Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The EVM results were not consistent with PB data #46

Open
Huangyizhong opened this issue Sep 11, 2021 · 34 comments
Open

The EVM results were not consistent with PB data #46

Huangyizhong opened this issue Sep 11, 2021 · 34 comments

Comments

@Huangyizhong
Copy link

Huangyizhong commented Sep 11, 2021

Hi, there.
I have used the EVM model to combine lots of data to the annotation of a genome. I checked some results from the EVM results with the IGV. As shown in the picture, I confused about the results.
Why the EVM results were not consistent with the PB data or the transdecoder result. Note: I put the transdecoder results into the EVM model.
image
Thanks so much !
Yizhong Huang

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 11, 2021 via email

@Huangyizhong
Copy link
Author

hi, If you send me the input data for this example and your command, I'll be able to give some insights. best, @.*** ~b

On Sat, Sep 11, 2021 at 6:06 AM Yizhong Huang @.***> wrote: Hi, there. I have used the EVM model to combine lots of data to the annotation of a genome. I checked some results from the EVM results with the IGV. As shown in the picture, I confused about the results. Why the EVM results were not consistent with the PB data or the transdecoder result. Note: I put the transdecoder results into the EVM model. [image: image] https://user-images.githubusercontent.com/31943359/132944144-878260c9-8dbe-498a-b01c-b34800a89acd.png — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#46>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX47KDGOLW2GKBZN3S3UBMSZDANCNFSM5D2ZVEIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

OK, thanks so much! I will prepare the data as soon as possible!
best
Yizhong Huang

@Huangyizhong
Copy link
Author

Hi, there
Thanks for your kind help ! I have attached the corresponding file for the region that I have posted before.
The commands are listed as follows:
${EVM}/EvmUtils/partition_EVM_inputs.pl
--genome ${genome}
--gene_predictions gene_predictions.gff3
--transcript_alignments transcript_alignments.gff3
--protein_alignments protein_alignments.gff3
--segmentSize 500000 --overlapSize 10000
--partition_listing partitions_list.out
${EVM}/EvmUtils/write_EVM_commands.pl --genome ${genome} --weights /home/goldenpigs/1.Huangyizhong/12.BMX/14.Hifi-data/7.merge/weights.txt
--gene_predictions gene_predictions.gff3
--transcript_alignments transcript_alignments.gff3
--protein_alignments protein_alignments.gff3
--output_file_name evm.out --partitions partitions_list.out > commands.list

${EVM}/EvmUtils/execute_EVM_commands.pl commands.list

#参数

--weights | -w weights for evidence types file

${EVM}/EvmUtils/recombine_EVM_partial_outputs.pl --partitions partitions_list.out --output_file_name evm.out

${EVM}/EvmUtils/convert_EVM_outputs_to_GFF3.pl --partitions partitions_list.out --output evm.out --genome ${genome}
find . -regex ".*evm.out.gff3" -exec cat {} ; > EVM.all.gff3
Thanks again for your help!
Sincerely
YIzhong Huang

test.zip

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 12, 2021 via email

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 12, 2021 via email

@Huangyizhong
Copy link
Author

Huangyizhong commented Sep 12, 2021 via email

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 12, 2021 via email

@Huangyizhong
Copy link
Author

Huangyizhong commented Sep 12, 2021 via email

@Huangyizhong
Copy link
Author

The partitioned list out for this for segment is :
chr1_MotherHap /work/6.EVM/chr1_MotherHap Y /work/6.EVM/chr1_MotherHap/chr1_MotherHap_92000001-93000000

The comamand is :
/home/EVidenceModeler/EvmUtils/.././evidence_modeler.pl -G genome_softmasked.fa -g gene_predictions.gff3 -w /work/6.EVM/weights.txt -e transcript_alignments.gff3 -p protein_alignments.gff3 --exec_dir /work/6.EVM/chr1_MotherHap/chr1_MotherHap_92000001-93000000 > /work/6.EVM/chr1_MotherHap/chr1_MotherHap_92000001-93000000/evm.out 2> /work/6.EVM/chr1_MotherHap/chr1_MotherHap_92000001-93000000/evm.out.log
all this files are right?

Thanks
Best
Yizhong. Huang

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 12, 2021 via email

@Huangyizhong
Copy link
Author

do you find all the inputs at: /work/6.EVM/chr1_MotherHap/chr1_MotherHap_92000001-93000000 ?

On Sun, Sep 12, 2021 at 10:42 AM Yizhong Huang @.***> wrote: The partitioned list out for this for segment is : chr1_MotherHap /work/6.EVM/chr1_MotherHap Y /work/6.EVM/chr1_MotherHap/chr1_MotherHap_92000001-93000000 The comamand is : /home/EVidenceModeler/EvmUtils/.././evidence_modeler.pl -G genome_softmasked.fa -g gene_predictions.gff3 -w /work/6.EVM/weights.txt -e transcript_alignments.gff3 -p protein_alignments.gff3 --exec_dir /work/6.EVM/chr1_MotherHap/chr1_MotherHap_92000001-93000000 > /work/6.EVM/chr1_MotherHap/chr1_MotherHap_92000001-93000000/evm.out 2> /work/6.EVM/chr1_MotherHap/chr1_MotherHap_92000001-93000000/evm.out.log all this files are right? Thanks Best Yizhong. Huang — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKXZX33YGO3GQ4ZL4IO3UBS34BANCNFSM5D2ZVEIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

Yes, all thell the inputs at: /work/6.EVM/chr1_MotherHap/chr1_MotherHap_92000001-93000000 as attached as files below
region_92000001-93000000.zip

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 12, 2021 via email

@Huangyizhong
Copy link
Author

Thanks so much for your kind advice! I have change the EVM process as follows:
First, add the transdecoder file into the gene_predictions.gff3
Second, add the pacbio. data using the taco_gtf_to_alignment_gff3.pl into the. transcript_alignments.gff3
But as for the protein results from the GeMoMa, the file that I used was the same as the file in the example_data_files, then I used the GeMoMa_gff_to_gff3.pl to convert it into the protein_alignments.gff3, the final file is the same format as the gene_prediction. I also runned the GeMoMa.example.gff by using the GeMoMa script, no changes in the final results. How to deal with it? The new EVM process is running.

Thanks again!
yizhong Huang

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 13, 2021 via email

@Huangyizhong
Copy link
Author

I see. In this case, you might add the GeMoMa predictions into the gene predictions input as well, and assign it as 'OTHER_EVIDENCE' type in the weights file (like transdecoder).

On Sun, Sep 12, 2021 at 9:46 PM Yizhong Huang @.***> wrote: Thanks so much for your kind advice! I have change the EVM process as follows: First, add the transdecoder file into the gene_predictions.gff3 Second, add the pacbio. data using the taco_gtf_to_alignment_gff3.pl into the. transcript_alignments.gff3 But as for the protein results from the GeMoMa, the file that I used was the same as the file in the example_data_files, then I used the GeMoMa_gff_to_gff3.pl to convert it into the protein_alignments.gff3, the final file is the same format as the gene_prediction. I also runned the GeMoMa.example.gff by using the GeMoMa script, no changes in the final results. How to deal with it? The new EVM process is running. Thanks again! yizhong Huang — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX22HXLIIR3Y73B3YADUBVJW7ANCNFSM5D2ZVEIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

Just discard the protein_alignments.gff3 if I have no other proteins evidence and then add the GeMoMa predictions into the gene predictions input files? Right ?

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 13, 2021 via email

@Huangyizhong
Copy link
Author

that's right On Mon, Sep 13, 2021 at 10:48 AM Yizhong Huang @.> wrote:

I see. In this case, you might add the GeMoMa predictions into the gene predictions input as well, and assign it as 'OTHER_EVIDENCE' type in the weights file (like transdecoder). … <#m_-4895672658369520534_> On Sun, Sep 12, 2021 at 9:46 PM Yizhong Huang @.
> wrote: Thanks so much for your kind advice! I have change the EVM process as follows: First, add the transdecoder file into the gene_predictions.gff3 Second, add the pacbio. data using the taco_gtf_to_alignment_gff3.pl into the. transcript_alignments.gff3 But as for the protein results from the GeMoMa, the file that I used was the same as the file in the example_data_files, then I used the GeMoMa_gff_to_gff3.pl to convert it into the protein_alignments.gff3, the final file is the same format as the gene_prediction. I also runned the GeMoMa.example.gff by using the GeMoMa script, no changes in the final results. How to deal with it? The new EVM process is running. Thanks again! yizhong Huang — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#46 (comment) <#46 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX22HXLIIR3Y73B3YADUBVJW7ANCNFSM5D2ZVEIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . -- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas Just discard the protein_alignments.gff3 if I have no other proteins evidence and then add the GeMoMa predictions into the gene predictions input files? Right ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX26QRBLFZ2YWIEBHJLUBYFKRANCNFSM5D2ZVEIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

Thanks so much! Hope it can solve my problems! Thanks again for your kind help !
Sincerely
Yizhong.Huang

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 13, 2021 via email

@Huangyizhong
Copy link
Author

Hi, there!
With your help, the EVM results have become more accurate than before! Thanks so much ! When I check the EVM final results with the igv, there is still some mistakes, as showing in the follow pictures. The final results were still lost some exons as comparing with the pb isoforms and the transdecoder results. Need your help and thanks again! In this EVM process, I just add the transcript_alignments.gff3 and gene_predictions.gff3 , and all the transdecoder and GeMoMa results were combined into the gene_predictions.gff3 file. The weights file were:
TRANSCRIPT pacbio 10
OTHER_PREDICTION transdecoder 8
OTHER_PREDICTION GeMoMa 5
image
chr15_MotherHap_12000001-13000000.zip

best~
Yizhong Huang

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 14, 2021 via email

@Huangyizhong
Copy link
Author

One issue is that EVM doesn't model UTRs and will only model coding exons, but you'll have UTRs in some of your inputs. If you want to add UTRs, you could run PASA afterwards using the EVM data as input annotations and the pacbio and other transcripts as inputs / sources for the UTRs. Another issue is that EVM requires complete ORFs unless the genes fall at the ends of the contigs, in which case they can be 5' or 3' partials. hope this helps, b

On Tue, Sep 14, 2021 at 5:34 AM Yizhong Huang @.***> wrote: Hi, there! With your help, the EVM results have become more accurate than before! Thanks so much ! When I check the EVM final results with the igv, there is still some mistakes, as showing in the follow pictures. The final results were still lost some exons as comparing with the pb isoforms and the transdecoder results. Need your help and thanks again! In this EVM process, I just add the transcript_alignments.gff3 and gene_predictions.gff3 , and all the transdecoder and GeMoMa results were combined into the gene_predictions.gff3 file. The weights file were: TRANSCRIPT pacbio 10 OTHER_PREDICTION transdecoder 8 OTHER_PREDICTION GeMoMa 5 [image: image] https://user-images.githubusercontent.com/31943359/133223891-6c7000a0-77a2-4732-960a-fe8e044f89f3.png chr15_MotherHap_12000001-13000000.zip https://github.com/EVidenceModeler/EVidenceModeler/files/7160680/chr15_MotherHap_12000001-13000000.zip best
Yizhong Huang — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX4JHJ4YR3TK36RYF3LUB4JLZANCNFSM5D2ZVEIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

Yes,I know that the EVM only model the coding exons. As shown in the transdecoder file, the genes that marked with the red arrow is a complete ORFs and the left terminal is a exon ,may not a UTR. So, I confused that why the EVM results did not have the left exon?
Thanks so much!
yizhong Huang

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 14, 2021 via email

@Huangyizhong
Copy link
Author

you can play around with the weights file and see how it impacts things. If there's something serious, I can dig into it, but I've got a lot of other work on my plate right now. On Tue, Sep 14, 2021 at 8:27 AM Yizhong Huang @.> wrote:

One issue is that EVM doesn't model UTRs and will only model coding exons, but you'll have UTRs in some of your inputs. If you want to add UTRs, you could run PASA afterwards using the EVM data as input annotations and the pacbio and other transcripts as inputs / sources for the UTRs. Another issue is that EVM requires complete ORFs unless the genes fall at the ends of the contigs, in which case they can be 5' or 3' partials. hope this helps, b … <#m_4129522838419141696_> On Tue, Sep 14, 2021 at 5:34 AM Yizhong Huang @.
> wrote: Hi, there! With your help, the EVM results have become more accurate than before! Thanks so much ! When I check the EVM final results with the igv, there is still some mistakes, as showing in the follow pictures. The final results were still lost some exons as comparing with the pb isoforms and the transdecoder results. Need your help and thanks again! In this EVM process, I just add the transcript_alignments.gff3 and gene_predictions.gff3 , and all the transdecoder and GeMoMa results were combined into the gene_predictions.gff3 file. The weights file were: TRANSCRIPT pacbio 10 OTHER_PREDICTION transdecoder 8 OTHER_PREDICTION GeMoMa 5 [image: image] https://user-images.githubusercontent.com/31943359/133223891-6c7000a0-77a2-4732-960a-fe8e044f89f3.png chr15_MotherHap_12000001-13000000.zip https://github.com/EVidenceModeler/EVidenceModeler/files/7160680/chr15_MotherHap_12000001-13000000.zip best Yizhong Huang — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#46 (comment) <#46 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX4JHJ4YR3TK36RYF3LUB4JLZANCNFSM5D2ZVEIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . -- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas Yes,I know that the EVM only model the coding exons. As shown in the transdecoder file, the genes that marked with the red arrow is a complete ORFs and the left terminal is a exon ,may not a UTR. So, I confused that why the EVM results did not have the left exon? Thanks so much! yizhong Huang — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX7ZNOXRAVY3PNLMK43UB45RZANCNFSM5D2ZVEIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

Ok, thanks and sorry to disturb!
Best
yizhong Huang

@Huangyizhong
Copy link
Author

Hi, there
Sorry to disturb you again. Today, I modify the weight files and the results have been improved much. I also check the file in the IGV with the EVm final files, pb data, transdecoder data and the protein data. As shown in the pictures, all data support that this region has only one gene, while the EVM model hass two genes. How to explain and solve it? The weights are:
TRANSCRIPT pacbio 8
OTHER_PREDICTION transdecoder 10
OTHER_PREDICTION GeMoMa 5
image

Thanks
Yizhong Huang

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 15, 2021 via email

@Huangyizhong
Copy link
Author

yes, I have added the genoma data into the igv and the results also support this region has only one gene. Thanks advance for your kind help! The partitioned data for this one is attahced!
chr11_MotherHap_72000001-73000000.zip
Best
Yizhong Huang

@brianjohnhaas
Copy link
Contributor

When I load up your data here, I'm not seeing the same view. Here's what I'm seeing:

Screen Shot 2021-09-15 at 12 08 48 PM

[bed_format.zip](https://github.com/EVidenceModeler/EVidenceModeler/files/7171695/bed_format.zip)

@Huangyizhong
Copy link
Author

Sorry, it is my fault. The region that I posted is so big which can not been seen clearly. I have checked the input file, the final region is on the left of the photo that you posted. Just like show as follws
image
Thanks
Yizhong Huang

@Huangyizhong
Copy link
Author

Hi, there
Another problem: how to deal with the EVM results which was the same as the protein results but not with the pb and transdecoder result? As shown in the following picture.
Thanks
image

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 16, 2021 via email

@Huangyizhong
Copy link
Author

Thanks, for the split gene with the splice site that EVM doesn't work with those introns, only I can do is manual curation?
As for the second question, we can see clearly that the EVM results is the same as the GeMoMa results, not the pb or transdecoder results. How to deal with it ?

thanks so much!
yIzhong Huang

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 16, 2021 via email

@Huangyizhong
Copy link
Author

Got it, when I do the EVM process, I have set the weight file for PB at the first, and then the transdecoder, the gemoma is the third. In my opinion, when the exons structures are the same in the three data, the final EVM results may the PB structures, so I confused about the results as shown in the final picture. Maby I am wrong!
I agree with you that the denovo prediction results should be added into the EVM model. I have tested the results between with-BRAKER and no-BRAKER (the BRAKER including the augustus result), I find that the EVM results are more credible without the BRAKER result. Maybe I should add the denovo results to do it again!
As for the PASA , do you mean I should firstly run the alignment by the PASA process with the transcripts data and then do the update process ?
Thanks so much!
Yizhong Huang

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Sep 18, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants