Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting sensitivity and specificity scores at the mRNA level #190

Closed
sanyalab opened this issue Oct 25, 2021 · 18 comments · Fixed by #191
Closed

Getting sensitivity and specificity scores at the mRNA level #190

sanyalab opened this issue Oct 25, 2021 · 18 comments · Fixed by #191

Comments

@sanyalab
Copy link

Hi Jacques,

I would like to get the sensitivity and specificity values using agat_sp_sensitivity_specificity.pl at the mRNA level of the reference prediction. My command is as follows

agat_sp_sensitivity_specificity.pl --gff1 Reference_Prediction.gff --gff2 transcript_alignment.gff -o Output.txt

However, I notice that this script is taking a very long time. I waited for 24 hours and then killed the process. So I was wondering if the tool was at all designed to do this. Or was it designed to compare just the final annotations.

If it was designed to compare two final annotations, I am wondering if it can be repurposed to do a reference to transcript alignment comparison and generate the Sn and Sp values.

Thanks
Abhijit

@Juke34
Copy link
Collaborator

Juke34 commented Oct 25, 2021

This should work smoothly. I guess one of this file is problematic.
Check them idependenlty using the gxf2gxf script to see if they can be parsed properly.

@sanyalab
Copy link
Author

Oh. Both files get parsed properly. No problems found. The comparison is taking a long time. The transcript alignment file is 27MB and the Reference annotation file is 196MB

@Juke34
Copy link
Collaborator

Juke34 commented Oct 25, 2021

Then, there is probably a bug lying around. Would it be possible to send me your files?
Which version of AGAT are you using?
Could you re-run with -v and tell me at what point it stops?

@sanyalab
Copy link
Author

sanyalab commented Oct 26, 2021

Hi Jacques,

I am using AGAT v 0.6.2. Attached is the reference annotation and transcript alignment files.

I'll rerun with the verbose flag and let you know.

Thanks
Abhijit

@sanyalab
Copy link
Author

sanyalab commented Oct 26, 2021

Hi Jacques,

I reran with the verbose flag. The code appears to get stuck at the following location in the reference annotation

Chr18 TESTRUN gene 577934 582612 . - . ID=dpgm18g704740.843;Name=dpgm18g704740.843

That reference gene has three mRNAs each with identical start and end coords for the mRNA but small differences in the exon and CDS coords. It corresponds to the following location in the query transcript annotation

Chr18 TESTRUN gene 578301 582902 . - . ID=XM_015716755.2.path1;Name=XM_015716755.2

The above gene alignment has a single mRNA.

The tail end of the verbose stdout continuously produces the attached text. The file is over 215G in size, so not attaching the same.

@Juke34
Copy link
Collaborator

Juke34 commented Oct 27, 2021

It is definitly a bug, it goes into an infinite loop. We need to fix that.

@sanyalab
Copy link
Author

sanyalab commented Oct 27, 2021

Oh I see. Thank you for looking into it. Will this be corrected in the next AGAT release?

-Abhijit

@Juke34 Juke34 mentioned this issue Nov 2, 2021
@sanyalab
Copy link
Author

sanyalab commented Nov 9, 2021

Hi Jacques,

Can I replace the sensitivity_specificity code in my copy of AGAT from the master branch and use the same?

Thanks
Abhijit

@Juke34
Copy link
Collaborator

Juke34 commented Nov 9, 2021

Yes but you will have to call the script with the path to be sure to use that one:
/path/to/the/script/agat_sp_sensitivity_specificity.pl

Otherwise follow those instruction: https://agat.readthedocs.io/en/latest/troubleshooting.html#how-to-use-a-version-of-agat-from-a-specific-branch

@sanyalab
Copy link
Author

sanyalab commented Nov 9, 2021

Hi Jacques,

I tried the code with the same sample files I sent you. It worked!!! Thanks. I get an output as below.

Results:

----------------------------------------------------------------
| Feature type | Sensitivity | Specificity |
----------------------------------------------------------------
| gene | 0.06 | 0.50 |
| mrna | 0.06 | 0.50 |
| cds | 0.04 | 0.73 |
| exon | 0.04 | 0.69 |
| five_prime_utr | 0.00 | 0.05 |
| three_prime_utr | 0.00 | 0.08 |
----------------------------------------------------------------

I was hoping to get the sensitivity and specificity values on a per mRNA basis. Something like below

mRNA_ID Sensitivity Specificity

Can this be achieved by modifying this code, or piecing together a few other AGAT tools?

Thanks

@Juke34
Copy link
Collaborator

Juke34 commented Nov 9, 2021

Per mRNA basis is something difficult because you need to be sure to compare the same gene in the different annotation, while genes can overlap a lot, and name are rarely synchornised.
A way to go would be to filter in both annotation the mRNA with longest CDS (
agat_sp_keep_longest_isoform.pl ) and then run the Se/Sp.

@sanyalab
Copy link
Author

Ok... Thank you Jacques for the quick turnaround on this. I have another small request. I'll put it up as a new issue.

Best Regards
Abhijit

@sanyalab
Copy link
Author

Hi Jacques,

I got a chance to try the new "agat_sp_sensitivity_specificity.pl" after the update. I am attaching my test files. The Sensitivity for the exon and cds are above one.

usage: agat_sp_sensitivity_specificity.pl --gff1 reference_test.gff3 --gff2 query_test_modi.gff3
Results:

----------------------------------------------------------------
| Feature type | Sensitivity | Specificity |
----------------------------------------------------------------
| gene | 0.05 | 0.50 |
| mrna | 0.05 | 0.50 |
| cds | 1.33 | 0.31 |
| exon | 1.02 | 0.21 |
| five_prime_utr | 0 | 0 |

| three_prime_utr | 0 | 0 |
----------------------------------------------------------------
Bye Bye.
query_test_modi_gff3.txt
reference_test_gff3.txt

@Juke34
Copy link
Collaborator

Juke34 commented Nov 22, 2021

Thank you for pointing it, indeed there was a bug.

@Juke34 Juke34 closed this as completed in cb579e0 Nov 22, 2021
@sanyalab
Copy link
Author

Hi Jacques,

Sorry to bother you again. But the issue seems to be occurring again in a few cases.

Attached are 3 reference and 3 query files and their respective outputs. I cannot pin down the cause since in the ref3.gff3 and query3.gff3 case, both the start and end of the query lie within the bounds of the reference gene coords.

But in the other two examples, just one of the query predictions has its end outside the gene bounds of the reference.

Thanks
TESTFILES.zip

@Juke34
Copy link
Collaborator

Juke34 commented Nov 29, 2021

No worries,
You're right it is still buggy. I will push a fix soon.

@sanyalab
Copy link
Author

sanyalab commented Dec 6, 2021

This is working very well. No issues so far in several thousand runs. Thank you Jacques

Best Wishes
Abhijit

@Juke34
Copy link
Collaborator

Juke34 commented Dec 6, 2021

Good to hear. Thank you for your feedbacks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants