Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subregion_fraction approach infinity due to divison by zero #4

Closed
skchronicles opened this issue Jul 14, 2023 · 4 comments
Closed

subregion_fraction approach infinity due to divison by zero #4

skchronicles opened this issue Jul 14, 2023 · 4 comments

Comments

@skchronicles
Copy link

skchronicles commented Jul 14, 2023

Hello there,

I hope you are having a great day! Thank you again for creating and maintaining this awesome pipeline. It has been extremely useful for a project I have involving A-I editing.

I just wanted to point out an issue I noticed while running FLARE. I will submit a PR shortly, but here is a brief description of the problem below.

I first noticed the issue while viewing some of the output files of SALIOR/FLARE within IGV. In the image(s) below, I have zoomed into a random region. In this figure, the top/bottom coverage tracks are colored by allele frequency.

For this specific project, we were interested in A-I editing sites, so we are looking for A>G editing on the forward strand. These sites are represented as green and yellow coverage bars at any given location.

image

As you can see here, SAILOR is doing an awesome job calling the A>G sites, and FLARE is also doing an amazing job calling A-I editing regions for the WT samples. But for my KO samples, FLARE did not call this region. After taking a closer look, I noticed something strange in my KO samples.

I tried to grep for this gene within the KO samples' FLARE results, and there were no hits (as expected by the IGV screenshot), so I started looking back further into FLARE's previous output files, and I noticed this had gene inf background_rates.

# KO_S29.bam_all_regions_with_fdr.tsv
=========================================================================
1 chrom                                                             chr12
2 start                                                          98514788
3 end                                                            98514818
4 edit_fraction                                      0.003861003861003861
5 strand                                                                -
6 target_bases                                                        259
7 edited_bases                                                          1
8 num_edited_reads                                                      1
9 total_reads_in_region                                                83
10 fraction_reads_edited                             0.012048192771084338
11 mean_depth                                          59.193548387096776
12 num_substrate_bases                                                  8
13 subregion_coverage                                               36035
14 subregion_conversions                                                3
15 region_coverage                                                    0.0
16 region_conversions                                                 0.0
17 gene_coverage                                                      0.0
18 gene_conversions                                                   0.0
19 subregion_fraction                               5.590339892665474e-05
20 background_rate                                                    inf
21 poisson_p                                                          1.0
22 p_adj                                                              1.0
23 region_id                            TMPO-AS1:exon_0:98514788-98514818
24 overall_region_id                                      TMPO-AS1:exon_0

So, then I looked into the step that created this output file, and I found where the error was occurring.

In this script on line 45, there is an edge-case where division by 0 can occur:
image

For my dataset, it happened right here:

# KO_S29.bam_all_regions_with_fdr.tsv
======================================================================
1 chrom                                                           chrX
2 start                                                       85003907
3 end                                                         85003927
4 edit_fraction                                   0.045454545454545456
5 strand                                                             +
6 target_bases                                                      22
7 edited_bases                                                       1
8 num_edited_reads                                                   1
9 total_reads_in_region                                            103
10 fraction_reads_edited                          0.009708737864077669
11 mean_depth                                        93.85714285714286
12 num_substrate_bases                                               5
13 subregion_coverage                                               22
14 subregion_conversions                                             2
15 region_coverage                                                 0.0
16 region_conversions                                              0.0
17 gene_coverage                                                   0.0
18 gene_conversions                                                0.0
19 subregion_fraction                                              inf *
20 background_rate                                                 inf
21 poisson_p                                                       1.0
22 p_adj                                                           1.0
23 region_id                            APOOL:exon_0:85003907-85003927
24 overall_region_id                                      APOOL:exon_0

For this particular region, the math works out to: (2-1)/(22-22). This causes more issues later on in the script here when finding the mean (resulting in inf values) and then the background_rate calculation (resulting in inf values again).

I will submit a PR to fix the issue now. Please feel free to review it whenever you have some free time. Thank you again for your time, and thank you again for creating this awesome pipeline! Please let me know if you have any questions, and have a wonderful weekend!

Best Regards,
@skchronicles

@skchronicles
Copy link
Author

I just submitted a PR to fix the issue here. Thank you again for your time. I appreciate it!

@skchronicles
Copy link
Author

Here is a new screenshot of the KO FLARE A>G results at the TMPO locus:
image

It is calling enriched A>G regions correctly now (addressing the inf problem fixed everything, see PR for fix).

@acurry-hyde
Copy link

Thank you so much for posting this fix!

@ekofman
Copy link
Collaborator

ekofman commented Mar 8, 2024

Yes thanks for taking the time to look into this and making the PR @skchronicles , looks good and I will merge it in.

@ekofman ekofman closed this as completed Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants