About probability_modified threshold #114
-
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
hi @Yang990-sys , the mod ratio will definitely be correlated with probability modified. The training data of m6Anet comprises modification labels at the site levels and so we need some way to pool the probability output of m6anet for each read. Probability modified essentially serves this purpose - we reason that the m6ACE labels in our training data will pick up sites with at least one modified read, and the higher the number of modified reads, the higher the probability that this site will be picked up by m6ACE. So high probability modified indicates that this site is likely to be picked up as modified if we are to run m6ACE on the dataset. The mod_ratio, on the other hand, is not trained directly on m6ACE labels. The probability modified is by no means quantitative, but since we output modification probability for each read, we can then set a threshold that will allow us to say whether a read is modified or not. Using this threshold, we can calculate the percentage of modified reads per site in the form of mod_ratio which is also more interpretable than probability_modified. High mod_ratio number suggests that a lot of reads are modified in a sense that a lot of them have read level modification probability above a certain threshold - which will translate to high probability_modified by design |
Beta Was this translation helpful? Give feedback.
hi @Yang990-sys , the mod ratio will definitely be correlated with probability modified. The training data of m6Anet comprises modification labels at the site levels and so we need some way to pool the probability output of m6anet for each read.
Probability modified essentially serves this purpose - we reason that the m6ACE labels in our training data will pick up sites with at least one modified read, and the higher the number of modified reads, the higher the probability that this site will be picked up by m6ACE. So high probability modified indicates that this site is likely to be picked up as modified if we are to run m6ACE on the dataset.
The mod_ratio, on the other hand, is not trai…