question about prediction on sequences score #85

ymkng · 2019-06-13T19:13:09Z

Hi,
I have been reading the documentation and I'm still not sure what the output scores for getting predictions from a trained model means. I noticed that the scores are all from 0-1, is it the probability that a TF will bind to an input region and what is this probability based on?

Another question I have is that if I set my "center_bin_to_predict" to be 200 when training the model, and my "feature_thresholds" is 0.5, do my input TF binding regions have to be at least 200bp long for Selene to classify it as a "binding region"

thanks!

Michelle

kathyxchen · 2019-06-13T21:50:26Z

Hi Michelle,

Yes, you can consider the scores to be 'probabilities'; however, this is a rather loose definition because these values are really just the outputs from the Sigmoid layer (which constrain the values to between 0 and 1 and allow us to determine whether a particular chromatin factor is likely to bind at a region).

If the threshold is 0.5, the TF binding region needs to cover at least 100bp of the center bin to be classified as a binding region. You can adjust this threshold or the center bin size based on the size of the peaks in your track files.

Thanks!
Kathy

ymkng · 2019-06-14T15:54:50Z

thanks so much! ...so if my TF binding regions vary in size, between 90-300 base pairs, what would you suggest as the the threshold or center_bin_to_predict to be set to?

evancofer · 2019-06-14T17:22:40Z

I would think 0.45, or somewhere between 0.40 and 0.50. I don't have a good sense of how it will influence performance on your specific data if you set it too small/large, so it might be worth tuning it on some validation data. What do you think @kathyxchen ?

kathyxchen · 2019-06-14T19:08:21Z

Agree with @evancofer that it could be worth tuning on validation data. Otherwise you could make the decision by figuring out the distribution of the TF binding region sizes, how they are distributed in the genome, & if you have multiple TFs in your dataset, are there ones at 90bp that you'd be excluding entirely by keeping the threshold this large... etc

evancofer · 2019-06-17T23:28:55Z

@ymkng @kathyxchen Can this be closed?

ymkng · 2019-06-18T16:31:57Z

thanks!

ymkng closed this as completed Jun 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about prediction on sequences score #85

question about prediction on sequences score #85

ymkng commented Jun 13, 2019

kathyxchen commented Jun 13, 2019 •

edited

ymkng commented Jun 14, 2019

evancofer commented Jun 14, 2019

kathyxchen commented Jun 14, 2019 •

edited

evancofer commented Jun 17, 2019

ymkng commented Jun 18, 2019

question about prediction on sequences score #85

question about prediction on sequences score #85

Comments

ymkng commented Jun 13, 2019

kathyxchen commented Jun 13, 2019 • edited

ymkng commented Jun 14, 2019

evancofer commented Jun 14, 2019

kathyxchen commented Jun 14, 2019 • edited

evancofer commented Jun 17, 2019

ymkng commented Jun 18, 2019

kathyxchen commented Jun 13, 2019 •

edited

kathyxchen commented Jun 14, 2019 •

edited