New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question about prediction on sequences score #85
Comments
Hi Michelle, Yes, you can consider the scores to be 'probabilities'; however, this is a rather loose definition because these values are really just the outputs from the Sigmoid layer (which constrain the values to between 0 and 1 and allow us to determine whether a particular chromatin factor is likely to bind at a region). If the threshold is 0.5, the TF binding region needs to cover at least 100bp of the center bin to be classified as a binding region. You can adjust this threshold or the center bin size based on the size of the peaks in your track files. Thanks! |
thanks so much! ...so if my TF binding regions vary in size, between 90-300 base pairs, what would you suggest as the the threshold or center_bin_to_predict to be set to? |
I would think |
Agree with @evancofer that it could be worth tuning on validation data. Otherwise you could make the decision by figuring out the distribution of the TF binding region sizes, how they are distributed in the genome, & if you have multiple TFs in your dataset, are there ones at 90bp that you'd be excluding entirely by keeping the threshold this large... etc |
@ymkng @kathyxchen Can this be closed? |
thanks! |
Hi,
I have been reading the documentation and I'm still not sure what the output scores for getting predictions from a trained model means. I noticed that the scores are all from 0-1, is it the probability that a TF will bind to an input region and what is this probability based on?
Another question I have is that if I set my "center_bin_to_predict" to be 200 when training the model, and my "feature_thresholds" is 0.5, do my input TF binding regions have to be at least 200bp long for Selene to classify it as a "binding region"
thanks!
Michelle
The text was updated successfully, but these errors were encountered: