Feature request: Add a Poisson sampling option. In other words, add a weight option to sampling to allow unequal probabilities of each row being sampled. #7506
BorgeJorge
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Poisson sampling is a generalization of Bernoulli sampling in which each row can have a different probability of being sampled. If the sampling procedure could accept a numeric variable as a sampling weight, the probability of any given row being sampled could then be calculated as the sampling weight for that row divided by the total sum of weights (with zero-weighted rows having zero probability of being sampled). This seems like a straightforward extension of Bernoilli sampling, in which the probability of a row being sampled is 1/total N of rows, which is equal to the Poisson sampling formula when all rows have a sampling weight of 1.
This would enable us to pull samples with specific distributions of characteristics, which would greatly benefit some kinds of exploratory analysis.
Beta Was this translation helpful? Give feedback.
All reactions