Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sample with frac argument on postrgres backend. #1452

Merged
merged 12 commits into from
Mar 19, 2016

Conversation

kwmsmith
Copy link
Member

Subsumes #1425 and #1424.

Closes #1423.

@kwmsmith
Copy link
Member Author

ping @sandhujasmine

@sandhujasmine
Copy link

Looks good - it contains tests for various sample() configurations.

And sample(frac= is working correctly; however, saw small inconsistencies between using sample() for a pandas backend vs DB backend.

  1. When using sample(n=176) if the result set has 175 rows, for the DB backend we get all rows but pandas backend raises an error. The pandas sample works a little bit differently.
  2. Also, in an example, found that using frac=0.3 for DB returns 53 rows while pandas (read from CSV file) returns 52 rows; basically one is rounding down and the other is rounding (ceil) up.

Would be good to have consistent behavior if looking at same data either from files or loaded up in DB. This is more when testing the same symbolic (unbound) expression against different resources.

Maybe we should work towards merging this as is, then add failing tests to reproduce above cases, then work on solutions?

kwmsmith added a commit that referenced this pull request Mar 19, 2016
Fix sample with `frac` argument on postrgres backend.
@kwmsmith kwmsmith merged commit 3c63208 into blaze:master Mar 19, 2016
@kwmsmith kwmsmith deleted the fix-sample-frac-postgres branch March 19, 2016 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants