Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAMPLE function not sampling randomly #3730

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

SAMPLE function not sampling randomly #3730

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2015-05-28 09:58:52 +0200
From: Anthony Damico <>
To: SQL devs <>
Version: 11.19.11 (Oct2014-SP3)
CC: ajdamico, @hannesmuehleisen, @njnes

Last updated: 2015-08-28 13:41:54 +0200

Comment 20895

Date: 2015-05-28 09:58:52 +0200
From: Anthony Damico <>

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0
Build Identifier:

if you have a column with only two rows, the SAMPLE function does not pick each approximately 50% of the time. something is biasing the SAMPLE draws.

Reproducible: Always

Steps to Reproduce:

here's the sql command.

SELECT * FROM ( SELECT 1 AS col UNION ALL SELECT 2 AS col ) AS temp SAMPLE 1;

but you have to run it a thousand times, however you prefer. here's the R code to do that with MonetDB.R--

here's a reproducible example using R code to repeat the sampling 1000 times. in both SAMPLE examples below, the database pulls the 2 less than 200 times out of 1000. shouldn't it be close to 500 out of 1000? this seems not random (misleading to users?) sorry if i'm misunderstanding something.. thank you!!

start in an empty directory somewhere
setwd( "C:/My Directory/MonetDB" )

      START OF SETUP - no editing required

library(MonetDB.R)

batfile <-
monetdb.server.setup(
database.directory = paste0( getwd() , "/MonetDB" ) ,
monetdb.program.path =
ifelse(
.Platform$OS.type == "windows" ,
"C:/Program Files/MonetDB/MonetDB5" ,
""
) ,
dbname = "test" ,
dbport = 50000
)

pid <- monetdb.server.start( batfile )

db <- dbConnect( MonetDB.R() , "monetdb://localhost:50000/test" , wait = TRUE )

END OF SETUP

dbGetQuery( db , "SELECT 1 AS col UNION ALL SELECT 2 AS col" )

out <- NULL
for ( i in 1:1000 ){
out <- c( out , dbGetQuery( db , "SELECT * FROM ( SELECT 1 AS col UNION ALL SELECT 2 AS col ) AS temp SAMPLE 0.5" ) )
}

not random
table( unlist( out ) )
1 2
880 120

out <- NULL
for ( i in 1:1000 ){
out <- c( out , dbGetQuery( db , "SELECT * FROM ( SELECT 1 AS col UNION ALL SELECT 2 AS col ) AS temp SAMPLE 1" ) )
}

ALSO not random
table( unlist( out ) )
1 2
856 144

Actual Results:

in both of the above examples of R output, more than 4 out of 5 draws were 1s instead of 2s.

Expected Results:

half 1s half 2s

Comment 20900

Date: 2015-06-02 13:50:32 +0200
From: MonetDB Mercurial Repository <>

Changeset 91c472eabc48 made by Niels Nes niels@cwi.nl in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=91c472eabc48

Changeset description:

fixed bug #3730, ie always pass the schema ('sys') for operators

Comment 20903

Date: 2015-06-03 14:20:44 +0200
From: @hannesmuehleisen

Looking into this, in particular the behaviour of rand() on our beloved Windows.

Comment 20905

Date: 2015-06-03 15:26:08 +0200
From: @hannesmuehleisen

Could not reproduce on Windows 7 Professional (64bits) with MonetDB Oct2014-SP3, both 32 and 64bits. Also could not find any flaw with random number generator on Windows.

Comment 21114

Date: 2015-08-10 20:41:20 +0200
From: Anthony Damico <>

sorry, this has not been fixed on windows using the Jul2015 release

table( unlist( out ) )

1 2
832 168

table( unlist( out ) )

1 2
830 170

MonetDB 5 server v11.21.1 "Jul2015"
Serving database 'test', using 8 threads
Compiled for x86_64-pc-winnt/64bit with 64bit OIDs dynamically linked
Found 15.914 GiB available main-memory.
Copyright (c) 1993-July 2008 CWI.
Copyright (c) August 2008-2015 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Listening for connection requests on mapi:monetdb://127.0.0.1:50000/
Start processing logs sql/sql_logs version 52200
Finished processing logs sql/sql_logs
MonetDB/SQL module loaded

Comment 21116

Date: 2015-08-11 14:56:41 +0200
From: MonetDB Mercurial Repository <>

Changeset 29e2ba371d58 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=29e2ba371d58

Changeset description:

On Windows, rand() is per thread, so call srand on thread start.
This should fix bug #3730.

Comment 21118

Date: 2015-08-11 15:22:27 +0200
From: MonetDB Mercurial Repository <>

Changeset 5c206e0510fe made by Hannes Muehleisen hannes@cwi.nl in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=5c206e0510fe

Changeset description:

Attempt at a test case for Bug #3730

Comment 21162

Date: 2015-08-23 19:45:49 +0200
From: Anthony Damico <>

just tested this- looks like it's been solved on the latest testing version. thanks thanks!!

MonetDB 5 server v11.21.3 "Jul2015"
Serving database 'test', using 8 threads
Compiled for x86_64-pc-winnt/64bit with 64bit OIDs dynamically linked
Found 15.914 GiB available main-memory.
Copyright (c) 1993-July 2008 CWI.
Copyright (c) August 2008-2015 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Listening for connection requests on mapi:monetdb://127.0.0.1:50000/
Start processing logs sql/sql_logs version 52200
Finished processing logs sql/sql_logs
MonetDB/SQL module loaded

Comment 21193

Date: 2015-08-28 13:41:54 +0200
From: @sjoerdmullender

Jul2015 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant