New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove chunk collection from reservoir sampler #10038
Merged
Mytherin
merged 14 commits into
duckdb:main
from
Tmonster:remove_chunk_collection_from_reservoir_sampler
Jan 8, 2024
Merged
Remove chunk collection from reservoir sampler #10038
Mytherin
merged 14 commits into
duckdb:main
from
Tmonster:remove_chunk_collection_from_reservoir_sampler
Jan 8, 2024
Commits on Dec 18, 2023
-
Squashed commit of the following:
commit 13fb9e2 Author: Tmonster <tom@ebergen.com> Date: Mon Dec 18 11:37:06 2023 -0800 PR cleanup #2 commit 066f3cc Author: Tmonster <tom@ebergen.com> Date: Mon Dec 18 11:21:07 2023 -0800 fix dereference nullptr commit 094db53 Author: Tmonster <tom@ebergen.com> Date: Mon Dec 18 10:43:15 2023 -0800 PR cleanup commit c9a1ecd Merge: 2893c0c 6258996 Author: Tmonster <tom@ebergen.com> Date: Mon Dec 18 10:22:20 2023 -0800 Merge remote-tracking branch 'upstream/main' into reservoir_sampler_Vectors commit 2893c0c Author: Tmonster <tom@ebergen.com> Date: Thu Dec 14 13:10:25 2023 +0100 make format fix. Get compiler ready commit 80b5f13 Merge: e30b726 c29eb0c Author: Tmonster <tom@ebergen.com> Date: Thu Dec 14 12:34:18 2023 +0100 Merge branch 'main' into reservoir_sampler_Vectors commit e30b726 Author: Tmonster <tom@ebergen.com> Date: Thu Dec 14 12:33:03 2023 +0100 remove all parallelism. will do it in the next iteration commit e8e088d Author: Tmonster <tom@ebergen.com> Date: Thu Dec 14 11:52:27 2023 +0100 still failing a test. Merging samples collected in parallel is difficult, and probably doesnt provide much benefit. Going to leave it for later commit 96bfa1c Merge: 45fa9a5 3237244 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Dec 13 17:02:31 2023 +0100 Merge remote-tracking branch 'upstream/main' into reservoir_sampler_Vectors commit 45fa9a5 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Dec 13 14:36:50 2023 +0100 make format-fix commit 049327b Author: Tom Ebergen <tom@ebergen.com> Date: Wed Dec 13 14:31:22 2023 +0100 try to fix this parallel issue commit a5b290d Merge: 21d4120 8849f97 Author: Tom Ebergen <tom@ebergen.com> Date: Tue Dec 12 11:18:52 2023 +0100 Merge remote-tracking branch 'upstream/main' into reservoir_sampler_Vectors commit 21d4120 Merge: 795c454 e117c34 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Dec 11 12:43:43 2023 +0100 Merge branch 'main' into reservoir_sampler_Vectors commit c29eb0c Merge: 6bf31e1 25906f3 Author: Tmonster <tom@ebergen.com> Date: Thu Dec 7 15:23:06 2023 +0100 Merge remote-tracking branch 'upstream/main' commit 6bf31e1 Author: Elliana May <me@mause.me> Date: Mon Dec 4 22:21:30 2023 +0800 fix warning commit a521081 Author: Elliana May <me@mause.me> Date: Mon Dec 4 21:58:50 2023 +0800 add test for streaming extracted statements commit 5ee902a Author: Elliana May <me@mause.me> Date: Mon Dec 4 21:15:30 2023 +0800 add some tests of duckdb_execute_prepared_streaming commit 58b6664 Author: Elliana May <me@mause.me> Date: Mon Dec 4 21:02:48 2023 +0800 chore(docs): update docs for duckdb_execute_prepared_streaming commit a8e49b1 Author: Hannes Mühleisen <hannes@duckdblabs.com> Date: Tue Dec 5 11:31:21 2023 +0100 add test case, apparently from snowflake commit a7ee1dd Author: Hannes Mühleisen <hannes@duckdblabs.com> Date: Tue Dec 5 11:25:51 2023 +0100 enable implicit fallthrough warning for /src and fixed a few instances commit c6bf4c6 Author: Hannes Mühleisen <hannes@duckdblabs.com> Date: Tue Dec 5 11:02:54 2023 +0100 supporting more physical types of parquet time columns with time zone info commit baf670f Author: Jacob <535707+jkub@users.noreply.github.com> Date: Mon Dec 4 09:05:56 2023 -0800 make BufferPool members protected commit 878e7d2 Author: Yves <yves@motherduck.com> Date: Mon Dec 4 12:00:49 2023 -0500 Mark BufferPool getters const commit a7ddb87 Author: Gabor Szarnyas <gabor@duckdblabs.com> Date: Mon Dec 4 16:22:44 2023 +0100 Capitalize URL in httpfs extension flags commit 795c454 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Dec 6 13:23:29 2023 +0100 removing reservoir type checks commit 6e0e431 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Dec 6 11:25:50 2023 +0100 make format fix commit 236825b Author: Tom Ebergen <tom@ebergen.com> Date: Wed Dec 6 10:23:57 2023 +0100 remove unused code commit 34902e9 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 5 21:20:09 2023 +0100 should pass make format fix commit 42d3fb8 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 5 18:04:36 2023 +0100 percentage is still global, but rows is local commit d378cc7 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 5 15:37:40 2023 +0100 some debugging statements commit 4ad877c Author: Tmonster <tom@ebergen.com> Date: Tue Dec 5 14:16:25 2023 +0100 some changes. Have a lot of bugs solved. but still not great commit ad79d30 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Dec 4 17:41:37 2023 +0100 have figured out why percentage wasnt working. but it requires a big rework commit 04d4c0d Author: Tom Ebergen <tom@ebergen.com> Date: Mon Dec 4 14:10:26 2023 +0100 reservoir sample works. but for large cardinalities and high percentages no commit ddcea54 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Dec 4 12:33:16 2023 +0100 remove std::couts commit 4e12d15 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Nov 29 17:47:16 2023 +0100 ok, have the proper output for reservoir sampling. need to understand when to add local sample or global sample commit 43e72a4 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Nov 29 15:23:55 2023 +0100 compiles. Now I want to figure out where I left off last time commit 450655c Merge: 3639e4c 3f96a90 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Nov 29 15:00:08 2023 +0100 Merge branch 'main' into reservoir_sampler_Vectors commit 3639e4c Merge: c10b3a4 5bc0773 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Nov 29 14:56:39 2023 +0100 Merge branch 'main' into reservoir_sampler_Vectors commit c10b3a4 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Jan 23 10:04:20 2023 +0100 this should work now for sampling a set amount of rows. Still need to work on percentage sampling commit 7147e2a Author: Tom Ebergen <tom@ebergen.com> Date: Wed Jan 18 16:38:02 2023 +0100 it is starting to work, but need to look into why it is still slow commit 2255424 Author: Tom Ebergen <tom@ebergen.com> Date: Wed Jan 18 11:21:15 2023 +0100 working for normal blocking sample, but not for percentage commit 8a01b32 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Jan 16 17:02:34 2023 +0100 intermediate commit, will fix other spots later commit 3676737 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Jan 16 09:03:52 2023 +0100 intermediate work, will be fixing later commit 904d220 Author: Tom Ebergen <tom@ebergen.com> Date: Fri Jan 13 13:50:43 2023 +0100 collecting samples in parallel now, now I need to figure out how to combine them in a proper uniform and weighted manner commit 01c4b89 Author: Tom Ebergen <tom@ebergen.com> Date: Tue Jan 10 15:22:28 2023 +0100 minor code cleanup commit b5c6d61 Author: Tom Ebergen <tom@ebergen.com> Date: Tue Jan 10 11:51:32 2023 +0100 get rid of 4 spaces commit 1dc807f Merge: 605520f 7e1a307 Author: Tom Ebergen <tom@ebergen.com> Date: Tue Jan 10 11:50:14 2023 +0100 Merge branch 'reservoir_sampler_Vectors' of github.com:Tmonster/duckdb into reservoir_sampler_Vectors commit 7e1a307 Author: Tmonster <tom@ebergen.com> Date: Wed Jan 4 11:39:30 2023 -0800 make format-fix commit 750c1e3 Author: Tmonster <tom@ebergen.com> Date: Wed Jan 4 11:37:27 2023 -0800 small syntax updates commit fa2ac9c Author: Tmonster <tom@ebergen.com> Date: Wed Jan 4 11:36:33 2023 -0800 small syntax updates commit cd232c6 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 27 14:54:08 2022 -0800 Revert "mostly adding debugging statements for help. Still trying to figure out how to know if parallelizing is sequential or not" This reverts commit 0f08574. commit 0f08574 Author: Tmonster <tom@ebergen.com> Date: Wed Dec 21 15:08:52 2022 -0800 mostly adding debugging statements for help. Still trying to figure out how to know if parallelizing is sequential or not commit f4f5834 Author: Tmonster <tom@ebergen.com> Date: Wed Dec 21 13:43:00 2022 -0800 remove iostream commit bee57ae Author: Tmonster <tom@ebergen.com> Date: Wed Dec 21 12:37:02 2022 -0800 make format fix commit ce950a1 Author: Tmonster <tom@ebergen.com> Date: Wed Dec 21 09:10:23 2022 -0800 ok added test over reservoir threshold commit 29fb39a Author: Tmonster <tom@ebergen.com> Date: Wed Dec 21 09:01:21 2022 -0800 ok it's all in a datachunk, now I can try and parallelize it commit 8f05514 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 20 17:03:25 2022 +0100 remove pragma threads commit 98f9897 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 20 17:02:58 2022 +0100 no more memory errors commit da234b7 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 20 12:24:37 2022 +0100 no more errors when running count(*) on samples greater than the basic vector size commit 3fc7214 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 20 10:04:13 2022 +0100 fix error commit e302946 Author: Tmonster <tom@ebergen.com> Date: Tue Dec 20 10:03:30 2022 +0100 still errors commit 7ea6405 Author: Tom Ebergen <tom@ebergen.com> Date: Mon Dec 19 21:15:38 2022 +0100 its getting better but still getting memory errors commit 8f3c597 Author: Tom Ebergen <tom@ebergen.com> Date: Fri Dec 16 16:48:12 2022 +0100 add some functionality, but mostly making reservoir sampler use datachunk chunkcollection commit 605520f Author: Tom Ebergen <tom@ebergen.com> Date: Mon Dec 19 21:15:38 2022 +0100 its getting better but still getting memory errors commit 97a491e Author: Tom Ebergen <tom@ebergen.com> Date: Fri Dec 16 16:48:12 2022 +0100 add some functionality, but mostly making reservoir sampler use datachunk chunkcollection
Configuration menu - View commit details
-
Copy full SHA for 07e0a60 - Browse repository at this point
Copy the full SHA 07e0a60View commit details
Commits on Jan 3, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 31c6fcd - Browse repository at this point
Copy the full SHA 31c6fcdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4f0ee4a - Browse repository at this point
Copy the full SHA 4f0ee4aView commit details -
Configuration menu - View commit details
-
Copy full SHA for f701939 - Browse repository at this point
Copy the full SHA f701939View commit details -
Configuration menu - View commit details
-
Copy full SHA for d1cd788 - Browse repository at this point
Copy the full SHA d1cd788View commit details -
Configuration menu - View commit details
-
Copy full SHA for 34b9da0 - Browse repository at this point
Copy the full SHA 34b9da0View commit details -
Merge remote-tracking branch 'upstream/main' into remove_chunk_collec…
…tion_from_reservoir_sampler
Configuration menu - View commit details
-
Copy full SHA for 24a36d6 - Browse repository at this point
Copy the full SHA 24a36d6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3f381b9 - Browse repository at this point
Copy the full SHA 3f381b9View commit details
Commits on Jan 4, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 8db51d9 - Browse repository at this point
Copy the full SHA 8db51d9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 08bdfbb - Browse repository at this point
Copy the full SHA 08bdfbbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 938e5c1 - Browse repository at this point
Copy the full SHA 938e5c1View commit details
Commits on Jan 5, 2024
-
Merge remote-tracking branch 'upstream/main' into remove_chunk_collec…
…tion_from_reservoir_sampler
Configuration menu - View commit details
-
Copy full SHA for 2ad4e3f - Browse repository at this point
Copy the full SHA 2ad4e3fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5aa8163 - Browse repository at this point
Copy the full SHA 5aa8163View commit details
Commits on Jan 8, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 3699cf3 - Browse repository at this point
Copy the full SHA 3699cf3View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.