Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove chunk collection from reservoir sampler #10038

Merged

Commits on Dec 18, 2023

  1. Squashed commit of the following:

    commit 13fb9e2
    Author: Tmonster <tom@ebergen.com>
    Date:   Mon Dec 18 11:37:06 2023 -0800
    
        PR cleanup #2
    
    commit 066f3cc
    Author: Tmonster <tom@ebergen.com>
    Date:   Mon Dec 18 11:21:07 2023 -0800
    
        fix dereference nullptr
    
    commit 094db53
    Author: Tmonster <tom@ebergen.com>
    Date:   Mon Dec 18 10:43:15 2023 -0800
    
        PR cleanup
    
    commit c9a1ecd
    Merge: 2893c0c 6258996
    Author: Tmonster <tom@ebergen.com>
    Date:   Mon Dec 18 10:22:20 2023 -0800
    
        Merge remote-tracking branch 'upstream/main' into reservoir_sampler_Vectors
    
    commit 2893c0c
    Author: Tmonster <tom@ebergen.com>
    Date:   Thu Dec 14 13:10:25 2023 +0100
    
        make format fix. Get compiler ready
    
    commit 80b5f13
    Merge: e30b726 c29eb0c
    Author: Tmonster <tom@ebergen.com>
    Date:   Thu Dec 14 12:34:18 2023 +0100
    
        Merge branch 'main' into reservoir_sampler_Vectors
    
    commit e30b726
    Author: Tmonster <tom@ebergen.com>
    Date:   Thu Dec 14 12:33:03 2023 +0100
    
        remove all parallelism. will do it in the next iteration
    
    commit e8e088d
    Author: Tmonster <tom@ebergen.com>
    Date:   Thu Dec 14 11:52:27 2023 +0100
    
        still failing a test. Merging samples collected in parallel is difficult, and probably doesnt provide much benefit. Going to leave it for later
    
    commit 96bfa1c
    Merge: 45fa9a5 3237244
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Wed Dec 13 17:02:31 2023 +0100
    
        Merge remote-tracking branch 'upstream/main' into reservoir_sampler_Vectors
    
    commit 45fa9a5
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Wed Dec 13 14:36:50 2023 +0100
    
        make format-fix
    
    commit 049327b
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Wed Dec 13 14:31:22 2023 +0100
    
        try to fix this parallel issue
    
    commit a5b290d
    Merge: 21d4120 8849f97
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Tue Dec 12 11:18:52 2023 +0100
    
        Merge remote-tracking branch 'upstream/main' into reservoir_sampler_Vectors
    
    commit 21d4120
    Merge: 795c454 e117c34
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Mon Dec 11 12:43:43 2023 +0100
    
        Merge branch 'main' into reservoir_sampler_Vectors
    
    commit c29eb0c
    Merge: 6bf31e1 25906f3
    Author: Tmonster <tom@ebergen.com>
    Date:   Thu Dec 7 15:23:06 2023 +0100
    
        Merge remote-tracking branch 'upstream/main'
    
    commit 6bf31e1
    Author: Elliana May <me@mause.me>
    Date:   Mon Dec 4 22:21:30 2023 +0800
    
        fix warning
    
    commit a521081
    Author: Elliana May <me@mause.me>
    Date:   Mon Dec 4 21:58:50 2023 +0800
    
        add test for streaming extracted statements
    
    commit 5ee902a
    Author: Elliana May <me@mause.me>
    Date:   Mon Dec 4 21:15:30 2023 +0800
    
        add some tests of duckdb_execute_prepared_streaming
    
    commit 58b6664
    Author: Elliana May <me@mause.me>
    Date:   Mon Dec 4 21:02:48 2023 +0800
    
        chore(docs): update docs for duckdb_execute_prepared_streaming
    
    commit a8e49b1
    Author: Hannes Mühleisen <hannes@duckdblabs.com>
    Date:   Tue Dec 5 11:31:21 2023 +0100
    
        add test case, apparently from snowflake
    
    commit a7ee1dd
    Author: Hannes Mühleisen <hannes@duckdblabs.com>
    Date:   Tue Dec 5 11:25:51 2023 +0100
    
        enable implicit fallthrough warning for /src and fixed a few instances
    
    commit c6bf4c6
    Author: Hannes Mühleisen <hannes@duckdblabs.com>
    Date:   Tue Dec 5 11:02:54 2023 +0100
    
        supporting more physical types of parquet time columns with time zone info
    
    commit baf670f
    Author: Jacob <535707+jkub@users.noreply.github.com>
    Date:   Mon Dec 4 09:05:56 2023 -0800
    
        make BufferPool members protected
    
    commit 878e7d2
    Author: Yves <yves@motherduck.com>
    Date:   Mon Dec 4 12:00:49 2023 -0500
    
        Mark BufferPool getters const
    
    commit a7ddb87
    Author: Gabor Szarnyas <gabor@duckdblabs.com>
    Date:   Mon Dec 4 16:22:44 2023 +0100
    
        Capitalize URL in httpfs extension flags
    
    commit 795c454
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Wed Dec 6 13:23:29 2023 +0100
    
        removing reservoir type checks
    
    commit 6e0e431
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Wed Dec 6 11:25:50 2023 +0100
    
        make format fix
    
    commit 236825b
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Wed Dec 6 10:23:57 2023 +0100
    
        remove unused code
    
    commit 34902e9
    Author: Tmonster <tom@ebergen.com>
    Date:   Tue Dec 5 21:20:09 2023 +0100
    
        should pass make format fix
    
    commit 42d3fb8
    Author: Tmonster <tom@ebergen.com>
    Date:   Tue Dec 5 18:04:36 2023 +0100
    
        percentage is still global, but rows is local
    
    commit d378cc7
    Author: Tmonster <tom@ebergen.com>
    Date:   Tue Dec 5 15:37:40 2023 +0100
    
        some debugging statements
    
    commit 4ad877c
    Author: Tmonster <tom@ebergen.com>
    Date:   Tue Dec 5 14:16:25 2023 +0100
    
        some changes. Have a lot of bugs solved. but still not great
    
    commit ad79d30
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Mon Dec 4 17:41:37 2023 +0100
    
        have figured out why percentage wasnt working. but it requires a big rework
    
    commit 04d4c0d
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Mon Dec 4 14:10:26 2023 +0100
    
        reservoir sample works. but for large cardinalities and high percentages no
    
    commit ddcea54
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Mon Dec 4 12:33:16 2023 +0100
    
        remove std::couts
    
    commit 4e12d15
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Wed Nov 29 17:47:16 2023 +0100
    
        ok, have the proper output for reservoir sampling. need to understand when to add local sample or global sample
    
    commit 43e72a4
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Wed Nov 29 15:23:55 2023 +0100
    
        compiles. Now I want to figure out where I left off last time
    
    commit 450655c
    Merge: 3639e4c 3f96a90
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Wed Nov 29 15:00:08 2023 +0100
    
        Merge branch 'main' into reservoir_sampler_Vectors
    
    commit 3639e4c
    Merge: c10b3a4 5bc0773
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Wed Nov 29 14:56:39 2023 +0100
    
        Merge branch 'main' into reservoir_sampler_Vectors
    
    commit c10b3a4
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Mon Jan 23 10:04:20 2023 +0100
    
        this should work now for sampling a set amount of rows. Still need to work on percentage sampling
    
    commit 7147e2a
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Wed Jan 18 16:38:02 2023 +0100
    
        it is starting to work, but need to look into why it is still slow
    
    commit 2255424
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Wed Jan 18 11:21:15 2023 +0100
    
        working for normal blocking sample, but not for percentage
    
    commit 8a01b32
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Mon Jan 16 17:02:34 2023 +0100
    
        intermediate commit, will fix other spots later
    
    commit 3676737
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Mon Jan 16 09:03:52 2023 +0100
    
        intermediate work, will be fixing later
    
    commit 904d220
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Fri Jan 13 13:50:43 2023 +0100
    
        collecting samples in parallel now, now I need to figure out how to combine them in a proper uniform and weighted manner
    
    commit 01c4b89
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Tue Jan 10 15:22:28 2023 +0100
    
        minor code cleanup
    
    commit b5c6d61
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Tue Jan 10 11:51:32 2023 +0100
    
        get rid of 4 spaces
    
    commit 1dc807f
    Merge: 605520f 7e1a307
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Tue Jan 10 11:50:14 2023 +0100
    
        Merge branch 'reservoir_sampler_Vectors' of github.com:Tmonster/duckdb into reservoir_sampler_Vectors
    
    commit 7e1a307
    Author: Tmonster <tom@ebergen.com>
    Date:   Wed Jan 4 11:39:30 2023 -0800
    
        make format-fix
    
    commit 750c1e3
    Author: Tmonster <tom@ebergen.com>
    Date:   Wed Jan 4 11:37:27 2023 -0800
    
        small syntax updates
    
    commit fa2ac9c
    Author: Tmonster <tom@ebergen.com>
    Date:   Wed Jan 4 11:36:33 2023 -0800
    
        small syntax updates
    
    commit cd232c6
    Author: Tmonster <tom@ebergen.com>
    Date:   Tue Dec 27 14:54:08 2022 -0800
    
        Revert "mostly adding debugging statements for help. Still trying to figure out how to know if parallelizing is sequential or not"
    
        This reverts commit 0f08574.
    
    commit 0f08574
    Author: Tmonster <tom@ebergen.com>
    Date:   Wed Dec 21 15:08:52 2022 -0800
    
        mostly adding debugging statements for help. Still trying to figure out how to know if parallelizing is sequential or not
    
    commit f4f5834
    Author: Tmonster <tom@ebergen.com>
    Date:   Wed Dec 21 13:43:00 2022 -0800
    
        remove iostream
    
    commit bee57ae
    Author: Tmonster <tom@ebergen.com>
    Date:   Wed Dec 21 12:37:02 2022 -0800
    
        make format fix
    
    commit ce950a1
    Author: Tmonster <tom@ebergen.com>
    Date:   Wed Dec 21 09:10:23 2022 -0800
    
        ok added test over reservoir threshold
    
    commit 29fb39a
    Author: Tmonster <tom@ebergen.com>
    Date:   Wed Dec 21 09:01:21 2022 -0800
    
        ok it's all in a datachunk, now I can try and parallelize it
    
    commit 8f05514
    Author: Tmonster <tom@ebergen.com>
    Date:   Tue Dec 20 17:03:25 2022 +0100
    
        remove pragma threads
    
    commit 98f9897
    Author: Tmonster <tom@ebergen.com>
    Date:   Tue Dec 20 17:02:58 2022 +0100
    
        no more memory errors
    
    commit da234b7
    Author: Tmonster <tom@ebergen.com>
    Date:   Tue Dec 20 12:24:37 2022 +0100
    
        no more errors when running count(*) on samples greater than the basic vector size
    
    commit 3fc7214
    Author: Tmonster <tom@ebergen.com>
    Date:   Tue Dec 20 10:04:13 2022 +0100
    
        fix error
    
    commit e302946
    Author: Tmonster <tom@ebergen.com>
    Date:   Tue Dec 20 10:03:30 2022 +0100
    
        still errors
    
    commit 7ea6405
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Mon Dec 19 21:15:38 2022 +0100
    
        its getting better but still getting memory errors
    
    commit 8f3c597
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Fri Dec 16 16:48:12 2022 +0100
    
        add some functionality, but mostly making reservoir sampler use datachunk chunkcollection
    
    commit 605520f
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Mon Dec 19 21:15:38 2022 +0100
    
        its getting better but still getting memory errors
    
    commit 97a491e
    Author: Tom Ebergen <tom@ebergen.com>
    Date:   Fri Dec 16 16:48:12 2022 +0100
    
        add some functionality, but mostly making reservoir sampler use datachunk chunkcollection
    Tmonster committed Dec 18, 2023
    Configuration menu
    Copy the full SHA
    07e0a60 View commit details
    Browse the repository at this point in the history

Commits on Jan 3, 2024

  1. Configuration menu
    Copy the full SHA
    31c6fcd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4f0ee4a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f701939 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    d1cd788 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    34b9da0 View commit details
    Browse the repository at this point in the history
  6. Merge remote-tracking branch 'upstream/main' into remove_chunk_collec…

    …tion_from_reservoir_sampler
    Tmonster committed Jan 3, 2024
    Configuration menu
    Copy the full SHA
    24a36d6 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    3f381b9 View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2024

  1. Configuration menu
    Copy the full SHA
    8db51d9 View commit details
    Browse the repository at this point in the history
  2. remove other CI

    Tmonster committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    08bdfbb View commit details
    Browse the repository at this point in the history
  3. Revert "remove other CI"

    This reverts commit 08bdfbb.
    Tmonster committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    938e5c1 View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2024

  1. Merge remote-tracking branch 'upstream/main' into remove_chunk_collec…

    …tion_from_reservoir_sampler
    Tmonster committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    2ad4e3f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5aa8163 View commit details
    Browse the repository at this point in the history

Commits on Jan 8, 2024

  1. Minor fixes

    Mytherin committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    3699cf3 View commit details
    Browse the repository at this point in the history