-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-23031 rewrite distinct #988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
176 commits
Select commit
Hold shift + click to select a range
9540999
add datasketches udfs for hll
kgyrtkirk 9b08b9c
add crap
kgyrtkirk 83367d1
add sketches1
kgyrtkirk a3b87d8
updates to q
kgyrtkirk 85182b4
add test/etc
kgyrtkirk 47551f3
s2
kgyrtkirk 48c738f
add 0
kgyrtkirk 4952952
add
kgyrtkirk dedb374
fx
kgyrtkirk 600bf83
s3
kgyrtkirk 50aea64
add to conversion
kgyrtkirk c67f538
add sq way
kgyrtkirk c42114c
u
kgyrtkirk e95afb0
cok
kgyrtkirk 3c5f794
add local clones of UDxFs
kgyrtkirk d9b2246
modify t struct1
kgyrtkirk 5f7a36a
sketchtoestimate v0.1
kgyrtkirk dcde5cf
working sk2est
kgyrtkirk 1056ee5
begin to split?
kgyrtkirk ef27ec1
IRoll
kgyrtkirk 5e15ff4
rollup complete
kgyrtkirk a3f0283
ws change
kgyrtkirk 301d47b
Merge remote-tracking branch 'apache/master' into HIVE-sketches
kgyrtkirk fc13dc0
Merge remote-tracking branch 'remotes/kgyrtkirk/HIVE-sketches' into H…
kgyrtkirk 9d9319b
unpatch some parts
kgyrtkirk 0aac13f
remove local crap
kgyrtkirk d9e5c34
add a real datasketches release
kgyrtkirk b52f961
prefi
kgyrtkirk fc5ee1e
register hll
kgyrtkirk 5ad9c87
add fixme
kgyrtkirk 7239510
renameX
kgyrtkirk d265d19
add a bunch
kgyrtkirk 4e4d7c4
add more/fix/etc
kgyrtkirk dab0947
correct typo
kgyrtkirk b557a55
undo ws
kgyrtkirk 7a50054
add/note/etc
kgyrtkirk aff4bec
UDF/UDAF name clash
kgyrtkirk 3a363ce
hll example
kgyrtkirk 7910a67
pom changes I
kgyrtkirk 94030e3
remove preliminary qtests
kgyrtkirk c17c77e
fixme comment
kgyrtkirk ab34866
add theta
kgyrtkirk 279d355
add theta
kgyrtkirk c81ef36
run tests with minillapolocal
kgyrtkirk c198f0f
Merge remote-tracking branch 'apache/master' into HIVE-22940-sketches…
kgyrtkirk 94f211a
renames
kgyrtkirk 14be625
more changes
kgyrtkirk 25571c2
fix
kgyrtkirk 9be818a
fix name
kgyrtkirk 404a5ed
Merge remote-tracking branch 'apache/master' into HIVE-22940-sketches…
kgyrtkirk f68f110
Merge remote-tracking branch 'apache/master' into HIVE-22940-sketches…
kgyrtkirk b5ef5bb
cleanup/etc
kgyrtkirk 6a808d0
Merge remote-tracking branch 'remotes/kgyrtkirk/HIVE-22940-sketches-f…
kgyrtkirk 797d846
Merge remote-tracking branch 'remotes/kgyrtkirk/HIVE-22940-sketches-f…
kgyrtkirk 379e6c7
add prototype codes
kgyrtkirk 75dfaf5
Revert "add prototype codes"
kgyrtkirk 4d4aff4
Revert "Revert "add prototype codes""
kgyrtkirk e59f913
use metrgable
kgyrtkirk 4198bce
union2
kgyrtkirk 2e97b3a
rollup0
kgyrtkirk cc092a8
there..it works
kgyrtkirk 3622fd4
indent
kgyrtkirk ef5f5f1
remove
kgyrtkirk a771c4f
remove
kgyrtkirk 0f8363c
add initial
kgyrtkirk 09a5614
somewhat better
kgyrtkirk 5769151
fx
kgyrtkirk 8ef6215
Merge remote-tracking branch 'remotes/kgyrtkirk/HIVE-23030-rollup-uni…
kgyrtkirk 328ead9
add
kgyrtkirk 2fb0cd5
cleanup/etc
kgyrtkirk e65f192
HIVE-22998 : Dump partition info if hive.repl.dump.metadata.only.for.…
aasha c8d5191
HIVE-22964: MM table split computation is very slow (Aditya Shah revi…
4746cbb
HIVE-16355 HIVE-22893: addendum - missing ASF headers
kgyrtkirk 0a73fce
HIVE-23008: UDAFExampleMaxMinNUtil.sortedMerge must be able to handle…
kgyrtkirk e1d9663
HIVE-22762: Leap day is incorrectly parsed during cast in Hive (Karen…
belugabehr 54b5bba
HIVE-21778: CBO: "Struct is not null" gets evaluated as `nullable` al…
vineetgarg02 ab4aeb6
HIVE-21939 : protoc:2.5.0 dependence has broken building on aarch64. …
chinnaraolalam 755e990
HIVE-22974: Metastore's table location check should be applied when l…
nrg4878 daae908
HIVE-23015: Fix HIVE_VECTORIZATION_GROUPBY_COMPLEX_TYPES_ENABLED defi…
pvargacl fc73fdf
HIVE-22985: Failed compaction always throws TxnAbortedException (Kare…
1c848b2
HIVE-22976: Oracle and MSSQL upgrade script missing the addition of W…
bmaidics fb30aaa
HIVE-22970: Add a qoption to enable tests to use transactional mode (…
kgyrtkirk fbc8a4c
HIVE-22959 : Extend storage-api to expose FilterContext (Panos G via …
8debe93
HIVE-23027: Fix syntax error in llap package.py (Rajesh Balamohan, re…
rbalamohan 6b9170e
HIVE-23023: MR compaction ignores column schema evolution (Kare Coppa…
a8dcfb8
HIVE-23011: Shared work optimizer should check residual predicates wh…
jcamachor d9e005d
HIVE-22901: Variable substitution can lead to OOM on circular referen…
dvoros 3c37e74
HIVE-22539: HiveServer2 SPNEGO authentication should skip if authoriz…
risdenk bba01c6
HIVE-22841: ThriftHttpServlet#getClientNameFromCookie should handle C…
risdenk be620a6
HIVE-23022 : Arrow deserializer should ensure size of hive vector equ…
7936a94
HIVE-22990 : Build acknowledgement mechanism for repl dump and load. …
aasha 6ff297e
HIVE-23019: Fix TestTxnCommandsForMmTable test case (Peter Varga via …
pvargacl 17ad636
HIVE-22955 PreUpgradeTool can fail because access to CharsetDecoder i…
ghanko ccde408
HIVE-23034 : Arrow serializer should not keep the reference of arrow …
43d2440
HIVE-23002: Optimise LazyBinaryUtils.writeVLong (Rajesh Balamohan, re…
rbalamohan bce3225
HIVE-23035: Scheduled query executor may hang in case TezAMs are laun…
kgyrtkirk 4551045
HIVE-23033: MSSQL metastore schema init script doesn't initialize NOT…
nrg4878 ef301e3
HIVE-23059 In constraint name uniqueness query use the MTable instead…
miklosgergely 325e2ea
HIVE-23063 Use the same PerfLogger all over Compiler (Miklos Gergely,…
miklosgergely 8d27295
Merge remote-tracking branch 'apache/master' into HIVE-22940-sketches…
kgyrtkirk a3630aa
no-transform
kgyrtkirk 7523c8e
aa
kgyrtkirk d0058ad
it does work
kgyrtkirk 55c5362
cleanup
kgyrtkirk 4dbce31
Merge remote-tracking branch 'kgyrtkirk/HIVE-22940-sketches-fns' into…
kgyrtkirk 99145fc
Merge remote-tracking branch 'apache/master' into HIVE-23031-rewrite-…
kgyrtkirk 5c28eff
Merge remote-tracking branch 'kgyrtkirk/HIVE-22940-sketches-fns' into…
kgyrtkirk fb87cac
Merge remote-tracking branch 'apache/master' into HIVE-23030-rollup-u…
kgyrtkirk 25fdd21
SketchFn enum
kgyrtkirk 45c27eb
back to string consts
kgyrtkirk 0b95dc2
register/x
kgyrtkirk 062403b
fix/inline
kgyrtkirk 8e07be3
chanma
kgyrtkirk cd21596
add to testconf
kgyrtkirk 5adb78f
use map
kgyrtkirk d5ae78d
rename/cleanup/etc
kgyrtkirk 7f08d49
fixes
kgyrtkirk cadb274
cleanup
kgyrtkirk c1037fd
cleanup
kgyrtkirk 2deb2fb
cleanup
kgyrtkirk 7064cae
unpatch DSF
kgyrtkirk 7e49d62
Merge remote-tracking branch 'kgyrtkirk/HIVE-23030-rollup-union' into…
kgyrtkirk 38f3b6e
remove crap
kgyrtkirk b2d8bf3
cleanup
kgyrtkirk 25fc17d
reanme -a
kgyrtkirk 0a31af6
reanme
kgyrtkirk 60dcc01
add round
kgyrtkirk 865a7f4
remove redundant test
kgyrtkirk 960d29d
add round
kgyrtkirk 8258555
singleton/etc
kgyrtkirk 27b8600
cleanup
kgyrtkirk 8458667
plugin reg
kgyrtkirk e1fd7f4
cleanup
kgyrtkirk 739db22
cleanuo
kgyrtkirk 708968d
fac epalm
kgyrtkirk 1669827
Merge remote-tracking branch 'kgyrtkirk/HIVE-23030-rollup-union' into…
kgyrtkirk e2641d0
cleanup
kgyrtkirk fee7f7f
address review comments
kgyrtkirk 541b3f9
Merge remote-tracking branch 'apache/master' into HIVE-23030-rollup-u…
kgyrtkirk 8498aca
Merge remote-tracking branch 'apache/master' into HIVE-23031-rewrite-…
kgyrtkirk c8f9fbe
Merge remote-tracking branch 'kgyrtkirk/HIVE-23030-rollup-union' into…
kgyrtkirk 79ae119
fx
kgyrtkirk 9bdd2d7
added explicit drop for rollup
kgyrtkirk b0bd2a8
Merge remote-tracking branch 'kgyrtkirk/HIVE-23030-rollup-union' into…
kgyrtkirk f4bf29e
Merge remote-tracking branch 'apache/master' into HIVE-23031-rewrite-…
kgyrtkirk 81602f4
Merge remote-tracking branch 'apache/master' into HIVE-23031-rewrite-…
kgyrtkirk 391eb98
Merge remote-tracking branch 'apache/master' into HIVE-23031-rewrite-…
kgyrtkirk 3a84bed
remove ws changes
kgyrtkirk eb3abe3
Merge remote-tracking branch 'apache/master' into HIVE-23031-rewrite-…
kgyrtkirk a552736
cleanup
kgyrtkirk 0e64fab
change dsf
kgyrtkirk cf61be4
cleaner fns
kgyrtkirk 51dd712
add estimate to dsf
kgyrtkirk b9b7302
fix fixme/add estimate
kgyrtkirk 2298c86
add options/etc
kgyrtkirk 826d50a
fix test
kgyrtkirk 4706018
add to conf
kgyrtkirk 729cfe6
git statusMerge remote-tracking branch 'apache/master' into HIVE-2303…
kgyrtkirk 306d7c3
remove hiveconf
kgyrtkirk 4ffe943
cleanup
kgyrtkirk a9e9089
rename/etc
kgyrtkirk 35bfd3e
add fixme
kgyrtkirk 676c542
add comment
kgyrtkirk 4474144
one-way to add return type...
kgyrtkirk 011ed58
rename options
kgyrtkirk f64af08
cleanup
kgyrtkirk e2e89d9
correct comment
kgyrtkirk b856e9a
cleanup
kgyrtkirk e6ea2d4
cleanup
kgyrtkirk 83367c3
\updates
kgyrtkirk e4b82a5
update q.out
kgyrtkirk 35ad023
add test
kgyrtkirk 6f707c0
add new test
kgyrtkirk a68917b
remove multitype
kgyrtkirk 4d079d3
removed cpc/theta
kgyrtkirk 5b460d6
dis
kgyrtkirk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we limit the algorithm choices to a single one for the time being?
The reason I am asking this is that this will not work with materialized views. Since we are not storing in the SQL view definition the algorithm that we used to generate the column, if the property value changes, this would lead to errors.
The multi-algorithm supports needs a little bit more work. One option would be to store this information in the MV table properties so we know how to interpret them when HS2 needs to load them (and thus parse them). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that would be neccessary
I've added a test(sketches_materialized_view_sketchtype.q)
which shows how it works when there is an MV for HLL ; in case the mode is not HLL the MV is ignored and computed directly
I think the real meaning of the MV should not change(I think we agree on this); we have 2 choices here:
I think addressing this is outside of the scope of this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand for a single algorithm it will work. However, consider the following scenario:
hll.hll. The SQL statement still has count distinct.cpcand restart HS2. Thus, when the MV is loaded by HS2, the count distinct is transformed tocpc.cpc, matches the MV... but fails at deserialization time because the sketch stored for the MV ishll.That is why I suggested we could limit the options for algorithms till we have proper support. The risk I see if we do not do that now is that if anyone creates MVs using the different default algorithms, we will not have any way to distinguish between them anymore.
From the two choices that you mention above, I was suggesting the second option, since the main goal of the whole effort is to be able to use these algorithms seamlessly with the MVs. I agree it can be outside of the scope of this change, but let's limit the algorithm choices till then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not thinking about restarting HS2
sure...we can limit it to one - but if this incorrect behaviour does exists - then I think it could also be triggered with the main bi mode switch as well:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried it out - I didn't seen any exceptions the MV match for a plain
count(distinct id)didn't happened....when I've changed the default algo no exceptions happened; but matches were made incorrectly - so there could be dragons...
I've removed cpc/theta for now...we can add it back later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, failure can still happen when the sketch is stored and the mode changes.
Thanks for making the changes in any case. Let's check in this patch and give priority to the overlay issue, it should not be too difficult to address and will fix all these issues.