-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-1741: [C++] Add DictionaryArray::CanCompareIndices #5342
Conversation
cpp/src/arrow/type.h
Outdated
@@ -1256,6 +1257,10 @@ class ARROW_EXPORT DictionaryType : public FixedWidthType { | |||
|
|||
bool ordered() const { return ordered_; } | |||
|
|||
/// \brief Determine whether dictionary arrays may be compared without unification | |||
/// (smaller dictionaries are prefixes of larger dictionaries) | |||
static Result<bool> ComparableWithoutUnification(std::vector<const Array*> arrays); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a nitpick but I would prefer to put this function into arrow/array/dictionary_util.h
or arrow/array/array_dict.h
(since splitting up array.h
is going to happen eventually)
I might prefer a two-argument version of this.
namespace dictionary {
bool CanCompareIndices(const DictionaryArray& left, const DictionaryArray& right);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DictionaryArray::CanCompareIndices
would be OK too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll refactor to that
cpp/src/arrow/type.cc
Outdated
return Status::TypeError("array types not all consistent ", *arrays[0]->type(), | ||
" vs ", *a->type()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think returning false is better. This is doing too much in one function IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean; what is doing too much? Checking for type agreement seems necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that returning an error is going too far. Error Status should not be used for routine argument validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following DictionaryType::Unify
which returns Error status under this condition
https://github.com/apache/arrow/blob/2ba0566/cpp/src/arrow/array/builder_dict.cc#L130-L133
I can change this to a DCHECK
cpp/src/arrow/type.cc
Outdated
if (a->type_id() != Type::DICTIONARY) { | ||
return Status::TypeError("input arrays must be dictionary arrays"); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is written as a two-argument function where the inputs are DictionaryArray then these checks aren't needed
@wesm how's this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. As a small nit I'm going to move the DictionaryArray::CanCompareIndices
implementation to array.cc
Move CanCompareIndices implementation
9c4e8d8
to
9600868
Compare
Tests whether arrays' dictionaries are prefixes of other arrays' dictionaries, which allows them to be compared without unification since the indices refer to identical values. Closes apache#5342 from bkietz/1741-Comparison-function-for-D and squashes the following commits: 9600868 <Benjamin Kietzman> iwyu: algorithm 544a7c7 <Benjamin Kietzman> iwyu: memory 08022c9 <Benjamin Kietzman> remove error status validation 0f3cbd7 <Benjamin Kietzman> move dictionary array specific code to array/dict_internal.{h,cc} e85d0ae <Benjamin Kietzman> refactor to DictionaryArray::CanCompareIndices 2f20180 <Benjamin Kietzman> ARROW-1741: Add DictionaryType::ComparableWithoutUnification Authored-by: Benjamin Kietzman <bengilgit@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>
Tests whether arrays' dictionaries are prefixes of other arrays' dictionaries, which allows them to be compared without unification since the indices refer to identical values.