Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-1741: [C++] Add DictionaryArray::CanCompareIndices #5342

Closed
wants to merge 6 commits into from

Conversation

bkietz
Copy link
Member

@bkietz bkietz commented Sep 10, 2019

Tests whether arrays' dictionaries are prefixes of other arrays' dictionaries, which allows them to be compared without unification since the indices refer to identical values.

@@ -1256,6 +1257,10 @@ class ARROW_EXPORT DictionaryType : public FixedWidthType {

bool ordered() const { return ordered_; }

/// \brief Determine whether dictionary arrays may be compared without unification
/// (smaller dictionaries are prefixes of larger dictionaries)
static Result<bool> ComparableWithoutUnification(std::vector<const Array*> arrays);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a nitpick but I would prefer to put this function into arrow/array/dictionary_util.h or arrow/array/array_dict.h (since splitting up array.h is going to happen eventually)

I might prefer a two-argument version of this.

namespace dictionary {

bool CanCompareIndices(const DictionaryArray& left, const DictionaryArray& right);

}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DictionaryArray::CanCompareIndices

would be OK too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll refactor to that

return Status::TypeError("array types not all consistent ", *arrays[0]->type(),
" vs ", *a->type());
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning false is better. This is doing too much in one function IMHO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean; what is doing too much? Checking for type agreement seems necessary

Copy link
Member

@wesm wesm Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that returning an error is going too far. Error Status should not be used for routine argument validation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following DictionaryType::Unify which returns Error status under this condition
https://github.com/apache/arrow/blob/2ba0566/cpp/src/arrow/array/builder_dict.cc#L130-L133
I can change this to a DCHECK

if (a->type_id() != Type::DICTIONARY) {
return Status::TypeError("input arrays must be dictionary arrays");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is written as a two-argument function where the inputs are DictionaryArray then these checks aren't needed

@bkietz bkietz changed the title ARROW-1741: [C++] Add DictionaryType::ComparableWithoutUnification ARROW-1741: [C++] Add DictionaryArray::CanCompareIndices Sep 10, 2019
@bkietz
Copy link
Member Author

bkietz commented Sep 12, 2019

@wesm how's this?
The Travis failure is https://issues.apache.org/jira/browse/ARROW-6509

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. As a small nit I'm going to move the DictionaryArray::CanCompareIndices implementation to array.cc

@wesm wesm force-pushed the 1741-Comparison-function-for-D branch from 9c4e8d8 to 9600868 Compare September 14, 2019 22:34
@wesm wesm closed this in 5fa694b Sep 14, 2019
pprudhvi pushed a commit to pprudhvi/arrow that referenced this pull request Sep 16, 2019
Tests whether arrays' dictionaries are prefixes of other arrays' dictionaries, which allows them to be compared without unification since the indices refer to identical values.

Closes apache#5342 from bkietz/1741-Comparison-function-for-D and squashes the following commits:

9600868 <Benjamin Kietzman> iwyu: algorithm
544a7c7 <Benjamin Kietzman> iwyu: memory
08022c9 <Benjamin Kietzman> remove error status validation
0f3cbd7 <Benjamin Kietzman> move dictionary array specific code to array/dict_internal.{h,cc}
e85d0ae <Benjamin Kietzman> refactor to DictionaryArray::CanCompareIndices
2f20180 <Benjamin Kietzman> ARROW-1741:  Add DictionaryType::ComparableWithoutUnification

Authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
@bkietz bkietz deleted the 1741-Comparison-function-for-D branch February 25, 2021 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants