Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-14795: [C++] Fix issue on replace with mask for null values #11759

Conversation

AlvinJ15
Copy link
Contributor

@AlvinJ15 AlvinJ15 commented Nov 23, 2021

ARROW-14795: [C++] Fix issue on vector replace with mask for null values, which weren't updated on null bitmaps

@github-actions
Copy link

@bkmgit
Copy link
Contributor

bkmgit commented Nov 23, 2021

Thanks for the pull request. The issue should be ARROW-14795 not ARROW-14748

Copy link
Contributor

@aocsa aocsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me. I requested few changes, once is ready lets see what CI shows.

Comment on lines 76 to 77
const uint8_t* GetInBitmap() const {
return in_bitmap;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this method, this is a struct so properties are public by default. I recommend to check this reference: https://www.educative.io/collection/page/10370001/5111386763952128/6533026123087872

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add this because the CopyBitmap is a template with typename for this struct and can't acces to the in_bitmap property.

void SetBitTrue(uint8_t* out_bitmap, int64_t out_offset, int64_t offset) const {
BitUtil::SetBitTo(out_bitmap, out_offset, true);
}
Copy link
Contributor

@aocsa aocsa Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to change method name to SetBitToTrue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just for reference, BitUtil calls this SetBit:

static inline void SetBit(uint8_t* bits, int64_t i) { bits[i / 8] |= kBitmask[i % 8]; }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solved

Comment on lines 112 to 111
void SetBitTrue(uint8_t* out_bitmap, int64_t out_offset, int64_t offset) const {
BitUtil::SetBitTo(out_bitmap, out_offset, true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use same method name here as suggested before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solved

Comment on lines 99 to 90
const uint8_t* GetInBitmap() const {
return nullptr;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe instead of creating this weird method, you can create a property like this:

  const static uint8_t* in_bitmap = nullptr;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solved

Comment on lines 226 to 250
this->Assert(ReplaceWithMask, this->array("[1, null, 1]"),
this->mask("[false, true, false]"),
this->array("[7]"),
this->array("[1, 7, 1]"));
this->Assert(ReplaceWithMask, this->array("[1, null, 1, 7]"),
this->mask("[false, true, false, true]"),
this->array("[7, 20]"),
this->array("[1, 7, 1, 20]"));
this->Assert(ReplaceWithMask, this->array("[1, 2, 3, 4]"),
this->mask("[false, true, false, true]"),
this->array("[null, null]"),
this->array("[1, null, 3, null]"));
this->Assert(ReplaceWithMask, this->array("[null, 2, 3, 4]"),
this->mask("[true, true, false, true]"),
this->array("[1, null, null]"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to move all these new tests to a new test function to clarify the purpose of this PR.

TYPED_TEST(TestReplaceNumeric, ReplaceWithMaskForNullValuesAndMaskEnabled)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solved

@aocsa
Copy link
Contributor

aocsa commented Nov 23, 2021

Please @lidavidm enable to run the CI on this PR.

@lidavidm
Copy link
Member

It looks like there's some lint failures; there's documentation on how to run the linter locally if that helps: https://arrow.apache.org/docs/developers/cpp/development.html#code-style-linting-and-ci

@AlvinJ15 AlvinJ15 force-pushed the achunga/14795-fix_vector_replace_with_mask branch 6 times, most recently from 5dd5dc9 to b059ba0 Compare November 23, 2021 22:36
@bkmgit
Copy link
Contributor

bkmgit commented Nov 24, 2021

@AlvinJ15 Please change your title to "ARROW-14795: [C++] Fix issue on replace with mask for null values" so that the correct JIRA issue gets picked up.

@AlvinJ15 AlvinJ15 changed the title ARROW-14748: [C++] Fix issue on replace with mask for null values ARROW-14795: [C++] Fix issue on replace with mask for null values Nov 24, 2021
@github-actions
Copy link

https://issues.apache.org/jira/browse/ARROW-14795

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@lidavidm
Copy link
Member

The following test case fails:

this->Assert(ReplaceWithMask, this->array("[1, null, 1]"),
             this->mask("[false, true, false]"), this->scalar("null"),
             this->array("[1, null, 1]"));

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fix can be made simpler. This is the gist of the issue:

diff --git a/cpp/src/arrow/compute/kernels/vector_replace.cc b/cpp/src/arrow/compute/kernels/vector_replace.cc
index 7f204b529..1ed50919d 100644
--- a/cpp/src/arrow/compute/kernels/vector_replace.cc
+++ b/cpp/src/arrow/compute/kernels/vector_replace.cc
@@ -81,7 +81,7 @@ struct CopyArrayBitmap {
 
   void SetBit(uint8_t* out_bitmap, int64_t out_offset, int64_t offset) const {
     BitUtil::SetBitTo(out_bitmap, out_offset,
-                      BitUtil::GetBit(in_bitmap, in_offset + offset));
+                      in_bitmap ? BitUtil::GetBit(in_bitmap, in_offset + offset) : true);
   }
 };
 
@@ -133,10 +133,8 @@ void ReplaceWithArrayMaskImpl(const ArrayData& array, const ArrayData& mask,
              BitUtil::GetBit(mask_bitmap, write_offset + mask.offset + i))) {
           Functor::CopyData(*array.type, out_values, out_offset + write_offset + i,
                             replacements, replacements_offset, /*length=*/1);
-          if (replacements_bitmap) {
-            copy_bitmap.SetBit(out_bitmap, out_offset + write_offset + i,
-                               replacements_offset);
-          }
+          copy_bitmap.SetBit(out_bitmap, out_offset + write_offset + i,
+                             replacements_offset);
           replacements_offset++;
         }
       }

That is, in one case, we should always update the output bitmap.

We can try these additional test cases:

+  this->Assert(ReplaceWithMask, this->array("[1, null, 1]"),
+               this->mask("[true, true, true]"), this->array("[7, 7, 7]"),
+               this->array("[7, 7, 7]"));
+  this->Assert(ReplaceWithMask, this->array("[1, null, 1]"),
+               this->mask("[true, true, true]"), this->array("[null, null, null]"),
+               this->array("[null, null, null]"));
+  this->Assert(ReplaceWithMask, this->array("[1, null, 1]"),
+               this->mask("[false, true, false]"), this->scalar("null"),
+               this->array("[1, null, 1]"));
+  this->Assert(ReplaceWithMask, this->array("[1, null, 1]"),
+               this->mask("[true, true, true]"), this->scalar("null"),
+               this->array("[null, null, null]"));

@AlvinJ15 AlvinJ15 force-pushed the achunga/14795-fix_vector_replace_with_mask branch from b059ba0 to 3ebc2e2 Compare November 24, 2021 18:36
@AlvinJ15
Copy link
Contributor Author

I updated this PR addressing latest feedback comments. I think this PR is ready for merging. Let me know if anything needs to be taken care of from my end. cc @lidavidm

@AlvinJ15 AlvinJ15 force-pushed the achunga/14795-fix_vector_replace_with_mask branch from 3ebc2e2 to 242646c Compare November 24, 2021 20:17
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's actually an unspotted bug on line 169 that I think would replace most changes here:

  const bool replacements_bitmap = replacements.is_array()
                                       ? replacements.array()->MayHaveNulls()
: !replacements.scalar()->is_valid;

should instead read

  const bool replacements_bitmap = replacements.is_array()
                                       ? replacements.array()->MayHaveNulls()
: true;

cpp/src/arrow/compute/kernels/vector_replace.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_replace.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_replace.cc Outdated Show resolved Hide resolved
@@ -96,6 +105,10 @@ struct CopyScalarBitmap {
void SetBit(uint8_t* out_bitmap, int64_t out_offset, int64_t offset) const {
BitUtil::SetBitTo(out_bitmap, out_offset, is_valid);
}

void SetBitToTrue(uint8_t* out_bitmap, int64_t out_offset, int64_t offset) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nor is this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidavidm all comments has been solved

@AlvinJ15 AlvinJ15 force-pushed the achunga/14795-fix_vector_replace_with_mask branch from 242646c to 15d4b11 Compare November 26, 2021 00:37
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. Just a couple small things.

cpp/src/arrow/compute/kernels/vector_replace.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_replace.cc Outdated Show resolved Hide resolved
@AlvinJ15 AlvinJ15 force-pushed the achunga/14795-fix_vector_replace_with_mask branch from 15d4b11 to 04b7a83 Compare November 29, 2021 00:31
@AlvinJ15
Copy link
Contributor Author

@lidavidm all comments solved

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for working through all the updates.

@lidavidm lidavidm closed this in 72f7fbe Nov 29, 2021
@ursabot
Copy link

ursabot commented Nov 29, 2021

Benchmark runs are scheduled for baseline = 236362a and contender = 72f7fbe. 72f7fbe is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] ursa-i9-9960x
[Finished ⬇️0.66% ⬆️0.18%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

kou pushed a commit to kou/arrow that referenced this pull request Dec 1, 2021
ARROW-14795: [C++]  Fix issue on vector replace with mask for null values, which weren't updated on null bitmaps

Closes apache#11759 from AlvinJ15/achunga/14795-fix_vector_replace_with_mask

Authored-by: alvinj15 <Alvin258461@>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants