Skip to content

Conversation

@js8544
Copy link
Collaborator

@js8544 js8544 commented Aug 8, 2023

Rationale for this change

Add rolling_* functions.

What changes are included in this PR?

Added their implementations, tests and python ports.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions
Copy link

github-actions bot commented Aug 8, 2023

⚠️ GitHub issue #37059 has been automatically assigned in GitHub to PR creator.

@felipecrv
Copy link
Contributor

@js8544 ping me when this is ready for the initial reviews.

@js8544
Copy link
Collaborator Author

js8544 commented Aug 16, 2023

@js8544 ping me when this is ready for the initial reviews.

Thanks in advance! I'm quite busy with my day job this week. Will deal with this over the weekend.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current visit_valid must have 1 arg and visit_null must have no args. However, I want to pass VisitRightNull and VisitLeftNull (both having 0 args) as visit_valid, so I need to use InvokeWithRequiredArgs for all the lambdas passed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to order it alphabetically

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 22, 2023
@js8544 js8544 requested a review from felipecrv August 22, 2023 16:31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe [&] should be replaced by [visit_both_not_null = std::move(visit_both_not_null)] so the r-value parameters are moved into the local lambdas. As they are not shared, they can be moved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arr0_it and arr1_it would be captured by & of course as they are shared among the different lambdas.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think capturing by reference is better here because the lambda won't need to create a variable for the func. See https://cppinsights.io/s/690fcf31 for an example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. Nothing changes in the final assembly anyway. https://godbolt.org/z/Edaqefsc9

Comment on lines 68 to 71
Copy link
Contributor

@felipecrv felipecrv Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call the template parameter ArrowType or DataType you can use it in place of ArgType and OutType removing the need for the aliasing and different names for a type that is the same.

That would also free up ArgType and OutType to refer to the primitive C types and remove the need for ArgValue and OutValue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you think some Windows might need to differentiate here, keep it as it is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would prefer to keep this because Mean has a different out type than arg type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why double instead of ArgValue here? To avoid having to cast sum to double in GetValue? But how does that trade off with the error accumulation due to sums and subs on the double?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid overflow. Errors would only occur with large values, which are also susceptible to overflow errors. I think overflow is the worse problem here. Also, to improve accuracy, we can later switch to Kahan Sum or Pairwise Sum.

Comment on lines 200 to 333
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two can come from Window::... as well I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and a large amount of template parameter thanks to this. I've updated the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's possible to pass IgnoreNulls as a template parameter and de-duplicate the implementation by adding a few if constexpr (IgnoreNulls) in the code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this but there will be so many if constexprs that it hurts readability. The main problem is one needs to keep window_valid_count and the other needs window_null_count.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I've gave up on some if constexpr-based implementations in the past because of this as well.

@js8544 js8544 requested a review from felipecrv August 29, 2023 12:43
Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but you will need a commiter's approval to merge it :)

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for great code! I may ask a stupid question that, how do Prod handles value like NaN, 0? are they regarded as "invalid" like null?

@js8544
Copy link
Collaborator Author

js8544 commented Aug 30, 2023

Hi, thanks for great code! I may ask a stupid question that, how do Prod handles value like NaN, 0? are they regarded as "invalid" like null?

There's no special handling of them. So they are dealt with the same way C++ deals with them.

@js8544
Copy link
Collaborator Author

js8544 commented Aug 30, 2023

LGTM, but you will need a commiter's approval to merge it :)

Thanks! @pitrou would you mind taking a look?

@mapleFU
Copy link
Member

mapleFU commented Aug 30, 2023

There's no special handling of them. So they are dealt with the same way C++ deals with them.

TEST(TestRollingProd, ZeroHandling) {
  for (std::string func : {"rolling_prod", "rolling_prod_checked"}) {
    for (auto ty : NumericTypes()) {
      if (ty == uint8() || ty == int8()) {  // out of bounds
        continue;
      }
      RollingOptions options(/*window_length=*/3, 3, true);
      CheckRolling(func, ArrayFromJSON(ty, "[1, 2, 0, 4, 5, 6, 10]"),
                   ArrayFromJSON(ty, "[1, 2, 0, 0, 0, 120, 300]"), &options);
   }
  }
}

Yeah, I mean when handling window with 0, NaN. would "prod" suffer from divide by zero / NaN when 0/NaN is moved from window?

@js8544
Copy link
Collaborator Author

js8544 commented Aug 30, 2023

I see. They do require some special handling.

@js8544
Copy link
Collaborator Author

js8544 commented Aug 30, 2023

I've add special handling of NaN and zero for rolling_prod and rolling_prod_checked, along with a test case for them.

Comment on lines +139 to +182
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid division by zero! Clever.

@mapleFU
Copy link
Member

mapleFU commented Aug 30, 2023

Thanks, a clever fix! Would "sum" or mean also suffer from nan? should we fixing them using the same way?

@js8544
Copy link
Collaborator Author

js8544 commented Aug 31, 2023

Fixed for sum and mean. Also I noticed that the default options for AssertChunkedEquivalent doesn't allow 'nan=nan', so I fixed it too.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with arrow compute, but the code looks clean and good to me! Thanks

@js8544
Copy link
Collaborator Author

js8544 commented Oct 11, 2023

@pitrou This PR is also ready for review. Would you mind having a look? Thanks!

@js8544
Copy link
Collaborator Author

js8544 commented Dec 9, 2023

@felipecrv @mapleFU Hi, since both of you have become committers (congrats btw), would you mind have another look at this PR? You've approved this before but it wasn't merged.

@felipecrv
Copy link
Contributor

@felipecrv @mapleFU Hi, since both of you have become committers (congrats btw), would you mind have another look at this PR? You've approved this before but it wasn't merged.

I will take another look before merging. Soon-ish.

@github-actions
Copy link

Thank you for your contribution. Unfortunately, this pull request has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this PR will be closed in 14 days. Feel free to re-open this if it has been closed in error. If you do not have repository permissions to reopen the PR, please tag a maintainer.

@github-actions github-actions bot added the Status: stale-warning Issues and PRs flagged as stale which are due to be closed if no indication otherwise label Nov 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting committer review Awaiting committer review Component: C++ Component: Documentation Component: Python Status: stale-warning Issues and PRs flagged as stale which are due to be closed if no indication otherwise

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Python] Add rolling_* functions

3 participants