Skip to content

Conversation

@toppyy
Copy link
Contributor

@toppyy toppyy commented May 12, 2024

Related to tidyverse/duckplyr#92

Changes:

  1. rapi_rel_order now accepts a vector of boolean values that describe whether each expression should be sorted in ascending order
  2. rel_order checks that the number of expressions and boolean values match or defaults to TRUE if a null-value is passed
  3. tests for the relation api test the rel_order(), not rapi_rel_order(). The tests break without this change as the default value is handled in rel_order.

I will mark this as a draft due to these notes:

  • This does not work for nested expressions. For example, assume someone writes arrange(someotherfunction(desc(column))) in duckplyr. I don't know if this is an issue as the interpretation of such an expression in terms of sort order is a bit ambiguous(?).
    • note: this problem exists also without changes proposed here?
  • I was unsure what function should to check that length(orders) == length(ascending). It is now rel_order(), rapi_rel_order(). As a result, using rapi_rel_order directly might result indexing out of array bounds. Perhaps the check should be in rapi_rel_order as the vector is indexed there?
  • Related to the previous note. Should we aim to test rel_order() or rapi_rel_order? There are examples of both for other relation expressions

Copy link
Collaborator

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great! Triggering checks.

OrderType order_type;
size_t i = 0;

for (expr_extptr_t expr : orders) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Antonov548: What's the idiomatic C++ way to iterate over two "parallel" vectors of the same length, orders and ascending ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I know there is no specific function in STL to do this. Usually it's range based for loop with : for single iterable objects and index based loop for multiple vectors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use zip_view() or similar?

https://codereview.stackexchange.com/q/283638/10691

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ranges require C++20 and higher. Not sure that it's worth of changing to C++20 if it's even possible.

@krlmlr
Copy link
Collaborator

krlmlr commented May 22, 2024

Testing rel_order() is fine, rapi_rel_order() is low-level and will never be exported. The arrange(f(desc(x))) issue is not much of a problem -- duckplyr will only translate desc() if it's the outermost expression and fall back to dplyr otherwise.

@krlmlr krlmlr marked this pull request as ready for review May 22, 2024 06:18
@krlmlr krlmlr changed the title rel_order accepts sort order as an argument feat: New sort argument to rel_order() May 22, 2024
@krlmlr krlmlr force-pushed the rel_order_sort_order branch from c9d80cd to be42107 Compare May 26, 2024 19:03
@krlmlr krlmlr merged commit 268ccc9 into duckdb:main May 27, 2024
@krlmlr
Copy link
Collaborator

krlmlr commented May 27, 2024

Thanks!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 28, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants