Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do ignore nulls in some window functions? #2549

Closed
samsaara opened this issue Nov 5, 2021 · 5 comments
Closed

How to do ignore nulls in some window functions? #2549

samsaara opened this issue Nov 5, 2021 · 5 comments

Comments

@samsaara
Copy link

samsaara commented Nov 5, 2021

Is it possible at the moment to ignore nulls while computing some aggregations like first / last value etc. in window functions like Redshift does?

At present, I have user_id, order_id columns with order_id occasionally being null. I want to create a new column last_order_id (per user) and is it possible to use something like this? 😃

SELECT
  id,
  user_id,
  order_id,
  LAST_VALUE (order_id IGNORE NULLS) over (
    PARTITION BY user_id
    ORDER BY id
    ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
  ) AS last_order_id

Thanks for the great work btw 🙏 😻

@samsaara samsaara changed the title How to do aggregations ignoring nulls in some window functions? How to do ignore nulls in some window functions? Nov 5, 2021
@hawkfish
Copy link
Contributor

hawkfish commented Nov 5, 2021

We don't currently have support for the IGNORE NULLS construct, but I think you can get the result you want (and probably a lot faster by avoiding a window operator) with

SELECT id, user_id, order_id AS last_order_id
FROM table 
INNER JOIN (
    SELECT user_id, MAX(id) AS id
    FROM table
    GROUP BY 1
    WHERE order_id IS NOT NULL
    ) filter
WHERE table.id = filter.id 
  AND table.user_id = filter.user_id

The filter table picks out the maximum id for each user_id where the order_id is defined. The join then picks out the corresponding order_id.

@samsaara
Copy link
Author

samsaara commented Nov 6, 2021

Thanks for offering the solution @hawkfish but IIUC that's not quite what I was looking for as max(id) for each user would always go through all the rows in the table... what I need is to go through all the rows only until one row before current row.

For e.g., this is what I have

+------+-----------+------------+
|   id |   user_id |   order_id |
|------+-----------+------------|
|    0 |         1 |        614 |
|    1 |         1 |       null |
|    2 |         1 |       null |
|    3 |         1 |        639 |
|    4 |         1 |       2027 |
+------+-----------+------------+

and this is what I want to get

+------+-----------+------------+-----------------+
|   id |   user_id |   order_id |   last_order_id |
|------+-----------+------------+-----------------|
|    0 |         1 |        614 |            null |
|    1 |         1 |       null |             614 |
|    2 |         1 |       null |             614 |
|    3 |         1 |        639 |             614 |
|    4 |         1 |       2027 |             639 |
+------+-----------+------------+-----------------+

As you can see the max order (per user) changes as we go down the table.

@hawkfish
Copy link
Contributor

hawkfish commented Nov 6, 2021

Ah, sorry I get it now. You want the previous non-NULL order_id even for rows where the order_id is NULL (so you can't filter them out beforehand.) I'll think about how we might implement this.

@hawkfish
Copy link
Contributor

hawkfish commented Nov 6, 2021

The other thing I was not catching was the fact that this was referring to the LAST_VALUE window function, not the LAST aggregate function. It looks like IGNORE NULLS is a common extension for window functions, not general aggregates. So from an implementation point of view, this is something that should be added to window functions.

hawkfish pushed a commit to hawkfish/duckdb that referenced this issue Nov 9, 2021
Add the flag to the grammar.
Add user test.
hawkfish pushed a commit to hawkfish/duckdb that referenced this issue Nov 11, 2021
Add ignore_nulls flag to BoundWindowExpression.
Test unsupported cases.
hawkfish pushed a commit to hawkfish/duckdb that referenced this issue Nov 11, 2021
Replace BitArray with ValidityMask
in preparation for reusing scanning logic.
hawkfish pushed a commit to hawkfish/duckdb that referenced this issue Nov 11, 2021
Implement for the analytic functions that Oracle supports.
hawkfish pushed a commit to hawkfish/duckdb that referenced this issue Nov 11, 2021
The Oracle syntax also has a NOP keyword
alternative to IGNORE...
hawkfish pushed a commit to hawkfish/duckdb that referenced this issue Nov 12, 2021
Rename symbols for Windows R build.
hawkfish pushed a commit to hawkfish/duckdb that referenced this issue Nov 15, 2021
Bump the DB version and add a round trip test for the new flag.
hawkfish pushed a commit to hawkfish/duckdb that referenced this issue Nov 22, 2021
Resolve version conflict.
Mytherin added a commit that referenced this issue Nov 23, 2021
@Mytherin
Copy link
Collaborator

This should be fixed in #2583.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants