Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11431: [Rust][DataFusion] Support the HAVING clause. #9364

Closed
wants to merge 9 commits into from

Conversation

drusso
Copy link
Contributor

@drusso drusso commented Jan 29, 2021

This commit adds support for the SQL HAVING clause.

For example, the following queries are supported:

  • Filtering on an aggregate:

     SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING MAX(c2) > 100;
    
  • Filtering on an aliased aggregate:

    SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING m > 100;
    
  • Filtering on an aggregate that does not appear in the SELECT:

    SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING MIN(c2) > 100;
    
  • Filtering on a complex aggregates:

    SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING MAX(c2 / 2) + 1 > 100;
    
  • Filtering on a non-aggregate column:

    SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING c1 > 100;
    
  • Filtering without a GROUP BY:

    SELECT MAX(c1) FROM t HAVING MAX(c1) > 100;
    

@github-actions
Copy link

@Dandandan
Copy link
Contributor

Dandandan commented Jan 29, 2021

This is looking good @drusso ! I think it can use some tests with example (tabular) input as well? To make sure the results are as expected. There are some more end to end tests in the tests directory.

@drusso
Copy link
Contributor Author

drusso commented Jan 29, 2021

Thanks!

I will definitely add some more tests. There are also a couple of clippy errors for me to address.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quickly skimmed this PR @drusso and it is looking really cool! Thank you very much! Please ping me when you think it is ready for review and I will give it a close read.

@drusso
Copy link
Contributor Author

drusso commented Feb 1, 2021

@alamb @Dandandan I've updated the PR, let me know if there's anything outstanding.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this PR carefully, understood all the changes and have no comments. Great work, @drusso 💯

I think that we should add it to the README as part of the SQL features. :)

@alamb
Copy link
Contributor

alamb commented Feb 1, 2021

I plan to review this carefully tomorrow

@drusso
Copy link
Contributor Author

drusso commented Feb 2, 2021

Thanks @jorgecarleitao and @alamb!

I added a note in the README.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is beautiful @drusso -- I think it should have one more test (for a query with both HAVING and WHERE) and get rebased against master and it is ready to merge. 🥇

rust/datafusion/src/sql/planner.rs Show resolved Hide resolved
@alamb
Copy link
Contributor

alamb commented Feb 3, 2021

Thanks @drusso !

@alamb
Copy link
Contributor

alamb commented Feb 3, 2021

I am just waiting for the CI to finish on this PR and then I plan to merge it!

@alamb
Copy link
Contributor

alamb commented Feb 3, 2021

Looks like github is operating in degraded fashion: https://www.githubstatus.com/

@alamb
Copy link
Contributor

alamb commented Feb 3, 2021

All the rust tests are passing. Merging this in even though Travis is still running.

@alamb alamb closed this in 3d4c2bb Feb 3, 2021
@alamb
Copy link
Contributor

alamb commented Feb 3, 2021

🎉

@jorgecarleitao
Copy link
Member

🎉

@Dandandan
Copy link
Contributor

🚀

nevi-me pushed a commit to nevi-me/arrow that referenced this pull request Feb 13, 2021
This commit adds support for the SQL `HAVING` clause.

For example, the following queries are supported:

* Filtering on an aggregate:

    ```
     SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING MAX(c2) > 100;
    ```

* Filtering on an aliased aggregate:

    ```
    SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING m > 100;
    ```

* Filtering on an aggregate that does not appear in the SELECT:

    ```
    SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING MIN(c2) > 100;
    ```

* Filtering on a complex aggregates:

    ```
    SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING MAX(c2 / 2) + 1 > 100;
    ```

* Filtering on a non-aggregate column:

    ```
    SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING c1 > 100;
    ```

* Filtering without a `GROUP BY`:

    ```
    SELECT MAX(c1) FROM t HAVING MAX(c1) > 100;
    ```

Closes apache#9364 from drusso/ARROW-11431

Authored-by: Daniel Russo <danrusso@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
This commit adds support for the SQL `HAVING` clause.

For example, the following queries are supported:

* Filtering on an aggregate:

    ```
     SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING MAX(c2) > 100;
    ```

* Filtering on an aliased aggregate:

    ```
    SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING m > 100;
    ```

* Filtering on an aggregate that does not appear in the SELECT:

    ```
    SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING MIN(c2) > 100;
    ```

* Filtering on a complex aggregates:

    ```
    SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING MAX(c2 / 2) + 1 > 100;
    ```

* Filtering on a non-aggregate column:

    ```
    SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING c1 > 100;
    ```

* Filtering without a `GROUP BY`:

    ```
    SELECT MAX(c1) FROM t HAVING MAX(c1) > 100;
    ```

Closes apache#9364 from drusso/ARROW-11431

Authored-by: Daniel Russo <danrusso@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
This commit adds support for the SQL `HAVING` clause.

For example, the following queries are supported:

* Filtering on an aggregate:

    ```
     SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING MAX(c2) > 100;
    ```

* Filtering on an aliased aggregate:

    ```
    SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING m > 100;
    ```

* Filtering on an aggregate that does not appear in the SELECT:

    ```
    SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING MIN(c2) > 100;
    ```

* Filtering on a complex aggregates:

    ```
    SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING MAX(c2 / 2) + 1 > 100;
    ```

* Filtering on a non-aggregate column:

    ```
    SELECT c1, MAX(c2) FROM t GROUP BY c1 HAVING c1 > 100;
    ```

* Filtering without a `GROUP BY`:

    ```
    SELECT MAX(c1) FROM t HAVING MAX(c1) > 100;
    ```

Closes apache#9364 from drusso/ARROW-11431

Authored-by: Daniel Russo <danrusso@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants