Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting FILTER and other similar operations #480

Open
yuvi17 opened this issue Oct 16, 2023 · 2 comments
Open

Supporting FILTER and other similar operations #480

yuvi17 opened this issue Oct 16, 2023 · 2 comments

Comments

@yuvi17
Copy link

yuvi17 commented Oct 16, 2023

Athena has support for filter function that can be used to filter values in a array and return the filtered array.
https://docs.aws.amazon.com/athena/latest/ug/filtering-arrays.html

It will be quite helpful if we can have an equivalent function in pyathena's SQLAlchemy dialect.


Use Case

I have a table that contains an array of timestamps and an action performed by an actor.
Something like

[ {"timestamp": "2023-10-10", "action_count": 5}, {"timestamp": "2023-10-11", "action_count": 6}...]

I would want to write a query which can give me something like, count of times this action was performed more than 3 times b/w certain dates.
Or users who performed this action a total of more than 3 times b/w certain dates.

If had to write Athena SQL for this, it would look something like this

Select * from actors
where
count(filter(action, x -> x.timestamp <= '2023-10-10' and x.timestamp >= '2023-10-01' and x.action_count >= 2))

I would like to do something like this for a table with multiple such columns and using and or etc. between those.

@laughingman7743 Is there any way I can do this today?

I hate the idea of writing the query as Python F-strings and filling in the values.

@yuvi17
Copy link
Author

yuvi17 commented Oct 16, 2023

A query built using sqlalchemy + pyathena something like this

query = select().where(
    func.cardinality(func.filter(
        "action",
        lambda x: (
            func.cast(x['timestamp'], Date) <= '2020-09-07',
            func.cast(x['timestamp'], Date) >= '2020-06-01',
            func.cast(x['action_count'], Integer) > 3
        )
    ))
)

results in a UnsupportedCompilationError with exceptions

        operator_ = override_operator or binary.operator
        disp = self._get_operator_dispatch(operator_, "binary", None)
        if disp:
            return disp(binary, operator_, **kw)
        else:
            try:
>               opstring = OPERATORS[operator_]
E               KeyError: <built-in function getitem>

../../../opt/anaconda3/envs/py39/lib/python3.9/site-packages/sqlalchemy/sql/compiler.py:2201: KeyError

The above exception was the direct cause of the following exception:
.....
.....
.....

        elif replace_context is not None:
            # no good solution here, we would like to have the exception
            # have only the context of replace_context.__context__ so that the
            # intermediary exception does not change, but we can't figure
            # that out.
            exception.__cause__ = replace_context

        try:
>           raise exception
E           sqlalchemy.exc.UnsupportedCompilationError: Compiler <pyathena.sqlalchemy_athena.AthenaStatementCompiler object at 0x7f98cba92190> can't render element of type <built-in function getitem> (Background on this error at: https://sqlalche.me/e/14/l7de)

../../../opt/anaconda3/envs/py39/lib/python3.9/site-packages/sqlalchemy/util/compat.py:208: UnsupportedCompilationError

This is after creating an engine with athena and passing the dialect to compile function of sqlalchemy.

@laughingman7743
Copy link
Owner

If there is anything missing from the implementation in PyAthena, pull requests are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants