Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect SELECT * statement previewing tables with timestamp partitions #13009

Closed
3 tasks done
kekwan opened this issue Feb 8, 2021 · 6 comments
Closed
3 tasks done
Labels
!deprecated-label:bug Deprecated label - Use #bug instead inactive Inactive for >= 30 days sqllab Namespace | Anything related to the SQL Lab

Comments

@kekwan
Copy link
Contributor

kekwan commented Feb 8, 2021

When previewing a table in SQL Lab, a SELECT * statement is generated based on table data and table metadata if it contains partitions. When generating the SELECT * statement on a table with partitions, the table data, 'TIMESTAMP' in my case is not casted in the WHERE block.

Expected results

Previewing tables with data partitions should work. The WHERE block of the generated SELECT * statement should talk into account casting of data types, particularly TIMESTAMP.

Actual results

select_star method calls where_latest_partition to generate WHERE block when partition exists. where_latest_partition is failing to CAST data types needed for WHERE block causing syntax error when it runs the select * statement.

SQL Statement generated by get_table_metadata calling select_star. Columns datetimepartition and etl_ts should be casted to TIMESTAMP, not VARCHAR.

SELECT "event_type" AS "event_type",
       "trace_id" AS "trace_id",
       "tenant" AS "tenant",
       "core_tenant" AS "core_tenant",
       "event_timestamp" AS "event_timestamp",
       "table_id" AS "table_id",
       "dtr_workspace_id" AS "dtr_workspace_id",
       "connectors" AS "connectors",
       "table_type" AS "table_type",
       "row_count" AS "row_count",
       "field_count" AS "field_count",
       "partition_count" AS "partition_count",
       "data_size_bytes" AS "data_size_bytes",
       "datetimepartition" AS "datetimepartition",
       "etl_ts" AS "etl_ts",
       "insert_try_number" AS "insert_try_number"
FROM "events"."data_tables"
WHERE "datetimepartition" = '2021-02-08 16:00:00.000'
  AND "etl_ts" = '2021-02-08 16:30:00.000'
  AND "insert_try_number" = 1
LIMIT 100

Syntax error from Presto due to the incorrect SQL statement
Presto error: Cannot apply operator: timestamp(3) = varchar(23)

Screenshots

image

How to reproduce the bug

  1. Go to SQL Lab
  2. Preview Trino/Presto table with partitions and TIMESTAMP as a partition column.

Environment

  • superset version: 0.37.2
  • python version: python 3.6

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.
@kekwan kekwan added the #bug Bug report label Feb 8, 2021
@junlincc junlincc added the sqllab Namespace | Anything related to the SQL Lab label Feb 8, 2021
@tritonrc
Copy link

Hi, I ran into this same issue yesterday and after digging around the code I landed on a fix. In my case, the fields were coming back marked as Time type. Looking at https://github.com/apache/superset/blob/master/superset/db_engine_specs/presto.py#L576

I insert two lines to cover the Time type at line 579:

if tt == utils.TemporalType.TIME:
            return f"""from_iso8601_timestamp('{dttm.isoformat(timespec="microseconds")}')"""  # pylint: disable=line-too-long

As part of digging into this, I ran across the recently landed PR #12861 which might help here as well.

@kekwan
Copy link
Contributor Author

kekwan commented Feb 10, 2021

Hey @tritonrc. I also fixed the issue but had a slightly different fix for my scenario.

Issue 1 was our Presto TIMESTAMP columns were getting marked as OTHER (not TIME either) due to Trino/Presto's new timestamp precision type. We followed this #11138 (comment), to omit timestamp precision and it correctly identified our TIMESTAMP columns.

Even this didn't fix our issue with previewing tables however. I believe Presto/Trino doesn't do automatic casting in WHERE statements so when the SELECT * is generated for preview tables, it needs to explicitly cast the partition value to
TIMESTAMP. This is a valid query in Presto (notice the explicit casting of the partition value)

SELECT "event_type" AS "event_type",
       "trace_id" AS "trace_id",
       "tenant" AS "tenant",
       "core_tenant" AS "core_tenant",
       "event_timestamp" AS "event_timestamp",
       "table_id" AS "table_id",
       "dtr_workspace_id" AS "dtr_workspace_id",
       "connectors" AS "connectors",
       "table_type" AS "table_type",
       "row_count" AS "row_count",
       "field_count" AS "field_count",
       "partition_count" AS "partition_count",
       "data_size_bytes" AS "data_size_bytes",
       "datetimepartition" AS "datetimepartition",
       "etl_ts" AS "etl_ts",
       "insert_try_number" AS "insert_try_number"
FROM "events"."data_tables"
WHERE "datetimepartition" = TIMESTAMP '2021-02-08 16:00:00.000'
  AND "etl_ts" = TIMESTAMP '2021-02-08 16:30:00.000'
  AND "insert_try_number" = 1
LIMIT 100

In order to do this, I implemented a TypeDecorator class which overrode existing functionality of the SQLA timestamp type and instantiated that class here when the column type is timestamp https://github.com/apache/superset/blob/master/superset/db_engine_specs/presto.py#L917

@tooptoop4
Copy link
Contributor

@kekwan can u share the code for the fix?

@kekwan
Copy link
Contributor Author

kekwan commented Feb 11, 2021

@tooptoop4 Yup. I filed a PR #13082

@zuzana-vej zuzana-vej added !deprecated-label:bug Deprecated label - Use #bug instead and removed #bug Bug report labels Apr 20, 2021
@stale
Copy link

stale bot commented May 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label May 2, 2022
@rusackas
Copy link
Member

Sounds like this was fixed, but never closed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
!deprecated-label:bug Deprecated label - Use #bug instead inactive Inactive for >= 30 days sqllab Namespace | Anything related to the SQL Lab
Projects
None yet
Development

No branches or pull requests

6 participants