Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DuckDB JDBC doesnt seem to work with timestamps in Tableau #7013

Closed
2 tasks done
ryanrozich opened this issue Apr 10, 2023 · 21 comments
Closed
2 tasks done

DuckDB JDBC doesnt seem to work with timestamps in Tableau #7013

ryanrozich opened this issue Apr 10, 2023 · 21 comments
Assignees

Comments

@ryanrozich
Copy link

ryanrozich commented Apr 10, 2023

What happens?

I've connected tableau desktop (Mac) to duckdb via 'Other Database (JDBC)' as described here
https://brocktibert.medium.com/use-tableau-to-connect-to-duckdb-e92daad6e7b4

The connection works, and I can see the schema. However when I preview the data I get this error:
image

In tableau workbook I can work with the varchar fields as dimensions, however whenever I try to use the timestamp fields (truncated to some lower granularity like hour or year) I get the following error
image

Are there any known issues with working with timestamps in tableau using the jdbc connector? I am using the SQL92 Postgres dialect in the JDBC datasource setup wizard in tableau. I have also tried setting it up using the Postgres SQL92 dialect, and got a different error in the worksheet when trying to work with the timestamp field.

My duckdb database reads from parquet files stored locally
I am on

  • MacOS 13.3.1
  • duckdb v0.7.1 b00b93f
  • tableau desktop 2023.1

To Reproduce

duckdb database.duckdb
CREATE TABLE f_cj_events_v2 AS SELECT * FROM read_parquet('/path/to/*.parquet');
CHECKPOINT;

(the database needs to include a timestamp field)

(copy duckdb_jdbc-0.7.1 to ~/Library/Tableau/Drivers) MacOS

(in tableau desktop)
Create datasource -> other JDBC -> jdbc:duckdb:///path_to/database.duckdb
Drag the timestamp field to a column, set the granularity to 'Hour'

OS:

MacOS 13.3.1

DuckDB Version:

0.7.1

DuckDB Client:

Tableau (JDBC)

Full Name:

Ryan Rozich

Affiliation:

Self

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree
@Mause
Copy link
Member

Mause commented Apr 10, 2023

Are you able to provide the error you get when you use the postgres dialect? We generally try to provide postgres compatibility

@ryanrozich
Copy link
Author

Absolutely.. I've posted both the error messages as well as the screenshots for SQL92 and Postgres below. Looks like I actually did paste the error from the postgres dialect above

POSTGRES DIALECT

An error occurred while communicating with the Other Databases (JDBC) data source 'f_cj_events_v2 (main)'
Bad Connection: Tableau could not connect to the data source.
Error Code: FAB9A2C5
java.sql.SQLException: Catalog Error: Scalar Function with name trunc does not exist!
Did you mean "tan"?
LINE 2:   CAST(TRUNC(EXTRACT(YEAR FROM "f_cj_events_v2"."view_received_at")) AS INTEGER) AS "yr:view_received_at:ok"
FROM "main"."f_cj_events_v2" "f_cj_events_v2"
GROUP BY 2...
               ^
SELECT SUM(1) AS "cnt:f_cj_events_v2_0A2AC4C294B143D686A2602064539CC4:ok",
  CAST(TRUNC(EXTRACT(YEAR FROM "f_cj_events_v2"."view_received_at")) AS INTEGER) AS "yr:view_received_at:ok"
FROM "main"."f_cj_events_v2" "f_cj_events_v2"
GROUP BY 2

image

SQL92 DIALECT

An error occurred while communicating with the Other Databases (JDBC) data source 'f_cj_events_v2 (main)'
Bad Connection: Tableau could not connect to the data source.
Error Code: FAB9A2C5
java.sql.SQLException: Parser Error: syntax error at or near "TIMESTAMPADD"
LINE 2:   {fn TIMESTAMPADD(SQL_TSI_HOUR,EXTRACT(HOUR FROM "f_cj_events_v2"."view_received_at"),CAST("f_cj_events_v2"."view_received_at" AS DATE))} AS "thr_view_received_at_ok"
FROM "main"."f_cj_events_v2" "f_cj_events_v2"
GROUP BY 2...
              ^
SELECT SUM(1) AS "cnt_f_cj_events_v2_0A2AC4C294B143D686A2602064539CC4_ok",
  {fn TIMESTAMPADD(SQL_TSI_HOUR,EXTRACT(HOUR FROM "f_cj_events_v2"."view_received_at"),CAST("f_cj_events_v2"."view_received_at" AS DATE))} AS "thr_view_received_at_ok"
FROM "main"."f_cj_events_v2" "f_cj_events_v2"
GROUP BY 2

image

@hawkfish
Copy link
Contributor

hawkfish commented Apr 10, 2023

When you set the granularity to year, do you mean you are truncating or extracting the part? Because the generated code looks like it is using Tableau's DATE_PART function rather than DATE_TRUNC (that's what the yr:<field name>:ok pseudo-column name means). In the date pill menu, the first group of date functions create discrete categorical bins; the second group (outlined in red) use continuous truncation.

Date Menu

But either way, Tableau should have no trouble generating the correct function equivalents for Postgres (it's one of the first SQL dialects I implemented when I was there.) What was the error you got when you used the Postgres dialect?

@hawkfish
Copy link
Contributor

hawkfish commented Apr 10, 2023

(Sorry, simultaneous posts)

Ah it looks like we don't have the Postgres TRUNC function. That's annoying but not too hard to add.

@hawkfish
Copy link
Contributor

What's going on here is that Tableau expects DATE_PART to always return an integer but PG will return decimals in some cases (e.g., SECONDS) so it wraps the part extraction in a TRUNC to remove the decimals and then casts it to an INTEGER.

@hawkfish hawkfish self-assigned this Apr 10, 2023
hawkfish pushed a commit to hawkfish/duckdb that referenced this issue Apr 11, 2023
Note that there are integral implementations too,
because the use case in the bug is Tableau wrapping EXTRACT in TRUNC
to force integral values. Since our EXTRACT functions are integral,
this would either fail or add an extra cast.
@ryanrozich
Copy link
Author

Thank you @hawkfish - there was a suggestion on discord to implement a function like this in my local duckdb as a workaround

CREATE MACRO trunc(x) as cast(x as integer)

And this worked when I was extracting the "hour" part in tableau in the pill at the top; however when I try to "truncate" to the hour I get this error:

(Using Postgres dialect for JDBC driver)

image

Error Message

An error occurred while communicating with the Other Databases (JDBC) data source 'f_cj_events_v2 (main)'
Bad Connection: Tableau could not connect to the data source.
Error Code: FAB9A2C5
getTimestamp
SELECT DATE_TRUNC( 'HOUR', CAST("f_cj_events_v2"."view_received_at" AS TIMESTAMP) ) AS "thr:view_received_at:ok"
FROM "main"."f_cj_events_v2" "f_cj_events_v2"
GROUP BY 1

In fact, it seems with this TRUNC macro workaround, all of the "extract" functions work on the timestamp in tableau, but the "truncate" operations do not:

image

@ryanrozich
Copy link
Author

When I run the sql that is listed in the error message below getTimestamp from the duckdb CLI I dont get an error:

SELECT DATE_TRUNC( 'HOUR', CAST("f_cj_events_v2"."view_received_at" AS TIMESTAMP) ) AS "thr:view_received_at:ok"
FROM "main"."f_cj_events_v2" "f_cj_events_v2"
GROUP BY 1

Below is a screenshot running this in my CLI

image

@hawkfish
Copy link
Contributor

hawkfish commented Apr 11, 2023

Hmm the functions all exist (e.g., DATE_TRUNC), but there may be a related issue with reading timestamps in JDBC that IIRC is in process.

@ryanrozich
Copy link
Author

yeah it looks like its the getTimestamp function that appears in that query that is causing the problems..

In the Data Source Explorer in tableau, Im getting a getTimestamp error when I try to preview the data pressing the "update now" button:

image

@hawkfish
Copy link
Contributor

This is looking like a JDBC bug now.

@hawkfish
Copy link
Contributor

Can you try the most recent nightly build?

@hawkfish hawkfish added the JDBC label Apr 12, 2023
hawkfish pushed a commit to hawkfish/duckdb that referenced this issue Apr 12, 2023
* Use ScalarFunction::NopFunction for the integral types
* Test special values
Mytherin added a commit that referenced this issue Apr 13, 2023
@hawkfish
Copy link
Contributor

I've been working on a proper Tableau connector and as part of that work I managed to figure out that those strange error messages were an exception being thrown for calls to getTimestamp(column, Calendar). I have a pending fix for that, but it hasn't been well tested yet.

hawkfish pushed a commit to hawkfish/duckdb that referenced this issue Apr 27, 2023
We already have the data in universal time, so I think all we need to do
is just return is (scaled appropriately). It seems to fix the Tableau problem.
@ryanrozich
Copy link
Author

@hawkfish A proper Tableau connector would be 🔥 if you need any help testing it out I'd be happy to.

Some of the errors and limitations in using the JDBC data source are frustrating, but its pretty awesome to be able to shred through 100s of GBs of parquet files locally in tableau. Beats waiting in query queues for snowflake or redshift serverless :) It would make a pretty awesome demo for duckdb

Feel free to msg me on discord @GladiatorPoeta

hawkfish pushed a commit to hawkfish/duckdb that referenced this issue Apr 27, 2023
hawkfish pushed a commit to hawkfish/duckdb that referenced this issue Apr 27, 2023
@hawkfish
Copy link
Contributor

@hawkfish A proper Tableau connector would be 🔥 if you need any help testing it out I'd be happy to.

Some of the errors and limitations in using the JDBC data source are frustrating, but its pretty awesome to be able to shred through 100s of GBs of parquet files locally in tableau. Beats waiting in query queues for snowflake or redshift serverless :) It would make a pretty awesome demo for duckdb

That's exactly the use case I have in mind. There are two other hive partitioning issues I am looking at:

  1. Missing files in a deep tree should not be an error
  2. Only files actually being scanned should be open (otherwise you can hit too many files open errors).

Mytherin added a commit that referenced this issue Apr 28, 2023
Issue #7013: Implement getTimestamp Calendar
@ryanrozich
Copy link
Author

@hawkfish I saw your comment above on trying a nightly build. I'm happy to try it out, how does one switch to a nightly build? I used brew to install duck previously

(base) ryan@RYANs-MBP ~ % brew info duckdb
==> duckdb: stable 0.7.1 (bottled)
Embeddable SQL OLAP Database Management System
https://www.duckdb.org
/opt/homebrew/Cellar/duckdb/0.7.1 (884 files, 106.5MB) *
  Poured from bottle using the formulae.brew.sh API on 2023-04-01 at 07:57:05
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/duckdb.rb
License: MIT
==> Dependencies
Build: cmake ✘
==> Analytics
install: 16 (30 days), 387 (90 days), 3,441 (365 days)
install-on-request: 16 (30 days), 388 (90 days), 3,443 (365 days)
build-error: 0 (30 days)

(base) ryan@RYANs-MBP ~ % duckdb --version
v0.7.1 b00b93f0b1

Is there a way to switch to a nightly build using brew, or do I need to download and install separately?

@hawkfish
Copy link
Contributor

I don't think you actually need the latest build of the Cl tool because it is embedded in the JDBC driver. The build with the fix is here.

@hawkfish
Copy link
Contributor

Also, if you want the bleeding edge build from master, just go to the installation page and choose macOS/Github master. It will download the binaries that you can directly execute.

@ryanrozich
Copy link
Author

I got the latest nightly JDBC installed and also CLI, and looks like the errors I was seeing with the truncation timestamp aggregation are fixed!

image

quick note if anyone else wants to run from the latest: you will likely need to rebuild your database files from the same nightly build as the JDBC driver.

@hawkfish
Copy link
Contributor

hawkfish commented May 9, 2023

You should post this in show and tell!

@Mause
Copy link
Member

Mause commented May 15, 2023

Sounds like this is fixed, so I'll close this issue

@Mause Mause closed this as completed May 15, 2023
@andreas-eviny
Copy link

Hi, when using the JDBC driver for DuckDB 0.8 i get a weird error when using date_trunc('month') in tableau.

Running the query in a sql client works fine, but the results in tableau shows one month off. It actually returns the last day of the previous month, instead of the first day in the current month. The syntax in the tableau generated query is:

DATE_TRUNC( 'MONTH', CAST("utilization"."ts_from" AS TIMESTAMP) )

(ts_from is a timestamp column)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants