Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java][JDBC] Converting TIMESTAMP_WITH_TIMEZONE #37021

Closed
aiguofer opened this issue Aug 4, 2023 · 6 comments · Fixed by #37085 · May be fixed by #37088
Closed

[Java][JDBC] Converting TIMESTAMP_WITH_TIMEZONE #37021

aiguofer opened this issue Aug 4, 2023 · 6 comments · Fixed by #37085 · May be fixed by #37088
Assignees
Labels
Component: Java Type: usage Issue is a user question
Milestone

Comments

@aiguofer
Copy link
Contributor

aiguofer commented Aug 4, 2023

Describe the usage question you have. Please include as many useful details as possible.

Error description

We have a Java FlightServer for Arrow Flight SQL that connects to various types of datawarehouses through JDBC and converts the results into arrow to send back using sqlToArrowVectorIterator. Everything has been working pretty well, but we just ran into some issues with TIMESTAMP_TZ columns on Snowflake.

We were getting the following exception (redacted):

java.lang.RuntimeException: Error occurred while creating iterator.
	at org.apache.arrow.adapter.jdbc.ArrowVectorIterator.create(ArrowVectorIterator.java:87)
	at org.apache.arrow.adapter.jdbc.JdbcToArrow.sqlToArrowVectorIterator(JdbcToArrow.java:101)
...
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.arrow.vector.types.pojo.ArrowType.getTypeID()" because "arrowType" is null
	at org.apache.arrow.adapter.jdbc.JdbcToArrowUtils.getConsumer(JdbcToArrowUtils.java:437)
	at org.apache.arrow.adapter.jdbc.ArrowVectorIterator.initialize(ArrowVectorIterator.java:142)
	at org.apache.arrow.adapter.jdbc.ArrowVectorIterator.createVectorSchemaRoot(ArrowVectorIterator.java:134)

After digging around, I noticed we were getting back a JDBC type TIMESTAMP_WITH_TIMEZONE (2014) which wasn't mapped by the default converter JdbcToArrowUtils.getArrowTypeFromJdbcType. I created my own converter that mapped that type the same way it maps TIMESTAMP to get me unblocked (figured I'd deal with the TZ issue later). Now I don't get the exception, but the TIMESTAMP_WITH_TIMEZONE is excluded from the results. For example here's how I set up the data in Snowflake:

image

And here's what I get when I query through Dbeaver using the AFS JDBC driver through our service:

image

Debugging attempts

My attempts at debugging this have yielded very strange results:

  • When I look in the debugger on our service, the VectorSchemaRoot looks correct:
image
  • If I try to select just the NTZ column it works fine:
image
  • If I try to select just the TZ column it seems to try to make a acceptPutPreparedStatementUpdate call, which we haven't implemented so it fails.

My guess is that this is a bug on the JDBC driver, but as far as that's concerned the 2 vectors are of the same type, so I don't see why it would just drop one column.

Any help would be greatly appreciated!

Component(s)

Java

@aiguofer aiguofer added the Type: usage Issue is a user question label Aug 4, 2023
@aiguofer
Copy link
Contributor Author

aiguofer commented Aug 4, 2023

Nvm, I figured it out. The issue is we were still using jdbcToArrowSchema(ResultSetMetaData rsmd, Calendar calendar), which uses the default converter in the createPreparedStatement request. It seems that when jdbcToArrowSchema can't map a specific type, it simply excludes that Field in the Schema that it generates. Then, when you do getStreamPreparedStatement the conversion fails.

By passing the same custom JdbcToArrowConfig with our custom converter to both jdbcToArrowSchema and to sqlToArrowVectorIterator we're now getting the right results!! Now I just have to make sure we set the TZ correctly, which might require querying snowflake for the current session timezone... this might be tough, especially if a user changes the session timezone while talking to our service -.-

Would it make sense to make a bug report for the behavior discrepancy between jdbcToArrowSchema and sqlToArrowVectorIterator with regards to unmapped types? or to add support for TIMESTAMP_WITH_TIMEZONE in the default converter?

@lidavidm
Copy link
Member

lidavidm commented Aug 4, 2023

Yes, we should error instead of silently dropping a field; and we should support TIMESTAMP WITH TIMEZONE if possible, though I wonder if the semantics are consistent enough between vendors to allow this. (I ran into this with ADBC: you can't add your own converters, so I was considering forking arrow-jdbc for that...)

@aiguofer
Copy link
Contributor Author

aiguofer commented Aug 4, 2023

Cool that makes sense!

I've been digging more into this and I'm a bit confused. It seems like the default is to set the timezone associated with the config.calendar to regular TIMESTAMP w/o TZ.

Shouldn't the TZ be null for TIMESTAMP and then set the TZ based on the config.calendar like this (in Kotlin):

            Types.TIMESTAMP -> ArrowType.Timestamp(TimeUnit.MILLISECOND, null)
            Types.TIMESTAMP_WITH_TIMEZONE -> ArrowType.Timestamp(TimeUnit.MILLISECOND, calendar.timeZone.id)

I tried doing this but then I got an error in JdbcToArrowUtils.getConsumer because of:

      case Timestamp:
        if (config.getCalendar() == null) {
          return TimestampConsumer.createConsumer((TimeStampMilliVector) vector, columnIndex, nullable);
        } else {
          return TimestampTZConsumer.createConsumer((TimeStampMilliTZVector) vector, columnIndex, nullable, calendar);
        }

it seems to me like that should be

      case Timestamp:
        if (((ArrowType.Timestamp) arrowType).getTimezone() == null) {
          return TimestampConsumer.createConsumer((TimeStampMilliVector) vector, columnIndex, nullable);
        } else {
          return TimestampTZConsumer.createConsumer((TimeStampMilliTZVector) vector, columnIndex, nullable, calendar);
        }

It's a little weird because you can specify the converter but not the consumer, but the consumer seems to be coupled to the default converter.

I'd be happy to start a PR for either:

  • Making getConsumer pluggable like the converter
  • Applying the changes above
  • Applying some variation of the changes above to support TIMESTAMP_WITH_TIMEZONE

@lidavidm
Copy link
Member

lidavidm commented Aug 7, 2023

#36519 might be related? The timestamp handling is iffy in general.

Any/all of those 3 would be welcome.

@aiguofer
Copy link
Contributor Author

aiguofer commented Aug 9, 2023

Hmmm that's interesting! I think it's related, although that's about handling timestamps client-side on the driver. The issue here is in the conversion from a JDBC ResultSet. I'll put up a PR with what I think should be the correct behavior, but this might affect the other conversation since it affects what is returned to the driver if using this conversion on the server side.

lidavidm pushed a commit that referenced this issue Aug 10, 2023
### Rationale for this change

This was discussed in [this thread](#37021 (comment)). The `getConsumer` implementation depends on the implementation for `getArrowTypeFromJdbcType`, especially for timestamp objects. Since we can provide a different implementation for `JdbcToArrowTypeConverter` it follows we should be able to provide a different implementation for `JdbcConsumerGetter`

### What changes are included in this PR?

Adds a way to configure an alternate `JdbcToArrowUtils.getConsumer` function through `JdbcToArrowConfigBuilder.setJdbcConsumerGetter`.

It also throws a more helpful exception from the default implementations when a provided type is not mapped.

### Are these changes tested?

No, the default behavior remains unchanged.
* Related: #37021
* Closes: #37021

Authored-by: Diego Fernandez <aiguo.fernandez@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
@lidavidm lidavidm added this to the 14.0.0 milestone Aug 10, 2023
@lidavidm
Copy link
Member

Ah - do you mind opening a separate issue for the other PR?

loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
### Rationale for this change

This was discussed in [this thread](apache#37021 (comment)). The `getConsumer` implementation depends on the implementation for `getArrowTypeFromJdbcType`, especially for timestamp objects. Since we can provide a different implementation for `JdbcToArrowTypeConverter` it follows we should be able to provide a different implementation for `JdbcConsumerGetter`

### What changes are included in this PR?

Adds a way to configure an alternate `JdbcToArrowUtils.getConsumer` function through `JdbcToArrowConfigBuilder.setJdbcConsumerGetter`.

It also throws a more helpful exception from the default implementations when a provided type is not mapped.

### Are these changes tested?

No, the default behavior remains unchanged.
* Related: apache#37021
* Closes: apache#37021

Authored-by: Diego Fernandez <aiguo.fernandez@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment