Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw a meaningful exception on unsupported data types in ORC #366

Merged
merged 1 commit into from
Jan 31, 2018
Merged

Throw a meaningful exception on unsupported data types in ORC #366

merged 1 commit into from
Jan 31, 2018

Conversation

stefanobaghino
Copy link
Contributor

I've encountered an error when moving data from a JDBCSource to an ORC-backed HiveSink.

I've manually created a Hive table to host the data contained in the original source. It may be that the semantics of bigint are different for Teradata (DECIMAL(n, n)) and Hive/ORC (LONG), however the problem is that whenever a BigIntType is encountered, a MatchError is thrown as this is not among the supported data types for ORC.

I solved my problem by patching the code as in this PR. It's very rough and of course I'm available to improve on this, should this be of interest for a review.

It's worth mentioning that along the way I noticed that probably the right tool to use would have been an ad-hoc implementation of MetastoreSchemaHandler suited for my use case. It's the first time I used Eel and unfortunately this came up later during my experimentation.

If that's the case, feel free to close this.

@hannesmiller
Copy link
Contributor

Hi Stefan,

It looks to me that Teradata Decimals should map to JDBC decimal types (i.e. java.math.Decimal). Therefore it should flow into the HiveSink as a decimal, however I don't think your HIVE table should be defined as bigint which is the same as a long in the EEL type system?

The BigIntType in the EEL type system is more like a BigDecimal without a scale.

Can you confirm what shows up as the JDBC type for the column in the JdbcSource or just write a little JDBC test program with same query and examine the types coming through in the debugger?

Alternatively you could set a breakpoint on the HiveSink at io.eels.component.hive.HiveSinkWriter#write to examine the EEL row as well.

@stefanobaghino
Copy link
Contributor Author

Yes, I probably just got the schema creation wrong.

Field([REDACTED],DecimalType(Precision(18),Scale(0)),false,false,None,false,null,Map(),None)

Still, do you think it would be a nice addition to add a more telling message rather then just having a MatchError thrown when a BigInt is encountered?

@hannesmiller
Copy link
Contributor

Absolutely we can add a sys.error with a more meaningful error message on a no match - this is very simple to fix.

I will keep this ticket open for this.

We might push this out early on a a11 release though we are planning to rollout a 1.3 official release in the next few weeks if everything goes well.

@stefanobaghino
Copy link
Contributor Author

Is there some kind of channel that can be used to get in touch with the development team? A mailing list or a Gitter channel or something like that? I like this library, perhaps there's something I can contribute.

@stefanobaghino stefanobaghino changed the title ORC BigInt support Unsupported data types in ORC cause a generic MatchError Jan 31, 2018
@stefanobaghino stefanobaghino changed the title Unsupported data types in ORC cause a generic MatchError Throw a meaningful exception on unsupported data types in ORC Jan 31, 2018
@stefanobaghino
Copy link
Contributor Author

stefanobaghino commented Jan 31, 2018

I've reworked this PR to go in the direction we discussed, I hope it's helpful. Let me know if the project is open for contribution. 🙂

Is there some kind of channel that can be used to get in touch with the development team? A mailing list or a Gitter channel or something like that? I like this library, perhaps there's something I can contribute.

@garyfrost
Copy link
Member

@stefanobaghino thanks for this contribution, will merge.

I think a gitter channel is a great idea, I'll set it up.

There are a couple of issues and we'd love your contributions, we can chat more about it when I set up the gitter channel if you like.

Gary

@garyfrost garyfrost merged commit edcc353 into 51zero:master Jan 31, 2018
@garyfrost
Copy link
Member

@stefanobaghino Gitter channel is up - https://gitter.im/eel-sdk/Lobby

@stefanobaghino
Copy link
Contributor Author

Awesome, thanks for acting on my suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants