-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHOENIX-2288 Phoenix-Spark: PDecimal precision and scale aren't carried through to Spark DataFrame #124
Conversation
…ed through to Spark DataFrame
This looks great @navis The Spark portion looks fine. I'll leave the updates to ColumnInfo for @ravimagham @JamesRTaylor et. al. to review |
if (pColumn.getMaxLength() == null) { | ||
return new ColumnInfo(pColumn.toString(), sqlType); | ||
} | ||
if (sqlType == Types.CHAR || sqlType == Types.VARCHAR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than check for particular types, it'd be more general to check for null like this:
Integer maxLength = pColumn.getMaxLength();
Integer scale = pColumn.getScale();
return new ColumnInfo(pColumn.toString(), sqlType, maxLength, scale);
Then make sure that ColumnInfo handles a null maxLength and scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JamesRTaylor How about to move the logic above into PColumn? Then we can access full information in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColumnInfo is a kind of lightweight transport class solely for passing in the necessary column metadata for the MR and Spark integration to run. It's passed in through the config so it has some simple to/from string methods - this prevents us from having to lookup the metadata from Phoenix metadata using the regular JDBC metadata APIs (which would be another option). Having this ColumnInfo class was deemed slightly easier.
PColumn has more information than we need and it'd be best to keep this as an internal/private class as much as possible. It's the object representation of our column metadata.
Thanks for the pull request, @navis. A couple of minor comments, but overall it looks great. FYI, our PDataType class is stateless (it was an enum originally), so we currently access maxLength/precision and scale through the PDatum interface (from which PColumn and Expression are derived). Now that PDataType is no longer an enum, it might be nice to allow instantiation with maxLength and scale provided at construction time. Please file a JIRA. |
How's this look, @JamesRTaylor / @ravimagham ? |
Already merged. |
from jira description
It seemed enough just for current usage in spark-interagation. But in long term, PDataType should contain meta information like maxLength or precision, etc.