PHOENIX-2288 Phoenix-Spark: PDecimal precision and scale aren't carried through to Spark DataFrame #124

navis · 2015-10-22T02:38:15Z

from jira description

When loading a Spark dataframe from a Phoenix table with a 'DECIMAL' type, the underlying precision and scale aren't carried forward to Spark.

The Spark catalyst schema converter should load these from the underlying column. These appear to be exposed in the ResultSetMetaData, but if there was a way to expose these somehow through ColumnInfo, it would be cleaner.

I'm not sure if Pig has the same issues or not, but I suspect it may.

It seemed enough just for current usage in spark-interagation. But in long term, PDataType should contain meta information like maxLength or precision, etc.

…ed through to Spark DataFrame

jmahonin · 2015-10-22T12:55:13Z

This looks great @navis

The Spark portion looks fine. I'll leave the updates to ColumnInfo for @ravimagham @JamesRTaylor et. al. to review

JamesRTaylor · 2015-10-23T00:47:32Z

phoenix-core/src/main/java/org/apache/phoenix/util/PhoenixRuntime.java

+        if (pColumn.getMaxLength() == null) {
+            return new ColumnInfo(pColumn.toString(), sqlType);
+        }
+        if (sqlType == Types.CHAR || sqlType == Types.VARCHAR) {


Rather than check for particular types, it'd be more general to check for null like this:

Integer maxLength = pColumn.getMaxLength(); Integer scale = pColumn.getScale(); return new ColumnInfo(pColumn.toString(), sqlType, maxLength, scale);

Then make sure that ColumnInfo handles a null maxLength and scale.

@JamesRTaylor How about to move the logic above into PColumn? Then we can access full information in it.

ColumnInfo is a kind of lightweight transport class solely for passing in the necessary column metadata for the MR and Spark integration to run. It's passed in through the config so it has some simple to/from string methods - this prevents us from having to lookup the metadata from Phoenix metadata using the regular JDBC metadata APIs (which would be another option). Having this ColumnInfo class was deemed slightly easier.

PColumn has more information than we need and it'd be best to keep this as an internal/private class as much as possible. It's the object representation of our column metadata.

JamesRTaylor · 2015-10-23T00:58:44Z

Thanks for the pull request, @navis. A couple of minor comments, but overall it looks great. FYI, our PDataType class is stateless (it was an enum originally), so we currently access maxLength/precision and scale through the PDatum interface (from which PColumn and Expression are derived). Now that PDataType is no longer an enum, it might be nice to allow instantiation with maxLength and scale provided at construction time. Please file a JIRA.

jmahonin · 2015-10-27T13:16:39Z

How's this look, @JamesRTaylor / @ravimagham ?

stoty · 2023-08-01T12:49:11Z

Already merged.

PHOENIX-2288 Phoenix-Spark: PDecimal precision and scale aren't carri…

25fa16b

…ed through to Spark DataFrame

fix typo

efca40f

JamesRTaylor reviewed Oct 23, 2015
View reviewed changes

addressed comments

b641b47

stoty closed this Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PHOENIX-2288 Phoenix-Spark: PDecimal precision and scale aren't carried through to Spark DataFrame #124

PHOENIX-2288 Phoenix-Spark: PDecimal precision and scale aren't carried through to Spark DataFrame #124

navis commented Oct 22, 2015

jmahonin commented Oct 22, 2015

JamesRTaylor Oct 23, 2015

navis Oct 23, 2015

JamesRTaylor Oct 23, 2015

JamesRTaylor commented Oct 23, 2015

jmahonin commented Oct 27, 2015

stoty commented Aug 1, 2023

PHOENIX-2288 Phoenix-Spark: PDecimal precision and scale aren't carried through to Spark DataFrame #124

PHOENIX-2288 Phoenix-Spark: PDecimal precision and scale aren't carried through to Spark DataFrame #124

Conversation

navis commented Oct 22, 2015

jmahonin commented Oct 22, 2015

JamesRTaylor Oct 23, 2015

Choose a reason for hiding this comment

navis Oct 23, 2015

Choose a reason for hiding this comment

JamesRTaylor Oct 23, 2015

Choose a reason for hiding this comment

JamesRTaylor commented Oct 23, 2015

jmahonin commented Oct 27, 2015

stoty commented Aug 1, 2023