PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32#316
PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32#316lw-lin wants to merge 2 commits intoapache:masterfrom
Conversation
|
Had once noticed this issue but decided not to have this warning because this warning is mostly for Parquet data model developers. With this patch, when an end user uses a Parquet data model that doesn't always use the optimal primitive type for decimal, he/she may always see this warning, but can do nothing about it without updating the data model itself. This makes this warning quite disturbing and not helpful. Just my two cents. |
|
@liancheng Thank your for the explanation, which is also reasonable I believe. |
|
I think this warning is valuable. In the case of Spark or Hive, it is for data model developers, but it would definitely help fix a problem introduced there by flagging that the underlying data isn't using an efficient representation. In the case of object models like Avro, the user has control over the underlying type and would benefit from knowing if they chose too wide of a type. |
There was a problem hiding this comment.
This should be static final Logger LOG and I would normally add an import for LoggerFactory.
|
Updated according to @rdblue 's code comments. |
|
One thing worthy noticing is, we'll warn when |
|
@lw-lin I think the logic here should match what was decided for the format (that PR has been merged). Is it the same in this PR? |
|
Yes, it is the same here. What was decided in the merged PR: precision < 10 will produce a warningWhat's implemented in this PR: // MAX_PRECISION_INT32 is actually 9
if (meta.getPrecision() <= MAX_PRECISION_INT32) {
warn...
}@rdblue thanks for the revisit! :-) |
|
@rdblue @liancheng thank you all for the review & merging :-) |
…ed as INT32 Below is documented in [LogicalTypes.md](https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#decimal): > int32: for 1 <= precision <= 9 > int64: for 1 <= precision <= 18; precision < 10 will produce a warning This PR implements the `precision < 10 will produce a warning` part. @rdblue @liancheng would mind taking a look at this when you have time? It's a fairly small addition; cheers. Author: Liwei Lin <proflin.me@gmail.com> Author: proflin <proflin.me@gmail.com> Closes apache#316 from lw-lin/P-484-2 and squashes the following commits: 207e509 [Liwei Lin] Address comments b227484 [proflin] PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32
…ed as INT32 Below is documented in [LogicalTypes.md](https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#decimal): > int32: for 1 <= precision <= 9 > int64: for 1 <= precision <= 18; precision < 10 will produce a warning This PR implements the `precision < 10 will produce a warning` part. @rdblue @liancheng would mind taking a look at this when you have time? It's a fairly small addition; cheers. Author: Liwei Lin <proflin.me@gmail.com> Author: proflin <proflin.me@gmail.com> Closes apache#316 from lw-lin/P-484-2 and squashes the following commits: 207e509 [Liwei Lin] Address comments b227484 [proflin] PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32
…ed as INT32 Below is documented in [LogicalTypes.md](https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#decimal): > int32: for 1 <= precision <= 9 > int64: for 1 <= precision <= 18; precision < 10 will produce a warning This PR implements the `precision < 10 will produce a warning` part. @rdblue @liancheng would mind taking a look at this when you have time? It's a fairly small addition; cheers. Author: Liwei Lin <proflin.me@gmail.com> Author: proflin <proflin.me@gmail.com> Closes apache#316 from lw-lin/P-484-2 and squashes the following commits: 207e509 [Liwei Lin] Address comments b227484 [proflin] PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32
Below is documented in LogicalTypes.md:
This PR implements the
precision < 10 will produce a warningpart.@rdblue @liancheng would mind taking a look at this when you have time? It's a fairly small addition; cheers.