Skip to content

PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32#316

Closed
lw-lin wants to merge 2 commits intoapache:masterfrom
lw-lin:P-484-2
Closed

PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32#316
lw-lin wants to merge 2 commits intoapache:masterfrom
lw-lin:P-484-2

Conversation

@lw-lin
Copy link
Copy Markdown
Contributor

@lw-lin lw-lin commented Jan 30, 2016

Below is documented in LogicalTypes.md:

int32: for 1 <= precision <= 9
int64: for 1 <= precision <= 18; precision < 10 will produce a warning

This PR implements the precision < 10 will produce a warning part.

@rdblue @liancheng would mind taking a look at this when you have time? It's a fairly small addition; cheers.

@liancheng
Copy link
Copy Markdown
Contributor

Had once noticed this issue but decided not to have this warning because this warning is mostly for Parquet data model developers. With this patch, when an end user uses a Parquet data model that doesn't always use the optimal primitive type for decimal, he/she may always see this warning, but can do nothing about it without updating the data model itself. This makes this warning quite disturbing and not helpful. Just my two cents.

@lw-lin
Copy link
Copy Markdown
Contributor Author

lw-lin commented Feb 2, 2016

@liancheng Thank your for the explanation, which is also reasonable I believe.
Is there some consensus of warning nor not reached so far in this community (I'm not aware of any)? I think it'd be good to make some changes either to the code or to the documentation, but this is not an urgent one after all.

@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented Feb 3, 2016

I think this warning is valuable. In the case of Spark or Hive, it is for data model developers, but it would definitely help fix a problem introduced there by flagging that the underlying data isn't using an efficient representation. In the case of object models like Avro, the user has control over the underlying type and would benefit from knowing if they chose too wide of a type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be static final Logger LOG and I would normally add an import for LoggerFactory.

@lw-lin
Copy link
Copy Markdown
Contributor Author

lw-lin commented Feb 4, 2016

Updated according to @rdblue 's code comments.

@lw-lin
Copy link
Copy Markdown
Contributor Author

lw-lin commented Feb 4, 2016

One thing worthy noticing is, we'll warn when precision <= 9 rather than when precision <= 10. This has been brought up in parquet-format PR #26.

@lw-lin lw-lin closed this Feb 4, 2016
@lw-lin lw-lin reopened this Feb 4, 2016
@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented Apr 17, 2016

@lw-lin I think the logic here should match what was decided for the format (that PR has been merged). Is it the same in this PR?

@lw-lin
Copy link
Copy Markdown
Contributor Author

lw-lin commented Apr 17, 2016

Yes, it is the same here.

What was decided in the merged PR:

precision < 10 will produce a warning

What's implemented in this PR:

// MAX_PRECISION_INT32 is actually 9
if (meta.getPrecision() <= MAX_PRECISION_INT32) {
   warn...
}

@rdblue thanks for the revisit! :-)

@asfgit asfgit closed this in 82b8ecc Apr 19, 2016
@lw-lin
Copy link
Copy Markdown
Contributor Author

lw-lin commented Apr 20, 2016

@rdblue @liancheng thank you all for the review & merging :-)

@lw-lin lw-lin deleted the P-484-2 branch April 20, 2016 09:16
piyushnarang pushed a commit to piyushnarang/parquet-mr that referenced this pull request Jun 15, 2016
…ed as INT32

Below is documented in [LogicalTypes.md](https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#decimal):

> int32: for 1 <= precision <= 9
> int64: for 1 <= precision <= 18; precision < 10 will produce a warning

This PR implements the `precision < 10 will produce a warning` part.

@rdblue @liancheng would mind taking a look at this when you have time? It's a fairly small addition; cheers.

Author: Liwei Lin <proflin.me@gmail.com>
Author: proflin <proflin.me@gmail.com>

Closes apache#316 from lw-lin/P-484-2 and squashes the following commits:

207e509 [Liwei Lin] Address comments
b227484 [proflin] PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32
rdblue pushed a commit to rdblue/parquet-mr that referenced this pull request Jan 6, 2017
…ed as INT32

Below is documented in [LogicalTypes.md](https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#decimal):

> int32: for 1 <= precision <= 9
> int64: for 1 <= precision <= 18; precision < 10 will produce a warning

This PR implements the `precision < 10 will produce a warning` part.

@rdblue @liancheng would mind taking a look at this when you have time? It's a fairly small addition; cheers.

Author: Liwei Lin <proflin.me@gmail.com>
Author: proflin <proflin.me@gmail.com>

Closes apache#316 from lw-lin/P-484-2 and squashes the following commits:

207e509 [Liwei Lin] Address comments
b227484 [proflin] PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32
rdblue pushed a commit to rdblue/parquet-mr that referenced this pull request Jan 10, 2017
…ed as INT32

Below is documented in [LogicalTypes.md](https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#decimal):

> int32: for 1 <= precision <= 9
> int64: for 1 <= precision <= 18; precision < 10 will produce a warning

This PR implements the `precision < 10 will produce a warning` part.

@rdblue @liancheng would mind taking a look at this when you have time? It's a fairly small addition; cheers.

Author: Liwei Lin <proflin.me@gmail.com>
Author: proflin <proflin.me@gmail.com>

Closes apache#316 from lw-lin/P-484-2 and squashes the following commits:

207e509 [Liwei Lin] Address comments
b227484 [proflin] PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants