PARQUET-255: Fixes a typo in decimal type specification#26
Closed
liancheng wants to merge 1 commit intoapache:masterfrom
Closed
PARQUET-255: Fixes a typo in decimal type specification#26liancheng wants to merge 1 commit intoapache:masterfrom
liancheng wants to merge 1 commit intoapache:masterfrom
Conversation
Contributor
Author
|
BTW, seems that this warning isn't implemented in parquet-mr yet. |
|
log2(9999999999) is ~33.2(does not fit in |
|
Maybe we can get this one committed and closed? |
Contributor
Author
|
I've merged this to master. Thanks for the review! |
lekv
pushed a commit
to lekv/parquet-format
that referenced
this pull request
Jul 31, 2017
… than scalar Column scanning and record reconstruction is independent of the Parquet file format and depends, among other things, on the data structures where the reconstructed data will end up. This is a work-in progress, but the basic idea is: - APIs for reading a batch of repetition `ReadRepetitionLevels` or definition levels `ReadDefinitionLevels` into a preallocated `int16_t*` - APIs for reading arrays of decoded values into preallocated memory (`ReadValues`) These methods are only able to read data within a particular data page. Once you exhaust the data available in the data page (`ReadValues` returns 0), you must call `ReadNewPage`, which returns `true` is there is more data available. Separately, I added a simple `Scanner` class that emulates the scalar value iteration functionality that existed previously. I used this to reimplement the `DebugPrint` method in `parquet_scanner.cc`. This obviously only works currently for flat data. I would like to keep the `ColumnReader` low level and primitive, concerned only with providing access to the raw data in a Parquet file as fast as possible. We can devise separate algorithms for inferring nested record structure by examining the arrays of decoded values and repetition/definition levels. The major benefit of separating raw data access from structure inference is that this can be pipelined with threads: one thread decompresses and decodes values and levels, and another thread can turn batches into a nested record- or column-oriented structure. Author: Wes McKinney <wes@cloudera.com> Closes apache#26 from wesm/PARQUET-435 and squashes the following commits: 4bf5cd4 [Wes McKinney] Fix cpplint 852f4ec [Wes McKinney] Address review comments, also be sure to use Scanner::HasNext 7ea261e [Wes McKinney] Add TODO comment 4999719 [Wes McKinney] Make ColumnReader::ReadNewPage private and call HasNext() in ReadBatch 0d2e111 [Wes McKinney] Fix function description. Change #define to constexpr 111ef13 [Wes McKinney] Incorporate review comments and add some better comments e16f7fd [Wes McKinney] Typo ef52404 [Wes McKinney] Fix function doc 5e95cda [Wes McKinney] Configurable scanner batch size. Do not use printf in DebugPrint 1b4eca0 [Wes McKinney] New batch read API which reads levels and values in one shot de4d6b6 [Wes McKinney] Move column_* files into parquet/column folder aad4a86 [Wes McKinney] Finish refactoring scanner API with shared pointers 4506748 [Wes McKinney] Refactoring, do not have shared_from_this working yet 6489b15 [Wes McKinney] Batch level/value read interface on ColumnReader. Add Scanner class for flat columns. Add a couple smoke unit tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I believe the mentioned warning should be produced when decimal precision is less than (rather than less than or equal to) 10 when an
int64is used to represent a decimal.