Skip to content

PARQUET-255: Fixes a typo in decimal type specification#26

Closed
liancheng wants to merge 1 commit intoapache:masterfrom
liancheng:fix-decimal-doc
Closed

PARQUET-255: Fixes a typo in decimal type specification#26
liancheng wants to merge 1 commit intoapache:masterfrom
liancheng:fix-decimal-doc

Conversation

@liancheng
Copy link
Copy Markdown
Contributor

I believe the mentioned warning should be produced when decimal precision is less than (rather than less than or equal to) 10 when an int64 is used to represent a decimal.

Review on Reviewable

@liancheng
Copy link
Copy Markdown
Contributor Author

BTW, seems that this warning isn't implemented in parquet-mr yet.

@liancheng liancheng changed the title Fixes a typo in LogicalTypes.md PARQUET-255: Fixes a typo in decimal type specification Apr 15, 2015
@lw-lin
Copy link
Copy Markdown

lw-lin commented Jan 30, 2016

log2(9999999999) is ~33.2(does not fit in int32), so it seems proper to use int64 for precision = 10, i.e., seems we'd better not warn for this. @rdblue What do you think?

@lw-lin
Copy link
Copy Markdown

lw-lin commented Feb 20, 2016

Maybe we can get this one committed and closed?
@julienledem @rdblue @liancheng

@asfgit asfgit closed this in 6a1664b Feb 24, 2016
@liancheng
Copy link
Copy Markdown
Contributor Author

I've merged this to master. Thanks for the review!

@liancheng liancheng deleted the fix-decimal-doc branch February 24, 2016 09:42
lekv pushed a commit to lekv/parquet-format that referenced this pull request Jul 31, 2017
… than scalar

Column scanning and record reconstruction is independent of the Parquet file format and depends, among other things, on the data structures where the reconstructed data will end up. This is a work-in progress, but the basic idea is:

- APIs for reading a batch of repetition `ReadRepetitionLevels` or definition levels `ReadDefinitionLevels` into a preallocated `int16_t*`
- APIs for reading arrays of decoded values into preallocated memory (`ReadValues`)

These methods are only able to read data within a particular data page. Once you exhaust the data available in the data page (`ReadValues` returns 0), you must call `ReadNewPage`, which returns `true` is there is more data available.

Separately, I added a simple `Scanner` class that emulates the scalar value iteration functionality that existed previously. I used this to reimplement the `DebugPrint` method in `parquet_scanner.cc`. This obviously only works currently for flat data.

I would like to keep the `ColumnReader` low level and primitive, concerned only with providing access to the raw data in a Parquet file as fast as possible. We can devise separate algorithms for inferring nested record structure by examining the arrays of decoded values and repetition/definition levels. The major benefit of separating raw data access from structure inference is that this can be pipelined with threads: one thread decompresses and decodes values and levels, and another thread can turn batches into a nested record- or column-oriented structure.

Author: Wes McKinney <wes@cloudera.com>

Closes apache#26 from wesm/PARQUET-435 and squashes the following commits:

4bf5cd4 [Wes McKinney] Fix cpplint
852f4ec [Wes McKinney] Address review comments, also be sure to use Scanner::HasNext
7ea261e [Wes McKinney] Add TODO comment
4999719 [Wes McKinney] Make ColumnReader::ReadNewPage private and call HasNext() in ReadBatch
0d2e111 [Wes McKinney] Fix function description. Change #define to constexpr
111ef13 [Wes McKinney] Incorporate review comments and add some better comments
e16f7fd [Wes McKinney] Typo
ef52404 [Wes McKinney] Fix function doc
5e95cda [Wes McKinney] Configurable scanner batch size. Do not use printf in DebugPrint
1b4eca0 [Wes McKinney] New batch read API which reads levels and values in one shot
de4d6b6 [Wes McKinney] Move column_* files into parquet/column folder
aad4a86 [Wes McKinney] Finish refactoring scanner API with shared pointers
4506748 [Wes McKinney] Refactoring, do not have shared_from_this working yet
6489b15 [Wes McKinney] Batch level/value read interface on ColumnReader. Add Scanner class for flat columns. Add a couple smoke unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants