Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-1701: [C++] Stream API: Add support for optional fields #5928

Conversation

gawain-bolton
Copy link

StreamReader and StreamWriter classes can now support schemata with
optional fields.

Additional input/output operators are available using the
arrow::util::optional class in order to indicate whether optional data
was present when reading, or is present when writing.

@github-actions
Copy link

github-actions bot commented Dec 1, 2019

@pitrou
Copy link
Member

pitrou commented Dec 4, 2019

@wesm

@emkornfield
Copy link
Contributor

will try to take a look at this within a week if nobody gets to it sooner.

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, left an initial round of comments.

cpp/src/parquet/stream_reader_test.cc Outdated Show resolved Hide resolved
cpp/src/parquet/stream_reader_test.cc Outdated Show resolved Hide resolved
cpp/src/parquet/stream_reader_test.cc Outdated Show resolved Hide resolved
cpp/src/parquet/stream_reader_test.cc Show resolved Hide resolved
cpp/src/parquet/stream_writer.h Show resolved Hide resolved
cpp/src/parquet/stream_reader.h Outdated Show resolved Hide resolved
cpp/src/parquet/stream_reader.h Outdated Show resolved Hide resolved
cpp/src/parquet/stream_writer.cc Outdated Show resolved Hide resolved
cpp/src/parquet/stream_writer.h Outdated Show resolved Hide resolved
cpp/src/parquet/stream_writer.cc Outdated Show resolved Hide resolved
gawain.bolton added 3 commits January 4, 2020 22:26
StreamReader and StreamWriter classes can now support schemata with
optional fields.

Additional input/output operators are available using the
arrow::util::optional class in order to indicate whether optional data
was present when reading, or is present when writing.
@gawain-bolton gawain-bolton force-pushed the PARQUET-1701_stream_api_optional_fields branch from 4fe9e47 to 1efa7ff Compare January 6, 2020 22:43
///
class PARQUET_EXPORT StreamReader {
public:
template <typename T>
using optional = arrow::util::optional<T>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I thought I commented (and perhaps I missed the response) but I'm kind of against this alias.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the contrary this alias is useful to users of this class so that they can use the required optional type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the marginal convenience of it, I could go either way. I'm fine with leaving it

@emkornfield
Copy link
Contributor

Seems reasonable except for the documentation and 1 style (using an alias for optional). So I think a committer should take a look @wesm @xhochy @majetideepak

optional fields.

Add tests to ensure that required fields can be read using
operator>>(arrow::util::optional<T>).

Add tests to ensure that required fields can be written using
operator<<(arrow::util::optional<T>).
@wesm
Copy link
Member

wesm commented Jan 13, 2020

Will look. Are the CI failures legitimate?

@nealrichardson
Copy link
Contributor

The macOS failure is probably https://issues.apache.org/jira/browse/ARROW-7551 (I say clicking on it)

@emkornfield
Copy link
Contributor

The appveyor failure looks unrelated, trying to rebuild.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there's parts of the tests that could be simplified through type parameterization and TEST_P. I'm not sure that changing everything to EXPECT from ASSERT is a good idea since there may be cases where one of the objects is left in an invalid state once one of the calls fails. I don't see anything (aside potentially for CI issues) to stop this from being merged, further cleaning / improvement could be done in a follow up patch. @gawain-bolton do you plan to make any other changes?

///
class PARQUET_EXPORT StreamReader {
public:
template <typename T>
using optional = arrow::util::optional<T>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the marginal convenience of it, I could go either way. I'm fine with leaving it

cpp/src/parquet/stream_reader.h Show resolved Hide resolved
cpp/src/parquet/stream_reader_test.cc Show resolved Hide resolved
cpp/src/parquet/stream_reader_test.cc Show resolved Hide resolved
cpp/src/parquet/stream_writer_test.cc Outdated Show resolved Hide resolved
@gawain-bolton
Copy link
Author

@wesm : I do not plan on making any other changes to this pull request unless required.

All the functionality required to support optional fields is in place and I would very much like this functionality to be available in the next release of Arrow.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. The Appveyor build looks okay

@wesm wesm closed this in 90c0459 Jan 14, 2020
@wesm
Copy link
Member

wesm commented Jan 14, 2020

thanks @gawain-bolton!

maartenb pushed a commit to maartenb/arrow that referenced this pull request Jan 14, 2020
StreamReader and StreamWriter classes can now support schemata with
optional fields.

Additional input/output operators are available using the
arrow::util::optional class in order to indicate whether optional data
was present when reading, or is present when writing.

Closes apache#5928 from gawain-bolton/PARQUET-1701_stream_api_optional_fields and squashes the following commits:

e9d8001 <gawain.bolton> Rename tests to conform with codebase.
70d9b3b <gawain.bolton> Improve documentation on input/output operators for required and optional fields.
1efa7ff <gawain.bolton> Updates after code review by emkornfield on 20191229.
de82604 <gawain.bolton> Eliminate warning on Visual Studio 2017 build
70233da <gawain.bolton> PARQUET-1701:  Stream API: Add support for optional fields

Authored-by: gawain.bolton <gawain.bolton@cfm.fr>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants