PARQUET-807: Allow user to retain ownership of parquet::FileMetaData. #213

Closed
wants to merge 6 commits into
from

Projects

None yet

2 participants

@wesm
Member
wesm commented Jan 4, 2017 edited

Also implements PARQUET-808: opening file with existing metadata object.

This allows a user to create a reader only for the purposes of obtaining the metadata.

Do you all think it's worth having a convenience method for reading the metadata out of a file?

- // This is specified as seconds since the UNIX epoch
- // TODO: Converted type in Parquet?
- // logical_type = LogicalType::TIMESTAMP_MILLIS;
- break;
@wesm
wesm Jan 4, 2017 Member

I'm planning to remove this type code from Arrow for now.

@xhochy
xhochy Jan 4, 2017 Member

As long as Drill has no dependency on that, +1!

@xhochy
xhochy approved these changes Jan 4, 2017 View changes

+1, LGTM. We can keep that one function but I don't feel the need for it.

- // This is specified as seconds since the UNIX epoch
- // TODO: Converted type in Parquet?
- // logical_type = LogicalType::TIMESTAMP_MILLIS;
- break;
@xhochy
xhochy Jan 4, 2017 Member

As long as Drill has no dependency on that, +1!

@@ -323,9 +323,9 @@ class FileMetaData::FileMetaDataImpl {
FileMetaData::Version writer_version_;
};
-std::unique_ptr<FileMetaData> FileMetaData::Make(
+std::shared_ptr<FileMetaData> FileMetaData::Make(
@xhochy
xhochy Jan 4, 2017 Member

This function is now just an alias for make_shared. Do we really want to keep it?

@wesm
wesm Jan 4, 2017 Member

The FileMetaData ctor used here is private, I will add a comment.

@xhochy
Member
xhochy commented Jan 4, 2017

Do you all think it's worth having a convenience method for reading the metadata out of a file?

Definitely, especially as tools such as Spark provide files as _common_metadata and _metadata that are mainly useful for inferring the schema of all files in a directory.

@wesm
Member
wesm commented Jan 4, 2017

OK. Will add a method for reading the metadata only for a file.

In these special _metadata files, do you know if they have the PAR1 footer and metadata length, or are they only the serialized Thrift message? If the latter, then there should be an option whether the input is a full Parquet file or simply a metadata-only file

@wesm
Member
wesm commented Jan 4, 2017

I'll add whatever's needed to read serialized-metadata-only files in pyarrow, for now the convenience method can just read from a file footer.

@wesm
Member
wesm commented Jan 5, 2017

@xhochy I expanded the scope to include PARQUET-808. Could you review again? Thanks

@xhochy
xhochy approved these changes Jan 5, 2017 View changes

+1, LGTM

@asfgit asfgit pushed a commit that closed this pull request Jan 5, 2017
@wesm wesm PARQUET-807: Allow user to retain ownership of parquet::FileMetaData.
Also implements PARQUET-808: opening file with existing metadata object.

This allows a user to create a reader only for the purposes of obtaining the metadata.

Do you all think it's worth having a convenience method for reading the metadata out of a file?

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #213 from wesm/PARQUET-807 and squashes the following commits:

c1b5c7c [Wes McKinney] Use ReadMetaData function in test
d382cca [Wes McKinney] Add note about ARROW-455
05ecd37 [Wes McKinney] Implement/test opening with provided metadata. Do not close Arrow output files automatically
d790bb5 [Wes McKinney] Tweak
0dd4184 [Wes McKinney] Add ReadMetaData convenience method
97527ba [Wes McKinney] Change FileMetaData in ParquetFileReader to a shared_ptr so that ownership can be transferred away
378f335
@asfgit asfgit closed this in 378f335 Jan 5, 2017
@wesm wesm deleted the wesm:PARQUET-807 branch Jan 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment