Skip to content

ARROW-2763: [Python] Make _metadata file accessible in ParquetDataset#2195

Closed
rgruener wants to merge 1 commit intoapache:masterfrom
rgruener:metadata-file
Closed

ARROW-2763: [Python] Make _metadata file accessible in ParquetDataset#2195
rgruener wants to merge 1 commit intoapache:masterfrom
rgruener:metadata-file

Conversation

@rgruener
Copy link

No description provided.

with self.fs.open(self.metadata_path) as f:
self.metadata = ParquetFile(f).metadata
else:
self.metadata = metadata
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is potentially confusing since if you pass in metadata for schema validation the metadata_path could be pointing to a different file than the metadata object represents. I think it might be best to have the metadata passed into the constructor be strictly used for schema validation and not stored to represent the metadata object of the dataset as it seems like it wouldnt contain the correct row group information of the dataset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we would go the route and only use the passed-in metadata for schema validation, we should probably call the parameter differently (i.e. deprecate the old, add the new one).

Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

if self.metadata is None and self.schema is None:
if self.common_metadata_path is not None:
self.schema = open_file(self.common_metadata_path).schema
if self.common_metadata is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

with self.fs.open(self.metadata_path) as f:
self.metadata = ParquetFile(f).metadata
else:
self.metadata = metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we would go the route and only use the passed-in metadata for schema validation, we should probably call the parameter differently (i.e. deprecate the old, add the new one).

@xhochy
Copy link
Member

xhochy commented Jun 30, 2018

Failure is the Plasma test that is fixed on master

@xhochy xhochy closed this in 88f6794 Jun 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants