ARROW-2763: [Python] Make _metadata file accessible in ParquetDataset#2195
ARROW-2763: [Python] Make _metadata file accessible in ParquetDataset#2195rgruener wants to merge 1 commit intoapache:masterfrom
Conversation
| with self.fs.open(self.metadata_path) as f: | ||
| self.metadata = ParquetFile(f).metadata | ||
| else: | ||
| self.metadata = metadata |
There was a problem hiding this comment.
This is potentially confusing since if you pass in metadata for schema validation the metadata_path could be pointing to a different file than the metadata object represents. I think it might be best to have the metadata passed into the constructor be strictly used for schema validation and not stored to represent the metadata object of the dataset as it seems like it wouldnt contain the correct row group information of the dataset.
There was a problem hiding this comment.
When we would go the route and only use the passed-in metadata for schema validation, we should probably call the parameter differently (i.e. deprecate the old, add the new one).
| if self.metadata is None and self.schema is None: | ||
| if self.common_metadata_path is not None: | ||
| self.schema = open_file(self.common_metadata_path).schema | ||
| if self.common_metadata is not None: |
| with self.fs.open(self.metadata_path) as f: | ||
| self.metadata = ParquetFile(f).metadata | ||
| else: | ||
| self.metadata = metadata |
There was a problem hiding this comment.
When we would go the route and only use the passed-in metadata for schema validation, we should probably call the parameter differently (i.e. deprecate the old, add the new one).
|
Failure is the Plasma test that is fixed on master |
No description provided.