-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python][Parquet] Support hashing for FileMetaData
and ParquetSchema
#39780
Comments
pitrou
pushed a commit
that referenced
this issue
Feb 12, 2024
…uetSchema (#39781) I think the hash, especially for `FileMetaData` could be better, maybe just use return of `__repr__`, even though that won't include row group info? ### Rationale for this change Helpful for dependent projects. ### What changes are included in this PR? Impl `__hash__` for `ParquetSchema` and `FileMetaData` ### Are these changes tested? Yes ### Are there any user-facing changes? Supports hashing metadata: ```python In [1]: import pyarrow.parquet as pq In [2]: f = pq.ParquetFile('test.parquet') In [3]: hash(f.metadata) Out[3]: 4816453453708427907 In [4]: hash(f.metadata.schema) Out[4]: 2300988959078172540 ``` * Closes: #39780 Authored-by: Miles Granger <miles59923@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss
pushed a commit
to dgreiss/arrow
that referenced
this issue
Feb 19, 2024
…d ParquetSchema (apache#39781) I think the hash, especially for `FileMetaData` could be better, maybe just use return of `__repr__`, even though that won't include row group info? ### Rationale for this change Helpful for dependent projects. ### What changes are included in this PR? Impl `__hash__` for `ParquetSchema` and `FileMetaData` ### Are these changes tested? Yes ### Are there any user-facing changes? Supports hashing metadata: ```python In [1]: import pyarrow.parquet as pq In [2]: f = pq.ParquetFile('test.parquet') In [3]: hash(f.metadata) Out[3]: 4816453453708427907 In [4]: hash(f.metadata.schema) Out[4]: 2300988959078172540 ``` * Closes: apache#39780 Authored-by: Miles Granger <miles59923@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
zanmato1984
pushed a commit
to zanmato1984/arrow
that referenced
this issue
Feb 28, 2024
…d ParquetSchema (apache#39781) I think the hash, especially for `FileMetaData` could be better, maybe just use return of `__repr__`, even though that won't include row group info? ### Rationale for this change Helpful for dependent projects. ### What changes are included in this PR? Impl `__hash__` for `ParquetSchema` and `FileMetaData` ### Are these changes tested? Yes ### Are there any user-facing changes? Supports hashing metadata: ```python In [1]: import pyarrow.parquet as pq In [2]: f = pq.ParquetFile('test.parquet') In [3]: hash(f.metadata) Out[3]: 4816453453708427907 In [4]: hash(f.metadata.schema) Out[4]: 2300988959078172540 ``` * Closes: apache#39780 Authored-by: Miles Granger <miles59923@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
thisisnic
pushed a commit
to thisisnic/arrow
that referenced
this issue
Mar 8, 2024
…d ParquetSchema (apache#39781) I think the hash, especially for `FileMetaData` could be better, maybe just use return of `__repr__`, even though that won't include row group info? ### Rationale for this change Helpful for dependent projects. ### What changes are included in this PR? Impl `__hash__` for `ParquetSchema` and `FileMetaData` ### Are these changes tested? Yes ### Are there any user-facing changes? Supports hashing metadata: ```python In [1]: import pyarrow.parquet as pq In [2]: f = pq.ParquetFile('test.parquet') In [3]: hash(f.metadata) Out[3]: 4816453453708427907 In [4]: hash(f.metadata.schema) Out[4]: 2300988959078172540 ``` * Closes: apache#39780 Authored-by: Miles Granger <miles59923@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
Being able to
hash(parquet_file.metadata)
and schema should be helpful for dask. (dask/dask-expr#703)Component(s)
Python
The text was updated successfully, but these errors were encountered: