Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-16113: [Python] Partitioning.dictionaries in case of a subset of fields are dictionary encoded #12791

Closed

Conversation

sanjibansg
Copy link
Contributor

@sanjibansg sanjibansg commented Apr 4, 2022

This PR modifies the dictionaries method to have entries for all the fields. With this change, the method will return a list with the values if present, otherwise it shall contain None, thus returning a list of same length.

Ref: #12530 (comment)

@github-actions
Copy link

github-actions bot commented Apr 4, 2022

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is a better way to handle this, thanks for this.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a few tiny code comments

python/pyarrow/tests/test_dataset.py Outdated Show resolved Hide resolved
python/pyarrow/tests/test_dataset.py Show resolved Hide resolved
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sanjibansg sanjibansg deleted the partitioning_dictionaries branch April 5, 2022 08:19
@ursabot
Copy link

ursabot commented Apr 5, 2022

Benchmark runs are scheduled for baseline = 7616fba and contender = 85809a9. 85809a9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.29% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.04% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/441| 85809a95 ec2-t3-xlarge-us-east-2>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/426| 85809a95 test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/427| 85809a95 ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/436| 85809a95 ursa-thinkcentre-m75q>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/440| 7616fba5 ec2-t3-xlarge-us-east-2>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/425| 7616fba5 test-mac-arm>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/426| 7616fba5 ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/435| 7616fba5 ursa-thinkcentre-m75q>
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

jcralmeida pushed a commit to rafael-telles/arrow that referenced this pull request Apr 19, 2022
…f fields are dictionary encoded

This PR modifies the `dictionaries` method to have entries for all the fields. With this change, the method will return a list with the values if present, otherwise it shall contain `None`, thus returning a list of same length.

Ref: apache#12530 (comment)

Closes apache#12791 from sanjibansg/partitioning_dictionaries

Authored-by: Sanjiban Sengupta <sanjiban.sg@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants