Skip to content

GH-32320: [R] Docs unclear about potential disadvantages of dictionary encoding#50093

Closed
thisisnic wants to merge 1 commit into
apache:mainfrom
thisisnic:GH-32320
Closed

GH-32320: [R] Docs unclear about potential disadvantages of dictionary encoding#50093
thisisnic wants to merge 1 commit into
apache:mainfrom
thisisnic:GH-32320

Conversation

@thisisnic
Copy link
Copy Markdown
Member

@thisisnic thisisnic commented Jun 4, 2026

Rationale for this change

Unclear docs - dictionary encoding can slow stuff down but it's not apparent

What changes are included in this PR?

Talk about it

Are these changes tested?

Nah

Are there any user-facing changes?

Nope

Copilot AI review requested due to automatic review settings June 4, 2026 08:56
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

⚠️ GitHub issue #32320 has been automatically assigned in GitHub to PR creator.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR clarifies Arrow’s Parquet writer documentation (R and Python) to note that dictionary encoding—while often beneficial—can be counterproductive for high-cardinality columns by increasing file size and reducing compression effectiveness.

Changes:

  • Expanded the R write_parquet() roxygen docs for use_dictionary with guidance about high-cardinality columns.
  • Expanded the Python pyarrow.parquet writer argument docs for use_dictionary with the same cautionary note.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
r/R/parquet.R Adds an explicit warning in write_parquet() docs about dictionary encoding potentially worsening size/compression for many-unique-value columns.
python/pyarrow/parquet/core.py Adds the same cautionary note to the Parquet writer argument documentation string under use_dictionary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@thisisnic thisisnic marked this pull request as draft June 4, 2026 09:00
@thisisnic
Copy link
Copy Markdown
Member Author

Closing this as we have a fallback and on reflection the original use case may have been a little niche and adding this to the docs may confuse things further

@thisisnic thisisnic closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants