Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Default Write Distribution Mode #6679

Closed
RussellSpitzer opened this issue Jan 27, 2023 · 12 comments · Fixed by #6828
Closed

Change Default Write Distribution Mode #6679

RussellSpitzer opened this issue Jan 27, 2023 · 12 comments · Fixed by #6828
Milestone

Comments

@RussellSpitzer
Copy link
Member

RussellSpitzer commented Jan 27, 2023

Feature Request / Improvement

Merge Writes as well as some inserts end up generating many files with our default write distirbution mode of None. While this is the cheapest method and is our old default behavior, we now have several reasons to default to Range (or Hash).

  1. Spark AQE now has both skew handling and and adaptive coalesce
  2. With Merge operations None is never the correct mode to request since we are always shuffling anyway
  3. More users are coming to Iceberg who don't understand how Spark Partitioning works (required to get good perf with default None)

I suggest we change the default distribution mode to Range and add some documentation around configuring AQE to the Spark docs. I think this will be a better behavior for most first users and power users can still manually configure a different mode for their specific requirements.

Query engine

Spark

@RussellSpitzer
Copy link
Member Author

RussellSpitzer commented Jan 27, 2023

@aokolnychyi + @danielcweeks + @rdblue + @jackye1995 + @szehon-ho

Please ping anyone else as well who would have strong opinions about this change as well

@dramaticlly
Copy link
Contributor

Thank you @RussellSpitzer , I understand where this change is coming from but some of the GDPR like deletion on V1 table will benefit from the none write distribution mode (to avoid shuffle if possible). I am aware currently we can configure it via setting the table properties like write.delete.distribution-mode or write.update.distribution-mode, but I am wondering if there's any way we can configure it on per spark job level (also delete is done via SQL only which only makes it harder)

@RussellSpitzer
Copy link
Member Author

The "none" mode in GDPR cases still only helps in case in which the data has already been aligned with the partitioning of the table. This is rarely the case in my experience.

@jackye1995
Copy link
Contributor

+1 for using range as default. Overall we probably need a dedicated doc section about how to configure those parameters in the Iceberg Spark documentation for people to make informed decisions.

@singhpk234
Copy link
Contributor

+1 on changing the default from none and having a dedicated doc section for the configuring these. Happy to contribute to this if possible.


Side note : I also see write.merge.distribution-mode and write.update.distribution-mode missing in table props in doc section as well https://iceberg.apache.org/docs/latest/configuration/

@dramaticlly
Copy link
Contributor

dramaticlly commented Jan 28, 2023

Side note : I also see write.merge.distribution-mode and write.update.distribution-mode missing in table props in doc section as well https://iceberg.apache.org/docs/latest/configuration/

yeah @singhpk234 I noticed that before and had my attempt #5280 to fix it but need some help on merge case to provide better narrative.

@rdblue
Copy link
Contributor

rdblue commented Jan 28, 2023

+1 for range as default.

@aokolnychyi
Copy link
Contributor

I would be careful with range as it may cause performance regressions. Especially, for MERGE. The range distribution requires sampling that leads to double scanning and re-evaluating of particular nodes in the plan. This will cause the same issues we have today where the default would perform poorly.

The upcoming Spark 3.4 has support for rebalancing partitions via AQE for hash distributions requested by v2 writes. That means, we can request a hash distribution without worrying about having too much data per task and OOM. I'd rather switch to hash as default and let users configure if it fails. I don't know a single use case where the range distribution performs well in MERGE at any reasonable scale.

@aokolnychyi
Copy link
Contributor

We have examples in TestSparkDistributionAndOrderingUtil that should become a section in the docs.

@RussellSpitzer
Copy link
Member Author

@dramaticlly Did you want to write up another issue for specifying write distribution mode as a Spark SqlConf option?

@JunchengMa
Copy link

+1 on @dramaticlly 's comment, changing the write distribution mode affects Spark job performance (causes heavy shuffle) when using Spark SQL like

DELETE FROM db_name.tbl_name WHERE date < '20220801'

or

UPDATE db_name.tlb_name SET col_a = NULL WHERE date <= '20220801'

setting write.delete.distribution-mode=none and write.update.distribution-mode='none' at table properties would reduce shuffle, but could affect other normal jobs writing to the same table.
So having an option for specifying the write distribution mode would be ideal.

@aokolnychyi
Copy link
Contributor

I will submit a PR to change the default distribution modes for insert and merge. I'll be also happy to review a PR for #6741.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants