[MINOR][SQL] Update the DataFrameWriter.bucketBy comment

### What changes were proposed in this pull request? This PR intends to update the `DataFrameWriter.bucketBy` comment for clearly describing that the bucketBy scheme follows a Spark "specific" one. I saw the questions about the current bucketing compatibility with Hive in [SPARK-31162](https://issues.apache.org/jira/browse/SPARK-31162?focusedCommentId=17060408&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17060408) and [SPARK-17495](https://issues.apache.org/jira/browse/SPARK-17495?focusedCommentId=17059847&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17059847) from users and IMHO the comment is a bit confusing to users about the compatibility ### Why are the changes needed? To make users understood smoothly. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A Closes #27930 from maropu/UpdateBucketByComment. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
apache · Mar 17, 2020 · 124b4ce · 124b4ce
1 parent 30d9535
commit 124b4ce
Showing 1 changed file with 2 additions and 1 deletion.
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -198,7 +198,8 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
 
   /**
    * Buckets the output by the given columns. If specified, the output is laid out on the file
-   * system similar to Hive's bucketing scheme.
+   * system similar to Hive's bucketing scheme, but with a different bucket hash function
+   * and is not compatible with Hive's bucketing.
    *
    * This is applicable for all file-based data sources (e.g. Parquet, JSON) starting with Spark
    * 2.1.0.