Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34760][EXAMPLES] Replace favorite_color with age in JavaSQLDataSourceExample #31851

Closed
wants to merge 2 commits into from

Conversation

zengruios
Copy link
Contributor

What changes were proposed in this pull request?

In JavaSparkSQLExample when excecute 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'
throws Exception: 'Exception in thread "main" org.apache.spark.sql.AnalysisException: partition column favorite_color is not defined in table people_partitioned_bucketed, defined table columns are: age, name;'
Change the column favorite_color to age.

Why are the changes needed?

Run JavaSparkSQLExample successfully.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

test in JavaSparkSQLExample .

@maropu
Copy link
Member

maropu commented Mar 16, 2021

ok to test

@maropu
Copy link
Member

maropu commented Mar 16, 2021

Looks fine.

@maropu maropu changed the title [BugFix]fix the bug in issue SPARK-34760. [MINOR] Correct an example error in JavaSQLDataSourceExample Mar 16, 2021
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@dongjoon-hyun dongjoon-hyun changed the title [MINOR] Correct an example error in JavaSQLDataSourceExample [MINOR][EXAMPLES] Replace favorite_color with age in JavaSQLDataSourceExample Mar 16, 2021
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix, @zengruios .

However, this PR introduces inconsistency from Scala/Python examples.

Please see Scala/Python example.

    # $example on:write_partition_and_bucket$
    df = spark.read.parquet("examples/src/main/resources/users.parquet")
    (df
        .write
        .partitionBy("favorite_color")
        .bucketBy(42, "name")
        .saveAsTable("people_partitioned_bucketed"))
    # $example off:write_partition_and_bucket$
    # $example on:write_partition_and_bucket$
    df = spark.read.parquet("examples/src/main/resources/users.parquet")
    (df
        .write
        .partitionBy("favorite_color")
        .bucketBy(42, "name")
        .saveAsTable("people_partitioned_bucketed"))
    # $example off:write_partition_and_bucket$

I guess we need to replace peopleDF with usersDF instead at line 207.

@zengruios
Copy link
Contributor Author

zengruios commented Mar 16, 2021

@dongjoon-hyun, thanks for your suggestion, maybe the table's name should be changed to user_partitioned_bucketed, I will try to fix it like this.

@maropu maropu changed the title [MINOR][EXAMPLES] Replace favorite_color with age in JavaSQLDataSourceExample [SPARK-34760][EXAMPLES][MINOR] Replace favorite_color with age in JavaSQLDataSourceExample Mar 16, 2021
@maropu
Copy link
Member

maropu commented Mar 16, 2021

Could you merge the #31852 fix into this PR? These issues are similar and minor, so merging them looks okay.

@zengruios
Copy link
Contributor Author

@maropu, OK,I will merge them.

@maropu
Copy link
Member

maropu commented Mar 16, 2021

Thank you, @zengruios

@zengruios
Copy link
Contributor Author

@maropu, @dongjoon-hyun, I have update it, can you review it again, thanks!

@HyukjinKwon
Copy link
Member

@yaooqinn maybe can you try merging this as a brand new committer :-)?

@yaooqinn
Copy link
Member

thanks, @HyukjinKwon. I will merge this to master only, is it Okay?

@HyukjinKwon
Copy link
Member

Improvements are not backported in general but looks like this is a bug fix in the example (reading from JIRA) which is usually backported. The JIRA states the affected versions are 3.0.1 and 3.1.1 so I would merge this to branch-3.1 and branch-3.0.

@yaooqinn
Copy link
Member

Yea, LGTM~

@yaooqinn yaooqinn changed the title [SPARK-34760][EXAMPLES][MINOR] Replace favorite_color with age in JavaSQLDataSourceExample [SPARK-34760][EXAMPLES] Replace favorite_color with age in JavaSQLDataSourceExample Mar 18, 2021
@yaooqinn yaooqinn closed this in 5570f81 Mar 18, 2021
@yaooqinn
Copy link
Member

My network is not in good condition at the moment. It took years to fetch and push this PR to master :(.. Now, it's fighting for branch-3.1

yaooqinn pushed a commit that referenced this pull request Mar 18, 2021
…LDataSourceExample

### What changes were proposed in this pull request?
In JavaSparkSQLExample when excecute 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'
throws Exception: 'Exception in thread "main" org.apache.spark.sql.AnalysisException: partition column favorite_color is not defined in table people_partitioned_bucketed, defined table columns are: age, name;'
Change the column favorite_color to age.

### Why are the changes needed?
Run JavaSparkSQLExample successfully.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
test in JavaSparkSQLExample .

Closes #31851 from zengruios/SPARK-34760.

Authored-by: zengruios <578395184@qq.com>
Signed-off-by: Kent Yao <yao@apache.org>
(cherry picked from commit 5570f81)
Signed-off-by: Kent Yao <yao@apache.org>
yaooqinn pushed a commit that referenced this pull request Mar 18, 2021
…LDataSourceExample

### What changes were proposed in this pull request?
In JavaSparkSQLExample when excecute 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'
throws Exception: 'Exception in thread "main" org.apache.spark.sql.AnalysisException: partition column favorite_color is not defined in table people_partitioned_bucketed, defined table columns are: age, name;'
Change the column favorite_color to age.

### Why are the changes needed?
Run JavaSparkSQLExample successfully.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
test in JavaSparkSQLExample .

Closes #31851 from zengruios/SPARK-34760.

Authored-by: zengruios <578395184@qq.com>
Signed-off-by: Kent Yao <yao@apache.org>
(cherry picked from commit 5570f81)
Signed-off-by: Kent Yao <yao@apache.org>
@yaooqinn
Copy link
Member

@zengruios Thanks for your first contribution to Apache Spark.

I have added you as a contributor at the JIRA side, and SPARK-34760 has been assigned to you.

Thanks for the review, @dongjoon-hyun @maropu @HyukjinKwon

Merged to master/3.1/3.0.

@HyukjinKwon
Copy link
Member

👏

@dongjoon-hyun
Copy link
Member

Congratulation, @zengruios and @yaooqinn .

@maropu
Copy link
Member

maropu commented Mar 18, 2021

late lgtm 👏

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…LDataSourceExample

### What changes were proposed in this pull request?
In JavaSparkSQLExample when excecute 'peopleDF.write().partitionBy("favorite_color").bucketBy(42,"name").saveAsTable("people_partitioned_bucketed");'
throws Exception: 'Exception in thread "main" org.apache.spark.sql.AnalysisException: partition column favorite_color is not defined in table people_partitioned_bucketed, defined table columns are: age, name;'
Change the column favorite_color to age.

### Why are the changes needed?
Run JavaSparkSQLExample successfully.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
test in JavaSparkSQLExample .

Closes apache#31851 from zengruios/SPARK-34760.

Authored-by: zengruios <578395184@qq.com>
Signed-off-by: Kent Yao <yao@apache.org>
(cherry picked from commit 5570f81)
Signed-off-by: Kent Yao <yao@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants