Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command #24903

Closed
wants to merge 12 commits into from

Conversation

@sujith71955
Copy link
Contributor

commented Jun 18, 2019

What changes were proposed in this pull request?

LOAD DATA command resolves the partition column name as case sensitive manner,
where as in insert commandthe partition column name will be resolved using
the SQLConf resolver where the names will be resolved based on spark.sql.caseSensitive property. Same logic can be applied for resolving the partition column names in LOAD COMMAND.

Why are the changes needed?

It's to handle the partition column name correctly according to the configuration.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UT and manual testing.

@sujith71955 sujith71955 changed the title [SPARK-28084][SPARK-SQL]Resolving the partition column name based on the resolver which follows spark.sql.caseSensitive property configuration value. [SPARK-28084][SPARK-SQL]Resolving the partition column name based on the resolver in sql load command Jun 18, 2019
@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Jun 18, 2019

@dongjoon-hyun Please review and let me know for any clarifications.

@SparkQA

This comment has been minimized.

Copy link

commented Jun 18, 2019

Test build #106634 has finished for PR 24903 at commit b9962eb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28084][SPARK-SQL]Resolving the partition column name based on the resolver in sql load command [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command Jun 18, 2019
Copy link
Member

left a comment

Could you add a test case for LOAD DATA?

@dongjoon-hyun dongjoon-hyun added the SQL label Jun 18, 2019
@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Jun 19, 2019

@dongjoon-hyun sure, will update the changes based on the comments

@HyukjinKwon

This comment has been minimized.

Copy link
Member

commented Jun 20, 2019

Can we add a simple test?

@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Jun 21, 2019

sorry folks, stuck up with some personal work, will update the PR soon. thanks for all the suggestions.

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Jul 7, 2019

Gentle ping, @sujith71955 .

@sujith71955 sujith71955 force-pushed the sujith71955:master_paritionColName branch to b316b7b Jul 8, 2019
@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

@ALL : Handled the comments, please review and let me know for any clarifications/suggestions. thanks

@SparkQA

This comment has been minimized.

Copy link

commented Jul 8, 2019

Test build #107340 has finished for PR 24903 at commit b316b7b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Jul 8, 2019

Test build #107339 has finished for PR 24903 at commit cde7b2e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Jul 15, 2019

@dongjoon-hyun Gentle ping, Let me know for any inputs. thanks all for the valuable inputs.

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Jul 16, 2019

Retest this please.

Copy link
Member

left a comment

I left a few comments, @sujith71955 . The UT is the most important comment. Could you update your PR, please?

@SparkQA

This comment has been minimized.

Copy link

commented Jul 16, 2019

Test build #107718 has finished for PR 24903 at commit b316b7b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@maropu

This comment has been minimized.

Copy link
Member

commented Jul 16, 2019

retest this please

@sujith71955 sujith71955 force-pushed the sujith71955:master_paritionColName branch from f941efd to 386343a Jul 16, 2019
@SparkQA

This comment has been minimized.

Copy link

commented Jul 16, 2019

Test build #107741 has finished for PR 24903 at commit b316b7b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Jul 16, 2019

Can we add a simple test?

done

…the resolver which follows spark.sql.caseSensitive property configuration value.

What changes were proposed in this pull request?
LOAD DATA command resolves the partition column name as case sensitive manner, where as in insert command
the partition column name will be resolved using the SQLConf resolver where the names will be  resolved  based on
spark.sql.caseSensitive property. same logic can be applied for resolving the partition column names in LOAD COMMAND.

How was this patch tested?
Existing UT and manual testing.
@SparkQA

This comment has been minimized.

Copy link

commented Aug 27, 2019

Test build #109796 has finished for PR 24903 at commit 8c2a053.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Aug 27, 2019

@dongjoon-hyun @maropu @HyukjinKwon @dilipbiswal Updated the logic a bit, currently i am normalizing the paritionspec using an existing API PartitioningUtils.normalizePartitionSpec().
Please let me know for any suggestions.

@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Aug 27, 2019

@maropu
Moved the testcase to SQLQuerySuite.scala as DDLParserSuite basically does the verification of parsing and plan resolution, nowhere i can find a testcase with execution flow, In my case the resolution happens while Load command execution. Hope its fine, please suggest. Thanks

@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Aug 29, 2019

3 similar comments
@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Sep 4, 2019

@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Sep 9, 2019

@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Sep 11, 2019

@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Sep 13, 2019

@ALL please guide me if any gap is still present in this PR, i am ready to work.Thanks for all your review comments and valuable time.

@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Sep 17, 2019

gentle ping @dongjoon-hyun @maropu

@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2019

@HyukjinKwon

This comment has been minimized.

Copy link
Member

commented Sep 29, 2019

retest this please

Copy link
Member

left a comment

Seems right to me. cc @viirya who wrote the codes too

@SparkQA

This comment has been minimized.

Copy link

commented Sep 29, 2019

Test build #111566 has finished for PR 24903 at commit 8c2a053.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@viirya
viirya approved these changes Sep 29, 2019
Copy link
Member

left a comment

Thanks @HyukjinKwon pinging me. Not aware of this PR. There were good reviews from other reviewers. Just few comments about style. Otherwise looks good.

@sujith71955 sujith71955 force-pushed the sujith71955:master_paritionColName branch from 81e07f7 to 7eb3e3c Oct 1, 2019
@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Oct 1, 2019

@viirya Fixed the comments. thanks for the suggestions.

@SparkQA

This comment has been minimized.

Copy link

commented Oct 1, 2019

Test build #111653 has finished for PR 24903 at commit 81e07f7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Oct 1, 2019

Test build #111654 has finished for PR 24903 at commit 7eb3e3c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
…the resolver which follows spark.sql.caseSensitive property configuration value.

What changes were proposed in this pull request?
LOAD DATA command resolves the partition column name as case sensitive manner, where as in insert command
the partition column name will be resolved using the SQLConf resolver where the names will be  resolved  based on
spark.sql.caseSensitive property. same logic can be applied for resolving the partition column names in LOAD COMMAND.

How was this patch tested?
Existing UT and manual testing.
@sujith71955 sujith71955 force-pushed the sujith71955:master_paritionColName branch from 7eb3e3c to ca4b85c Oct 2, 2019
@SparkQA

This comment has been minimized.

Copy link

commented Oct 2, 2019

Test build #111671 has finished for PR 24903 at commit ca4b85c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@viirya

This comment has been minimized.

Copy link
Member

commented Oct 2, 2019

retest this please.

@SparkQA

This comment has been minimized.

Copy link

commented Oct 2, 2019

Test build #111677 has finished for PR 24903 at commit ca4b85c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@viirya

This comment has been minimized.

Copy link
Member

commented Oct 2, 2019

Looks good to me now.

@maropu

This comment has been minimized.

Copy link
Member

commented Oct 2, 2019

The code itself looks good to me, too. Could you use the new PR template for better commit logs? https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE
cc: @HyukjinKwon @dongjoon-hyun

Copy link
Member

left a comment

+1, LGTM.
Thank you so much, @sujith71955 , @maropu , @HyukjinKwon @viirya .
Sorry for being late. I updated the PR description, too.
Merged to master.

@sujith71955

This comment has been minimized.

Copy link
Contributor Author

commented Oct 3, 2019

+1, LGTM.
Thank you so much, @sujith71955 , @maropu , @HyukjinKwon @viirya .
Sorry for being late. I updated the PR description, too.
Merged to master.

Thanks all for your valuable time and suggestions .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.