Skip to content

[SPARK-26667][DOC] Add Scanning Input Table to Performance Tuning Guide#23593

Closed
Deegue wants to merge 4 commits intomasterfrom
unknown repository
Closed

[SPARK-26667][DOC] Add Scanning Input Table to Performance Tuning Guide#23593
Deegue wants to merge 4 commits intomasterfrom
unknown repository

Conversation

@Deegue
Copy link
Contributor

@Deegue Deegue commented Jan 19, 2019

What changes were proposed in this pull request?

We can use CombineTextInputFormat instead of TextInputFormat and set configurations to increase the speed when reading tables.

There's no need to add spark configurations, so add it to the Performance Tuning.

This part of the document will be like :
image

Linked to #23506

How was this patch tested?

Manually tested

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This at least needs some proofreading, like hadoop -> Hadoop, Max -> max, we scanning -> scanning.
This seems fairly niche and not so SQL related.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proofread. I added it in SQL Tuning because we can set it before executing a SQL. I'd appreciate it if you could give more suggestions.

@dongjoon-hyun
Copy link
Member

Could you attach the screenshot of newly added documentation part?

@srowen
Copy link
Member

srowen commented Jan 19, 2019

I don't think a screenshot helps? it's just text, not a UI change.

@Deegue
Copy link
Contributor Author

Deegue commented Jan 20, 2019

Could you attach the screenshot of newly added documentation part?

Added a screenshot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only applies when we read Hive table from Spark. BTW, is it something we should document at Spark side if so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I think we can optimize the input format at Spark side (#23506). Maybe we can document these configurations to tell those who want to tune Spark SQL?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-26667][DOC]Add Scanning Input Table to Performance Tuning Guide [SPARK-26667][DOC] Add Scanning Input Table to Performance Tuning Guide Jul 7, 2019
@dongjoon-hyun
Copy link
Member

Hi, @Deegue . Are you still working on this PR?

@Deegue
Copy link
Contributor Author

Deegue commented Jul 8, 2019

Hi, @Deegue . Are you still working on this PR?

Yes, I'm waiting for more reviews or I think maybe, we could merge this one.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree with the general comments here, but I'm not sure it adds enough concise and precise information to be worthwhile. It would need to be significantly rewritten. It's also fairly specific to HDFS, right?

@Deegue
Copy link
Contributor Author

Deegue commented Jul 8, 2019

I don't disagree with the general comments here, but I'm not sure it adds enough concise and precise information to be worthwhile. It would need to be significantly rewritten. It's also fairly specific to HDFS, right?

Yes, you are right. I will rewrite the docs and make it concise and precise.

@Deegue
Copy link
Contributor Author

Deegue commented Jul 19, 2019

Rewrote the doc and updated the screenshot, could you please review again? @srowen @dongjoon-hyun
I think maybe it's better than before..

@HyukjinKwon
Copy link
Member

I am with @srowen. I feel like it should better be in HDFS and users should read that.

@srowen srowen closed this Aug 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments