[SPARK-26667][DOC] Add `Scanning Input Table` to Performance Tuning Guide by Deegue · Pull Request #23593 · apache/spark

Deegue · 2019-01-19T10:13:42Z

What changes were proposed in this pull request?

We can use CombineTextInputFormat instead of TextInputFormat and set configurations to increase the speed when reading tables.

There's no need to add spark configurations, so add it to the Performance Tuning.

This part of the document will be like :

Linked to #23506

How was this patch tested?

Manually tested

AmplabJenkins · 2019-01-19T10:19:38Z

Can one of the admins verify this patch?

srowen · 2019-01-19T15:52:35Z

docs/sql-performance-tuning.md

This at least needs some proofreading, like hadoop -> Hadoop, Max -> max, we scanning -> scanning.
This seems fairly niche and not so SQL related.

Proofread. I added it in SQL Tuning because we can set it before executing a SQL. I'd appreciate it if you could give more suggestions.

dongjoon-hyun · 2019-01-19T20:16:14Z

Could you attach the screenshot of newly added documentation part?

srowen · 2019-01-19T20:19:25Z

I don't think a screenshot helps? it's just text, not a UI change.

Deegue · 2019-01-20T02:57:33Z

Could you attach the screenshot of newly added documentation part?

Added a screenshot.

HyukjinKwon · 2019-01-20T11:03:33Z

docs/sql-performance-tuning.md

I think this only applies when we read Hive table from Spark. BTW, is it something we should document at Spark side if so?

Done, I think we can optimize the input format at Spark side (#23506). Maybe we can document these configurations to tell those who want to tune Spark SQL?

dongjoon-hyun · 2019-07-07T23:54:52Z

Hi, @Deegue . Are you still working on this PR?

Deegue · 2019-07-08T02:32:57Z

Hi, @Deegue . Are you still working on this PR?

Yes, I'm waiting for more reviews or I think maybe, we could merge this one.

srowen

I don't disagree with the general comments here, but I'm not sure it adds enough concise and precise information to be worthwhile. It would need to be significantly rewritten. It's also fairly specific to HDFS, right?

Deegue · 2019-07-08T02:41:31Z

I don't disagree with the general comments here, but I'm not sure it adds enough concise and precise information to be worthwhile. It would need to be significantly rewritten. It's also fairly specific to HDFS, right?

Yes, you are right. I will rewrite the docs and make it concise and precise.

Deegue · 2019-07-19T02:18:42Z

Rewrote the doc and updated the screenshot, could you please review again? @srowen @dongjoon-hyun
I think maybe it's better than before..

HyukjinKwon · 2019-07-25T13:23:42Z

I am with @srowen. I feel like it should better be in HDFS and users should read that.

Edit the performance doc

c676b3c

srowen reviewed Jan 19, 2019

View reviewed changes

Proofread

8ace451

HyukjinKwon reviewed Jan 20, 2019

View reviewed changes

Table -> Hive Table

b9d5c3a

dongjoon-hyun added the DOCUMENTATION label Jun 14, 2019

dongjoon-hyun changed the title ~~[SPARK-26667][DOC]Add Scanning Input Table to Performance Tuning Guide~~ [SPARK-26667][DOC] Add Scanning Input Table to Performance Tuning Guide Jul 7, 2019

srowen reviewed Jul 8, 2019

View reviewed changes

edit

93fd228

srowen closed this Aug 7, 2019

Conversation

Deegue commented Jan 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

AmplabJenkins commented Jan 19, 2019

Uh oh!

srowen Jan 19, 2019

Choose a reason for hiding this comment

Uh oh!

Deegue Jan 19, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jan 19, 2019

Uh oh!

srowen commented Jan 19, 2019

Uh oh!

Deegue commented Jan 20, 2019

Uh oh!

HyukjinKwon Jan 20, 2019

Choose a reason for hiding this comment

Uh oh!

Deegue Jan 20, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jul 7, 2019

Uh oh!

Deegue commented Jul 8, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

Deegue commented Jul 8, 2019

Uh oh!

Deegue commented Jul 19, 2019

Uh oh!

HyukjinKwon commented Jul 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Deegue commented Jan 19, 2019 •

edited

Loading