Skip to content

[SPARK-28639][CORE][DOC] Configuration doc for Barrier Execution Mode#25370

Closed
sarutak wants to merge 4 commits intoapache:masterfrom
sarutak:barrier-exec-mode-conf-doc
Closed

[SPARK-28639][CORE][DOC] Configuration doc for Barrier Execution Mode#25370
sarutak wants to merge 4 commits intoapache:masterfrom
sarutak:barrier-exec-mode-conf-doc

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Aug 6, 2019

What changes were proposed in this pull request?

SPARK-24817 and SPARK-24819 introduced new 3 non-internal properties for barrier-execution mode but they are not documented.
So I've added a section into configuration.md for barrier-mode execution.

How was this patch tested?

Built using jekyll and confirm the layout by browser.

</table>

### Barrier Mode Execution

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this section below the scheduling section because this is related to scheduling.

@sarutak sarutak changed the title [SPARK-28639][CORE] Configuration doc for Barrier Mode Execution [SPARK-28639][CORE] Configuration doc for Barrier Execution Mode Aug 6, 2019
@sarutak
Copy link
Member Author

sarutak commented Aug 6, 2019

CC: @jiangxb1987

@sarutak sarutak changed the title [SPARK-28639][CORE] Configuration doc for Barrier Execution Mode [SPARK-28639][CORE][DOC] Configuration doc for Barrier Execution Mode Aug 6, 2019
@SparkQA
Copy link

SparkQA commented Aug 6, 2019

Test build #108730 has finished for PR 25370 at commit b813d65.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably good, but are you sure they're meant to be public / documented?

@sarutak
Copy link
Member Author

sarutak commented Aug 7, 2019

I think so because private properties are marked as internal like

  private[spark] val DRIVER_RESOURCES_FILE =
    ConfigBuilder("spark.driver.resourcesFile")
      .internal()
      .doc("Path to a file containing the resources allocated to the driver. " +
        "The file should be formatted as a JSON array of ResourceAllocation objects. " +
        "Only used internally in standalone mode.")
      .stringConf
      .createOptional

but those properties are not marked.

In particular, spark.scheduler.barrier.maxConcurrentTasksCheck.maxFailures should be public for users who want jobs fail fast.

But I'd like to confirm to @jiangxb1987 who introduced those properties, whether he intended they are public, just in case.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jiangxb1987
Copy link
Contributor

Thanks for adding the document, I agree we shall allow end user to customize these config values, though I won't encourage changing the default value.

### Barrier Execution Mode

<table class="table">
<tr><th>Property Name></th><th>Default</th><th>Meaning</th></tr>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we remove the extra > at Property Name>?
If you don't mind, please generate the doc and post the screenshot into the PR description.
That is helpful to spot this kind of typo-related bugs and the broken tables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've attached below.

<td><code>spark.barrier.sync.timeout</code></td>
<td>365d</td>
<td>
The timeout in seconds for each barrier() call from a barrier task. If the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<code>barrier()<code>?

The timeout in seconds for each barrier() call from a barrier task. If the
coordinator didn't receive all the sync messages from barrier tasks within the
configed time, throw a SparkException to fail all the tasks. The default value is set
to 31536000(3600 * 24 * 365) so the barrier() call shall wait for one year.
Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<code>barrier()<code>.

@sarutak
Copy link
Member Author

sarutak commented Aug 9, 2019

Screenshot 2019-08-09 14 19 07

@SparkQA
Copy link

SparkQA commented Aug 9, 2019

Test build #108865 has finished for PR 25370 at commit c987752.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

<td>
The timeout in seconds for each <code>barrier()</code> call from a barrier task. If the
coordinator didn't receive all the sync messages from barrier tasks within the
configed time, throw a SparkException to fail all the tasks. The default value is set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: configed -> configured

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Actually, this description is from here. So it might be better to modify it too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. This is because other sections use configured.

@SparkQA
Copy link

SparkQA commented Aug 9, 2019

Test build #108883 has finished for PR 25370 at commit 15d6909.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 11, 2019

Merged to master

@srowen srowen closed this in 31ef268 Aug 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants