Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(CDAP-17287) Collects Spark execution event logs #12782

Merged
merged 1 commit into from Sep 24, 2020

Conversation

chtyim
Copy link
Contributor

@chtyim chtyim commented Sep 23, 2020

This PR has three commits:

  1. Make runtime system services shutdown after the SparkExecutionContext stopped. Currently they stop concurrently. It is not a problem before this PR since we are not doing any important work at the context stop. However, this PR introduces uploading the event logs at the context stop, which needs to be completed before the system services shutdown and process termination.
  2. Just a refactoring of the SparkRuntimeEnv class to simplify the logic.
  3. Introducing the Spark event logging and upload of logs through the runtime service. Based on the cConf settings, the event logging collection will be on/off regardless of the Spark job spark.eventLog.* configurations.

Copy link
Contributor

@dli357 dli357 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

throw new UnsupportedOperationException("Spark event logs collection is not enabled");
}

if (!"application/octet-stream".equals(request.headers().get(HttpHeaderNames.CONTENT_TYPE))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use javax.ws.rs.core.MediaType.APPLICATION_OCTET_STREAM?

HttpURLConnection urlConn = remoteClient.openConnection(HttpMethod.POST, path);
try {
urlConn.setChunkedStreamingMode(CHUNK_SIZE);
urlConn.setRequestProperty(HttpHeaders.CONTENT_TYPE, "application/octet-stream");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use javax.ws.rs.core.MediaType.APPLICATION_OCTET_STREAM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

@dli357 dli357 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after minor fix

@chtyim chtyim force-pushed the feature/CDAP-17287-collect-spark-history branch 2 times, most recently from 0277c88 to 83144b4 Compare September 23, 2020 22:34
- Introducing the Spark event logging and upload of logs through the runtime service.
  Based on the cConf settings, the event logging collection will be on/off regardless
  of the Spark job spark.eventLog.* configurations.
- Refactoring of the SparkRuntimeEnv class to simplify the logic by using CompletableFuture.
- Make runtime system services shutdown after the SparkExecutionContext stopped.
  Currently they stop concurrently. It is not a problem before this PR since we
  are not doing any important work at the context stop.
  However, this PR introduces uploading the event logs at the context stop,
  which needs to be completed before the system services shutdown and process termination.
@chtyim chtyim force-pushed the feature/CDAP-17287-collect-spark-history branch from 83144b4 to 63aa9a7 Compare September 23, 2020 23:20
@chtyim
Copy link
Contributor Author

chtyim commented Sep 24, 2020

@chtyim chtyim merged commit 3e8fd78 into develop Sep 24, 2020
@chtyim chtyim deleted the feature/CDAP-17287-collect-spark-history branch September 24, 2020 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants