Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27464][CORE] Added Constant instead of referring string literal used from many places #24368

Closed
wants to merge 3 commits into from

Conversation

shivusondur
Copy link
Contributor

What changes were proposed in this pull request?

Added Constant instead of referring the same String literal "spark.buffer.pageSize" from many places

How was this patch tested?

Run the corresponding Unit Test Cases manually.

@HyukjinKwon
Copy link
Member

ok to test

@@ -1303,4 +1303,9 @@ package object config {
.doc("Staging directory used while submitting applications.")
.stringConf
.createOptional

private[spark] val BUFFER_PAGESIZE = ConfigBuilder("spark.buffer.pageSize")
.bytesConf(ByteUnit.BYTE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon
Thanks for your time.
Now I added the doc and corrected the scalastyle issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say this needs a default, but it looks like the default in non-test code is not a single value, and different tests want different defaults.

@SparkQA
Copy link

SparkQA commented Apr 14, 2019

Test build #104570 has finished for PR 24368 at commit bc815de.

  • This patch fails Java style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

> corrected the scalasyle issue
@@ -1303,4 +1303,9 @@ package object config {
.doc("Staging directory used while submitting applications.")
.stringConf
.createOptional

private[spark] val BUFFER_PAGESIZE = ConfigBuilder("spark.buffer.pageSize")
.bytesConf(ByteUnit.BYTE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say this needs a default, but it looks like the default in non-test code is not a single value, and different tests want different defaults.

@@ -255,7 +255,7 @@ private[spark] abstract class MemoryManager(
}
val size = ByteArrayMethods.nextPowerOf2(maxTungstenMemory / cores / safetyFactor)
val default = math.min(maxPageSize, math.max(minPageSize, size))
conf.getSizeAsBytes("spark.buffer.pageSize", default)
conf.getSizeAsBytes(BUFFER_PAGESIZE.key, default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a byteConf, can this just use conf.get()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen
Updated the code to use conf.get.
initially, I thought the same, But there is no method to provide the default value sparkConf class.
Now I am checking the values, if not present initializing with a default value.

@SparkQA
Copy link

SparkQA commented Apr 14, 2019

Test build #104571 has finished for PR 24368 at commit eba8417.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 14, 2019

Test build #104577 has finished for PR 24368 at commit 8cf7d81.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivusondur shivusondur changed the title [MINOR][CORE] Added Constant instead of referring string literal used from many places [SPARK-27464][CORE] Added Constant instead of referring string literal used from many places Apr 15, 2019
Copy link
Contributor

@attilapiros attilapiros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small question otherwise LGTM.

@srowen
Copy link
Member

srowen commented Apr 16, 2019

Merged to master

@srowen srowen closed this in 88d9de2 Apr 16, 2019
@shivusondur
Copy link
Contributor Author

@srowen
Thanks for merging

dongjoon-hyun pushed a commit that referenced this pull request May 5, 2023
### What changes were proposed in this pull request?

This PR fixes a regression introduced by #36885 which broke JsonProtocol's ability to handle missing fields from exception field. old eventlogs missing a `Stack Trace` will raise a NPE.
As a result, SHS misinterprets  failed-jobs/SQLs as `Active/Incomplete`

This PR solves this problem by checking the JsonNode for null. If it is null, an empty array of `StackTraceElements`

### Why are the changes needed?

Fix a case which prevents the history server from identifying failed jobs if the stacktrace was not set.

Example eventlog

```
{
   "Event":"SparkListenerJobEnd",
   "Job ID":31,
   "Completion Time":1616171909785,
   "Job Result":{
      "Result":"JobFailed",
      "Exception":{
         "Message":"Job aborted"
      }
   }
}
```

**Original behavior:**

The job is marked as `incomplete`

Error from the SHS logs:

```
23/05/01 21:57:16 INFO FsHistoryProvider: Parsing file:/tmp/nds_q86_fail_test to re-build UI...
23/05/01 21:57:17 ERROR ReplayListenerBus: Exception parsing Spark event log: file:/tmp/nds_q86_fail_test
java.lang.NullPointerException
    at org.apache.spark.util.JsonProtocol$JsonNodeImplicits.extractElements(JsonProtocol.scala:1589)
    at org.apache.spark.util.JsonProtocol$.stackTraceFromJson(JsonProtocol.scala:1558)
    at org.apache.spark.util.JsonProtocol$.exceptionFromJson(JsonProtocol.scala:1569)
    at org.apache.spark.util.JsonProtocol$.jobResultFromJson(JsonProtocol.scala:1423)
    at org.apache.spark.util.JsonProtocol$.jobEndFromJson(JsonProtocol.scala:967)
    at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:878)
    at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:865)
....
23/05/01 21:57:17 ERROR ReplayListenerBus: Malformed line #24368: {"Event":"SparkListenerJobEnd","Job ID":31,"Completion Time":1616171909785,"Job Result":{"Result":"JobFailed","Exception":
{"Message":"Job aborted"}
}}
```

**After the fix:**

Job 31 is marked as `failedJob`

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added new unit test in JsonProtocolSuite.

Closes #41050 from amahussein/aspark-43340-b.

Authored-by: Ahmed Hussein <ahussein@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request May 5, 2023
This PR fixes a regression introduced by #36885 which broke JsonProtocol's ability to handle missing fields from exception field. old eventlogs missing a `Stack Trace` will raise a NPE.
As a result, SHS misinterprets  failed-jobs/SQLs as `Active/Incomplete`

This PR solves this problem by checking the JsonNode for null. If it is null, an empty array of `StackTraceElements`

Fix a case which prevents the history server from identifying failed jobs if the stacktrace was not set.

Example eventlog

```
{
   "Event":"SparkListenerJobEnd",
   "Job ID":31,
   "Completion Time":1616171909785,
   "Job Result":{
      "Result":"JobFailed",
      "Exception":{
         "Message":"Job aborted"
      }
   }
}
```

**Original behavior:**

The job is marked as `incomplete`

Error from the SHS logs:

```
23/05/01 21:57:16 INFO FsHistoryProvider: Parsing file:/tmp/nds_q86_fail_test to re-build UI...
23/05/01 21:57:17 ERROR ReplayListenerBus: Exception parsing Spark event log: file:/tmp/nds_q86_fail_test
java.lang.NullPointerException
    at org.apache.spark.util.JsonProtocol$JsonNodeImplicits.extractElements(JsonProtocol.scala:1589)
    at org.apache.spark.util.JsonProtocol$.stackTraceFromJson(JsonProtocol.scala:1558)
    at org.apache.spark.util.JsonProtocol$.exceptionFromJson(JsonProtocol.scala:1569)
    at org.apache.spark.util.JsonProtocol$.jobResultFromJson(JsonProtocol.scala:1423)
    at org.apache.spark.util.JsonProtocol$.jobEndFromJson(JsonProtocol.scala:967)
    at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:878)
    at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:865)
....
23/05/01 21:57:17 ERROR ReplayListenerBus: Malformed line #24368: {"Event":"SparkListenerJobEnd","Job ID":31,"Completion Time":1616171909785,"Job Result":{"Result":"JobFailed","Exception":
{"Message":"Job aborted"}
}}
```

**After the fix:**

Job 31 is marked as `failedJob`

No.

Added new unit test in JsonProtocolSuite.

Closes #41050 from amahussein/aspark-43340-b.

Authored-by: Ahmed Hussein <ahussein@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit dcd710d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request May 10, 2023
### What changes were proposed in this pull request?

This PR fixes a regression introduced by apache#36885 which broke JsonProtocol's ability to handle missing fields from exception field. old eventlogs missing a `Stack Trace` will raise a NPE.
As a result, SHS misinterprets  failed-jobs/SQLs as `Active/Incomplete`

This PR solves this problem by checking the JsonNode for null. If it is null, an empty array of `StackTraceElements`

### Why are the changes needed?

Fix a case which prevents the history server from identifying failed jobs if the stacktrace was not set.

Example eventlog

```
{
   "Event":"SparkListenerJobEnd",
   "Job ID":31,
   "Completion Time":1616171909785,
   "Job Result":{
      "Result":"JobFailed",
      "Exception":{
         "Message":"Job aborted"
      }
   }
}
```

**Original behavior:**

The job is marked as `incomplete`

Error from the SHS logs:

```
23/05/01 21:57:16 INFO FsHistoryProvider: Parsing file:/tmp/nds_q86_fail_test to re-build UI...
23/05/01 21:57:17 ERROR ReplayListenerBus: Exception parsing Spark event log: file:/tmp/nds_q86_fail_test
java.lang.NullPointerException
    at org.apache.spark.util.JsonProtocol$JsonNodeImplicits.extractElements(JsonProtocol.scala:1589)
    at org.apache.spark.util.JsonProtocol$.stackTraceFromJson(JsonProtocol.scala:1558)
    at org.apache.spark.util.JsonProtocol$.exceptionFromJson(JsonProtocol.scala:1569)
    at org.apache.spark.util.JsonProtocol$.jobResultFromJson(JsonProtocol.scala:1423)
    at org.apache.spark.util.JsonProtocol$.jobEndFromJson(JsonProtocol.scala:967)
    at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:878)
    at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:865)
....
23/05/01 21:57:17 ERROR ReplayListenerBus: Malformed line apache#24368: {"Event":"SparkListenerJobEnd","Job ID":31,"Completion Time":1616171909785,"Job Result":{"Result":"JobFailed","Exception":
{"Message":"Job aborted"}
}}
```

**After the fix:**

Job 31 is marked as `failedJob`

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added new unit test in JsonProtocolSuite.

Closes apache#41050 from amahussein/aspark-43340-b.

Authored-by: Ahmed Hussein <ahussein@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
This PR fixes a regression introduced by apache#36885 which broke JsonProtocol's ability to handle missing fields from exception field. old eventlogs missing a `Stack Trace` will raise a NPE.
As a result, SHS misinterprets  failed-jobs/SQLs as `Active/Incomplete`

This PR solves this problem by checking the JsonNode for null. If it is null, an empty array of `StackTraceElements`

Fix a case which prevents the history server from identifying failed jobs if the stacktrace was not set.

Example eventlog

```
{
   "Event":"SparkListenerJobEnd",
   "Job ID":31,
   "Completion Time":1616171909785,
   "Job Result":{
      "Result":"JobFailed",
      "Exception":{
         "Message":"Job aborted"
      }
   }
}
```

**Original behavior:**

The job is marked as `incomplete`

Error from the SHS logs:

```
23/05/01 21:57:16 INFO FsHistoryProvider: Parsing file:/tmp/nds_q86_fail_test to re-build UI...
23/05/01 21:57:17 ERROR ReplayListenerBus: Exception parsing Spark event log: file:/tmp/nds_q86_fail_test
java.lang.NullPointerException
    at org.apache.spark.util.JsonProtocol$JsonNodeImplicits.extractElements(JsonProtocol.scala:1589)
    at org.apache.spark.util.JsonProtocol$.stackTraceFromJson(JsonProtocol.scala:1558)
    at org.apache.spark.util.JsonProtocol$.exceptionFromJson(JsonProtocol.scala:1569)
    at org.apache.spark.util.JsonProtocol$.jobResultFromJson(JsonProtocol.scala:1423)
    at org.apache.spark.util.JsonProtocol$.jobEndFromJson(JsonProtocol.scala:967)
    at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:878)
    at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:865)
....
23/05/01 21:57:17 ERROR ReplayListenerBus: Malformed line apache#24368: {"Event":"SparkListenerJobEnd","Job ID":31,"Completion Time":1616171909785,"Job Result":{"Result":"JobFailed","Exception":
{"Message":"Job aborted"}
}}
```

**After the fix:**

Job 31 is marked as `failedJob`

No.

Added new unit test in JsonProtocolSuite.

Closes apache#41050 from amahussein/aspark-43340-b.

Authored-by: Ahmed Hussein <ahussein@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit dcd710d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
GladwinLee pushed a commit to lyft/spark that referenced this pull request Oct 10, 2023
This PR fixes a regression introduced by apache#36885 which broke JsonProtocol's ability to handle missing fields from exception field. old eventlogs missing a `Stack Trace` will raise a NPE.
As a result, SHS misinterprets  failed-jobs/SQLs as `Active/Incomplete`

This PR solves this problem by checking the JsonNode for null. If it is null, an empty array of `StackTraceElements`

Fix a case which prevents the history server from identifying failed jobs if the stacktrace was not set.

Example eventlog

```
{
   "Event":"SparkListenerJobEnd",
   "Job ID":31,
   "Completion Time":1616171909785,
   "Job Result":{
      "Result":"JobFailed",
      "Exception":{
         "Message":"Job aborted"
      }
   }
}
```

**Original behavior:**

The job is marked as `incomplete`

Error from the SHS logs:

```
23/05/01 21:57:16 INFO FsHistoryProvider: Parsing file:/tmp/nds_q86_fail_test to re-build UI...
23/05/01 21:57:17 ERROR ReplayListenerBus: Exception parsing Spark event log: file:/tmp/nds_q86_fail_test
java.lang.NullPointerException
    at org.apache.spark.util.JsonProtocol$JsonNodeImplicits.extractElements(JsonProtocol.scala:1589)
    at org.apache.spark.util.JsonProtocol$.stackTraceFromJson(JsonProtocol.scala:1558)
    at org.apache.spark.util.JsonProtocol$.exceptionFromJson(JsonProtocol.scala:1569)
    at org.apache.spark.util.JsonProtocol$.jobResultFromJson(JsonProtocol.scala:1423)
    at org.apache.spark.util.JsonProtocol$.jobEndFromJson(JsonProtocol.scala:967)
    at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:878)
    at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:865)
....
23/05/01 21:57:17 ERROR ReplayListenerBus: Malformed line apache#24368: {"Event":"SparkListenerJobEnd","Job ID":31,"Completion Time":1616171909785,"Job Result":{"Result":"JobFailed","Exception":
{"Message":"Job aborted"}
}}
```

**After the fix:**

Job 31 is marked as `failedJob`

No.

Added new unit test in JsonProtocolSuite.

Closes apache#41050 from amahussein/aspark-43340-b.

Authored-by: Ahmed Hussein <ahussein@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit dcd710d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
catalinii pushed a commit to lyft/spark that referenced this pull request Oct 10, 2023
This PR fixes a regression introduced by apache#36885 which broke JsonProtocol's ability to handle missing fields from exception field. old eventlogs missing a `Stack Trace` will raise a NPE.
As a result, SHS misinterprets  failed-jobs/SQLs as `Active/Incomplete`

This PR solves this problem by checking the JsonNode for null. If it is null, an empty array of `StackTraceElements`

Fix a case which prevents the history server from identifying failed jobs if the stacktrace was not set.

Example eventlog

```
{
   "Event":"SparkListenerJobEnd",
   "Job ID":31,
   "Completion Time":1616171909785,
   "Job Result":{
      "Result":"JobFailed",
      "Exception":{
         "Message":"Job aborted"
      }
   }
}
```

**Original behavior:**

The job is marked as `incomplete`

Error from the SHS logs:

```
23/05/01 21:57:16 INFO FsHistoryProvider: Parsing file:/tmp/nds_q86_fail_test to re-build UI...
23/05/01 21:57:17 ERROR ReplayListenerBus: Exception parsing Spark event log: file:/tmp/nds_q86_fail_test
java.lang.NullPointerException
    at org.apache.spark.util.JsonProtocol$JsonNodeImplicits.extractElements(JsonProtocol.scala:1589)
    at org.apache.spark.util.JsonProtocol$.stackTraceFromJson(JsonProtocol.scala:1558)
    at org.apache.spark.util.JsonProtocol$.exceptionFromJson(JsonProtocol.scala:1569)
    at org.apache.spark.util.JsonProtocol$.jobResultFromJson(JsonProtocol.scala:1423)
    at org.apache.spark.util.JsonProtocol$.jobEndFromJson(JsonProtocol.scala:967)
    at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:878)
    at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:865)
....
23/05/01 21:57:17 ERROR ReplayListenerBus: Malformed line apache#24368: {"Event":"SparkListenerJobEnd","Job ID":31,"Completion Time":1616171909785,"Job Result":{"Result":"JobFailed","Exception":
{"Message":"Job aborted"}
}}
```

**After the fix:**

Job 31 is marked as `failedJob`

No.

Added new unit test in JsonProtocolSuite.

Closes apache#41050 from amahussein/aspark-43340-b.

Authored-by: Ahmed Hussein <ahussein@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit dcd710d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants