Skip to content

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Feb 21, 2025

What changes were proposed in this pull request?

This PR proposes to use appropriate structure of JSON format for PySparkLogger by updating JSONFormatter.

Why are the changes needed?

To align the log structure with JVM side. Currently the "stacktrace" does not align.

Does this PR introduce any user-facing change?

The output JSON structure will include "stacktrace" in correct format:

Before

{
  "ts": "2025-02-21 15:40:52,043",
  "level": "ERROR",
  "logger": "TestLogger",
  "msg": "Exception occurred",
  "context": {"user": "test_user_stacktrace"},
  "exception": {
    "class": "ValueError",
    "msg": "Test Exception",
    "stacktrace": [
      "Traceback (most recent call last):", "  File \"/spark/python/pyspark/logger/tests/test_logger.py\", line 115, in test_log_exception_with_stacktrace", "    raise ValueError(\"Test Exception\")", "ValueError: Test Exception"
    ]
  }
}

After

{
  "ts": "2025-02-21 15:40:52,043",
  "level": "ERROR",
  "logger": "TestLogger",
  "msg": "Exception occurred",
  "context": {"user": "test_user_stacktrace"},
  "exception": {
    "class": "ValueError",
    "msg": "Test Exception",
    "stacktrace": [
      {
        "class": null,
        "method": "test_log_exception_with_stacktrace",
        "file": "/spark/python/pyspark/logger/tests/test_logger.py",
        "line": "115"
      }
    ]
  }
}

How was this patch tested?

Added related UT

Was this patch authored or co-authored using generative AI tooling?

No

@itholic itholic requested a review from ueshin February 21, 2025 07:00
Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM, pending tests.

@itholic itholic closed this in 67a337e Feb 24, 2025
itholic added a commit that referenced this pull request Feb 24, 2025
…ySparkLogger`

### What changes were proposed in this pull request?

This PR proposes to use appropriate structure of JSON format for `PySparkLogger` by updating `JSONFormatter`.

### Why are the changes needed?

To align the log structure with JVM side. Currently the "stacktrace" does not align.

### Does this PR introduce _any_ user-facing change?

The output JSON structure will include "stacktrace" in correct format:

**Before**
```json
{
  "ts": "2025-02-21 15:40:52,043",
  "level": "ERROR",
  "logger": "TestLogger",
  "msg": "Exception occurred",
  "context": {"user": "test_user_stacktrace"},
  "exception": {
    "class": "ValueError",
    "msg": "Test Exception",
    "stacktrace": [
      "Traceback (most recent call last):", "  File \"/spark/python/pyspark/logger/tests/test_logger.py\", line 115, in test_log_exception_with_stacktrace", "    raise ValueError(\"Test Exception\")", "ValueError: Test Exception"
    ]
  }
}
```

**After**
```json
{
  "ts": "2025-02-21 15:40:52,043",
  "level": "ERROR",
  "logger": "TestLogger",
  "msg": "Exception occurred",
  "context": {"user": "test_user_stacktrace"},
  "exception": {
    "class": "ValueError",
    "msg": "Test Exception",
    "stacktrace": [
      {
        "class": null,
        "method": "test_log_exception_with_stacktrace",
        "file": "/spark/python/pyspark/logger/tests/test_logger.py",
        "line": "115"
      }
    ]
  }
}
```

### How was this patch tested?

Added related UT

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #50038 from itholic/fix_json_format.

Lead-authored-by: Haejoon Lee <haejoon.lee@databricks.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Haejoon Lee <haejoon.lee@databricks.com>
(cherry picked from commit 67a337e)
Signed-off-by: Haejoon Lee <haejoon.lee@databricks.com>
@itholic
Copy link
Contributor Author

itholic commented Feb 24, 2025

Merged to master and branch-4.0.
Thanks @ueshin @HyukjinKwon for the review!

HyukjinKwon pushed a commit that referenced this pull request Feb 28, 2025
### What changes were proposed in this pull request?

This PR followups #50038 to update documentation following the fixed JSON format.

### Why are the changes needed?

To match the docs with real logging format

### Does this PR introduce _any_ user-facing change?

No API changes, but the user-facing documentation updated.

### How was this patch tested?

CI

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50100 from itholic/json_format_followup.

Authored-by: Haejoon Lee <haejoon.lee@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Feb 28, 2025
### What changes were proposed in this pull request?

This PR followups #50038 to update documentation following the fixed JSON format.

### Why are the changes needed?

To match the docs with real logging format

### Does this PR introduce _any_ user-facing change?

No API changes, but the user-facing documentation updated.

### How was this patch tested?

CI

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50100 from itholic/json_format_followup.

Authored-by: Haejoon Lee <haejoon.lee@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit b05f18b)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Pajaraja pushed a commit to Pajaraja/spark that referenced this pull request Mar 6, 2025
…ySparkLogger`

### What changes were proposed in this pull request?

This PR proposes to use appropriate structure of JSON format for `PySparkLogger` by updating `JSONFormatter`.

### Why are the changes needed?

To align the log structure with JVM side. Currently the "stacktrace" does not align.

### Does this PR introduce _any_ user-facing change?

The output JSON structure will include "stacktrace" in correct format:

**Before**
```json
{
  "ts": "2025-02-21 15:40:52,043",
  "level": "ERROR",
  "logger": "TestLogger",
  "msg": "Exception occurred",
  "context": {"user": "test_user_stacktrace"},
  "exception": {
    "class": "ValueError",
    "msg": "Test Exception",
    "stacktrace": [
      "Traceback (most recent call last):", "  File \"/spark/python/pyspark/logger/tests/test_logger.py\", line 115, in test_log_exception_with_stacktrace", "    raise ValueError(\"Test Exception\")", "ValueError: Test Exception"
    ]
  }
}
```

**After**
```json
{
  "ts": "2025-02-21 15:40:52,043",
  "level": "ERROR",
  "logger": "TestLogger",
  "msg": "Exception occurred",
  "context": {"user": "test_user_stacktrace"},
  "exception": {
    "class": "ValueError",
    "msg": "Test Exception",
    "stacktrace": [
      {
        "class": null,
        "method": "test_log_exception_with_stacktrace",
        "file": "/spark/python/pyspark/logger/tests/test_logger.py",
        "line": "115"
      }
    ]
  }
}
```

### How was this patch tested?

Added related UT

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#50038 from itholic/fix_json_format.

Lead-authored-by: Haejoon Lee <haejoon.lee@databricks.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Haejoon Lee <haejoon.lee@databricks.com>
Pajaraja pushed a commit to Pajaraja/spark that referenced this pull request Mar 6, 2025
### What changes were proposed in this pull request?

This PR followups apache#50038 to update documentation following the fixed JSON format.

### Why are the changes needed?

To match the docs with real logging format

### Does this PR introduce _any_ user-facing change?

No API changes, but the user-facing documentation updated.

### How was this patch tested?

CI

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50100 from itholic/json_format_followup.

Authored-by: Haejoon Lee <haejoon.lee@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants