Skip to content

[SPARK-45423][SQL] Lower ParquetWriteSupport log level to debug#43230

Closed
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-45423
Closed

[SPARK-45423][SQL] Lower ParquetWriteSupport log level to debug#43230
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-45423

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 5, 2023

What changes were proposed in this pull request?

This PR aims to lower ParquetWriteSupport log level from INFO to DEBUG

Why are the changes needed?

Currently, ParquetWriteSupport is too verbose at INFO level because it dumps the Parquet file schema per file. Since this is the only log in ParquetWriteSupport, the users can see this via a proper log4j2.properties setting when they want to debug jobs.

23/10/05 16:29:43 INFO ParquetOutputFormat: ParquetRecordWriter [block size: 134217728b, row group padding size: 8388608b, validating: false]
23/10/05 16:29:43 INFO ParquetWriteSupport: Initialized Parquet WriteSupport with Catalyst schema:
{
 "type" : "struct",
 "fields" : [ {
   "name" : "id",
   "type" : "long",
   "nullable" : false,
   "metadata" : { }
 } ]
}
and corresponding Parquet message type:
message spark_schema {
 required int64 id;
}

      
23/10/05 16:29:43 INFO MagicCommitTracker: ...

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Oct 5, 2023
@dongjoon-hyun
Copy link
Member Author

Could you review this PR, @viirya ?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

@dongjoon-hyun
Copy link
Member Author

Thank you always, @viirya !

LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request Oct 7, 2023
### What changes were proposed in this pull request?

This PR aims to lower `ParquetWriteSupport` log level from INFO to DEBUG

### Why are the changes needed?

 Currently, `ParquetWriteSupport` is too verbose at INFO level because it dumps the Parquet file schema per file. Since this is the only log in `ParquetWriteSupport`,  the users can see this via a proper `log4j2.properties` setting when they want to debug jobs.
 ```
23/10/05 16:29:43 INFO ParquetOutputFormat: ParquetRecordWriter [block size: 134217728b, row group padding size: 8388608b, validating: false]
23/10/05 16:29:43 INFO ParquetWriteSupport: Initialized Parquet WriteSupport with Catalyst schema:
{
  "type" : "struct",
  "fields" : [ {
    "name" : "id",
    "type" : "long",
    "nullable" : false,
    "metadata" : { }
  } ]
}
and corresponding Parquet message type:
message spark_schema {
  required int64 id;
}

23/10/05 16:29:43 INFO MagicCommitTracker: ...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#43230 from dongjoon-hyun/SPARK-45423.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
### What changes were proposed in this pull request?

This PR aims to lower `ParquetWriteSupport` log level from INFO to DEBUG

### Why are the changes needed?

 Currently, `ParquetWriteSupport` is too verbose at INFO level because it dumps the Parquet file schema per file. Since this is the only log in `ParquetWriteSupport`,  the users can see this via a proper `log4j2.properties` setting when they want to debug jobs.
 ```
23/10/05 16:29:43 INFO ParquetOutputFormat: ParquetRecordWriter [block size: 134217728b, row group padding size: 8388608b, validating: false]
23/10/05 16:29:43 INFO ParquetWriteSupport: Initialized Parquet WriteSupport with Catalyst schema:
{
  "type" : "struct",
  "fields" : [ {
    "name" : "id",
    "type" : "long",
    "nullable" : false,
    "metadata" : { }
  } ]
}
and corresponding Parquet message type:
message spark_schema {
  required int64 id;
}

23/10/05 16:29:43 INFO MagicCommitTracker: ...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#43230 from dongjoon-hyun/SPARK-45423.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-45423 branch November 4, 2023 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants