Skip to content

pkg/cli: add --log-format flag to debug zip for parquet output#162005

Draft
Abhinav1299 wants to merge 1 commit intocockroachdb:masterfrom
Abhinav1299:parquet-log-debug-zip
Draft

pkg/cli: add --log-format flag to debug zip for parquet output#162005
Abhinav1299 wants to merge 1 commit intocockroachdb:masterfrom
Abhinav1299:parquet-log-debug-zip

Conversation

@Abhinav1299
Copy link
Copy Markdown
Contributor

@Abhinav1299 Abhinav1299 commented Jan 29, 2026

Previously, debug zip collected log files in text format only. For large clusters, node logs can constitute 60-80% of the total debug zip size. While text logs compress well with gzip (~9x compression), the format offers no queryability benefits and the compression ratio is limited by the row-oriented nature of text logs.

This change introduces an optional --log-format flag for the cockroach debug zip command that supports two values: "text" (default, preserving existing behavior) and "parquet". When parquet format is selected, log entries are written using Apache Parquet columnar storage with ZSTD compression. Parquet achieves ~92% compression on raw log data through columnar storage, dictionary encoding on repetitive fields (severity, channel, file paths), and delta encoding on timestamps. The resulting parquet files can be analyzed using SQL-based tools like DuckDB.

The implementation adds a new logParquetWriter that maps logpb.Entry fields to a 15-column parquet schema. The getLogFiles function in zip_per_node.go branches based on the format flag, calling either the existing FormatLegacyEntry for text or the new writeLogEntriesAsParquet for parquet output.

Part of: CRDB-59104
Epic: none
Release note (cli change): Added --log-format flag to cockroach debug zip command. Valid values are "text" (default) and "parquet". The parquet format uses columnar storage with ZSTD compression, reducing raw log size.

Screenshot 2026-01-29 at 6 25 37 PM Screenshot 2026-01-29 at 6 26 57 PM

Previously, debug zip collected log files in text format only. For large
clusters, node logs can constitute 60-80% of the total debug zip size.
While text logs compress well with gzip (~9x compression), the format
offers no queryability benefits and the compression ratio is limited by
the row-oriented nature of text logs.

This change introduces an optional `--log-format` flag for the `cockroach
debug zip` command that supports two values: "text" (default, preserving
existing behavior) and "parquet". When parquet format is selected, log
entries are written using Apache Parquet columnar storage with ZSTD
compression. Parquet achieves ~92% compression on raw log data through
columnar storage, dictionary encoding on repetitive fields (severity,
channel, file paths), and delta encoding on timestamps. The resulting
parquet files can be analyzed using SQL-based tools like DuckDB.

The implementation adds a new `logParquetWriter` that maps `logpb.Entry`
fields to a 15-column parquet schema. The `getLogFiles` function in
`zip_per_node.go` branches based on the format flag, calling either the
existing `FormatLegacyEntry` for text or the new `writeLogEntriesAsParquet`
for parquet output.

Part of: CRDB-59104
Epic: none
Release note (cli change): Added `--log-format` flag to `cockroach debug zip`
command. Valid values are "text" (default) and "parquet". The parquet format
uses columnar storage with ZSTD compression, reducing raw log size.
@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented Jan 29, 2026

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants