Skip to content

GH-3572: Bump thrift to 0.23#3589

Draft
steveloughran wants to merge 3 commits into
apache:masterfrom
steveloughran:pr/GH-3572-thrift-update
Draft

GH-3572: Bump thrift to 0.23#3589
steveloughran wants to merge 3 commits into
apache:masterfrom
steveloughran:pr/GH-3572-thrift-update

Conversation

@steveloughran
Copy link
Copy Markdown
Contributor

Rationale for this change

There's a new Thrift release out.

Changes include a fix for the CVE GHSA-526f-jxpj-jmg2
This is server side and only affect thrift javascript code. While parquet is unaffected, security scanner tools aren't necessarily going to be that nuanced.

What changes are included in this PR?

  • updated build files/scripts with thrift version declarations
  • updated references in README.md
  • Added instructions in README as to where to find the gpg/sha signatures and a link to the thrift team KEYS file.

Are these changes tested?

  • Expecting PR CI to do the tests.
  • It compiles!
  • I manually ran the modified wget command in the README to verify the path to the tarball is valid.

Are there any user-facing changes?

No

Closes #3572

Bump thrift to 0.23

Added instructions in docs as to where to find the
gpg/sha signatures and a link to the thrift team KEYS file.
@steveloughran steveloughran marked this pull request as draft May 27, 2026 11:59
@steveloughran
Copy link
Copy Markdown
Contributor Author

new thrift is triggering NPEs in tests. That is bad

Error:  org.apache.parquet.hadoop.thrift.TestParquetToThriftReadWriteAndProjection.testPullInRequiredLists -- Time elapsed: 0.027 s <<< ERROR!
org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file file:/home/runner/work/parquet-java/parquet-java/parquet-thrift/target/test/TestParquetToThriftReadWriteAndProjection/file.parquet
	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:280)
	at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
	at org.apache.parquet.hadoop.thrift.TestParquetToThriftReadWriteAndProjection.shouldDoProjection(TestParquetToThriftReadWriteAndProjection.java:375)
	at org.apache.parquet.hadoop.thrift.TestParquetToThriftReadWriteAndProjection.shouldDoProjectionWithThriftColumnFilter(TestParquetToThriftReadWriteAndProjection.java:337)
	at org.apache.parquet.hadoop.thrift.TestParquetToThriftReadWriteAndProjection.testPullInRequiredLists(TestParquetToThriftReadWriteAndProjection.java:301)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Caused by: java.lang.NullPointerException
	at org.apache.thrift.protocol.TProtocol.incrementRecursionDepth(TProtocol.java:59)
	at org.apache.parquet.thrift.test.RequiredListFixture$RequiredListFixtureStandardScheme.read(RequiredListFixture.java:414)
	at org.apache.parquet.thrift.test.RequiredListFixture$RequiredListFixtureStandardScheme.read(RequiredListFixture.java:410)
	at org.apache.parquet.thrift.test.RequiredListFixture.read(RequiredListFixture.java:345)
	at org.apache.parquet.thrift.TBaseRecordConverter$1.readOneRecord(TBaseRecordConverter.java:63)
	at org.apache.parquet.thrift.TBaseRecordConverter$1.readOneRecord(TBaseRecordConverter.java:58)
	at org.apache.parquet.thrift.ThriftRecordConverter.getCurrentRecord(ThriftRecordConverter.java:945)
	at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:439)

But it means: if you don't supply a TTransport to TProtocol ctor
you hit an NPE during IO whenever those limits are validated.

Add a StubTTransport which no-ops the read range check and
declares there's no limit on depth checking.
@steveloughran
Copy link
Copy Markdown
Contributor Author

steveloughran commented May 27, 2026

NPE cause is THRIFT-5916 Bring Java recursion limit support at parity with C++;
apache/thrift#3287

commit #66dae3f is the "least change to existing code" patch, not necessarily the most elegant. It provides enough of a stub class to stop thrift blowing up.

The alternative is for ParquetProtocol and BufferedProtocolReadToWrite.NullProtocol to

  • Implement incrementRecursionDepth()/decrementRecursionDepth() as no-ops
  • Override all the implementations of checkReadBytesAvailable() to be no-ops.

Good: less stubbing of thrift internals
Bad:

  • duplicate implementation of no-op methods. They're only no-op methods though...
  • the transport is still null, so risk of NPEs surfacing if new checks are added.

Either way there's a risk of regressions if they add more checks in future releases.

There's also the open issue "should there be a depth limit?"; we're adding one when hardening variants after all. But there we can be fairly confident there are no large datasets using the type.

If thrift cpp enforces a depth of 64, then so will parquet-cpp, won't it? In which case modifying this PR to use the default depth of 64 would be consistent.

Copy link
Copy Markdown
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @steveloughran for fixing this 🙌

* A stub transport which implements the minimum amount needed for range/depth
* validation within the thrift library to succeed.
*/
public final class StubTTransport extends TTransport {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this package private?

/**
* There's no limits on recursion depth.
*/
private final TConfiguration conf = new TConfiguration(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this one static

@Fokko Fokko added this to the 1.18.0 milestone May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update thrift to 0.23.0 to eliminate warnings about CVE-2026-43870

2 participants