Cache local JAR file in ProtoBufCodeGenMessageDecoder to eliminate redundant remote fetches by rseetham · Pull Request #18233 · apache/pinot

rseetham · 2026-04-16T04:54:06Z

Why

ProtoBufCodeGenMessageDecoder.init() is called once per consuming segment creation — once per topic partition every time a segment rolls over, and once per partition on server restart. Each call unconditionally fetched the protobuf schema JAR from remote storage (S3, HDFS, etc.) via ProtoBufUtils.getFileCopiedToLocal(), which copies the JAR into a new timestamped temp directory every time. The JAR only changes when the table's decoder config is updated, so in normal operation every fetch after the first is unnecessary network I/O.
Additionally, if jar is fetched by object store and that connection is broken, ingestion stops right now. With this fix, ingestion will continue based on the cached copy.

What
Introduce a JVM-level ConcurrentHashMap<String, CachedJar> keyed by topicName. CachedJar stores the remote JAR path and the local File it was copied to. On every init():

Cache hit (same jarPath): return the cached local file immediately — no network call. Codegen and Janino compilation still run fresh per init(), which is correct because fieldsToRead can differ between decoder instances for the same topic.
jarPath changed (config update): fetch the new JAR, replace the cache entry.
Fetch failure with a stale entry: log an error and return the previously cached local file so segment creation succeeds rather than failing on a transient network issue. Rows decoded with a stale schema during this window are made explicit in the error log.

The URLClassLoader created to load the proto class is closed after the compiled Method is extracted, releasing the file handle immediately rather than accumulating them across segment rollovers.

How it behaves in each lifecycle event
Normal segment rollover: init() hits the cache, skips the remote fetch, runs codegen + Janino in memory (~ms), and returns. Each segment manager thread runs its own init() in parallel — no serialization across topics.

New table creation: First init() for that topicName — cache miss, JAR is fetched and cached. Subsequent segments for the same table hit the fast path.

Decoder config update (new JAR deployed): Next init() sees cached._jarPath != newJarPath, fetches the new JAR, replaces the cache entry.

Server restart: All cache entries are gone (JVM-level cache). Each partition's first init() after restart fetches the JAR once; subsequent rollovers hit the cache.

Tests

Existing behavioral tests are unchanged and continue to pass.
Added testCacheHit: two decoders initialized for the same topic and JAR both decode correctly, exercising the cache-hit path.
Added testStaleFallbackOnFetchFailure: a decoder initialized with an unreachable JAR path falls back to the previously cached local file and
decodes correctly.

🤖 Generated with Claude Code

…ant remote fetches

Copilot

Pull request overview

This PR optimizes protobuf decoder initialization in pinot-protobuf by caching the locally-copied schema JAR to avoid repeated remote fetches during consuming segment rollovers, while adding tests for cache hit and fetch-failure fallback behavior.

Changes:

Add a JVM-level JAR cache and a resolveJar() path in ProtoBufCodeGenMessageDecoder.init().
Close the per-init URLClassLoader after codegen/Janino compilation.
Add unit tests covering cache-hit behavior and stale fallback on fetch failure.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
`pinot-plugins/pinot-input-format/pinot-protobuf/src/main/java/org/apache/pinot/plugin/inputformat/protobuf/ProtoBufCodeGenMessageDecoder.java`	Introduces local JAR caching + fetch-failure fallback and classloader lifecycle updates.
`pinot-plugins/pinot-input-format/pinot-protobuf/src/test/java/org/apache/pinot/plugin/inputformat/protobuf/ProtoBufCodeGenMessageDecoderTest.java`	Adds tests for cache hit and stale fallback; clears cache between tests.
`pinot-plugins/pinot-input-format/pinot-protobuf/src/test/java/org/apache/pinot/plugin/inputformat/protobuf/ProtoBufUtilsTest.java`	Updates test classloading approach for descriptor lookup.

Copilot · 2026-04-16T05:39:34Z

    URL jarFile = getClass().getClassLoader().getResource("complex_types.jar");
-    ClassLoader clsLoader = ProtoBufCodeGenMessageDecoder.loadClass(jarFile.getPath());
+    ClassLoader clsLoader = new URLClassLoader(new URL[]{jarFile});
    Descriptors.Descriptor desc = ProtoBufCodeGenMessageDecoder.getDescriptorForProtoClass(clsLoader,
        "org.apache.pinot.plugin.inputformat.protobuf.ComplexTypes$TestMessage");


URLClassLoader implements Closeable, but this test never closes it. Use try-with-resources to avoid leaking an open JAR handle across the test suite (especially since this PR is explicitly addressing classloader/JAR handle churn).

Copilot · 2026-04-16T05:39:34Z

+    File localFile = resolveJar(topicName, jarPath);
+    URLClassLoader loader = new URLClassLoader(new URL[]{localFile.toURI().toURL()});
+    Descriptors.Descriptor descriptor = getDescriptorForProtoClass(loader, protoClassName);
    String codeGenCode = new MessageCodeGen().codegen(descriptor, fieldsToRead);
-    Class<?> recordExtractor = compileClass(protoMessageClsLoader,
+    Class<?> recordExtractor = compileClass(loader,
        MessageCodeGen.EXTRACTOR_PACKAGE_NAME + "." + MessageCodeGen.EXTRACTOR_CLASS_NAME, codeGenCode);
    _decodeMethod = recordExtractor.getMethod(MessageCodeGen.EXTRACTOR_METHOD_NAME, byte[].class, GenericRow.class);
+    loader.close();


URLClassLoader loader is only closed on the happy path. If getDescriptorForProtoClass(), codegen(), compileClass(), or getMethod() throws, the classloader (and underlying JAR handle) will leak. Use try-with-resources (or a finally) so the loader is always closed.

Copilot · 2026-04-16T05:39:35Z

+  private static File resolveJar(String topicName, String jarPath)
+      throws Exception {
+    CachedJar cached = JAR_CACHE.get(topicName);
+    if (cached != null && cached._jarPath.equals(jarPath) && cached._localFile.exists()) {
+      return cached._localFile;


The cache is keyed only by topicName. Pinot can have multiple tables consuming the same topic (note other components key by (table, topic)), and in that case a fetch failure could cause this decoder to fall back to a cached JAR from a different table/config, risking silent schema mismatch and bad ingestion. Consider including at least protoClassName (and/or jarPath) in the cache key, or restricting stale fallback so it cannot reuse an entry from a different configuration scope.

For a given kafka topic, the data can only be in 1 format. So the jar is the same to read from it. It the schema of different tables is different only those fields will be extracted per table. Here we are only caching the jar so that we don't keep fetching all the time.

Copilot · 2026-04-16T05:39:35Z

+    }
+    try {
+      File localFile = ProtoBufUtils.getFileCopiedToLocal(jarPath);
+      localFile.getParentFile().deleteOnExit();


localFile.getParentFile().deleteOnExit() won’t delete the temp directory at JVM exit because the directory is non-empty (the JAR file remains). If you want cleanup-on-exit, register localFile.deleteOnExit() first and then register the directory, or otherwise perform recursive cleanup.

Suggested change

localFile.getParentFile().deleteOnExit();

localFile.deleteOnExit();

File parentFile = localFile.getParentFile();

if (parentFile != null) {

parentFile.deleteOnExit();

}

Copilot · 2026-04-16T05:39:35Z

+      if (cached != null && cached._localFile.exists()) {
+        LOGGER.error("Failed to fetch JAR for topic '{}' from '{}', reusing stale local copy from '{}'. "
+                + "Rows decoded with the stale schema may be incorrect if the schema has changed.",
+            topicName, jarPath, cached._jarPath, e);


The error log says it is "reusing stale local copy from '…'" but the value being logged is cached._jarPath (remote path), not the local file path. This makes troubleshooting difficult; log cached._localFile (and optionally also log the stale remote path separately).

Suggested change

topicName, jarPath, cached._jarPath, e);

topicName, jarPath, cached._localFile, e);

codecov-commenter · 2026-04-16T05:42:57Z

Codecov Report

❌ Patch coverage is 89.28571% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.36%. Comparing base (87e09fd) to head (e9f3f4f).

Files with missing lines	Patch %	Lines
...format/protobuf/ProtoBufCodeGenMessageDecoder.java	89.28%	1 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##             master   #18233   +/-   ##
=========================================
  Coverage     63.36%   63.36%           
  Complexity     1627     1627           
=========================================
  Files          3243     3243           
  Lines        197038   197054   +16     
  Branches      30466    30468    +2     
=========================================
+ Hits         124845   124856   +11     
- Misses        62195    62204    +9     
+ Partials       9998     9994    -4

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (ø)`
integration	`100.00% <ø> (ø)`
integration1	`100.00% <ø> (ø)`
integration2	`0.00% <ø> (ø)`
java-11	`63.32% <89.28%> (+0.02%)`	⬆️
java-21	`63.31% <89.28%> (-0.02%)`	⬇️
temurin	`63.36% <89.28%> (+<0.01%)`	⬆️
unittests	`63.35% <89.28%> (+<0.01%)`	⬆️
unittests1	`55.32% <ø> (+<0.01%)`	⬆️
unittests2	`34.94% <89.28%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

xiangfu0

Found one high-signal ingestion correctness risk; see inline comment.

xiangfu0 · 2026-04-16T12:06:31Z

+  private static File resolveJar(String topicName, String jarPath)
+      throws Exception {
+    CachedJar cached = JAR_CACHE.get(topicName);
+    if (cached != null && cached._jarPath.equals(jarPath) && cached._localFile.exists()) {


This changes the rollout semantics from 're-fetch the protobuf JAR on every init' to 'reuse it indefinitely while the URI string stays the same'. Pinot deployments often replace the decoder JAR in place at the same S3/HDFS path during schema rollouts; after this cache hits once, long-lived servers will keep decoding with the old generated classes until restart, which can silently ingest rows with the wrong schema. We need either a freshness/version check here or an explicit versioned-URI contract before making the cached file authoritative.

I dabbled with having a background job refresh the jar fetch every hour. The issue is the plugin module does not have access to the server executor service so I'll have to create and manage it here. So I don't think this is a good solution.

Another solution is having a cache with a ttl of an hour/ some configured value (server property) That would also force a periodic fetch. The issue here is if you set the segment completion time to the same time as the cache expiration, all segments completed at the same time so they would wait for the jar fetch anyway. Users would have to set this more carefully. But this solves the problem. I'll add this and address the other smaller comments that were brought up.

Still the fundamental issue with both of these is the only way to force a fetch is a restart in case the jar was changed in place. At the moment a table force commit will force a fetch. During incidents saying this will take 1 hr will be an issue.

Is there another solution you'd suggest?

Cache local JAR file in ProtoBufCodeGenMessageDecoder to avoid redund…

e9f3f4f

…ant remote fetches

xiangfu0 requested review from 9aman, Jackie-Jiang, Copilot, swaminathanmanish and xiangfu0 April 16, 2026 05:35

xiangfu0 added ingestion Related to data ingestion pipeline plugins Related to the plugin system labels Apr 16, 2026

Copilot started reviewing on behalf of xiangfu0 April 16, 2026 05:36 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

xiangfu0 reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache local JAR file in ProtoBufCodeGenMessageDecoder to eliminate redundant remote fetches#18233

Cache local JAR file in ProtoBufCodeGenMessageDecoder to eliminate redundant remote fetches#18233
rseetham wants to merge 1 commit intoapache:masterfrom
rseetham:protobuf-jar-cache

rseetham commented Apr 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

rseetham Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

codecov-commenter commented Apr 16, 2026 •

edited

Loading

Uh oh!

xiangfu0 left a comment

Uh oh!

xiangfu0 Apr 16, 2026

Uh oh!

rseetham Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-      localFile.getParentFile().deleteOnExit();
+      localFile.deleteOnExit();
+      File parentFile = localFile.getParentFile();
+      if (parentFile != null) {
+        parentFile.deleteOnExit();
+      }

	topicName, jarPath, cached._jarPath, e);
	topicName, jarPath, cached._localFile, e);

Conversation

rseetham commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

rseetham Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

xiangfu0 left a comment

Choose a reason for hiding this comment

Uh oh!

xiangfu0 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

rseetham Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rseetham commented Apr 16, 2026 •

edited

Loading

codecov-commenter commented Apr 16, 2026 •

edited

Loading

rseetham Apr 16, 2026 •

edited

Loading