OPENNLP-1816: Make ME classes thread-safe by eliminating shared mutable instance state#1003
Conversation
2a904dd to
729b9c1
Compare
|
There were 3 checkstyle violations - fixed those. |
|
@krickert Thanks for the PR! |
|
always been a fan of OpenNLP - what I love about finally contributing is that before this patch, I was having to create pools of ME objects or create new ones every time. This gets rid of all that scaffolding. If you would like me to create anymore tests, let me know. I think the new tests cover the concurrency and recall use cases well. And I think the speed tests show that there's no concern about performance. I was excited to see the > 1.5x speedup with POSTagger... It's the single reason why I decided to work on this. |
|
Hi, Thanks for the contribution! Overall, I like the idea of looking into built-in thread safety rather than relying on ThreadLocal-based wrappers, which have known issues in Jakarta EE and other long-lived thread environments. A few concerns I'd like to discuss before this can move forward (imho):
The benchmarks are hand-rolled System.nanoTime() + ExecutorService loops. Without JMH, the results are susceptible to JIT warmup, GC pauses, and profile pollution, i.e. there's no fork isolation, no warmup iterations, and no statistical variance reporting. For a change that removes multiple
Three layers of caching were removed as a shortcut to thread safety:
The regression benchmark reports "performance within noise," but without JMH-level statistical rigor that's hard to verify. More importantly, the benchmark uses a small set of short sentences: a benchmark against a real-world dataset (e.g., from the eval/test corpora: https://nightlies.apache.org/opennlp/) would be far more convincing, particularly for POS tagging where the feature generation cache had the most impact under larger workloads. A thread-safe alternative would be making the caches method-local rather than removing them entirely.
|
|
Regardless of my comment, I am going to trigger a Eval build for this: https://ci-builds.apache.org/job/OpenNLP/job/eval-tests-configurable/39/ |
|
@rzo1 working on addressing all of your concerns right now - it'll be done in a moment. I'm restoring the caches and running tests with and without with proper benchmarks. All great points, and thanks for the feedback. |
729b9c1 to
94ca28d
Compare
|
I'm going to make the caches optional and configurable. This way we can run tests against all scenarios and come up with as many uses cases as needed to measure the impact. The last commit was premature, I'm still working on this. |
@krickert Thx Kristian for tackling this complex topic with so much energy! Much appreciated! Happy to review this PR deeper, especially lkn fwd for the JMH analyses. Richard has already given deep feedback in the first round; I'll share my 2c later on code stylistic nuances, seeking an optimal result from devs perspective. For the moment, completing the 3.0.0-M2 release process is on my list… |
|
@mawiesne no problem... I've been thinking about this for awhile now. @rzo1 you were right about CachedFeatureGenerator. The data shows it clearly and it helps. That particular cache in the old vs new instances to bring a 1.6x boost. This combined with the thread safety feature with reuse show over a 2x increase now. Thanks for pointing that out. But don't trust what I say; I'll update the tests shortly to show it (I would love to see it on another machine too) |
bfc8fdf to
d31aaa6
Compare
|
Thanks for the detailed feedback. We've addressed all four points made by @rzo1 . Here's a summary of what changed and the JMH data behind each decision. 1. Benchmarks (JMH)Replaced all hand-rolled
Also fixed the existing JMH profile - the annotation processor wasn't wired into the compiler plugin, so the Approaches measured
JMH Results (32 threads, all cores)
Tokenizer and SentenceDetector: all approaches within error bars (lightweight constructors). 2. CachesWe restored all caches as ThreadLocal (per-thread, not shared). Same behavior as the originals in single-threaded use, safe under concurrency. We also added a JMH Cache Impact Results (POSTagger, 32 threads)
This told us which caches matter and which don't:
Regarding the BeamSearch cache specifically
We restored it as ThreadLocal with per-thread 3. Thread-safety testsAddressed all sub-points:
4. Missing ME classesAll 7 ME classes are now covered:
All 7 ME classes are annotated 5. ThreadSafe*ME wrappers deprecatedSince the ME classes are now themselves thread-safe, the
We also replaced all internal usages of
No internal code uses the wrappers anymore. Open item
Agreed - this would strengthen the perf claims. The JMH benchmarks currently use the project's test data ( Do you have any real-world dataset tests around that we can run it against quickly? It's the only way I'd feel confident as well. |
|
Summary since first review: Made all 7 ME classes thread-safe by eliminating shared mutable instance state. Deprecate the MotivationME classes were documented as not thread-safe due to mutable instance fields that corrupt under concurrent access. The workarounds were creating a new ME instance per call (expensive) or using ApproachMutable state moved to method-local variables or per-thread caches (ThreadLocal) at every layer:
Files changed (30 total)Source (13 files): TokenizerME, SentenceDetectorME, POSTaggerME, LemmatizerME, ChunkerME, NameFinderME, LanguageDetectorME, BeamSearch, CachedFeatureGenerator, ConfigurablePOSContextGenerator, DefaultPOSContextGenerator, DefaultSDContextGenerator, SentenceContextGenerator (Thai) Deprecated (7 files): ThreadSafeTokenizerME, ThreadSafeSentenceDetectorME, ThreadSafePOSTaggerME, ThreadSafeLemmatizerME, ThreadSafeChunkerME, ThreadSafeNameFinderME, ThreadSafeLanguageDetectorME Internal usage swaps (3 files): Muc6NameSampleStreamFactory, TwentyNewsgroupSampleStreamFactory, POSTaggerMEIT - replaced Tests/benchmarks (5 files): ThreadSafetyBenchmarkTest (8 JUnit tests), 3 JMH benchmarks, CachedFeatureGeneratorTest update Build (1 file): pom.xml - fixed JMH annotation processor wiring |
d31aaa6 to
b02c2eb
Compare
|
@mawiesne - I did a push again to make the code try to match the style better - the problem I had was that your CICD failed linting and forced me to do 80-column code - which makes part of the code look ugly if not for my IDE. Can you ease up on the linting to make it 120 or 140 columns? or is that too much? I don't care either way, it's just a setting on my IDE - but the code in there has 3000+ violations - so I don't suspect it's really been enforced for a long time. |
|
Note: You can use the OpenNLP Formatting XML which is provided as download. In addition, you only have a few fixes: |
Oh cool! Thanks. I'll fix those today |
…le state All 7 ME classes (TokenizerME, SentenceDetectorME, POSTaggerME, LemmatizerME, ChunkerME, NameFinderME, LanguageDetectorME) are now safe for concurrent use from multiple threads. The ThreadSafe*ME wrappers are deprecated — use the ME classes directly. Thread-safety approach: - ME instance fields (bestSequence, tokProbs, newTokens, sentProbs) changed to volatile with method-local processing, atomic swap at end - BeamSearch: probs[] buffer and contextsCache moved to per-thread state via ThreadLocal - CachedFeatureGenerator: cache moved to per-thread state via ThreadLocal (JMH confirms 1.62x benefit from this cache) - ConfigurablePOSContextGenerator: cache moved to per-thread state via ThreadLocal - DefaultSDContextGenerator: buf/collectFeats moved to method-local JMH benchmark results (32 threads): - POSTagger instancePerThread: 2.52x faster than newInstancePerCall - POSTagger cache on vs off: no measurable difference for context generator cache; CachedFeatureGenerator provides 1.62x benefit - Tokenizer/SentenceDetector: all approaches within error bars API changes: - All 7 ME classes annotated @threadsafe - All 7 ThreadSafe*ME wrappers annotated @deprecated(since="3.0.0") - POSTaggerME: added constructor with contextCacheSize parameter - CachedFeatureGenerator: added DISABLE_CACHE_PROPERTY for benchmarking - Internal usages of ThreadSafe*ME replaced with direct ME usage Tests: - ThreadSafetyBenchmarkTest: 8 JUnit tests with CyclicBarrier (all 7 ME classes + probs() concurrency test) - JMH benchmarks for Tokenizer, SentenceDetector, POSTagger - Fixed JMH annotation processor config in pom.xml - All 680 runtime + 352 formats tests pass
b02c2eb to
178386f
Compare
|
Fixed.. let me know if there's more tests you'd like me to do. I think between the benchmarks, passing tests, and harness, it seems like a great use case. |
|
@atarora @jzonthemtn @rzo1 @mawiesne Just curious - what are the next steps? I'm not in a rush - just not familiar with the cadence for reviews with this repo. I'm most curious if I ran the tests enough to convince others of the advantages or if more testing is required so clarify the speedup factor and correctness of the solution. Suggest anything to help ease the review. I'm also open for a discord / any platform chat if that makes it easier too. Moving to a thread-safe approach should make it a lot easier to code against - I frequently forgot that it's not thread safe over the years and had to redo the same strategies and seeing the speed up has made me excited to contribute more in the future. I'll also update the documentation once it's approved. |
There was a problem hiding this comment.
The updated PR addresses my initial review concerns well. The ThreadLocal leak trade-off (1.) and the missing null guard in LemmatizerME (2.) are the most important items to fix. The rest are suggestions for polish from my side.
- ThreadLocal: The PR description mentions that ThreadSafe*ME wrappers "leak in Jakarta EE / long-lived thread environments" due to ThreadLocal.. This is better now, but not leak-free: in container environments with classloader isolation, any ThreadLocal holding objects from the app classloader can pin the classloader. Worth a Javadoc note on cleanup expectations.
- LemmatizerME.predictSES() missing null guard: POSTaggerME.tag(), ChunkerME.chunk(), and NameFinderME.find() all add if (seq == null) guards after model.bestSequence(), but LemmatizerME.predictSES() does not. Should be consistent.
- DefaultSDContextGenerator.collectFeatures() signature change is API-breaking. The protected method collectFeatures() now takes two additional parameters (List collectFeats, StringBuilder buf). The SentenceContextGenerator subclass is updated, but any external subclass would break. Since this is targeting 3.0.0, this is probably acceptable, but worth calling out in migration notes.
- DefaultPOSContextGenerator cache removal: Unlike ConfigurablePOSContextGenerator which moved to ThreadLocal caches, DefaultPOSContextGenerator simply removes the cache entirely. Why?
- CachedFeatureGenerator.DISABLE_CACHE_PROPERTY: Using a system property (opennlp.cache.disabled) as a global toggle is a bit coarse. It's read once at construction time, so at least it's not checked per-call. Acceptable for benchmarking purposes, but the property name is generic: consider opennlp.featuregen.cache.disabled to avoid confusion with other caches in the system.
- Removed toString() from CachedFeatureGenerator: The old toString() included cache hit/miss statistics. Since stats are gone, toString() was removed entirely. This is fine, but if anyone was logging these instances, they'll now get the default Object.toString().
- The main() convenience runners in each benchmark class set .forks(0), which runs benchmarks in the same JVM (no fork isolation). The class-level @fork(2) annotation is correct for mvn exec:java invocations, but someone running main() directly will get non-isolated results. Consider a comment explaining this is for quick iteration only.
- The current JMH benchmarks train on the small bundled test corpora, which is fine for correctness and relative comparison. I'd love to see some JMH numbers from a run against a real-world dataset (e.g., the pre-trained en models from the OpenNLP website) to get a sense of the absolute throughput characteristics and how the speedup scales with larger, production-representative model. Maybe you can post a related JMH result?
I have also triggered an eval build: https://ci-builds.apache.org/job/OpenNLP/job/eval-tests-configurable/41/
|
OK - the tests didn't catch it but there was one more part that needed extra thread safe work:
I'll post the fix and describe the solution. |
Fix descriptionThe automated tests still looked green, but there was one more thread-safety gap we needed to close.
The fix has been pushed to the PR The important part for performance: we did not want to pay full |
…Local Track ownership by Thread#threadId() (long) instead of holding a volatile Thread reference, matching OwnerOrPerThreadState. Holding a strong reference to a worker thread in a long-lived component pins the thread's context classloader in container environments (Jakarta EE) - exactly the leak this class is designed to avoid. Worst case with ID-based tracking is that a recycled-id thread sees a stale ownerValue from a previous owner instead of null, which is no worse than the documented contract for get(). Also adds focused unit tests for both LastResultOwnerOrThreadLocal and OwnerOrPerThreadState (owner-fast-path, second-thread isolation, clear-and-reclaim, one-way multi-threaded transition, concurrent stress) and expands the class Javadoc to document the three thread-safety strategies used across the seven ME classes.
…olish - BeamSearch CacheState now stashes a per-thread tempScores buffer (length numOutcomes) instead of allocating a fresh double[] inside the inner beam loop on every cache hit/miss. Functionally equivalent; removes one allocation per beam step per token. - BeamSearch.close() Javadoc now spells out the per-thread cleanup contract: a single close() releases only the calling thread's CacheState, not every per-thread slot held by long-lived BeamSearch instances shared across pool threads. - POSTaggerME class Javadoc clarifies that sharing one tagger saves both memory and model load time (the dominant startup cost). - CachedFeatureGenerator class Javadoc calls out that the "is this still the same sentence?" check uses reference identity (tokens == prevTokens), so a freshly allocated String[] with the same contents is treated as a new sentence and triggers a cache miss + clear.
|
@mawiesne @rzo1 — pushed two review-follow-up commits, branch is now at
Full local @rzo1 — would you mind re-triggering eval-tests-configurable against the new tip |
rzo1
left a comment
There was a problem hiding this comment.
Please double check
opennlp-core/opennlp-runtime/src/main/java/opennlp/tools/util/featuregen/POSTaggerNameFeatureGenerator.java:40-69
private String[] cachedTokens; // plain instance fields – no volatile, no ThreadLocal
private String[] cachedTags;
public void createFeatures(...) {
if (!Arrays.equals(this.cachedTokens, toks)) {
this.cachedTokens = toks;
this.cachedTags = this.posTagger.tag(toks);
}
feats.add("pos=" + this.cachedTags[index]); // race + potential AIOOBE
}
NameFinderME is annotated @ThreadSafe, but any model trained with the shipped feature template opennlp/tools/namefind/ner-pos-features.xml (and ner-pos-features-v15.xml, ner-en_pos-features.xml) pulls
this generator into the pipeline. Under concurrent find() calls:
- Thread A: cache miss, writes cachedTokens=toksA, cachedTags=tagsA (len=N).
- Thread B preempts: writes cachedTokens=toksB, cachedTags=tagsB (len=M where M<N).
- Thread A resumes: reads cachedTags[index] with index>=M → ArrayIndexOutOfBoundsException or wrong tag.
Could you check on DictionaryFeatureGenerator as well`? It's most likely write once, so not an actual issue, I guess.
NameFinderME:85-89 stores a fresh AdditionalContextFeatureGenerator per thread inside NameFinderState. That AFG has its own ThreadLocal<String[][]> (AdditionalContextFeatureGenerator:34) — so each AFG
instance is only ever used by one thread, making its inner ThreadLocal pure overhead. Functionally fine, but worth simplifying to a plain field.
Have triggered the eval build, but I think the first one needs a fix.
The cachedTokens / cachedTags fields were plain instance fields, which raced under concurrent NameFinderME.find() calls when the enclosing NameFinderME (now @threadsafe) was shared across threads. With models trained from the shipped feature templates (ner-pos-features.xml, ner-pos-features-v15.xml, ner-en_pos-features.xml) this generator is on the find() critical path, so the race could either return wrong tags or, on length-mismatched interleavings, throw ArrayIndexOutOfBoundsException (thread A stashes a longer cachedTags, thread B replaces it with a shorter one, thread A reads cachedTags[index] past the new bounds). Fix: - Move the per-sentence cache into a per-thread CacheState held in a ThreadLocal. Each thread now sees its own cachedTokens/cachedTags pair and indexes into a tag array that always belongs to the same sentence it just tagged. - Annotate the class @threadsafe and document the per-thread cache so the reader can see the contract at a glance. - Preserve the original Arrays.equals(cachedTokens, toks) cache-hit semantics; the only change is that the cache is now per-thread. Test: testConcurrentCreateFeaturesIsThreadSafe stress-tests the original failure shape - many threads, sentences of differing lengths, hundreds of iterations on one shared generator. Verified that this test fails on the unfixed class (1 error: AIOOBE inside createFeatures) and passes after the fix.
…safe The isg field was a plain (non-final, non-volatile) reference. In normal use it is set once at construction time via setDictionary() and never replaced, but both the constructor write and any later setDictionary() write needed a synchronization edge to be visible to other threads on the createFeatures() read side. - Mark isg volatile so both the one-shot constructor write and any later setDictionary() call publish safely. - Annotate the class @threadsafe; the underlying InSpanGenerator is already @threadsafe so the delegating createFeatures() is now concurrent-safe. - createFeatures() reads isg into a local before delegating, so it is immune to a setDictionary() racing with an in-flight call (the call finishes against whichever dictionary it observed first). - Document that setDictionary() is intended for setup time, not the hot path: it does not coordinate with in-flight reads beyond the volatile publish, so callers swapping dictionaries while createFeatures() runs on other threads may observe either the old or the new dictionary's features.
Previously NameFinderME stored a ThreadLocal<NameFinderState>, where each per-thread NameFinderState held its own freshly-allocated AdditionalContextFeatureGenerator (AFG). AFG itself keeps the per-thread additional-context array via its own ThreadLocal<String[][]>, so each per-thread AFG instance was only ever touched by one thread - making the inner ThreadLocal pure overhead and the outer per-thread allocation redundant. Refactor: - Replace ThreadLocal<NameFinderState> with one shared AdditionalContextFeatureGenerator field on NameFinderME (one per instance, not per thread). The AFG's existing internal ThreadLocal handles per-thread context just as before, with no nesting. - Replace the bestSequence slot in NameFinderState with LastResultOwnerOrThreadLocal<Sequence>, matching the pattern POSTaggerME / SentenceDetectorME / TokenizerME already use. This gives single-threaded short-lived NameFinderME instances the owner-fast-path (no ThreadLocal map entry at all until a second thread shows up), and keeps multi-thread callers correct. - Drop the anonymous AdaptiveFeatureGenerator wrapper that delegated each call to the per-thread AFG; the WindowFeatureGenerator now wraps the shared AFG directly. AdditionalContextFeatureGenerator: add clearForCurrentThread() so NameFinderME.clearThreadLocalState() can also release the AFG's per-thread slot, completing the per-thread cleanup contract used elsewhere in the PR. NameFinderME.clearThreadLocalState() Javadoc rewritten to spell out that this is a per-thread, not per-instance, operation - same lifecycle contract as on the other ME classes - and that it does not reach into BeamSearch / other feature generator per-thread state.
|
@mawiesne @rzo1 — thanks for the review, all three concerns addressed in three follow-up commits, branch tip is now 1.
|
rzo1
left a comment
There was a problem hiding this comment.
I have triggered the eval build from the current branch head (again).
From my side, the changes looks ok, although I think that it cannot be backported to 2.x due to usage of Thread API introduced in Java 17+
Since this is a substantial contribution, it might require an ICLA before we can move forward. WDYT @jzonthemtn ?
atarora
left a comment
There was a problem hiding this comment.
Thanks @krickert for the great contribution and the thorough iteration, and to @rzo1 and @mawiesne for thw detailed reviews that made this stronger
Looks good to me too! :)
before we merge:
- echo @rzo1 ICLA question
- and doc update would be great, especially around the ThreadSafe*ME deprecation and new shared usage pattern
|
@atarora let me know where you'd want the documentation update - I can work on that too. |
Perhaps just update the existing docs within this PR - so we are complete :) |
- Replace the outdated "NameFinderME is not thread safe" paragraph with positive guidance for sharing a single instance across threads. - Add a one-line thread-safety note next to each *ME constructor in the per-component docs (POSTaggerME, ChunkerME, SentenceDetectorME, TokenizerME, LemmatizerME, NameFinderME), and note that the legacy ThreadSafe*ME wrappers are retained-but-deprecated. - Clarify that POSTaggerME.probs() and ChunkerME.probs() are now per-thread "last result" calls when the tagger is shared. - Bump the model-loading.xml note from "Java 17+" to "Java 21+ (the minimum supported version since OpenNLP 3.0.0)". - Add a short "Thread safety" subsection to the README's "Migrating from 2.x to 3.x" block.
Updated the documentation and got a receipt for the ICLA. So pending further reviews, we're good to go. |
|
Thanks @krickert - this is now in good shape and the 3.x line will have it in. Happy to see additional contributions from your side! |
|
@rzo1 thanks!! So happy to see it made it! Can you get me the ICLA request going? Would love to get my name on that contribution list that ASF maintains.. I'm certainly going to add some more bells and whistles. Looking forward to the next 3.x-M release. |
Summary
Make ME classes (TokenizerME, SentenceDetectorME, POSTaggerME, LemmatizerME) safe for concurrent use by eliminating shared mutable instance state. This enables reusing ME instances across threads instead of allocating a new instance per call, reducing allocation overhead in high-throughput pipelines.
The old pattern (
new TokenizerME(model)per call) continues to work identically — zero regressions in correctness or performance.Motivation
ME classes were documented as not thread-safe due to mutable instance fields (
bestSequence,tokProbs,newTokens,sentProbs) that corrupt under concurrent access. The recommended workaround was either creating a new ME instance per call (expensive for high-throughput pipelines processing thousands of sentences in parallel) or using theThreadSafe*MEwrappers (which useThreadLocaland leak in Jakarta EE / long-running thread environments).The root cause was mutable state at four layers:
contextsCache,wordsKey,buf,collectFeats)CachedFeatureGeneratorwith mutableprevTokensand cacheprobs[]output buffer and acontextsCachethat stored references to the reused buffer (cached values were always stale)Approach
Move mutable state to method-local variables at every layer. ME instance fields are preserved as
volatilefor backward-compatibleprobs()access (last-writer-wins under concurrency). Caches are removed entirely — they were small (size 3 typically), not thread-safe, and in BeamSearch's case, buggy.Files changed (10 source, 5 test)
BeamSearch.javaprobs[]and buggycontextsCache; added@ThreadSafeDefaultSDContextGenerator.javabuf/collectFeatsmoved to method-local;collectFeatures()signature updatedSentenceContextGenerator.java(Thai)collectFeatures()signatureDefaultPOSContextGenerator.javacontextsCacheandwordsKeyConfigurablePOSContextGenerator.javacontextsCacheandwordsKeyCachedFeatureGenerator.javaprevTokens,contextsCache, counters; delegates directlyTokenizerME.javanewTokens/tokProbsvolatile;tokenizePos()uses local listsSentenceDetectorME.javasentProbsvolatile;sentPosDetect()uses local listPOSTaggerME.javabestSequencevolatile;tag()uses local var; added null guardLemmatizerME.javabestSequencevolatile;predictSES()uses local varBackward compatibility
new ME(model)per call) is unchanged — verified by regression benchmarkprobs()methods preserved (deprecated behavior under concurrency, correct single-threaded)cacheSizeparams accepted but ignored, marked@Deprecated(since = "3.0.0"))Test plan
mvn teston opennlp-runtime)ThreadSafetyBenchmarkTest— JUnit correctness test: shared ME instances produce identical results to single-threaded baseline across all CPU coresRegressionBenchmark— head-to-head stock vs patched, new-instance-per-call only: zero mismatches, zero errors, performance within noise on both buildsThreadSafetyBenchmark— three-way comparison (new-instance-per-call / instance-per-thread / shared-single-instance)CachedFeatureGeneratorTest— updated for removed cache behaviormvn clean installat root (checkstyle must be skipped — 9,446 pre-existing violations on main)Regression benchmark results (32 threads, new-instance-per-call)
Proves zero regression — stock vs patched, same API pattern:
Speedup benchmark results (32 threads, three-way comparison)
Approaches
The benchmark compares three strategies for using ME classes in a multi-threaded environment. All three produce identical output for a given input — the difference is how ME instances are allocated and shared.
String[] tags = new POSTaggerME(model).tag(tokens);POSTaggerME tagger = new POSTaggerME(model);for (String[] t : sentences) tagger.tag(t);POSTaggerME shared = new POSTaggerME(model);// pass shared to all threadsBenchmark results
POSTagger sees the largest gain because its constructor is the heaviest — it builds a BeamSearch, a ConfigurablePOSContextGenerator, and a full AdaptiveFeatureGenerator chain on every instantiation. Reusing one instance per thread eliminates that allocation on every call, yielding a 1.67x speedup with zero correctness impact.
Tokenizer and SentenceDetector constructors are lighter, so the per-call overhead is smaller and all three approaches perform similarly.
See
opennlp-core/opennlp-runtime/BENCHMARKS.mdfor full benchmark instructions.Thank you for contributing to Apache OpenNLP.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
https://issues.apache.org/jira/browse/OPENNLP-1816
Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically main)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.