[issue #11796] throw NPE when readEntry#11808
Closed
casuallc wants to merge 211 commits intoapache:masterfrom
casuallc:branch-2.8
Closed
[issue #11796] throw NPE when readEntry#11808casuallc wants to merge 211 commits intoapache:masterfrom casuallc:branch-2.8
casuallc wants to merge 211 commits intoapache:masterfrom
casuallc:branch-2.8
Conversation
#10861) This reverts commit 4264a67. ### Motivation The change #8796 has broken the Pulsar Functions running on Kubernetes. The Pulsar Functions Kubernetes runtime generates a secret and mounts it using mode `256`. That means the secret is only able to read by the user. The StatefulSet created by Kubernetes runtime mounts the secrets under the `root` user. Hence only the root user is able to read the secret. This results in any functions submitted will fail to read the authentication information. Because all the Kubernetes resources generated by the Kubernetes runtime are hardcoded. There is no easy way to change the security context for the function statefulsets. Let's revert this change for 2.8.0, until we can address the issues in the Kubernetes runtime. (cherry picked from commit 4f556a2)
(cherry picked from commit d93b775)
…ss brokers (#10862) * Added metadata cache test to simulate multi broker cache * fix create and delete ops on cache 1. During create we should add a watch for the path in zookeeper. Without this we will not be notified if the znode is changed on another brokers 2. similarly when deleting, the cache should be invalidated. But we shouldn't add an entry to the cache. This could get added again on some other broker. In that case we need to go a fetch the entry from the zookeeper. Adding an empty entry to the cache prevents that. * Address review comments Also add a small delay in test to allow notifications to propagate to other caches. Without this the tests are occasionally failing Co-authored-by: Surinder Singh <surinders@splunk.com> (cherry picked from commit 798b34f)
Fixes #10882 ### Motivation CmdSink and CmdSource uses `gson` to parse the JSON configs from pulsar-admin. But most of connectors are using ObjectMapper to serde the config into actual class. `gson` will also convert int/long value into float by default, which will lead ObjectMapper cannot parse float string into int/long correctlly. ### Modifications use ObjectMapper to parse sink/source config. (cherry picked from commit 2c9ea81)
*Motivation* Generating metrics is an expensive task and it can take more than 30 seconds if you have close to or more than a million topics. *Modification* This change provides a setting to adjust the async context timeout. (cherry picked from commit c75d45b)
(cherry picked from commit 3de089c)
(cherry picked from commit cf50da3)
Fixes #10906 Also addresses CVE-2021-28169 (cherry picked from commit 6c03154)
- Zookeeper 3.6.2 gets flagged as vulnerable https://ossindex.sonatype.org/component/pkg:maven/org.apache.zookeeper/zookeeper@3.6.2 because of using vulnerable Netty version (cherry picked from commit aa36eb8)
…r and make SchemaInfo an interface (#10878) ### Motivation The java-instance.jar generated by the pulsar-functions-runtime-all module should only contain interfaces that Pulsar Function's framework uses to interact with user code. The module should on have the following dependencies 1. pulsar-io-core 2. pulsar-functions-api 3. pulsar-client-api 4. slf4j-api 5. log4j-slf4j-impl 6. log4j-api 7. log4j-core *Explain here the context, and why you're making that change. What is the problem you're trying to solve.* ### Modifications Change dep pulsar-client-original to pulsar-client-api Slight changes in the top level pom for what is included in all sub-modules so that additional deps don't land into java-instance.jar There is also a fix for an issue introduced by #9673. The thread context class loader was set incorrectly in ThreadRuntime. ### Future improvements 1. We should also add a test in the future to make sure external libraries don't get add accidentally this module and java-instance.jar 2. Rename the module "pulsar-functions-runtime-all" to something that describes its function better. The current name can be confusing (cherry picked from commit d81b5f8)
* Use brace initializer to initialize configurations * Add config tests
### Motivation When broker entry metadata is enabled, using pulsar-admin to peek messages will fail with ``` [pulsar-web-29-16] ERROR org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null] Failed to peek message at position 1 from persistent://prop/ns-abc/topic-b604aad8ea8010af my-sub java.lang.IllegalArgumentException: Invalid unknonwn tag type: 6 at org.apache.pulsar.common.api.proto.LightProtoCodec.skipUnknownField(LightProtoCodec.java:270) ~[classes/:?] at org.apache.pulsar.common.api.proto.MessageMetadata.parseFrom(MessageMetadata.java:1370) ~[classes/:?] at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:426) ~[classes/:?] ``` ### Modifications Skip the broker entry metadata if exists in `generateResponseWithEntry` and add the test to verify it. (cherry picked from commit 7603a9d)
(cherry picked from commit 7417ca8)
fixes testExpiredLedgerDeletionAfterManagedLedgerRestart (cherry picked from commit 1623790)
When admin.topics().getDelayedDeliveryPolicy(topicName) is executed, topicPolicies may not be initialized yet This problem occurs when only a single unit test is executed. When the entire unit test is executed, it rarely appears, because other unit tests initialize the TopicPolicies under this Namespace (cherry picked from commit 96b293a)
* Do not expose meaningless stats for consumer. Currently we have exposed some meaningless consumer stats to users such as ``` addressLength addressOffset connectedSinceOffset connectedSinceLength clientVersionOffset clientVersionLength stringBuffer ``` All of these stats are not used by users but used internally. So remove these stats from the exposed consumer stats. * remove line (cherry picked from commit ae03e51)
## Motivation fix backlog issuse with --precise-backlog=true. Now when `managedLedger` create a new `ledger` complete. if `markDelete` is the `previousLedger` LAC it will delete the previousLedger from `managedLedger` . when get backlog we will use range.close to get `getNumberOfEntries` -1, if previousLedger not exist will get the wrong number.  ## implement use range.openClose() to `getBacklog`. ### Verifying this change Add the tests for it (cherry picked from commit e3a97ee)
### Motivation Support loading `awsCredentialPluginParam` from secrets for kinesis connectors to avoid leaking of sensitive information with best efforts. ### Modifications Load kinesis configs with `IOConfigUtils.loadWithSecrets()`. ### Verifying this change This change is already covered by existing tests and this PR also adds tests to confirm both loading credentials from plaintext config object and secrets work as expected. (cherry picked from commit cb79032)
### Motivation If we set enableRetry=true, the client will subscribe to the retry topic. If we set AllowAutoTopicCreation=false, the client will receive `Topic dose not exists`, which means that the retry Topic does not exists. This makes users very confused. My topic clearly exists, why is it still prompted like this? Therefore, the prompt information is improved to facilitate users to troubleshoot problems. ### Modifications add topic name to the returning message (cherry picked from commit e72a0d8)
Fixes # throwable exception not thrown. ### Motivation I found some throwable exception that not thrown, maybe it's a bug. (cherry picked from commit e0988f7)
…ting the number of partitions (#10910) Fixes #10673 ### Motivation When updating the number of partitions, we need to update the data in two places in zk: ``` /admin/partitioned-topics /managed-ledgers/ ``` Now we only update the number of partitions in `/admin/partitioned-topics`, so if we do not create a Producer or Consumer, the data obtained in another cluster will be incorrect ### Modifications 1)Try to create managedLedger when updating the number of partitions 2)Ensure that the number of partitions in `/admin/partitioned-topics` is updated every time (cherry picked from commit 202da11)
If a topic only have non-durable subscriptions but not durable subscriptions, and the non-durable subscription reach the end of the topic, we will get 0 estimated backlog size So that the compaction will never been triggered. The expected behavior is if we have no durable subscriptions, we should use the total size for triggering the compaction. (cherry picked from commit 797cb12)
#11691) * Fixed block forever bug in Consumer.batchReceive Ensure that all poll() calls to pendingBatchReceives is done within the pinnedInternalExecutor to avoid a race condition where a peek and a subsequent poll get different pending receives. Moved the pinnedInternalExecutor into the ConsumerBase as both ConsumerImpl and MultiTopicsConsumerImpl require it. failingPendingReceive() now always submits its work to the internal executor returning a CompletableFuture and all callers treat it as an asynchronous operation. * Fix broken MultiTopicsConsumerImplTest Needed a real executor service to run the failPendingReceive() method. * Ensure all calls to messageReceived happen on internal executor * Readd missing return statement in ConsumerImpl.closeAsync() * Ensure correct usage of consumer internal executors Ensure that the externalPinnedExecutor is only called for user code and internalPinnedExecutor used for internal tasks. Some test refactoring to manage creation of executors. (cherry picked from commit bd942e1)
…11741) Currently, if encounter the add entry failure, will call producer.disconnect() multiple times during the disconnecting the producer which will add many disconnect producer tasks to the EventLoop. 1. Added isDisconnecting state for the producer, if the producer in isDisconnecting state, skip the disconnect operation 2. Create new future list only the topic have producers to reduce the heap allocation Added test to cover disconnecting the producer multiple times, but the EventLoop only execute one time. (cherry picked from commit 49c0796)
* Fix the topic in fenced state and can not recover. Here is the log when the issue happens. The producer continues to reconnect to the broker, but the fenced state of the topic is always true. ``` 19:01:42.351 [pulsar-io-4-1] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.24.34.151:48052][persistent://public/default/test-8] Creating producer. producerId=8 19:01:42.352 [Thread-174681] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.24.34.151:48052] persistent://public/default/test-8 configured with schema false 19:01:42.352 [Thread-174681] WARN org.apache.pulsar.broker.service.AbstractTopic - [persistent://public/default/test-8] Attempting to add producer to a fenced topic 19:01:42.352 [Thread-174681] ERROR org.apache.pulsar.broker.service.ServerCnx - [/10.24.34.151:48052] Failed to add producer to topic persistent://public/default/test-8: producerId=8, org.apache.pulsar.broker.service.BrokerServiceException$TopicFencedException: Topic is temporarily unavailable ``` After check the heap dump of the broker, the `pendingWriteOps` is 5, this is the reason why the topic can not recover from the fenced state. The topic will change to unfenced only the `pendingWriteOps` is 0, details can find at [PersistentTopic.decrementPendingWriteOpsAndCheck()](https://github.com/apache/pulsar/blob/794aa20d9f2a4e668cc36465362d22e042e6e536/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L463) But after checking the ML state of the topic, it shows the `pendingAddEntries` is 0 which not equals to `pendingWriteOps` of the topic. The root cause is we are polling add entry op from the `pendingAddEntries` in multiple threads, one is the the ZK callback thread when complete the ledger creating (https://github.com/apache/pulsar/blob/794aa20d9f2a4e668cc36465362d22e042e6e536/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L1406, https://github.com/apache/pulsar/blob/794aa20d9f2a4e668cc36465362d22e042e6e536/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L1669), another one is the ML worker thread when complete the add entry op (https://github.com/apache/pulsar/blob/794aa20d9f2a4e668cc36465362d22e042e6e536/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java#L181) After the ongoing add entry op complete, but the corresponding op might been polled by the `clearPendingAddEntries` method. So it will poll another one, but due to not equals to the current op, the polled op will not get a chance to be failed, so that the `pendingWriteOps` will not change to 0. I have attached the complete logs for the topic: The fix is to complete the add entry op with ManagedLedgerException if the polled op is not equals to the current op. * Release buffer. * Revert (cherry picked from commit 1bcbab0)
…ned by an existed topic as a part (#11686) Fixes #11685 validatePartitionTopicUpdate use contain to check if there has a exist topic will cause conflict, which will cause a failed when exist a topic which contain the new topic's prefix and we want to update the new topic partition; we have a those topic "persistent://public/default/113p-partition-0" "persistent://public/default/113p-partition-1" "persistent://public/default/113p-partition-2" "persistent://public/default/3p-partition-0" when we want to update topic 3p to more partitions ,we failed because "persistent://public/default/113p-partition-0" contain "3p-partition" Modifications use the startwith to check if exist the same topic. * fix the bug, the old topic contain a same strSub cause couldn't add new partitions * add update the partitioned topic which a part is coontained in old topic test Co-authored-by: nicklixinyang <nicklixinyang@didiglobal.com> (cherry picked from commit 241de4b)
### Motivation The producer doesn't release the meomry when msg send timeout ### Modifications When the message timeout, release the memory too. ### Documentation We don't need to update docs, because it's a bug fix. ### reproduce the problem - start pulsar server - start pulsar producer, which will send msg for 10 minutes(queueSize 0, memory limit 50MB) - close the pulsar server - wait for 10 minutes over. - Look the heapdump, it shows pendingMessages size is 0, but Memory used 52429824 (cherry picked from commit 1399eeb)
### Motivation Under certain conditions applications using the multi-topic consumers might get the consumption stalled: The conditions to reproduce the issue are: * Consumer is subscribed to multiple topics, but only 1 topic has traffic * Messages are published in batches (no repro if no batches) * Receiver queue size == 1 (or small, in order to exercise race condition) The problem is that there is race condition between 2 threads when we're deciding to put one of the individual consumers in "paused" state, when the shared queue is full. What happens is that, just after we checked the conditions and we decide to mark the consumer as paused, the application has emptied the shared queue completely. From that point on, there is no re-attempt to check whether we need to unblock that consumer. ### Modification Instead of introducing a sync block (contended by many consumers), we just double check the state of the shared queue after marking the consumer as "paused". If the other thread has emptied the queue in the meantime, we'll be guaranteed to unblock the consumer. (cherry picked from commit f1d66d1)
Avoid issue due to async update by using Awaitility Co-authored-by: Surinder Singh <surinders@splunk.com> (cherry picked from commit 4cf1008)
…ug (#11769) ### Motivation branch 2.8 run license check failed. ``` com.squareup.okhttp-okhttp-2.7.4.jar unaccounted for in LICENSE It looks like there are issues with the LICENSE/NOTICE. ``` ### Modification add com.squareup.okhttp-okhttp-2.7.4.jar into LICENSE.bin.txt
* [Security] Use ubuntu:20.04 base image for Pulsar docker images
- Ubuntu fixes critical and high security vulnerabilities.
- openjdk:11-jdk-slim/openjdk:11-jdk images are based on
Debian 10 which contains a lot of unfixed vulnerabilities.
- this causes the Pulsar docker images to get flagged in Docker image vulnerability
scanning with docker image vulnerability scanning tools such as Clair
* Install Ubuntu updates
* Set DEBIAN_FRONTEND=noninteractive so that dist-upgrade doesn't wait for input
* Set JAVA_HOME
* Fix configuring networkaddress.cache.ttl
- JAVA_HOME set by ENV isn't available for RUN commands
* Configure networkaddress.cache.ttl after installing OpenJDK
(cherry picked from commit f989512)
…r upgrade to 0.20 (#11792) ### Motivation Due to aircompressor 0.19 can't work with heap buffer on JDK1.8, so #11594 use copy data to direct buffer to avoid NoSuchMethodError exception. Now aircompressor released 0.20 and #11790 has upgrade the aircompressor version to 0.20 to fix this issue, we can revert #11594 to avoid copy data to direct buffer to improve performance. ### Modification 1. revert Fix java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer when enabling topic metadata compression #11594, but keep the tests. (cherry picked from commit cc1b983)
Member
|
Thanks for your contribution. For this PR, do we need to update docs? (The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks) |
Contributor
Author
No need. |
Contributor
|
Why do you try to merge to branch-2.8 but not master? A PR that is not merging to master branch won't trigger any CI workflow. |
BewareMyPower
requested changes
Aug 27, 2021
Contributor
BewareMyPower
left a comment
There was a problem hiding this comment.
Block this PR first because it's merged to branch-2.8
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#11796
check if OpReadEntity create success