[issue #11796] throw NPE when readEntry by casuallc · Pull Request #11808 · apache/pulsar

casuallc · 2021-08-27T02:07:32Z

#11796
check if OpReadEntity create success

…10849) This reverts commit 6e747ec. ### Motivation The problem in #10798 is more about "epoll" and not about tc-native. With adoptopenjdk/openjdk15:alpine-slim docker image the problem does not reproduce, so it is better to stick to 2.0.38.Final (cherry picked from commit 416ee72)

#10861) This reverts commit 4264a67. ### Motivation The change #8796 has broken the Pulsar Functions running on Kubernetes. The Pulsar Functions Kubernetes runtime generates a secret and mounts it using mode `256`. That means the secret is only able to read by the user. The StatefulSet created by Kubernetes runtime mounts the secrets under the `root` user. Hence only the root user is able to read the secret. This results in any functions submitted will fail to read the authentication information. Because all the Kubernetes resources generated by the Kubernetes runtime are hardcoded. There is no easy way to change the security context for the function statefulsets. Let's revert this change for 2.8.0, until we can address the issues in the Kubernetes runtime. (cherry picked from commit 4f556a2)

(cherry picked from commit d93b775)

…ss brokers (#10862) * Added metadata cache test to simulate multi broker cache * fix create and delete ops on cache 1. During create we should add a watch for the path in zookeeper. Without this we will not be notified if the znode is changed on another brokers 2. similarly when deleting, the cache should be invalidated. But we shouldn't add an entry to the cache. This could get added again on some other broker. In that case we need to go a fetch the entry from the zookeeper. Adding an empty entry to the cache prevents that. * Address review comments Also add a small delay in test to allow notifications to propagate to other caches. Without this the tests are occasionally failing Co-authored-by: Surinder Singh <surinders@splunk.com> (cherry picked from commit 798b34f)

…created (#10876) * Always let system topic TRANSACTION_BUFFER_SNAPSHOT be auto created * Remove unused import * Add EVENTS_TOPIC_NAMES set; add unit tests (cherry picked from commit b1b3b3c)

Fixes #10882 ### Motivation CmdSink and CmdSource uses `gson` to parse the JSON configs from pulsar-admin. But most of connectors are using ObjectMapper to serde the config into actual class. `gson` will also convert int/long value into float by default, which will lead ObjectMapper cannot parse float string into int/long correctlly. ### Modifications use ObjectMapper to parse sink/source config. (cherry picked from commit 2c9ea81)

*Motivation* Generating metrics is an expensive task and it can take more than 30 seconds if you have close to or more than a million topics. *Modification* This change provides a setting to adjust the async context timeout. (cherry picked from commit c75d45b)

…0888) * Make KeyValueSchema an interface visible in the public Schema API - allow users of pulsar-client-api to use KeyValueSchema - move KeyValueSchema implementation to KeyValueSchemaImpl - introduce a new interface KeyValueSchema (cherry picked from commit 18f2f4a)

(cherry picked from commit 3de089c)

(cherry picked from commit cf50da3)

Fixes #10906 Also addresses CVE-2021-28169 (cherry picked from commit 6c03154)

- Zookeeper 3.6.2 gets flagged as vulnerable https://ossindex.sonatype.org/component/pkg:maven/org.apache.zookeeper/zookeeper@3.6.2 because of using vulnerable Netty version (cherry picked from commit aa36eb8)

…r and make SchemaInfo an interface (#10878) ### Motivation The java-instance.jar generated by the pulsar-functions-runtime-all module should only contain interfaces that Pulsar Function's framework uses to interact with user code. The module should on have the following dependencies 1. pulsar-io-core 2. pulsar-functions-api 3. pulsar-client-api 4. slf4j-api 5. log4j-slf4j-impl 6. log4j-api 7. log4j-core *Explain here the context, and why you're making that change. What is the problem you're trying to solve.* ### Modifications Change dep pulsar-client-original to pulsar-client-api Slight changes in the top level pom for what is included in all sub-modules so that additional deps don't land into java-instance.jar There is also a fix for an issue introduced by #9673. The thread context class loader was set incorrectly in ThreadRuntime. ### Future improvements 1. We should also add a test in the future to make sure external libraries don't get add accidentally this module and java-instance.jar 2. Rename the module "pulsar-functions-runtime-all" to something that describes its function better. The current name can be confusing (cherry picked from commit d81b5f8)

* Use brace initializer to initialize configurations * Add config tests

…oup (#10921)

### Motivation When broker entry metadata is enabled, using pulsar-admin to peek messages will fail with ``` [pulsar-web-29-16] ERROR org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null] Failed to peek message at position 1 from persistent://prop/ns-abc/topic-b604aad8ea8010af my-sub java.lang.IllegalArgumentException: Invalid unknonwn tag type: 6 at org.apache.pulsar.common.api.proto.LightProtoCodec.skipUnknownField(LightProtoCodec.java:270) ~[classes/:?] at org.apache.pulsar.common.api.proto.MessageMetadata.parseFrom(MessageMetadata.java:1370) ~[classes/:?] at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:426) ~[classes/:?] ``` ### Modifications Skip the broker entry metadata if exists in `generateResponseWithEntry` and add the test to verify it. (cherry picked from commit 7603a9d)

(cherry picked from commit 7417ca8)

fixes testExpiredLedgerDeletionAfterManagedLedgerRestart (cherry picked from commit 1623790)

When admin.topics().getDelayedDeliveryPolicy(topicName) is executed, topicPolicies may not be initialized yet This problem occurs when only a single unit test is executed. When the entire unit test is executed, it rarely appears, because other unit tests initialize the TopicPolicies under this Namespace (cherry picked from commit 96b293a)

* Do not expose meaningless stats for consumer. Currently we have exposed some meaningless consumer stats to users such as ``` addressLength addressOffset connectedSinceOffset connectedSinceLength clientVersionOffset clientVersionLength stringBuffer ``` All of these stats are not used by users but used internally. So remove these stats from the exposed consumer stats. * remove line (cherry picked from commit ae03e51)

## Motivation fix backlog issuse with --precise-backlog=true. Now when `managedLedger` create a new `ledger` complete. if `markDelete` is the `previousLedger` LAC it will delete the previousLedger from `managedLedger` . when get backlog we will use range.close to get `getNumberOfEntries` -1, if previousLedger not exist will get the wrong number. ![image](https://user-images.githubusercontent.com/39078850/122502847-fce47800-d029-11eb-81b3-abc9e595d93e.png) ## implement use range.openClose() to `getBacklog`. ### Verifying this change Add the tests for it (cherry picked from commit e3a97ee)

### Motivation Support loading `awsCredentialPluginParam` from secrets for kinesis connectors to avoid leaking of sensitive information with best efforts. ### Modifications Load kinesis configs with `IOConfigUtils.loadWithSecrets()`. ### Verifying this change This change is already covered by existing tests and this PR also adds tests to confirm both loading credentials from plaintext config object and secrets work as expected. (cherry picked from commit cb79032)

### Motivation If we set enableRetry=true, the client will subscribe to the retry topic. If we set AllowAutoTopicCreation=false, the client will receive `Topic dose not exists`, which means that the retry Topic does not exists. This makes users very confused. My topic clearly exists, why is it still prompted like this? Therefore, the prompt information is improved to facilitate users to troubleshoot problems. ### Modifications add topic name to the returning message (cherry picked from commit e72a0d8)

Fixes # throwable exception not thrown. ### Motivation I found some throwable exception that not thrown, maybe it's a bug. (cherry picked from commit e0988f7)

…#10901) (cherry picked from commit edc03c0)

…ting the number of partitions (#10910) Fixes #10673 ### Motivation When updating the number of partitions, we need to update the data in two places in zk: ``` /admin/partitioned-topics /managed-ledgers/ ``` Now we only update the number of partitions in `/admin/partitioned-topics`, so if we do not create a Producer or Consumer, the data obtained in another cluster will be incorrect ### Modifications 1）Try to create managedLedger when updating the number of partitions 2）Ensure that the number of partitions in `/admin/partitioned-topics` is updated every time (cherry picked from commit 202da11)

If a topic only have non-durable subscriptions but not durable subscriptions, and the non-durable subscription reach the end of the topic, we will get 0 estimated backlog size So that the compaction will never been triggered. The expected behavior is if we have no durable subscriptions, we should use the total size for triggering the compaction. (cherry picked from commit 797cb12)

…pics (#11731) * Remove the subscription from the topic when closing Reader subscription. * remove useless code (cherry picked from commit 5ac38f1)

…ry leak (#11748) (cherry picked from commit 7906bb5)

#11691) * Fixed block forever bug in Consumer.batchReceive Ensure that all poll() calls to pendingBatchReceives is done within the pinnedInternalExecutor to avoid a race condition where a peek and a subsequent poll get different pending receives. Moved the pinnedInternalExecutor into the ConsumerBase as both ConsumerImpl and MultiTopicsConsumerImpl require it. failingPendingReceive() now always submits its work to the internal executor returning a CompletableFuture and all callers treat it as an asynchronous operation. * Fix broken MultiTopicsConsumerImplTest Needed a real executor service to run the failPendingReceive() method. * Ensure all calls to messageReceived happen on internal executor * Readd missing return statement in ConsumerImpl.closeAsync() * Ensure correct usage of consumer internal executors Ensure that the externalPinnedExecutor is only called for user code and internalPinnedExecutor used for internal tasks. Some test refactoring to manage creation of executors. (cherry picked from commit bd942e1)

…11741) Currently, if encounter the add entry failure, will call producer.disconnect() multiple times during the disconnecting the producer which will add many disconnect producer tasks to the EventLoop. 1. Added isDisconnecting state for the producer, if the producer in isDisconnecting state, skip the disconnect operation 2. Create new future list only the topic have producers to reduce the heap allocation Added test to cover disconnecting the producer multiple times, but the EventLoop only execute one time. (cherry picked from commit 49c0796)

…r creation failure (#11754) (cherry picked from commit f154de7)

…sumers (#11749) (cherry picked from commit 8027ab4)

Fixes #10937 *Motivation* The previours bk version was introduce an incompatible issue with BouncyCastle FIPS. BookKeeper already fix this and we should upgrade the version to resolve the issue. For more information: #10937 *Modifications* Upgrade bk to 4.14.2 (cherry picked from commit e9292b3)

…iption failure (#11758) (cherry picked from commit c2bd23d)

* Fix the topic in fenced state and can not recover. Here is the log when the issue happens. The producer continues to reconnect to the broker, but the fenced state of the topic is always true. ``` 19:01:42.351 [pulsar-io-4-1] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.24.34.151:48052][persistent://public/default/test-8] Creating producer. producerId=8 19:01:42.352 [Thread-174681] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.24.34.151:48052] persistent://public/default/test-8 configured with schema false 19:01:42.352 [Thread-174681] WARN org.apache.pulsar.broker.service.AbstractTopic - [persistent://public/default/test-8] Attempting to add producer to a fenced topic 19:01:42.352 [Thread-174681] ERROR org.apache.pulsar.broker.service.ServerCnx - [/10.24.34.151:48052] Failed to add producer to topic persistent://public/default/test-8: producerId=8, org.apache.pulsar.broker.service.BrokerServiceException$TopicFencedException: Topic is temporarily unavailable ``` After check the heap dump of the broker, the `pendingWriteOps` is 5, this is the reason why the topic can not recover from the fenced state. The topic will change to unfenced only the `pendingWriteOps` is 0, details can find at [PersistentTopic.decrementPendingWriteOpsAndCheck()](https://github.com/apache/pulsar/blob/794aa20d9f2a4e668cc36465362d22e042e6e536/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L463) But after checking the ML state of the topic, it shows the `pendingAddEntries` is 0 which not equals to `pendingWriteOps` of the topic. The root cause is we are polling add entry op from the `pendingAddEntries` in multiple threads, one is the the ZK callback thread when complete the ledger creating (https://github.com/apache/pulsar/blob/794aa20d9f2a4e668cc36465362d22e042e6e536/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L1406, https://github.com/apache/pulsar/blob/794aa20d9f2a4e668cc36465362d22e042e6e536/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L1669), another one is the ML worker thread when complete the add entry op (https://github.com/apache/pulsar/blob/794aa20d9f2a4e668cc36465362d22e042e6e536/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java#L181) After the ongoing add entry op complete, but the corresponding op might been polled by the `clearPendingAddEntries` method. So it will poll another one, but due to not equals to the current op, the polled op will not get a chance to be failed, so that the `pendingWriteOps` will not change to 0. I have attached the complete logs for the topic: The fix is to complete the add entry op with ManagedLedgerException if the polled op is not equals to the current op. * Release buffer. * Revert (cherry picked from commit 1bcbab0)

…ned by an existed topic as a part (#11686) Fixes #11685 validatePartitionTopicUpdate use contain to check if there has a exist topic will cause conflict, which will cause a failed when exist a topic which contain the new topic's prefix and we want to update the new topic partition; we have a those topic "persistent://public/default/113p-partition-0" "persistent://public/default/113p-partition-1" "persistent://public/default/113p-partition-2" "persistent://public/default/3p-partition-0" when we want to update topic 3p to more partitions ,we failed because "persistent://public/default/113p-partition-0" contain "3p-partition" Modifications use the startwith to check if exist the same topic. * fix the bug, the old topic contain a same strSub cause couldn't add new partitions * add update the partitioned topic which a part is coontained in old topic test Co-authored-by: nicklixinyang <nicklixinyang@didiglobal.com> (cherry picked from commit 241de4b)

### Motivation The producer doesn't release the meomry when msg send timeout ### Modifications When the message timeout, release the memory too. ### Documentation We don't need to update docs, because it's a bug fix. ### reproduce the problem - start pulsar server - start pulsar producer, which will send msg for 10 minutes(queueSize 0, memory limit 50MB) - close the pulsar server - wait for 10 minutes over. - Look the heapdump, it shows pendingMessages size is 0, but Memory used 52429824 (cherry picked from commit 1399eeb)

### Motivation Under certain conditions applications using the multi-topic consumers might get the consumption stalled: The conditions to reproduce the issue are: * Consumer is subscribed to multiple topics, but only 1 topic has traffic * Messages are published in batches (no repro if no batches) * Receiver queue size == 1 (or small, in order to exercise race condition) The problem is that there is race condition between 2 threads when we're deciding to put one of the individual consumers in "paused" state, when the shared queue is full. What happens is that, just after we checked the conditions and we decide to mark the consumer as paused, the application has emptied the shared queue completely. From that point on, there is no re-attempt to check whether we need to unblock that consumer. ### Modification Instead of introducing a sync block (contended by many consumers), we just double check the state of the shared queue after marking the consumer as "paused". If the other thread has emptied the queue in the meantime, we'll be guaranteed to unblock the consumer. (cherry picked from commit f1d66d1)

Avoid issue due to async update by using Awaitility Co-authored-by: Surinder Singh <surinders@splunk.com> (cherry picked from commit 4cf1008)

…ug (#11769) ### Motivation branch 2.8 run license check failed. ``` com.squareup.okhttp-okhttp-2.7.4.jar unaccounted for in LICENSE It looks like there are issues with the LICENSE/NOTICE. ``` ### Modification add com.squareup.okhttp-okhttp-2.7.4.jar into LICENSE.bin.txt

* [Security] Use ubuntu:20.04 base image for Pulsar docker images - Ubuntu fixes critical and high security vulnerabilities. - openjdk:11-jdk-slim/openjdk:11-jdk images are based on Debian 10 which contains a lot of unfixed vulnerabilities. - this causes the Pulsar docker images to get flagged in Docker image vulnerability scanning with docker image vulnerability scanning tools such as Clair * Install Ubuntu updates * Set DEBIAN_FRONTEND=noninteractive so that dist-upgrade doesn't wait for input * Set JAVA_HOME * Fix configuring networkaddress.cache.ttl - JAVA_HOME set by ENV isn't available for RUN commands * Configure networkaddress.cache.ttl after installing OpenJDK (cherry picked from commit f989512)

…e block of memory (#11782) (cherry picked from commit 12aef52)

Fix #11789 ### Modification upgrade aircompressor from 0.19 to 0.20 (cherry picked from commit 051f52d)

…r upgrade to 0.20 (#11792) ### Motivation Due to aircompressor 0.19 can't work with heap buffer on JDK1.8, so #11594 use copy data to direct buffer to avoid NoSuchMethodError exception. Now aircompressor released 0.20 and #11790 has upgrade the aircompressor version to 0.20 to fix this issue, we can revert #11594 to avoid copy data to direct buffer to improve performance. ### Modification 1. revert Fix java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer when enabling topic metadata compression #11594, but keep the tests. (cherry picked from commit cc1b983)

Anonymitaet · 2021-08-27T03:25:01Z

Thanks for your contribution. For this PR, do we need to update docs?

(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

casuallc · 2021-08-27T03:45:28Z

Thanks for your contribution. For this PR, do we need to update docs?

(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

No need.

BewareMyPower · 2021-08-27T07:14:03Z

Why do you try to merge to branch-2.8 but not master? A PR that is not merging to master branch won't trigger any CI workflow.

BewareMyPower

Block this PR first because it's merged to branch-2.8

codelipenghui and others added 30 commits June 7, 2021 14:10

Release 2.8.0

5d659aa

pulsar-perf: print total number of messages (#10765)

263df02

(cherry picked from commit d93b775)

[Broker] Always let system topic TRANSACTION_BUFFER_SNAPSHOT be auto …

e44be07

…created (#10876) * Always let system topic TRANSACTION_BUFFER_SNAPSHOT be auto created * Remove unused import * Add EVENTS_TOPIC_NAMES set; add unit tests (cherry picked from commit b1b3b3c)

Provide a flag to disable managed ledger metrics (#10885)

8252f9d

(cherry picked from commit 3de089c)

Kafka connect sink adaptor to support non-primitive schemas (#10410)

3505b98

(cherry picked from commit cf50da3)

Upgrade Jetty to 9.4.42.v20210604 (#10907)

1916324

Fixes #10906 Also addresses CVE-2021-28169 (cherry picked from commit 6c03154)

[Security] Upgrade Zookeeper to 3.6.3 (#10852)

0929015

- Zookeeper 3.6.2 gets flagged as vulnerable https://ossindex.sonatype.org/component/pkg:maven/org.apache.zookeeper/zookeeper@3.6.2 because of using vulnerable Netty version (cherry picked from commit aa36eb8)

[C++] Fix undefined behavior caused by uninitialized variables (#10892)

7509677

* Use brace initializer to initialize configurations * Add config tests

Fix the unit tests for the websocket and run tests under websocket gr…

fa5c5d4

…oup (#10921)

Fix issue where Key_Shared consumers could get stuck (#10920)

8065d6c

Update version to 2.8.1-SNAPSHOT

bf33da8

[Broker] Fix direct memory leak in getLastMessageId (#10977)

02ec937

(cherry picked from commit 7417ca8)

Fix the flaky test in the ManagedLedgerTest (#11016)

a9b49ef

fixes testExpiredLedgerDeletionAfterManagedLedgerRestart (cherry picked from commit 1623790)

[Fixed] throwable exception not thrown. (#10863)

489d24d

Fixes # throwable exception not thrown. ### Motivation I found some throwable exception that not thrown, maybe it's a bug. (cherry picked from commit e0988f7)

Fix documentation for transactions to disable send timeout by default (…

c2d3707

…#10901) (cherry picked from commit edc03c0)

gaoran10 and others added 20 commits August 25, 2021 12:18

[Broker] Remove subscription when closing Reader on non-persistent to…

2fb2e59

…pics (#11731) * Remove the subscription from the topic when closing Reader subscription. * remove useless code (cherry picked from commit 5ac38f1)

[Broker] Call .release() when discarding entry to prevent direct memo…

1ed33f2

…ry leak (#11748) (cherry picked from commit 7906bb5)

[pulsar-client] clean up MultiTopicsConsumerImpl reference on consume…

de813bf

…r creation failure (#11754) (cherry picked from commit f154de7)

[Broker] Handle NPE when full key range isn't covered with active con…

9287e87

…sumers (#11749) (cherry picked from commit 8027ab4)

[pulsar-client] remove consumer reference from PulsarClient on subscr…

bce43cb

…iption failure (#11758) (cherry picked from commit c2bd23d)

Make MetadataCacheTest reliable. (#10877)

9ea5180

Avoid issue due to async update by using Awaitility Co-authored-by: Surinder Singh <surinders@splunk.com> (cherry picked from commit 4cf1008)

Release 2.8.1

694ebb7

[package management service] fix wrappedBuffer always using the sam…

af9ae9b

…e block of memory (#11782) (cherry picked from commit 12aef52)

upgrade aircompressor to 0.20 (#11790)

f9850fb

Fix #11789 ### Modification upgrade aircompressor from 0.19 to 0.20 (cherry picked from commit 051f52d)

[ISSUE 11796] throw NPE when readEntry

e3d401d

casuallc changed the title ~~Branch 2.8~~ [issue #11796] throw NPE when readEntry Aug 27, 2021

casuallc changed the base branch from master to branch-2.8 August 27, 2021 02:11

Anonymitaet added the doc-not-needed Your PR changes do not impact docs label Aug 27, 2021

BewareMyPower requested changes Aug 27, 2021

View reviewed changes

casuallc changed the base branch from branch-2.8 to master August 27, 2021 07:41

casuallc closed this Aug 27, 2021

casuallc deleted the branch-2.8 branch August 27, 2021 07:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[issue #11796] throw NPE when readEntry#11808

[issue #11796] throw NPE when readEntry#11808
casuallc wants to merge 211 commits intoapache:masterfrom
casuallc:branch-2.8

casuallc commented Aug 27, 2021

Uh oh!

Anonymitaet commented Aug 27, 2021

Uh oh!

casuallc commented Aug 27, 2021

Uh oh!

BewareMyPower commented Aug 27, 2021

Uh oh!

BewareMyPower left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

casuallc commented Aug 27, 2021

Uh oh!

Anonymitaet commented Aug 27, 2021

Uh oh!

casuallc commented Aug 27, 2021

Uh oh!

BewareMyPower commented Aug 27, 2021

Uh oh!

BewareMyPower left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants