Fix the problem that the abnormal file causes the bookie GC fail#3610
Closed
1559924775 wants to merge 82 commits intoapache:branch-4.14from
Closed
Fix the problem that the abnormal file causes the bookie GC fail#36101559924775 wants to merge 82 commits intoapache:branch-4.14from
1559924775 wants to merge 82 commits intoapache:branch-4.14from
Conversation
…ite2/** ### Motivation gradle files weren't included into the source artifact of the release site2/ was included. ### Changes Added gradle into the include pattern. Added site2 into the exclude patterns. Reviewers: Enrico Olivelli <eolivelli@gmail.com> This closes apache#2714 from dlg99/gradle-release
### Motivation After add label for prometheus metric by apache#2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow. ``` replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN ``` ### Modification 1. add label empty check for `PrometheusTextFormatUtil` 2. add label scope check test cover 3. add prometheus metric regex pattern check in test case Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits: 8590704 [hangc0276] format code a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/** 732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14 8dc108b [Matteo Merli] y 73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable 034ef85 [Shoothzj] Fix logger member not correct; (apache#2605) b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (apache#2658) 683ad45 [Matteo Merli] Allow to attach labels to metrics (apache#2650) 8091096 [Matteo Merli] Allow to bypass journal for writes (apache#2401) 63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (apache#2710) 87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (apache#2707)
### Motivation 4.14.0 release ### Changes followed https://bookkeeper.apache.org/community/release_guide/#verify-docker-image and used apache#2171 as example Reviewers: Enrico Olivelli <eolivelli@gmail.com> This closes apache#2719 from dlg99/bctests-4_14
* Docs and relese notes update for BK 4.1.4.0 * Updated with recently merged PRs
…version in _config.yml (apache#2722) Finishing up the release
### Motivation Bookie was previously a concrete class that was used and abused all over the place, especially in tests. A classic example of the God object antipattern. The extensive use in tests, resulted in test cases which spin up many instances of the whole system, which is very heavy and very slow, especially when trying to unit tests a particular feature. This change is the first step to resolving this situation. Bookie is now an interface, implemented by BookieImpl. Subsequent changes will break out parts of the interface, cleanup calls and add dependency injection. Reviewers: Matteo Merli <mmerli@apache.org>, Andrey Yegorov, Henry Saputra <hsaputra@apache.org>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#2717 from pkumar-singh/merge_back_to_oss and squashes the following commits: 3edee49 [Prashant] Replace SettableFuture with CompletableFuture db69026 [Ivan Kelly] Turn Bookie into an interface
### Motivation BookKeeperClusterTestCase has historically exposed its members to all subclasses, which would then manipulate them in many ways. There was an array of objects for configurations, bookieServers, autorecovery, which implicit linking between the objects based on maps and indices. Individual subclasses manipulated these arrays. This makes it hard to add any dependency injection on the objects managed by BookKeeperClusterTestCase as the objects. To add DI, we need each object to have a bunch of other objects associated with it. For example, for each Bookie, we need to create the Journal. Maintaining these in separate arrays will lead to fragile tests. This change encapsulates all the testing objects in a per bookie object, and only allows manipulation through methods. This will allow us to group the objects needed for DI clearly. Disable testFollowBookieAddressChangeTrckingDisabled Reviewers: Henry Saputra <hsaputra@apache.org>, Matteo Merli <mmerli@apache.org> This closes apache#2723 from pkumar-singh/refactor_BookKeeperClusterTestCase and squashes the following commits: 47fb812 [Prashant] Addressed code review comments 7f410ce [Ivan Kelly] Encapulate members of BookKeeperClusterTestCase c1847af [Prashant Kumar] Turn Bookie into an interface 9417b68 [Andrey Yegorov] site update for release 4.14, this time actually updating the latest version in _config.yml (apache#2722)
* Site updates and release notes for 4.14.1 * Site updates
### Motivation While restoring a storage container, we fetch the checkpoint from the checkpoint store. Currently this checkpoint will never get cleaned up. Every time we restore the storage container on pod, a new checkpoint will get added. Over period of time the disk usage keeps going up and eventually we have to manually delete these stale checkpoints. ### Changes With this change, we will cleanup the local storage for a storage container whenever we close the KVStore. This will ensure that stale checkpoints are not left behind. It is possible that POD may restart before the cleanup can be done. To avoid these, we will also ensure that local storage for the storage container is cleaned up before we restore the storage container. Reviewers: Henry Saputra <hsaputra@apache.org> This closes apache#2739 from sursingh/storage-cleanup and squashes the following commits: 6bd4776 [Surinder Singh] Clean local storage for storage containers c746836 [Surinder Singh] Add test case for local storage cleanup
### Motivation 'conif' seems to be a typo of `conf`. ### Changes Fixed the typo.
### Motivation I was trying to read the code to see how auto-recovery service works, just noticed the method `printUsage` was never called and the program doesn't print the usage even when '-h' is specified. ### Changes Print the usage of auto-recovery if '-h' is specified or additional unexpected arguments are provided when running the auto-recovery command.
Fixes apache#2733 ### Motivation The okhttp dependency version 2.7.4 is old and vulnerable. This dependency isn't needed and it causes Bookkeeper to be flagged for security vulnerabilities. ### Changes - exclude grpc-okhttp dependency which pulls in okhttp 2.7.4 - update license files
--- Fixes apache#2699 *Motivation* Add missing configuration `tlsCertificatePath` in the configuration file and website description
…sion (apache#2708) ### Motivation We need a way to use old bookie which doesn't support metadata-version-3 and upgrade bookkeeper-client with latest version. Right now, bk-4.12 writes ledger metadata with version-3 which will not be supported by bookie and replicator and replicator fails with below error ``` 2021-05-07.00:43:53.115 [main-EventThread] ERROR o.a.b.meta.AbstractZkLedgerManager - Could not parse ledger metadata for ledger: 123456 java.io.IOException: Metadata version not compatible. Expected between 0 and 2, but got 3 at org.apache.bookkeeper.client.LedgerMetadata.parseConfig(LedgerMetadata.java:465) at org.apache.bookkeeper.meta.AbstractZkLedgerManager$3.processResult(AbstractZkLedgerManager.java:414) at org.apache.bookkeeper.zookeeper.ZooKeeperClient$19$1.processResult(ZooKeeperClient.java:994) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:572) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) ``` So, until one upgrades the bookie version, allow bk-client to use the appropriate metadata version.
) Fixes apache#2511 ### Motivation See apache#2511 The current vertx version is 3.5.3 which has a vulnerability, CVE-2018-12541 . ### Changes - Upgrade [vertx version to 3.9.8](https://github.com/eclipse-vertx/vert.x/releases/tag/3.9.8) - Fix issue with deprecated API usage
Fixes apache#2732 ### Motivation - Freebuilder 1.14.9 contains an outdate jquery js file which causes the library to be flagged as vulnerable with the highest threat level in Sonatype IQ vulnerability scanner. This also flags Bookkeeper and Pulsar as vulnerable with the highest threat level although it is a false positive and not an actual threat. - Freebuilder shouldn't be exposed as a transitive dependency - it's an annotation processor which should be defined - [optional in maven](https://github.com/inferred/FreeBuilder#maven) - [compileOnly in gradle](https://github.com/inferred/FreeBuilder#gradle) ### Changes - upgrade [Freebuilder](https://github.com/inferred/FreeBuilder) from 1.14.9 to 2.7.0 - make dependency optional in maven pom.xml - use `compileOnly` instead of `implementation` in gradle build Reviewers: Sijie Guo <None> This closes apache#2734 from lhotari/lh-fix-freebuilder-dependency-issue
### Motivation More details are provided in [Pulsar # 10937](apache/pulsar#10937). In apache#2631, the default BouncyCastle was changed from non-fips into fips version. But the default version of BouncyCastle in Pulsar is the [non-fips](https://github.com/apache/pulsar/blob/v2.8.0/pulsar-client/pom.xml#L56) one(aimed to make it compatible with the old version of Pulsar). Bouncy Castle provides both FIPS and non-FIPS versions, but in a JVM, it can not include both of the 2 versions(non-Fips and Fips), and we have to exclude the current version before including the other. This makes the backward compatible a little hard, and that's why Pulsar has to involve an individual module for [Bouncy Castle](https://pulsar.apache.org/docs/en/security-bouncy-castle). And if we want to start BookKeeper with TLS enabled through Pulsar's binary, it will meet the following error: ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/bouncycastle/jcajce/provider/BouncyCastleFipsProvider at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:315) at org.apache.bookkeeper.common.util.ReflectionUtils.forName(ReflectionUtils.java:49) at org.apache.bookkeeper.tls.SecurityProviderFactoryFactory.getSecurityProviderFactory(SecurityProviderFactoryFactory.java:39) at org.apache.bookkeeper.proto.BookieServer.<init>(BookieServer.java:129) at org.apache.bookkeeper.server.service.BookieService.<init>(BookieService.java:52) at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:304) at org.apache.bookkeeper.server.Main.doMain(Main.java:226) at org.apache.bookkeeper.server.Main.main(Main.java:208) Caused by: java.lang.ClassNotFoundException: org.bouncycastle.jcajce.provider.BouncyCastleFipsProvider at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) ... 9 more ``` This fix is to use the reflection to get the loaded bc version to avoid the hard-coded bc version. ### Changes Use the reflection to get the loaded bc version to avoid the hard-coded bc version Add backward compatible test for bc-non-fips version
…on FileDescriptor#fd (apache#2749)
### Motivation Fix issue [apache#3726](apache#2726) ### Changes Add double quotation around `${JAVA}` in `/bin/bookkeeper`. Reviewers: Yong Zhang <zhangyong1025.zy@gmail.com>, Andrey Yegorov <None>, Sijie Guo <None> This closes apache#2727 from Sunny-Island/fix-space-in-java-home, closes apache#2726
It looks like an obvious typo that `tslProvider` should be `tlsProvider`, and the configuration item in the source code is also `tlsProvider`: https://github.com/apache/bookkeeper/blob/31e8d1b44ffafd867d0eb2774085e4b1141a7acb/bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/AbstractConfiguration.java#L102
) --- Master Issue: apache#2752 *Motivation* As discussed at length in https://issues.apache.org/jira/browse/LEGAL-572 we found out that the chardet library used by requests library was a mandatory dependency to requests and since it has LGPL licence, we should not release any Apache Software with it.
Fixes apache#2512 ### Motivation See apache#2512 The current libthrift version 0.12.0 has multiple vulnerabilities: - CVE-2019-0205 , CVE-2019-0210 , CVE-2020-13949 ### Motivation - Upgrade libthrift version to 0.14.1 and fix compilation errors - exclude new transitive dependencies org.apache.tomcat.embed:tomcat-embed-core and javax.annotation:javax.annotation-api Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Andrey Yegorov <None> This closes apache#2695 from lhotari/lh-upgrade-libthrift
--- *Motivation* Site updates and release note for 4.14.2
log the the tuple (namespace id, stream id, stream name) in RootStorageService getRange request. ### Motivation Server request metrics are labeled with the stream id, extracted from the routing header. The stream name (aka "table name") is not available but more useful. Rather than making a (cacheable) RPC request to fetch the id -> name mapping in the metrics, logging the information allows one to find the name without requiring admin access to the state store service. Reviewers: Ivan Kelly <ivank@apache.org>, Enrico Olivelli <eolivelli@gmail.com>, Henry Saputra <hsaputra@apache.org> This closes apache#2758 from mauricebarnum/log-stream-name and squashes the following commits: 4ef7ac0 [Maurice Barnum] cleanup: remove extraneous "final" declarations 284b643 [Maurice Barnum] state store: create and delete stream: log stream info
Descriptions of the changes in this PR: Updated py client's version to 4.14 ### Motivation Preparing for the release https://bookkeeper.apache.org/community/release_guide/#change-python-client-version
### Motivation Added `TCP_USER_TIMEOUT` in Epoll channel config to limit the time a connection is left sending keepalives to a non-responding Bookie. ### Changes The original issue reported that in scenarios where Bookies may go down unexpectedly and change their IP (e.g., Kubernetes), the Bookkeeper client may be left for some time attempting to connect with the old IP of the restarted Bookie (see apache#2482 for details). To prevent this problem from happening (in Epoll channels), we introduce the following changes: - Epoll channels are now configured with `TCP_USER_TIMEOUT`. This parameter rules over the underlying TCP keepalive configuration (see https://datatracker.ietf.org/doc/html/rfc5482), which may be defaulted to retry for too long depending on the environment (e.g., 10-15 minutes in our experience). - To prevent adding more configuration parameters, the existing `clientConnectTimeoutMillis` value in `ClientConfiguration` is the one used to set `TCP_USER_TIMEOUT` due to its similarity. ### Validation We have reproduced the original testing environment in which this problem appears consistently: - Cluster with 4 Bookies and 3 Kubernetes nodes, in addition to https://pravega.io which uses the Bookkeeper client. - Deployed an application to do IO to Pravega (and therefore, to Bookkeeper). - Periodically shut down a Kubernetes node, so Bookkeeper pods on it are restarted as well. Considering this test procedure, without the proposed PR we consistently observe Bookkeeper clients getting stuck trying to contact with old IPs from Bookies. With this change, we confirmed via logs that the configuration change takes place and we have not been able to reproduce the original problem so far after performing multiple node reboots. Master Issue: apache#2482 Reviewers: Flavio Junqueira <fpj@apache.org>, Enrico Olivelli <eolivelli@apache.org> This closes apache#2761 from RaulGracia/issue-2482-close-idle-bookie-connection, closes apache#2482
Descriptions of the changes in this PR: ### Motivation Benchmark tests were failing due to - Missing runtime test dependency of MetricsCore - Lack of enough JVM memory. ### Changes - Include metricsCore as testDependency - Increase heap size of 4 GB. Master Issue: apache#2640 Reviewers: Henry Saputra <hsaputra@apache.org>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#2777 from pkumar-singh/build_with_gradle
We have more than 100 labels and the merge script is not able to download the list of labels, resulting in the impossibility to merge PRs. Modifications: - download all the pages of labels - remove the Python2 script Reviewers: Anup Ghatage <ghatage@apache.org>, Henry Saputra <hsaputra@apache.org> This closes apache#2776 from eolivelli/fix/merge-script-pagination
…2833) * Eliminate direct ZK access in ScanAndCompareGarbageCollector * Removed unused imports * Fixed zk ACLs * Addressed comments * Fixed checkstyle
BP-44: USE metrics. A proposal for improving BookKeeper metrics so that operators can employ the USE method for diagnosing performance issues. Reviewers: Henry Saputra <hsaputra@apache.org>, Andrey Yegorov <None>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#2835 from Vanlightly/BP-44-use-metrics and squashes the following commits: 8d9baab [Jack Vanlightly] Added link to USE method and listed each term of USE 5a0f67d [Jack Vanlightly] BP-44 USE metrics a9b576d [Yunze Xu] Release semaphore when addEntry accepts the same entries (apache#2832) 148bf22 [Yun Tang] Ensure to release cache during KeyValueStorageRocksDB#closec (apache#2821) 4dc4260 [gaozhangmin] Heap memory leak problem when ledger replication failed (apache#2794) a522fa3 [Raúl Gracia] Issue 2815: Upgrade to log4j2 to get rid of CVE-2019-17571 (apache#2816) 0465052 [Nicolò Boschi] Upgrade httpclient from 4.5.5 to 4.5.13 (apache#2793) 594a056 [Raúl Gracia] Issue 2795: Bookkeeper upgrade using Bookie ID may fail due to cookie mismatch (apache#2796) 354cf37 [Raúl Gracia] Upgraded dependencies with CVEs (apache#2792) e413c70 [Raúl Gracia] Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option (apache#2779) 883231e [pradeepbn] Building bookkeeper with gradle on java11
…ache.bookkeeper.bookie. org.apache.bookkeeper.client org.apache.bookkeeper.replication org.apache.bookkeeper.tls. (apache#2812) Co-authored-by: Prashant Kumar <prashantk@splunk.com>
…ver (apache#2788) Error log: `16:21:20.140 [main] ERROR org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to initialize DNS Resolver org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping, used default subnet resolver : java.lang.RuntimeException: java.lang.NullPointerException java.lang.NullPointerException` `BookieAddressResolver` should be set before `((Configurable) dnsResolver).setConf(conf);` It will throw npe. when pulsar `ZkBookieRackAffinityMapping` invoke getBookieAddressResolver
Co-authored-by: Prashant Kumar <prashantk@splunk.com>
### Motivation - Issue is as described in [PR#2797](apache#2797). > In one day, zookeepers became high cpu usage and disk full. > The cause of this is bookie's gc of overreplicated ledgers. > Gc created/deleted zk nodes under /ledgers/underreplication/locks very frequently and some bookies ran gc at same time. > As a result, zookeepers created a lot of snapshots and became disk full. - I want to reduce the number of lock node creations and deletions in ZK. ### Changes - Add an ensemble check before creating the lock node. This is to reduce the number of lock node creations and deletions in ZK. - ~~If [PR#2797](apache#2797) was merged, this PR needs to be fixed.~~
* Forget to close preAllocator log on shutdown * Fix synchronize problem * handle InterruptedException
Co-authored-by: Prashant Kumar <prashantk@splunk.com>
* Remove direct ZK access for Auditor * Fixed unused imports * Fixed checkstyle * Fixed checkstyle in tests
…apache#2844) ### Motivation For each ledger whose metadata is not in ZK, following stack trace will be output: ``` 15:30:17.925 [GarbageCollectorThread-11-1] ERROR o.a.b.b.ScanAndCompareGarbageCollector - Exception when iterating through the ledgers to check for over-replication java.util.concurrent.ExecutionException: org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.bookkeeper.bookie.ScanAndCompareGarbageCollector.removeOverReplicatedledgers(ScanAndCompareGarbageCollector.java:199) at org.apache.bookkeeper.bookie.ScanAndCompareGarbageCollector.gc(ScanAndCompareGarbageCollector.java:120) at org.apache.bookkeeper.bookie.GarbageCollectorThread.doGcLedgers(GarbageCollectorThread.java:372) at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:323) at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:301) at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists at org.apache.bookkeeper.meta.AbstractZkLedgerManager$3.processResult(AbstractZkLedgerManager.java:397) at org.apache.bookkeeper.zookeeper.ZooKeeperClient$19$1.processResult(ZooKeeperClient.java:994) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:575) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) ``` It is noisy, makes the size of log files large and finally causes OOM during log rotation. So we should suppress the stacktrace. (This problem is due to [apache#2813](apache#2813).) ### Changes Add error handling to readLedgerMetadata in over-replicated ledger GC in order to suppress the stacktrace.
…#2848) * Remove deprecated Jenkinsfile * Remove Jenkins job, Travis refs, update doc/website for contributions * readd newline
…Counter (apache#2839) * Addition of thread-scoped stats The Counter and OpStatsLogger have new variants that add threadPool and thread labels to their metrics. These new variants can be obtained via new methods in the StatsLogger interface.
### Motivation
In order to complete migration to Gradle we must build all the subprojects.
### Changes
- Enabled `sh` integration tests with gradle, located in `tests/scripts/src/test/bash/gradle`
- Added these modules to the build
- `bookkeeper-http:servlet-http-server`
- `metadata-drivers:etcd`
- `tests:backward-compat:*`
- `tests:shaded:*`
- `stream:bk-grpc-name-resolver`
- DL shading process is now performed (before it didn't build any jar)
- Groovy tests (`tests:backward-compat:*`) now are triggered by the build/tests itself; with Maven, there is a "runner" project (`tests/integration-tests-base-groovy`); in Gradle is useless so it is skipped
### Test
- Both `bin/bookkeper standalone` and `bin/bookkeper_gradle standalone` work locally
- Tests are passing locally
Master Issue: apache#2849
Reviewers: Henry Saputra <hsaputra@apache.org>, Prashant Kumar <None>
This closes apache#2850 from nicoloboschi/fix/2849/gradle and squashes the following commits:
00b49f4 [Nicolò Boschi] Fix common_gradle.sh regex
bd739fd [Nicolò Boschi] fix sh tests
43230ba [Nicolò Boschi] revert sh files. Avoid to modify maven files, create gradle versions to faciltate migration
d1f95e4 [Nicolò Boschi] fix shaded deps
bcab40d [Nicolò Boschi] fix build
5fd0341 [Nicolò Boschi] fix build
0082e0e [Nicolò Boschi] fix build
2c32ac1 [Nicolò Boschi] fixes
3bc0b26 [Nicolò Boschi] bookkeeper-server-shaded-tests
ba89132 [Nicolò Boschi] shaded tests
6d39e33 [Nicolò Boschi] sh tests
e0032bc [Nicolò Boschi] actually run arquillian groovy tests
08dcc39 [Nicolò Boschi] backwards
2361f79 [Nicolò Boschi] hierarchical-ledger-manager
8388e11 [Nicolò Boschi] current-server-old-clients
6a24344 [Nicolò Boschi] bc-non-fips
2faca01 [Nicolò Boschi] bk-grpc-name-resolver
991bc11 [Nicolò Boschi] servlet-http-server
675ef7b [Nicolò Boschi] etcd
b1d5e14 [ZhangJian He] A empty implement in EtcdLedgerManagerFactory to let the project can compile (apache#2845)
bd5c50b [shustsud] Add error handling to readLedgerMetadata in over-replicated ledger GC (apache#2844)
746f9f6 [Matteo Merli] Remove direct ZK access for Auditor (apache#2842)
4117200 [ZhangJian He] the compare should be >= instead of > (apache#2782)
14ef56f [Prashant Kumar] BookieId can not be cast to BookieSocketAddress (apache#2843)
e10f3fe [ZhangJian He] Forget to close preAllocator log on shutdown (apache#2819)
53954ca [shustsud] Add ensemble check to over-replicated ledger GC (apache#2813)
919fdd3 [Prashant Kumar] Issue:2840 Create bookie shellscript for gradle (apache#2841)
031d168 [gaozhangmin] fix-npe-when-pulsar-ZkBookieRackAffinityMapping-getBookieAddressResolver (apache#2788)
3dd671c [Prashant Kumar] Migrate bookkeepr-server:test to gradle run unit tests excepts org.apache.bookkeeper.bookie. org.apache.bookkeeper.client org.apache.bookkeeper.replication org.apache.bookkeeper.tls. (apache#2812)
f6903b8 [Jack Vanlightly] BP-44 USE metrics
a4afaa4 [Matteo Merli] Eliminate direct ZK access in ScanAndCompareGarbageCollector (apache#2833)
a9b576d [Yunze Xu] Release semaphore when addEntry accepts the same entries (apache#2832)
148bf22 [Yun Tang] Ensure to release cache during KeyValueStorageRocksDB#closec (apache#2821)
4dc4260 [gaozhangmin] Heap memory leak problem when ledger replication failed (apache#2794)
a522fa3 [Raúl Gracia] Issue 2815: Upgrade to log4j2 to get rid of CVE-2019-17571 (apache#2816)
0465052 [Nicolò Boschi] Upgrade httpclient from 4.5.5 to 4.5.13 (apache#2793)
594a056 [Raúl Gracia] Issue 2795: Bookkeeper upgrade using Bookie ID may fail due to cookie mismatch (apache#2796)
354cf37 [Raúl Gracia] Upgraded dependencies with CVEs (apache#2792)
e413c70 [Raúl Gracia] Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option (apache#2779)
883231e [pradeepbn] Building bookkeeper with gradle on java11
…emoryCRC32Digest (apache#2847)
…plicationManager#acquireUnderreplicatedLedger (apache#2855)
…mand (apache#2870) ### Motivation When we use `bin/bookkeeper shell recover bookieId` command to recover specific bookie's ledgers, the recover process will exit when occurs recover ledger failed. In our production bookkeeper cluster, we found some ledgers in Open state and has no entry. When we call `bin/bookkeeper shell recover bookieId` command, it will traverse all the ledgers level by level. In the end, for each ledger, it will call the following code to process recover. ```Java Processor<Long> ledgerProcessor = new Processor<Long>() { @OverRide public void process(Long ledgerId, AsyncCallback.VoidCallback iterCallback) { recoverLedger(bookiesSrc, ledgerId, dryrun, skipOpenLedgers, skipUnrecoverableLedgers, iterCallback); } }; ``` In the `recoverLedger` method, it will call `asyncOpenLedgerNoRecovery` to open ledger and get LAC if the ledger in `OPEN` state. For the `getLAC` request, if the request ledger has no entry, it will return entry = -1 and return ERROR for this `getLAC` request. https://github.com/apache/bookkeeper/blob/98ddf8149592572eebcfaf6bdd4916f295ffd9d7/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java#L756-L769 And for the `asyncOpenLedgerNoRecovery` callback, it will return error for this process. It will stop the recover process of the following ledgers. In the end, the recover command runs failed, and the following ledger can't be recovered. ### Changes We should expose a flag for user to determine whether to move forward to recover the following ledgers when some ledgers recover failed. So, I provide the parameter `sku` to handle this case.
…urs (apache#2860) Return too many requests error when there is OperationRejectedException which occurs because of internal resource saturation
To have DI on the bookie, it's better to have a single constructor that takes all the injected implementations. With a single constructor, there's only one place modify in production code when we want to inject something, rather than having the bookie need to know how to create a "default" version (which often breaks encapsulation). If we need convenience constructors for tests, they should live in the tests. Co-authored-by: Ivan Kelly <ikelly@splunk.com>
--- *Motivation* Fix typo in DBLedgerStorage. `getLedgerSorage` -> `getLedgerStorage`
### Motivation Release note for 4.14.3
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Descriptions of the changes in this PR:
Motivation
When bookie GC encounters an entryLog file with disordered data, a IllegalArgumentException are not caught, resulting in the death of the GC thread and the end of the GC process.
This problem is encountered again in the next GC, resulting in the subsequent entrylog being unable to GC.
The problem described in this issue(#3604) will cause the entryLog file to be disordered, and will cause the IllegalArgumentException when parsing the entrylog file.
Master Issue:
#3607 : Describes the problem to be solved by this pr.
#3604 : Describes the phenomenon and causes of disordered entrylog files.