Skip to content

Fix the problem that the abnormal file causes the bookie GC fail#3610

Closed
1559924775 wants to merge 82 commits intoapache:branch-4.14from
1559924775:fix_gc
Closed

Fix the problem that the abnormal file causes the bookie GC fail#3610
1559924775 wants to merge 82 commits intoapache:branch-4.14from
1559924775:fix_gc

Conversation

@1559924775
Copy link

Descriptions of the changes in this PR:

Motivation

When bookie GC encounters an entryLog file with disordered data, a IllegalArgumentException are not caught, resulting in the death of the GC thread and the end of the GC process.
This problem is encountered again in the next GC, resulting in the subsequent entrylog being unable to GC.
The problem described in this issue(#3604) will cause the entryLog file to be disordered, and will cause the IllegalArgumentException when parsing the entrylog file.

Master Issue:
#3607 : Describes the problem to be solved by this pr.
#3604 : Describes the phenomenon and causes of disordered entrylog files.

dlg99 and others added 30 commits May 9, 2021 18:33
…ite2/**

### Motivation

gradle files weren't included into the source artifact of the release
site2/ was included.

### Changes

Added gradle into the include pattern.
Added site2 into the exclude patterns.



Reviewers: Enrico Olivelli <eolivelli@gmail.com>

This closes apache#2714 from dlg99/gradle-release
### Motivation
After add label for prometheus metric by apache#2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow.
```
replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN
```

### Modification
1. add label empty check for `PrometheusTextFormatUtil`
2. add label scope check test cover
3. add prometheus metric regex pattern check in test case

Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits:

8590704 [hangc0276] format code
a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check
bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/**
732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration
b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14
8dc108b [Matteo Merli] y
73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable
034ef85 [Shoothzj] Fix logger member not correct; (apache#2605)
b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (apache#2658)
683ad45 [Matteo Merli] Allow to attach labels to metrics (apache#2650)
8091096 [Matteo Merli] Allow to bypass journal for writes (apache#2401)
63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (apache#2710)
87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (apache#2707)
### Motivation

4.14.0 release 

### Changes

followed https://bookkeeper.apache.org/community/release_guide/#verify-docker-image and used apache#2171 as example



Reviewers: Enrico Olivelli <eolivelli@gmail.com>

This closes apache#2719 from dlg99/bctests-4_14
* Docs and relese notes update for BK 4.1.4.0

* Updated with recently merged PRs
### Motivation

Bookie was previously a concrete class that was used and abused all
over the place, especially in tests. A classic example of the God
object antipattern. The extensive use in tests, resulted in test cases
which spin up many instances of the whole system, which is very heavy
and very slow, especially when trying to unit tests a particular
feature.

This change is the first step to resolving this situation. Bookie is
now an interface, implemented by BookieImpl. Subsequent changes will
break out parts of the interface, cleanup calls and add dependency
injection.

Reviewers: Matteo Merli <mmerli@apache.org>, Andrey Yegorov, Henry Saputra <hsaputra@apache.org>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#2717 from pkumar-singh/merge_back_to_oss and squashes the following commits:

3edee49 [Prashant] Replace SettableFuture  with CompletableFuture
db69026 [Ivan Kelly] Turn Bookie into an interface
### Motivation
BookKeeperClusterTestCase has historically exposed its members to all
subclasses, which would then manipulate them in many ways. There was
an array of objects for configurations, bookieServers, autorecovery,
which implicit linking between the objects based on maps and indices.
Individual subclasses manipulated these arrays.

This makes it hard to add any dependency injection on the objects
managed by BookKeeperClusterTestCase as the objects. To add DI, we
need each object to have a bunch of other objects associated with
it. For example, for each Bookie, we need to create the
Journal. Maintaining these in separate arrays will lead to fragile
tests.

This change encapsulates all the testing objects in a per bookie
object, and only allows manipulation through methods. This will allow
us to group the objects needed for DI clearly.

Disable testFollowBookieAddressChangeTrckingDisabled

Reviewers: Henry Saputra <hsaputra@apache.org>, Matteo Merli <mmerli@apache.org>

This closes apache#2723 from pkumar-singh/refactor_BookKeeperClusterTestCase and squashes the following commits:

47fb812 [Prashant] Addressed code review comments
7f410ce [Ivan Kelly] Encapulate members of BookKeeperClusterTestCase
c1847af [Prashant Kumar] Turn Bookie into an interface
9417b68 [Andrey Yegorov] site update for release 4.14, this time actually updating the latest version in _config.yml (apache#2722)
* Site updates and release notes for 4.14.1

* Site updates
### Motivation

While restoring a storage container, we fetch the checkpoint from the
checkpoint store. Currently this checkpoint will never get cleaned up. Every
time we restore the storage container on pod, a new checkpoint will get added.
Over period of time the disk usage keeps going up and eventually we have to
manually delete these stale checkpoints.

### Changes

With this change, we will cleanup the local storage for a storage container
whenever we close the KVStore. This will ensure that stale checkpoints are not
left behind. It is possible that POD may restart before the cleanup can be
done. To avoid these, we will also ensure that local storage for the storage
container is cleaned up before we restore the storage container.


Reviewers: Henry Saputra <hsaputra@apache.org>

This closes apache#2739 from sursingh/storage-cleanup and squashes the following commits:

6bd4776 [Surinder Singh] Clean local storage for storage containers
c746836 [Surinder Singh] Add test case for local storage cleanup
### Motivation

'conif' seems to be a typo of `conf`.

### Changes

Fixed the typo.
### Motivation

I was trying to read the code to see how auto-recovery service works, just noticed the method `printUsage` was never called and the program doesn't print the usage even when '-h' is specified.

### Changes

Print the usage of auto-recovery if '-h' is specified or additional unexpected arguments are provided when running the auto-recovery command.
Fixes apache#2733

### Motivation

The okhttp dependency version 2.7.4 is old and vulnerable. This dependency isn't needed and it causes Bookkeeper to be flagged for security vulnerabilities.

### Changes

- exclude grpc-okhttp dependency which pulls in okhttp 2.7.4 
- update license files
---

Fixes apache#2699

*Motivation*

Add missing configuration `tlsCertificatePath` in the configuration file
and website description
…sion (apache#2708)

### Motivation

We need a way to use old bookie which doesn't support metadata-version-3 and upgrade bookkeeper-client with latest version. Right now, bk-4.12 writes ledger metadata with version-3 which will not be supported by bookie and replicator and replicator fails with below error
```
2021-05-07.00:43:53.115 [main-EventThread] ERROR
                o.a.b.meta.AbstractZkLedgerManager   - Could not parse ledger metadata for ledger: 123456
java.io.IOException: Metadata version not compatible. Expected between 0 and 2, but got 3
        at org.apache.bookkeeper.client.LedgerMetadata.parseConfig(LedgerMetadata.java:465)
        at org.apache.bookkeeper.meta.AbstractZkLedgerManager$3.processResult(AbstractZkLedgerManager.java:414)
        at org.apache.bookkeeper.zookeeper.ZooKeeperClient$19$1.processResult(ZooKeeperClient.java:994)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:572)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)

```
So, until one upgrades the bookie version, allow bk-client to use the appropriate metadata version.
)

Fixes apache#2511

### Motivation

See apache#2511

The current vertx version is 3.5.3 which has a vulnerability, CVE-2018-12541 .

### Changes

- Upgrade [vertx version to 3.9.8](https://github.com/eclipse-vertx/vert.x/releases/tag/3.9.8)
- Fix issue with deprecated API usage
Fixes apache#2732

### Motivation

- Freebuilder 1.14.9 contains an outdate jquery js file which causes the library to be flagged as vulnerable with the highest threat level in Sonatype IQ vulnerability scanner. This also flags Bookkeeper and Pulsar as vulnerable with the highest threat level although it is a false positive and not an actual threat.

- Freebuilder shouldn't be exposed as a transitive dependency
  - it's an annotation processor which should be defined
    - [optional in maven](https://github.com/inferred/FreeBuilder#maven)
    - [compileOnly in gradle](https://github.com/inferred/FreeBuilder#gradle)

### Changes

- upgrade [Freebuilder](https://github.com/inferred/FreeBuilder) from 1.14.9 to 2.7.0
- make dependency optional in maven pom.xml
- use `compileOnly` instead of `implementation` in gradle build

Reviewers: Sijie Guo <None>

This closes apache#2734 from lhotari/lh-fix-freebuilder-dependency-issue


### Motivation

More details are provided in [Pulsar # 10937](apache/pulsar#10937).

In apache#2631, the default BouncyCastle was changed from non-fips into fips version. But the default version of BouncyCastle in Pulsar is the [non-fips](https://github.com/apache/pulsar/blob/v2.8.0/pulsar-client/pom.xml#L56) one(aimed to make it compatible with the old version of Pulsar). 

Bouncy Castle provides both FIPS and non-FIPS versions, but in a JVM, it can not include both of the 2 versions(non-Fips and Fips), and we have to exclude the current version before including the other. This makes the backward compatible a little hard, and that's why Pulsar has to involve an individual module for [Bouncy Castle](https://pulsar.apache.org/docs/en/security-bouncy-castle).

And if we want to start BookKeeper with TLS enabled through Pulsar's binary, it will meet the following error:
```
Exception in thread "main" java.lang.NoClassDefFoundError: org/bouncycastle/jcajce/provider/BouncyCastleFipsProvider
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:315)
	at org.apache.bookkeeper.common.util.ReflectionUtils.forName(ReflectionUtils.java:49)
	at org.apache.bookkeeper.tls.SecurityProviderFactoryFactory.getSecurityProviderFactory(SecurityProviderFactoryFactory.java:39)
	at org.apache.bookkeeper.proto.BookieServer.<init>(BookieServer.java:129)
	at org.apache.bookkeeper.server.service.BookieService.<init>(BookieService.java:52)
	at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:304)
	at org.apache.bookkeeper.server.Main.doMain(Main.java:226)
	at org.apache.bookkeeper.server.Main.main(Main.java:208)
Caused by: java.lang.ClassNotFoundException: org.bouncycastle.jcajce.provider.BouncyCastleFipsProvider
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	... 9 more
```

This fix is to use the reflection to get the loaded bc version to avoid the hard-coded bc version.

### Changes

Use the reflection to get the loaded bc version to avoid the hard-coded bc version
Add backward compatible test for bc-non-fips version
### Motivation

Fix issue [apache#3726](apache#2726)


### Changes

Add double quotation around `${JAVA}`  in `/bin/bookkeeper`.


Reviewers: Yong Zhang <zhangyong1025.zy@gmail.com>, Andrey Yegorov <None>, Sijie Guo <None>

This closes apache#2727 from Sunny-Island/fix-space-in-java-home, closes apache#2726
)

---

Master Issue: apache#2752

*Motivation*

As discussed at length in https://issues.apache.org/jira/browse/LEGAL-572
we found out that the chardet library used by requests library was a
mandatory dependency to requests and since it has LGPL licence, we
should not release any Apache Software with it.
Fixes apache#2512

### Motivation

See apache#2512 

The current libthrift version 0.12.0 has multiple vulnerabilities:
  - CVE-2019-0205 , CVE-2019-0210 , CVE-2020-13949

### Motivation

- Upgrade libthrift version to 0.14.1 and fix compilation errors
- exclude new transitive dependencies org.apache.tomcat.embed:tomcat-embed-core and javax.annotation:javax.annotation-api

Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Andrey Yegorov <None>

This closes apache#2695 from lhotari/lh-upgrade-libthrift
---

*Motivation*

Site updates and release note for 4.14.2
log the the tuple (namespace id, stream id, stream name) in RootStorageService getRange request.

### Motivation

Server request metrics are labeled with the stream id, extracted from the routing header.  The stream name
(aka "table name") is not available but more useful.  Rather than making a (cacheable) RPC request to fetch
the id -> name mapping in the metrics, logging the information allows one to find the name without requiring
admin access to the state store service.



Reviewers: Ivan Kelly <ivank@apache.org>, Enrico Olivelli <eolivelli@gmail.com>, Henry Saputra <hsaputra@apache.org>

This closes apache#2758 from mauricebarnum/log-stream-name and squashes the following commits:

4ef7ac0 [Maurice Barnum] cleanup: remove extraneous "final" declarations
284b643 [Maurice Barnum] state store: create and delete stream: log stream info
Descriptions of the changes in this PR:

Updated py client's version to 4.14

### Motivation

Preparing for the release
https://bookkeeper.apache.org/community/release_guide/#change-python-client-version
### Motivation

Added `TCP_USER_TIMEOUT` in Epoll channel config to limit the time a connection is left sending keepalives to a non-responding Bookie.

### Changes

The original issue reported that in scenarios where Bookies may go down unexpectedly and change their IP (e.g., Kubernetes), the Bookkeeper client may be left for some time attempting to connect with the old IP of the restarted Bookie (see apache#2482 for details). To prevent this problem from happening (in Epoll channels), we introduce the following changes:
- Epoll channels are now configured with `TCP_USER_TIMEOUT`. This parameter rules over the underlying TCP keepalive configuration (see https://datatracker.ietf.org/doc/html/rfc5482), which may be defaulted to retry for too long depending on the environment (e.g., 10-15 minutes in our experience).
- To prevent adding more configuration parameters, the existing `clientConnectTimeoutMillis` value in `ClientConfiguration` is the one used to set `TCP_USER_TIMEOUT` due to its similarity.

### Validation

We have reproduced the original testing environment in which this problem appears consistently:
- Cluster with 4 Bookies and 3 Kubernetes nodes, in addition to https://pravega.io which uses the Bookkeeper client.
- Deployed an application to do IO to Pravega (and therefore, to Bookkeeper).
- Periodically shut down a Kubernetes node, so Bookkeeper pods on it are restarted as well.

Considering this test procedure, without the proposed PR we consistently observe Bookkeeper clients getting stuck trying to contact with old IPs from Bookies. With this change, we confirmed via logs that the configuration change takes place and we have not been able to reproduce the original problem so far after performing multiple node reboots.

Master Issue: apache#2482


Reviewers: Flavio Junqueira <fpj@apache.org>, Enrico Olivelli <eolivelli@apache.org>

This closes apache#2761 from RaulGracia/issue-2482-close-idle-bookie-connection, closes apache#2482
Descriptions of the changes in this PR:



### Motivation
Benchmark tests were failing due to 

- Missing runtime test dependency of MetricsCore
- Lack of enough JVM memory.

### Changes

- Include metricsCore as testDependency
- Increase heap size of 4 GB.


Master Issue: apache#2640



Reviewers: Henry Saputra <hsaputra@apache.org>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#2777 from pkumar-singh/build_with_gradle
We have more than 100 labels and the merge script is not able to download the list of labels, resulting in the impossibility to merge PRs.

Modifications:
- download all the pages of labels
- remove the Python2 script

Reviewers: Anup Ghatage <ghatage@apache.org>, Henry Saputra <hsaputra@apache.org>

This closes apache#2776 from eolivelli/fix/merge-script-pagination
merlimat and others added 29 commits October 18, 2021 08:46
…2833)

* Eliminate direct ZK access in ScanAndCompareGarbageCollector

* Removed unused imports

* Fixed zk ACLs

* Addressed comments

* Fixed checkstyle
BP-44: USE metrics. A proposal for improving BookKeeper metrics so that operators can employ the USE method for diagnosing performance issues.


Reviewers: Henry Saputra <hsaputra@apache.org>, Andrey Yegorov <None>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#2835 from Vanlightly/BP-44-use-metrics and squashes the following commits:

8d9baab [Jack Vanlightly] Added link to USE method and listed each term of USE
5a0f67d [Jack Vanlightly] BP-44 USE metrics
a9b576d [Yunze Xu] Release semaphore when addEntry accepts the same entries (apache#2832)
148bf22 [Yun Tang] Ensure to release cache during KeyValueStorageRocksDB#closec (apache#2821)
4dc4260 [gaozhangmin] Heap memory leak problem when ledger replication failed (apache#2794)
a522fa3 [Raúl Gracia] Issue 2815: Upgrade to log4j2 to get rid of CVE-2019-17571 (apache#2816)
0465052 [Nicolò Boschi] Upgrade httpclient from 4.5.5 to 4.5.13 (apache#2793)
594a056 [Raúl Gracia] Issue 2795: Bookkeeper upgrade using Bookie ID may fail due to cookie mismatch (apache#2796)
354cf37 [Raúl Gracia] Upgraded dependencies with CVEs (apache#2792)
e413c70 [Raúl Gracia] Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option (apache#2779)
883231e [pradeepbn] Building bookkeeper with gradle on java11
…ache.bookkeeper.bookie. org.apache.bookkeeper.client org.apache.bookkeeper.replication org.apache.bookkeeper.tls. (apache#2812)

Co-authored-by: Prashant Kumar <prashantk@splunk.com>
…ver (apache#2788)

Error log:

`16:21:20.140 [main] ERROR org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to initialize DNS Resolver org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping, used default subnet resolver : java.lang.RuntimeException: java.lang.NullPointerException java.lang.NullPointerException`

`BookieAddressResolver` should be set before  `((Configurable) dnsResolver).setConf(conf);`  

It will throw npe. when pulsar `ZkBookieRackAffinityMapping` invoke getBookieAddressResolver
Co-authored-by: Prashant Kumar <prashantk@splunk.com>
### Motivation
- Issue is as described in [PR#2797](apache#2797).
> In one day, zookeepers became high cpu usage and disk full.
> The cause of this is bookie's gc of overreplicated ledgers.
> Gc created/deleted zk nodes under /ledgers/underreplication/locks very frequently and some bookies ran gc at same time.
> As a result, zookeepers created a lot of snapshots and became disk full.

- I want to reduce the number of lock node creations and deletions in ZK.

### Changes
- Add an ensemble check before creating the lock node.
This is to reduce the number of lock node creations and deletions in ZK.

- ~~If [PR#2797](apache#2797) was merged, this PR needs to be fixed.~~
* Forget to close preAllocator log on shutdown

* Fix synchronize problem

* handle InterruptedException
Co-authored-by: Prashant Kumar <prashantk@splunk.com>
* Remove direct ZK access for Auditor

* Fixed unused imports

* Fixed checkstyle

* Fixed checkstyle in tests
…apache#2844)

### Motivation
For each ledger whose metadata is not in ZK, following stack trace will be output:

```
15:30:17.925 [GarbageCollectorThread-11-1] ERROR o.a.b.b.ScanAndCompareGarbageCollector - Exception when iterating through the ledgers to check for over-replication
java.util.concurrent.ExecutionException: org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at org.apache.bookkeeper.bookie.ScanAndCompareGarbageCollector.removeOverReplicatedledgers(ScanAndCompareGarbageCollector.java:199)
        at org.apache.bookkeeper.bookie.ScanAndCompareGarbageCollector.gc(ScanAndCompareGarbageCollector.java:120)
        at org.apache.bookkeeper.bookie.GarbageCollectorThread.doGcLedgers(GarbageCollectorThread.java:372)
        at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:323)
        at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:301)
        at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists
        at org.apache.bookkeeper.meta.AbstractZkLedgerManager$3.processResult(AbstractZkLedgerManager.java:397)
        at org.apache.bookkeeper.zookeeper.ZooKeeperClient$19$1.processResult(ZooKeeperClient.java:994)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:575)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
```

It is noisy, makes the size of log files large and finally causes OOM during log rotation.
So we should suppress the stacktrace.

(This problem is due to [apache#2813](apache#2813).)

### Changes
Add error handling to readLedgerMetadata in over-replicated ledger GC in order to suppress the stacktrace.
…#2848)

* Remove deprecated Jenkinsfile

* Remove Jenkins job, Travis refs, update doc/website for contributions

* readd newline
…Counter (apache#2839)

* Addition of thread-scoped stats

The Counter and OpStatsLogger have new variants that
add threadPool and thread labels to their metrics.
These new variants can be obtained via new methods
in the StatsLogger interface.
### Motivation

In order to complete migration to Gradle we must build all the subprojects.

### Changes

- Enabled `sh` integration tests with gradle, located in `tests/scripts/src/test/bash/gradle`
- Added these modules to the build
    - `bookkeeper-http:servlet-http-server` 
    - `metadata-drivers:etcd`
    - `tests:backward-compat:*`
    - `tests:shaded:*`
    - `stream:bk-grpc-name-resolver`
- DL shading process is now performed (before it didn't build any jar)
- Groovy tests (`tests:backward-compat:*`) now are triggered by the build/tests itself; with Maven, there is a "runner" project (`tests/integration-tests-base-groovy`); in Gradle is useless so it is skipped


### Test

- Both `bin/bookkeper standalone` and `bin/bookkeper_gradle standalone` work locally
- Tests are passing locally  

Master Issue: apache#2849



Reviewers: Henry Saputra <hsaputra@apache.org>, Prashant Kumar <None>

This closes apache#2850 from nicoloboschi/fix/2849/gradle and squashes the following commits:

00b49f4 [Nicolò Boschi] Fix common_gradle.sh regex
bd739fd [Nicolò Boschi] fix sh tests
43230ba [Nicolò Boschi] revert sh files. Avoid to modify maven files, create gradle versions to faciltate migration
d1f95e4 [Nicolò Boschi] fix shaded deps
bcab40d [Nicolò Boschi] fix build
5fd0341 [Nicolò Boschi] fix build
0082e0e [Nicolò Boschi] fix build
2c32ac1 [Nicolò Boschi] fixes
3bc0b26 [Nicolò Boschi] bookkeeper-server-shaded-tests
ba89132 [Nicolò Boschi] shaded tests
6d39e33 [Nicolò Boschi] sh tests
e0032bc [Nicolò Boschi] actually run arquillian groovy tests
08dcc39 [Nicolò Boschi] backwards
2361f79 [Nicolò Boschi] hierarchical-ledger-manager
8388e11 [Nicolò Boschi] current-server-old-clients
6a24344 [Nicolò Boschi] bc-non-fips
2faca01 [Nicolò Boschi] bk-grpc-name-resolver
991bc11 [Nicolò Boschi] servlet-http-server
675ef7b [Nicolò Boschi] etcd
b1d5e14 [ZhangJian He] A empty implement in EtcdLedgerManagerFactory to let the project can compile (apache#2845)
bd5c50b [shustsud] Add error handling to readLedgerMetadata in over-replicated ledger GC (apache#2844)
746f9f6 [Matteo Merli] Remove direct ZK access for Auditor (apache#2842)
4117200 [ZhangJian He] the compare should be >= instead of > (apache#2782)
14ef56f [Prashant Kumar] BookieId can not be cast to BookieSocketAddress (apache#2843)
e10f3fe [ZhangJian He] Forget to close preAllocator log on shutdown (apache#2819)
53954ca [shustsud] Add ensemble check to over-replicated ledger GC (apache#2813)
919fdd3 [Prashant Kumar] Issue:2840 Create bookie shellscript for gradle (apache#2841)
031d168 [gaozhangmin] fix-npe-when-pulsar-ZkBookieRackAffinityMapping-getBookieAddressResolver (apache#2788)
3dd671c [Prashant Kumar] Migrate bookkeepr-server:test to gradle run unit tests excepts org.apache.bookkeeper.bookie. org.apache.bookkeeper.client org.apache.bookkeeper.replication org.apache.bookkeeper.tls. (apache#2812)
f6903b8 [Jack Vanlightly] BP-44 USE metrics
a4afaa4 [Matteo Merli] Eliminate direct ZK access in ScanAndCompareGarbageCollector (apache#2833)
a9b576d [Yunze Xu] Release semaphore when addEntry accepts the same entries (apache#2832)
148bf22 [Yun Tang] Ensure to release cache during KeyValueStorageRocksDB#closec (apache#2821)
4dc4260 [gaozhangmin] Heap memory leak problem when ledger replication failed (apache#2794)
a522fa3 [Raúl Gracia] Issue 2815: Upgrade to log4j2 to get rid of CVE-2019-17571 (apache#2816)
0465052 [Nicolò Boschi] Upgrade httpclient from 4.5.5 to 4.5.13 (apache#2793)
594a056 [Raúl Gracia] Issue 2795: Bookkeeper upgrade using Bookie ID may fail due to cookie mismatch (apache#2796)
354cf37 [Raúl Gracia] Upgraded dependencies with CVEs (apache#2792)
e413c70 [Raúl Gracia] Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option (apache#2779)
883231e [pradeepbn] Building bookkeeper with gradle on java11
…mand (apache#2870)

### Motivation
When we use `bin/bookkeeper shell recover bookieId` command to recover specific bookie's ledgers, the recover process will exit when occurs recover ledger failed.

In our production bookkeeper cluster, we found some ledgers in Open state and has no entry.  When we call `bin/bookkeeper shell recover bookieId` command, it will traverse all the ledgers level by level. In the end, for each ledger, it will call the following code to process recover.
```Java
Processor<Long> ledgerProcessor = new Processor<Long>() {
            @OverRide
            public void process(Long ledgerId, AsyncCallback.VoidCallback iterCallback) {
                recoverLedger(bookiesSrc, ledgerId, dryrun, skipOpenLedgers, skipUnrecoverableLedgers, iterCallback);
            }
};
```

In the `recoverLedger` method, it will call `asyncOpenLedgerNoRecovery` to open ledger and get LAC if the ledger in `OPEN` state. For the `getLAC` request, if the request ledger has no entry, it will return entry = -1 and return ERROR for this `getLAC` request.
https://github.com/apache/bookkeeper/blob/98ddf8149592572eebcfaf6bdd4916f295ffd9d7/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java#L756-L769

And for the `asyncOpenLedgerNoRecovery` callback, it will return error for this process. It will stop the recover process of the following ledgers.

In the end, the recover command runs failed, and the following ledger can't be recovered.

### Changes
We should expose a flag for user to determine whether to move forward to recover the following ledgers when some ledgers recover failed.

So, I provide the parameter `sku` to handle this case.
…urs (apache#2860)

Return too many requests error when there is OperationRejectedException which occurs because of internal resource saturation
To have DI on the bookie, it's better to have a single constructor
that takes all the injected implementations. With a single
constructor, there's only one place modify in production code when we
want to inject something, rather than having the bookie need to know
how to create a "default" version (which often breaks encapsulation).

If we need convenience constructors for tests, they should live in the
tests.

Co-authored-by: Ivan Kelly <ikelly@splunk.com>
---

*Motivation*

Fix typo in DBLedgerStorage. `getLedgerSorage` -> `getLedgerStorage`
### Motivation

Release note for 4.14.3
@1559924775 1559924775 closed this Nov 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.