-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: recover from crash #107
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Collaborator
superhx
commented
Sep 12, 2023
- Upload WAL remaining data.
- Close opening streams.
Signed-off-by: Robin Han <hanxvdovehx@gmail.com>
Signed-off-by: Robin Han <hanxvdovehx@gmail.com>
superhx
requested review from
SCNieh,
mooc9988,
Chillax-0v0,
TheR1sing3un and
amos201600
September 12, 2023 03:25
SCNieh
approved these changes
Sep 12, 2023
daniel-y
pushed a commit
that referenced
this pull request
Mar 5, 2024
* fix: fix checkstyle and add ci flow for checkstyle and spotbugs (#106) ci(s3): add checkstyle and spotbugs in ci 1. add checkstyle and spotbugs in ci 2. fix to pass checkstyle Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix: fix fetching problem brought by thread pool separating; fix style problems; add more thread for fetching and appending thread pool (#107) * fix: fix fetching problem brought by thread pool separating; fix style problems; add more thread for fetching and appending thread pool Signed-off-by: Curtis Wan <wcy9988@163.com> * remove 'SLOW_FETCH_TIMEOUT_MILLIS - 1' case in quickFetch test and change 2 -> 'SLOW_FETCH_TIMEOUT_MILLIS / 2' in slowFetch test Signed-off-by: Curtis Wan <wcy9988@163.com> * refactor: add more comments to make logic more clear Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * refactor: close #105; add more threads for partition opening or closing (#109) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat(es): client factory SPI (#112) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: close #108; make sure topics are all cleaned at the end of the test (#115) Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: close #110; shutdown additional thread pools right now when broker is in shutdown (#111) * fix: close #110; shutdown additional thread pools right now when broker is in shutdown Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: move thread pools into AlwaysSuccessClient Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: add more logs; fix partition leading or following Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: close #117; fix return too early; add stack for log closing (#118) Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: close #114; use position rather than offset as nextOffset for indexes; close metastream when log is closing (#116) * wip: add more logs Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: add ExceptionUtil; use position rather than offset as nextOffset for indexes Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: fix style check Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: close #119; skip deleting segments when metaStream is closed (#120) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat: close #123; Handle no-retryable exceptions thrown by Elastic Stream SDK (#124) * feat: close #123; Handle no-retryable exceptions thrown by Elastic Stream SDK. The corresponding partitions will be offline; Add context info to pass unit tests for cases that fetching just follows appending. Signed-off-by: Curtis Wan <wcy9988@163.com> * feat: enhancement codes refered in discussions Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: style problem Signed-off-by: Curtis Wan <wcy9988@163.com> * test: wait for more time for quick/slow fetch tests Signed-off-by: Curtis Wan <wcy9988@163.com> * test: start fetching after appending finished Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * test: add slowFetchDelay (#135) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat(core): Add implementation of AutoBalancer components and unit tests (#134) * feat(core): Add customized MetricsReporter and partition-level metrics - add partition level metrics for BytesInPerSec and BytesOutPerSec - implement CruiseControlMetrics to monitor and report interested Yammer metrics Closes #78 Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * feat(core): AutoBalancerMetricsReporter optimization - pre-aggregate for broker and partition level metrics - fill with empty value for probably missing metrics #78 Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * feat(core): Implement load retriever for auto balancer #77 Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * feat(core): Implement AutoBalancerManager and AnomalyDetector #77 Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * fix(core): Fix inconsistent in-flight request check Closes #121 Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * feat(core): Create metrics reporter topic on controller Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * fix(core): fix execute interval of ExecutionManager Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * fix: add esUnit tag to unit tests of autobalancer Signed-off-by: sc.nieh <s.c.ney516@gmail.com> --------- Signed-off-by: sc.nieh <s.c.ney516@gmail.com> --------- Signed-off-by: TheR1sing3un <ther1sing3un@163.com> Signed-off-by: Curtis Wan <wcy9988@163.com> Signed-off-by: Robin Han <hanxvdovehx@gmail.com> Signed-off-by: sc.nieh <s.c.ney516@gmail.com> Co-authored-by: TheR1sing3un <87409330+TheR1sing3un@users.noreply.github.com> Co-authored-by: Robin Han <hanxvdovehx@gmail.com> Co-authored-by: Shichao Nie <s.c.ney516@gmail.com>
daniel-y
pushed a commit
that referenced
this pull request
Mar 5, 2024
* feat: get broker opening streams Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: recover from crash (WAL buggy version) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com>
superhx
added a commit
that referenced
this pull request
Mar 5, 2024
…branch of automq kafka (#879) * test(s3): add commit wal object test 1. add commit wal object test Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix(s3): still commit valid stream in wal object 1. still commit valid stream in wal object Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): replay WalObjectRecord to advance range's end offset 1. replay WalObjectRecord to advance range's end offset Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): support new version stream open/close operations 1. support new version stream open/close operations Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * style(s3): pass checkstyle 1. pass checkstyle Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): support get streams' offset range 1. support get streams' offset range 2. remove useless CompactObject related request/response 3. refactor WalObjectRequest to contain compacted objects and stream objects 4. increase all stream related request/response apiKey Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): complete wal object commit process in controller 1. complete wal object commit process in controller Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): replay new added state in StreamImage 1. replay new added state in StreamImage Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): boostrap Kafka on S3 (#49) 1. boostrap with controller-metadata-manager Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat: merge ESK (#54) * fix: fix checkstyle and add ci flow for checkstyle and spotbugs (#106) ci(s3): add checkstyle and spotbugs in ci 1. add checkstyle and spotbugs in ci 2. fix to pass checkstyle Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix: fix fetching problem brought by thread pool separating; fix style problems; add more thread for fetching and appending thread pool (#107) * fix: fix fetching problem brought by thread pool separating; fix style problems; add more thread for fetching and appending thread pool Signed-off-by: Curtis Wan <wcy9988@163.com> * remove 'SLOW_FETCH_TIMEOUT_MILLIS - 1' case in quickFetch test and change 2 -> 'SLOW_FETCH_TIMEOUT_MILLIS / 2' in slowFetch test Signed-off-by: Curtis Wan <wcy9988@163.com> * refactor: add more comments to make logic more clear Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * refactor: close #105; add more threads for partition opening or closing (#109) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat(es): client factory SPI (#112) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: close #108; make sure topics are all cleaned at the end of the test (#115) Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: close #110; shutdown additional thread pools right now when broker is in shutdown (#111) * fix: close #110; shutdown additional thread pools right now when broker is in shutdown Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: move thread pools into AlwaysSuccessClient Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: add more logs; fix partition leading or following Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: close #117; fix return too early; add stack for log closing (#118) Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: close #114; use position rather than offset as nextOffset for indexes; close metastream when log is closing (#116) * wip: add more logs Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: add ExceptionUtil; use position rather than offset as nextOffset for indexes Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: fix style check Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: close #119; skip deleting segments when metaStream is closed (#120) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat: close #123; Handle no-retryable exceptions thrown by Elastic Stream SDK (#124) * feat: close #123; Handle no-retryable exceptions thrown by Elastic Stream SDK. The corresponding partitions will be offline; Add context info to pass unit tests for cases that fetching just follows appending. Signed-off-by: Curtis Wan <wcy9988@163.com> * feat: enhancement codes refered in discussions Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: style problem Signed-off-by: Curtis Wan <wcy9988@163.com> * test: wait for more time for quick/slow fetch tests Signed-off-by: Curtis Wan <wcy9988@163.com> * test: start fetching after appending finished Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * test: add slowFetchDelay (#135) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat(core): Add implementation of AutoBalancer components and unit tests (#134) * feat(core): Add customized MetricsReporter and partition-level metrics - add partition level metrics for BytesInPerSec and BytesOutPerSec - implement CruiseControlMetrics to monitor and report interested Yammer metrics Closes #78 Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * feat(core): AutoBalancerMetricsReporter optimization - pre-aggregate for broker and partition level metrics - fill with empty value for probably missing metrics #78 Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * feat(core): Implement load retriever for auto balancer #77 Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * feat(core): Implement AutoBalancerManager and AnomalyDetector #77 Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * fix(core): Fix inconsistent in-flight request check Closes #121 Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * feat(core): Create metrics reporter topic on controller Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * fix(core): fix execute interval of ExecutionManager Signed-off-by: sc.nieh <s.c.ney516@gmail.com> * fix: add esUnit tag to unit tests of autobalancer Signed-off-by: sc.nieh <s.c.ney516@gmail.com> --------- Signed-off-by: sc.nieh <s.c.ney516@gmail.com> --------- Signed-off-by: TheR1sing3un <ther1sing3un@163.com> Signed-off-by: Curtis Wan <wcy9988@163.com> Signed-off-by: Robin Han <hanxvdovehx@gmail.com> Signed-off-by: sc.nieh <s.c.ney516@gmail.com> Co-authored-by: TheR1sing3un <87409330+TheR1sing3un@users.noreply.github.com> Co-authored-by: Robin Han <hanxvdovehx@gmail.com> Co-authored-by: Shichao Nie <s.c.ney516@gmail.com> * feat(s3): support controller-kv client (#55) 1. support controller-kv client 2. change some logs' level from info to trace Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): replace object-id with order-id for WAL (#56) 1. replace object-id with order-id for WAL Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * refactor(s3): return first object id in preparedObjectResponse (#60) 1. return first object id in preparedObjectResponse Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(stream-client): optimize getting objects from `StreamMetadataManager` (#64) * refactor(s3): remove inflight wal objects 1. remove inflight wal objects 2. delete redundant classes Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): support blocking `getObjects` in `StreamMetadataManager` 1. support blocking `getObjects` in `StreamMetadataManager` 2. refactor `getObjects` 3. change the unit of `s3.cache.size` from `MB` to `B` Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): more suitable log level 1. more suitable log level Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix(s3): add more concurrent protection 1. add more concurrent protection Signed-off-by: TheR1sing3un <ther1sing3un@163.com> --------- Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): handle stream objects commit in controller (#71) 1. handle stream objects commit in controller Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): handle invalid commit request (#75) 1. handle invalid commit request Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat: get broker opening streams (#85) * feat: rename GetStreamsOffset to GetOpeningStreams Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: get broker opening streams Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: convert brokerId to int32 (#90) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(s3): unify S3 object metadata (#86) * feat(s3): unify S3 object metadata 1. unify S3 object metadata Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix(s3): fix after rebasing 1. fix after rebasing Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): log stream-objects' info when commit wal object 1. log stream-objects' info when commit wal object Signed-off-by: TheR1sing3un <ther1sing3un@163.com> --------- Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): support retry to-controller request (#87) 1. support retry to-controller request Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat: recover from crash (#107) * feat: get broker opening streams Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: recover from crash (WAL buggy version) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: fix WAL iterator & data position (#113) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: directly split one stream WAL (#122) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: fix controller check prepare object NPE (#124) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(s3): fixed bugs on uploading WAL object during compaction (#130) * fix(s3): fixed bugs on uploading WAL object during compaction - eliminate race condition on uploading WAL object by using sequential writing - prevent blocking on multipart upload when part size is less than MIN_PART_SIZE Signed-off-by: Shichao Nie <niesc@automq.com> * fix(s3): used pooled buffer in DataBlockWriter; add newFixedThreadPool to Threads Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * feat(s3): support trim-stream operation (#137) * feat(s3): add trim-stream operation protocol 1. add trim-stream operation protocol Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): support trim-stream operation 1. support trim-stream operation Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix(s3): minor fix 1. minor fix Signed-off-by: TheR1sing3un <ther1sing3un@163.com> --------- Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): delete destroyed object in S3 (#161) * feat(s3): delete destroyed object in S3 1. delete destroyed object in S3 Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): configuration about mock S3 operator 1. configuration about mock S3 operator Signed-off-by: TheR1sing3un <ther1sing3un@163.com> --------- Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): support trim beyond end offset (#162) 1. support trim beyond end offset Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): support continuity check when commit wal object (#166) * feat(s3): support continuity check when commit wal object 1. support continuity check when commit wal object 2. when trim offset larger than current end offset, still keep current range's end offset Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * style(s3): fix suppress warning 1. fix suppress warning Signed-off-by: TheR1sing3un <ther1sing3un@163.com> --------- Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix: fix stream open (#176) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(s3): support broker epoch (#179) * feat(s3): add broker epoch field in stream related request protocol 1. add `broker-epoch` field in stream related request protocol 2. replace `OpenStreamMetadata` with `StreamMetadata` and remove `OpenStreamMetadata.class` Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): send stream-related request with `broker-epoch` 1. send stream-related request with `broker-epoch` Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): check broker epoch 1. check broker epoch 2. fix wal object record replaying bug Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): handle response with broker epoch error 1. handle response with broker epoch error Signed-off-by: TheR1sing3un <ther1sing3un@163.com> --------- Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * refactor: extract stream part0 rename package (#186) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * refactor: extract s3stream to s3stream module (#192) * refactor: extract s3stream to s3stream module Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * refactor: fix netty conflict Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(s3): support delete stream (#191) * feat(s3): support delete stream 1. support delete stream 2. simple fix unclosed stream when close partition Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix(s3): fix checkstyle 1. fix checkstyle Signed-off-by: TheR1sing3un <ther1sing3un@163.com> --------- Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * style(wal): Code beautification and formatting (#196) * refactor(wal): Make `capacity` a required config Signed-off-by: Ning Yu <ningyu@automq.com> * style(wal): Remove `Throwable` in try-catch Signed-off-by: Ning Yu <ningyu@automq.com> * style(wal): Standardize logs and error messages Signed-off-by: Ning Yu <ningyu@automq.com> * refactor(wal): Make `nextOffset` align to `BLOCK_SIZE` Signed-off-by: Ning Yu <ningyu@automq.com> * docs(wal): Replace all Chinese comments to English Signed-off-by: Ning Yu <ningyu@automq.com> * fix spotbugs Signed-off-by: Ning Yu <ningyu@automq.com> --------- Signed-off-by: Ning Yu <ningyu@automq.com> * refactor: s3stream remove dependency to metadata (#198) * refactor: s3stream remove dependency to metadata Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: fix ut Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: await partition close (#230) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(s3Stream): make sure to read continuous data blocks when data is trimmed on compaction (#231) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(s3): support batch request (#233) * feat(s3): support batch request 1. support batch request Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix(s3): fix after merging 1. fix after merging Signed-off-by: TheR1sing3un <ther1sing3un@163.com> --------- Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix: fix potential infinite loop in logCache (#234) * fix(s3Stream): prevent infinite loop when lastBlockStreamStartOffset is less or equal than startOffset on LogCache#get0 Signed-off-by: Shichao Nie <niesc@automq.com> * fix(s3Stream): check and create auto balancer metrics topic before consumer start Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * feat: stream safe trim/close/destory (#235) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(s3): add object retention delay time (#245) 1. add object retention delay time Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix(s3): fix objectPart not set null in some cases; add logIdent for writer (#255) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat(core): add step control to auto balancer (#254) * feat(core): add step control to auto balancer Signed-off-by: Shichao Nie <niesc@automq.com> * style(s3Stream): fix checkstyle Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * fix(core): prevent reassigning partition to inactive broker (#263) * fix(core): prevent reassign partition to inactive broker Signed-off-by: Shichao Nie <niesc@automq.com> * style(core): fix checkstyle Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * feat(s3): optimize batch request (#246) * feat(s3): temp commit 1. temp commit Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * feat(s3): optimize batch request handle process 1. optimize batch request handle process 2. support kv related batch request Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix(s3): minor fix 1. minor fix Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix(s3): minor fix 1. minor fix Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix(s3): fix checkstyle 1. fix checkstyle Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * refactor(s3): remove `broker` concept in S3Stream, replace it with `node` 1. remove `broker` concept in S3Stream, replace it with `node` Signed-off-by: TheR1sing3un <ther1sing3un@163.com> --------- Signed-off-by: TheR1sing3un <ther1sing3un@163.com> Signed-off-by: Shichao Nie <niesc@automq.com> Co-authored-by: Shichao Nie <niesc@automq.com> * feat: handle S3client lifecycle (#264) * feat(s3Stream): add lifecycle control for stream client Signed-off-by: Shichao Nie <niesc@automq.com> * style(s3Stream): optimize log position Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * feat(s3Stream): use dedicated S3Operator for Compaction (#265) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(s3Stream): add trace log for S3 object write event (#266) Signed-off-by: Shichao Nie <niesc@automq.com> * feat: parallel partition op (#272) * feat: parallel partition operation Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: persist clean shutdown mark when shutdown Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: async append (#275) * feat: async create segment Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: async lazy stream Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: fix https://github.com/AutoMQ/kafka-on-s3/issues/276 Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: fix unit test async operation fail Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: fix bugspot Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: fix https://github.com/AutoMQ/kafka-on-s3/issues/277 Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: pre-allocate log&time stream (#279) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(s3): limit object count in one object-deletion request (#286) 1. limit object count in one object-deletion request Signed-off-by: TheR1sing3un <ther1sing3un@163.com> * fix: E2E (#294) * fix: remove hard_shutdown Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: clean topic Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: disable e2e auto create topic Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: disable e2e auto create topic Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: skip manual create consume offset Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: suppress wait topic delete fail Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: halt on KafkaException Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: log cleaner (#297) * fix: fix log cleaner issues/296 Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: disable E2E exit when fail Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: fix stream trim (#301) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: delete kv firstly when destroying topicPartition (#303) * fix: delete kv firstly when destroying topicPartition Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: roll back kos_test_suite.yml Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: fix log compaction read fail (#320) * fix: fix log compaction read fail Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: disable test_broker_failure unclean shutdown Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: avoid pre-load Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: update s3stream to 0.1.0-SNAPSHOT (#322) * feat: update s3stream to 5.1.3-SNAPSHOT Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * chore: update s3stream to 0.1.0-SNAPSHOT Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: show last N segments (#321) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat: limit max retry delay (#323) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: init `CompactionManager` using an explicitly created `S3Operator` (#325) Signed-off-by: Ning Yu <ningyu@automq.com> * fix: do not warn inactive brokers (#324) Signed-off-by: Curtis Wan <wcy9988@163.com> * refactor: unite s3Operator for compaction tasks (#329) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat(s3Stream): implements S3Stream metrics interface with yammer metrics (#333) * feat(autobalancer): make auto balancer consumer backoff time configurable Signed-off-by: Shichao Nie <niesc@automq.com> * feat(s3Stream): implements S3Stream metrics interface with yammer metrics 1. introduce yammer metrics implementation for S3Stream 2. implements metrics reporter to log S3Stream metrics periodically Signed-off-by: Shichao Nie <niesc@automq.com> * refactor(s3Stream): change S3Stream dependency to 0.1.5 Signed-off-by: Shichao Nie <niesc@automq.com> * style(s3Stream): remove unused import Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * fix: temp fix time index out of bound (#332) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(s3Stream): support log metrics delta in s3stream metrics reporter (#337) Signed-off-by: Shichao Nie <niesc@automq.com> * fix: fix issues312 - change topic name to id (#348) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: issues/312 treat KEY_NOT_EXIST as delete success (#349) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: do not delete kv if log apply failed (#350) * fix: only delete kv if creation failed Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: do not delete kv if log apply failed Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: issue352, set stream slice end when roll segment (#353) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: shutdown order (#359) * feat: update s3stream to 0.1.8-SNAPSHOT Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: s3stream client shutdown order Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(core): add configuration for compaction limitation (#356) * feat(core): add configuration for compaction limitation Signed-off-by: Shichao Nie <niesc@automq.com> * feat(core): change config name for compatibility Signed-off-by: Shichao Nie <niesc@automq.com> * build(core): upgrade s3stream dependency to 0.1.9 Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * feat(controller): check node match (#362) * feat(controller): commit WAL object check node match Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * chore: upgrade s3stream Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: fix unit test Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * chore: update s3stream to 0.1.11 (#365) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: save timeindex last entry in meta-stream; meta-stream compact i… (#357) * feat: save timeindex last entry in meta-stream; meta-stream compact in close; fix producer-snapshot saving Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: import control Signed-off-by: Curtis Wan <wcy9988@163.com> * refactor: add more comments and logs Signed-off-by: Curtis Wan <wcy9988@163.com> * refactor: roll back info in ElasticLog Signed-off-by: Curtis Wan <wcy9988@163.com> * refactor: check log level; add logIdent Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: style problem Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: concurrency problem in fetch (#370) (#375) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat(core): implement getStreams interface for StreamManager (#376) Signed-off-by: Shichao Nie <niesc@automq.com> * fix: build problem (#378) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat(s3stream): add network throttle config and upgrade to 0.1.16 (#379) * feat(s3stream): add network throttle config and upgrade to 0.1.16 Signed-off-by: Shichao Nie <niesc@automq.com> * build(s3stream): fix dependency typo Signed-off-by: Shichao Nie <niesc@automq.com> * fix(s3stream): fix compaction constructor Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * feat: index cache (#382) * feat(stream): add time index data cache Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(stream): integrate file cache to time index Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: fix unit test Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: add pid to cache file Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: set cache path Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: fetch async (#386) * refactor: reduce changes on raw codes Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: style problem Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: style problem; fix LEO in elasticLog fetch Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * fix(issu387): skip checkS3ObjectsLifecycle when not active (#388) * fix(issu387): skip checkS3ObjectsLifecycle when not active Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: fix unit test Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * chore(core): set default AutoBalancer network bandwidth to 100MB/s (#391) Signed-off-by: Shichao Nie <niesc@automq.com> * feat: limit inflight append read request (#392) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: remove read limiter (#394) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * chore: update s3stream to 0.1.18 (#404) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: retry request timeout (#408) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * chore(core): set default AutoBalancer report interval to 10s (#409) * chore(core): set default AutoBalancer report interval to 10s Signed-off-by: Shichao Nie <niesc@automq.com> * chore(core): set default s3 metrics reporter interval to 60s Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * fix(core): exclude broker from auto balancing when topic-partition is out of sync (#411) Signed-off-by: Shichao Nie <niesc@automq.com> * fix: extending problem (#406) (#412) * fix: extending problem (#406) Signed-off-by: Curtis Wan <wcy9988@163.com> * refactor: add comments Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * feat(s3stream): refine s3stream config (#414) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(s3stream): compatible with s3stream metrics name (#415) Signed-off-by: Shichao Nie <niesc@automq.com> * refactor: remove slow fetch hint; rename esUnit test (#422) Signed-off-by: Curtis Wan <wcy9988@163.com> * refactor(issue429): rename wal object to sst object (#430) * refactor(issue429): rename wal object to sst object Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * refactor: rename kafka config Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * refactor: rename SST to stream set object (#435) * refactor: adaptor s3stream 0.3.0 Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * refactor: rename config sst to stream set object Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * refactor: rename remaining sst to stream set object Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(issue440): get lastOffst from records (#441) * fix(issue440): get lastOffst from records Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: unit test Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(core): metrics refine & upgrade s3stream to 0.4.2-SNAPSHOT (#450) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(issus447): add failover controller (#454) * feat(issues447): add failover controller Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(issues447): add failover context image Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(issues447): add failover switch Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(issues447): test and fix (#455) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(s3stream): support thread pool status monitor (#456) Signed-off-by: Shichao Nie <niesc@automq.com> * fix: inject ak and sk in ConfigUtils; adjust KAFKA_JDK_COMPATIBILITY_OPTS (#459) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat(issues447): integrate serverless (#463) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(log): time metrics in nanos rather than millis (#464) Signed-off-by: Ning Yu <ningyu@automq.com> * feat(s3stream): upgrade to 0.5.5-SNAPSHOT (#467) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(issues471): isolate fast / slow read (#472) * feat(issues471): isolate fast / slow read Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: keep the style Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: temp skip always success client test Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(s3stream): update to 0.6.3 (#476) - fix(kafka_issues475): do not log when fast fail Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(issues447): add more failover log and update s3stream to 0.6.4 (#486) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(core): mark ElasticLogManager earlier to avoid local checkpoints (#494) Signed-off-by: Curtis Wan <wcy9988@163.com> * fix(core): update HW with LEO after creating log (#497) * refactor: change enable method name Signed-off-by: Curtis Wan <wcy9988@163.com> * fix(core): update HW with LEO Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * feat(issues500): fetch use direct bytebuf (#503) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(core): avoid reading when checking shouldRoll (#504) * fix(core): avoid reading when checking shouldRoll Signed-off-by: Curtis Wan <wcy9988@163.com> * refactor(core): log stream id and epoch Signed-off-by: Curtis Wan <wcy9988@163.com> * fix(tests): rolling is based on wall clock now Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * fix(issues505): get first batch timestamp from meta (#506) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(streamaspect): keep the rollingBaseTimestamp semantics (#511) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(s3): add s3 force style setting (#518) Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: Duplicate configs of network bandwidth [issues 385] (#517) * fix(s3stream): fix network util when capacity set to zero (#524) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(s3stream): initialize broker capacity to ignored value (#526) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(issues529): reduce group records to PooledMemoryRecords memory usage (#530) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: callback fail when partition closing (#536) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(issues543): use pooled bytebuf for PooledMemoryRecords (#545) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(issuses550): PooledMemoryRecords memory leak (#551) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(issues542): optimize S3ObjectsDelta memory usage (#547) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat: await partition shutdown (#552) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: fix response earlier release (#555) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(core): solve NPE problem; avoid compact if replay failed (#561) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat(s3stream): add OpenTelemetry support to s3stream metrics (#548) Signed-off-by: Shichao Nie <niesc@automq.com> * test(metadata): fix testDeleteTooManyOneRequest (#579) Signed-off-by: Curtis Wan <wcy9988@163.com> * feat: meta json backward compatibility (#587) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(issues525): accelerate recovery from unclean shutdown (#596) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(issues598): fix test_replication_with_broker_failure fail (#599) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(core): introduce s3stream tracing (#610) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(issues602): compress stream set object data (#612) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(core): add telemetry docker compose scripts (#613) Signed-off-by: Shichao Nie <niesc@automq.com> * chore: bump s3stream to 0.11.0-SNAPSHOT (#614) Signed-off-by: Ning Yu <ningyu@automq.com> * feat(core): change autobalancer capacity unit to bytes (#623) Signed-off-by: Shichao Nie <niesc@automq.com> * docs(docker): refine README.md for telemetry (#626) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(s3stream): support config metrics level (#627) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(metadata): replace image map to delta map (#629) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(core): support transforming jmx metrics to OTLP (#632) Signed-off-by: Shichao Nie <niesc@automq.com> * feat: bump s3stream to 0.14.0 (#637) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(core): support independent OTLP endpoint for trace (#638) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(core): filter topics label for BrokerTopicMetrics (#640) Signed-off-by: Shichao Nie <niesc@automq.com> * feat: stream without async (#641) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(s3stream): upgrade to 0.16.0-SNAPSHOT (#645) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(issues619): optimize checkpoint size (#646) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(telemetry): update grafana dashboard to fix label (#649) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(telemetry): fix mbeans match rules (#653) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(telemetry): support average stats for request time (#655) Signed-off-by: Shichao Nie <niesc@automq.com> * feat: optimize s3stream metadata image memory usage (#661) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(issues662): make default config adapt to 2c16g (#663) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * perf: return fast when the `fetchOffset` equals to the `confirmOffset` (#658) Signed-off-by: Ning Yu <ningyu@automq.com> * perf: check the last segment first to avoid calling `floorSegment` (#659) Signed-off-by: Ning Yu <ningyu@automq.com> * perf: allocate less in `readFromLocalLogV2` (#669) Signed-off-by: Ning Yu <ningyu@automq.com> * feat(tool): add admin tool to help user start AutoMQ easily (#670) Signed-off-by: KaimingWan <kaiming.wan@automq.com> * feat(core): remove consumer group management for ab consumer (#671) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(core): add telemetry to release package (#675) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(core): create internal topic on demand (#677) Closes #674 Signed-off-by: Shichao Nie <niesc@automq.com> * feat(log): force compact meta when close (#676) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * refactor(core): optimize scalability of autobalancer structure (#678) * refactor(core): optimize scalability of autobalancer structure Closes #667 Signed-off-by: Shichao Nie <niesc@automq.com> * style(core): remove unused imports Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * feat(core): add unit test for AnomalyDetector (#682) Signed-off-by: Shichao Nie <niesc@automq.com> * feat: atomic failover feature (#693) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * chore: bump s3stream to 0.18.0 (#698) Signed-off-by: Ning Yu <ningyu@automq.com> * feat(core): add kafka request time max metrics (#699) Signed-off-by: Shichao Nie <niesc@automq.com> * refactor: rename module (#705) Signed-off-by: KaimingWan <kaiming.wan@automq.com> * feat(issues665): clean up scale-in nodes' objects (#703) * feat(issues665): clean up scale-in nodes' objects Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: checkstyle Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(log): make txn read async to avoid deadlock (#708) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * refactor(core): simplify AutoBalancer goals (#712) - use network usage instead of utilization for balancing - remove capacity based balancing Signed-off-by: Shichao Nie <niesc@automq.com> * feat(core): represents raw metric types with single byte (#717) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(metrics): introduce group commit offset metrics (#719) * feat(metrics): introduce group commit offset metrics Signed-off-by: Shichao Nie <niesc@automq.com> * fix(metrics): remove commit offset metrics when group is dead Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * feat(metrics): add more comprehensive jmx rules (#724) Signed-off-by: Shichao Nie <niesc@automq.com> * fix: force to check AutoMQCreateTopicPolicy before creating a topic (#726) Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: let kshell init holder (#727) init holder in kshell * feat(metrics): support OTLP http exporter (#728) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(log): file cache (#729) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(transaction): fix the async callback may cause missing abort txn (#734) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(e2e): add client tool core tests (#730) * feat(e2e): add client tool core tests Signed-off-by: Curtis Wan <wcy9988@163.com> * fix(e2e): hard bounce for transaction test Signed-off-by: Curtis Wan <wcy9988@163.com> --------- Signed-off-by: Curtis Wan <wcy9988@163.com> * fix: support ecs role (#733) fix: now we use credential provider holder Signed-off-by: KaimingWan <kaiming.wan@automq.com> Co-authored-by: shiguanxiong <guanxiong.shi@automq.com> * fix(core): initialize goals before optimization (#737) Signed-off-by: Shichao Nie <niesc@automq.com> * refactor(core): extract common methods to abstract class (#739) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(log): file cache support merge put (#740) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * refactor(core): support customized ClusterModelSnapshot (#741) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(log): add txn index cache (#743) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(core): use partition metrics time as broker time (#744) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(log): timeindex api thread isolation (#745) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(core): refine exported metrics and dashboard (#748) - use job & instance labels to match cluster id, node type and instance id - append suffix to metrics name by default Signed-off-by: Shichao Nie <niesc@automq.com> * fix(metrics): fix network metrics label value (#749) Signed-off-by: Shichao Nie <niesc@automq.com> * fix: optimize and s3url and fix parse (#750) * fix: add deprecated tips Signed-off-by: KaimingWan <kaiming.wan@automq.com> * fix: make s3 ops bucket required Signed-off-by: KaimingWan <kaiming.wan@automq.com> * fix: remove useless field auth method Signed-off-by: KaimingWan <kaiming.wan@automq.com> * fix: rename parameter name Signed-off-by: KaimingWan <kaiming.wan@automq.com> * fix: fix parse s3url args issue when involve credential provider holder Signed-off-by: KaimingWan <kaiming.wan@automq.com> * fix: fix check style Signed-off-by: KaimingWan <kaiming.wan@automq.com> --------- Signed-off-by: KaimingWan <kaiming.wan@automq.com> * fix(issues754): fix consume aborted txn (#755) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: don't deprecate s3.wal.path, plus a minor polish of s3url (#753) Signed-off-by: daniel-y <daniel@automq.com> * fix(log): unit test (#756) fix(log): keep log the same pattern as kafka Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(log): full checkpoint based on dirty bytes (#760) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * chore: convert license to bsl to accelerate open source innovation (#761) Signed-off-by: daniel-y <daniel@automq.com> * refactor(config): remove useless configs in wal, and add iops config (#762) refactor: remove useless configs in wal, and add iops config Signed-off-by: Ning Yu <ningyu@automq.com> * fix(metrics): refine grafana dashboards (#759) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(metrics): fix type (#763) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(isssue764): ControllerRequestSender stuck (#766) * fix(isssue764): ControllerRequestSender stuck Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: unit test Signed-off-by: Robin Han <hanxvdovehx@gmail.com> --------- Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(metrics): change ot cardinality limit from 2000 -> 5000 (#767) * feat(metrics): change ot cardinality limit from 2000 -> 10000 Signed-off-by: Shichao Nie <niesc@automq.com> * style(metrics): remove unused imports Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * feat(dashboard): refine dashboards (#768) Signed-off-by: Shichao Nie <niesc@automq.com> * fix: rename broker-address to broker-list (#775) Signed-off-by: daniel-y <daniel@automq.com> * chore: change STREAM_NOT_CLOSED log level to WARN (#776) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(telemetry): make metrics dashboard compatible with Aliyun (#778) feat(telemetry): make metrics dashboard compatible with Aliyun prometheus Signed-off-by: Shichao Nie <niesc@automq.com> * fix(telemetry): fix OTel collector http endpoint (#780) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(auth): throw an exception when failed to create a credential from env (#793) Signed-off-by: Ning Yu <ningyu@automq.com> * feat(core): verify stream epoch for stream object commit (#796) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(core): prevent generate stream object record for noop object id (#797) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(issues798): checkpoint NPE (#800) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(issues801): stream trim only update stream metadata (#805) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(core): add metrics to monitor auto balancer metrics delay (#807) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(issues806): stream object leak (#808) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix: range end offset isn't revertable (#809) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * chore: rename s3ObjectRetention* to s3ObjectDeleteRetention for a more precise description (#810) Signed-off-by: daniel-y <daniel@automq.com> * fix: set destroyed object size (#811) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(metrics): present metrics from active controller only (#815) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(core): prevent anomaly detect exit on inactive controller (#816) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(issues817): txn index fetch out of bound (#818) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(core): add metrics to monitor s3 objects (#823) * feat(core): add metrics to monitor s3 objects Signed-off-by: Shichao Nie <niesc@automq.com> * feat(core): add s3 object panels to grafana dashboard Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * fix(core): record s3 object metrics on active controller only (#824) Signed-off-by: Shichao Nie <niesc@automq.com> * feat: add object ttl reach log (#825) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(core): catch exceptions on replaying records (#836) * fix(core): catch exceptions on replaying records Signed-off-by: Shichao Nie <niesc@automq.com> * feat(core): refactor AutoBalancerManager and fix unit tests Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * feat(core): refine grafana dashboards (#837) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(core): fix auto balancer metrics delay time calculation (#838) Signed-off-by: Shichao Nie <niesc@automq.com> * fix: log permanet fail (#839) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(core): redirect JUL log to sl4j and remove unused logging exporter (#843) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(core): fix node id regex in broker dashboard (#841) Signed-off-by: Shichao Nie <niesc@automq.com> * feat: record pooled record memory usage (#846) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * fix(metrics): add label 'version' to kafka.request.count (#847) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(telemetry): add host name to OTel resource (#849) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(metrics): metrics on fetch limiters and executors (#848) * feat(metrics): metrics on fetch limiters Signed-off-by: Ning Yu <ningyu@automq.com> * feat(metrics): metrics on fetch executors' queue size Signed-off-by: Ning Yu <ningyu@automq.com> * style: fix check style Signed-off-by: Ning Yu <ningyu@automq.com> --------- Signed-off-by: Ning Yu <ningyu@automq.com> * feat(metrics): add buffer and thread metrics (#851) * feat(telemetry): add direct memory panels (#853) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(telemetry): fix read ahead throughput panel unit (#854) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(metrics): rename DirectByteBufAlloc to ByteBufAlloc (#855) * fix(telemetry): fix memory allocation metrics name (#856) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(metrics): enable buffer pools metrics (#857) * fix(telemetry): fix jvm metrics (#859) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(telemetry): refine grafana dashboard (#860) - deduplication for consumer lag calculation on partition reassignment - add get/put object avg latency panels Signed-off-by: Shichao Nie <niesc@automq.com> * fix(telemetry): add missing percentile metrics (#865) * fix(telemetry): add missing percentile metrics Signed-off-by: Shichao Nie <niesc@automq.com> * fix(telemetry): increase metric expiration time for collector Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: Shichao Nie <niesc@automq.com> * fix(telemetry): fix topic dashboard name (#868) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(issues871): log request info for stream object compaction (#872) Signed-off-by: Robin Han <hanxvdovehx@gmail.com> * feat(core): merge multiple reassignments for same partition (#873) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(dashboard): add node level network metrics (#874) Signed-off-by: Shichao Nie <niesc@automq.com> * feat(dashboard): set query interval to 30s to match default metrics reporter interval (#875) Signed-off-by: Shichao Nie <niesc@automq.com> * fix(core): support alter auto balancer topic partitions (#876) Signed-off-by: Shichao Nie <niesc@automq.com> --------- Signed-off-by: TheR1sing3un <ther1sing3un@163.com> Signed-off-by: Curtis Wan <wcy9988@163.com> Signed-off-by: Robin Han <hanxvdovehx@gmail.com> Signed-off-by: sc.nieh <s.c.ney516@gmail.com> Signed-off-by: Shichao Nie <niesc@automq.com> Signed-off-by: Ning Yu <ningyu@automq.com> Signed-off-by: KaimingWan <kaiming.wan@automq.com> Signed-off-by: daniel-y <daniel@automq.com> Co-authored-by: TheR1sing3un <ther1sing3un@163.com> Co-authored-by: Robin Han <hanxvdovehx@gmail.com> Co-authored-by: TheR1sing3un <87409330+TheR1sing3un@users.noreply.github.com> Co-authored-by: Curtis Wan <wcy9988@163.com> Co-authored-by: Shichao Nie <s.c.ney516@gmail.com> Co-authored-by: Shichao Nie <niesc@automq.com> Co-authored-by: Yu Ning <78631860+Chillax-0v0@users.noreply.github.com> Co-authored-by: zhouyou9505 <zhouyou9505@gmail.com> Co-authored-by: KamiWan <kaiming.wan@automq.com> Co-authored-by: shiguanxiong <guanxiong.shi@automq.com> Co-authored-by: SSpirits <admin@lv5.moe>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.