Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
[WIP] KAFKA-7739: Tiered storage #7561
[WIP] This is the initial draft version of the KIP-405. It includes the initial set of changes required for plugging in a
KIP is located at https://s.apache.org/pk53b
This PR contains HDFS implementation to discuss
Committer Checklist (excluded from commit message)
Summary: 1. Add a RemoteLogIndexEntry "constructor", to allow calculating crc inside the RemoteLogIndexEntry class 2. Index entry length should be 16 bits 3. Allow RSM read the content of RDI Reviewers: harshach, satishd Reviewed By: harshach, satishd Subscribers: jenkins, #streaming_data Differential Revision: https://code.uberinternal.com/D2911255
Summary: 1. Implement HDFS RSM 2. Some changes to the RSM interface: 2.1 throw IOException in most methods 2.2 copyLogSegment do not need to return RDI, cause its already in RemoteIndexEntry 2.3 read method need RemoteIndexEntry in addition to RDI 3. Move RemoteLogIndex.parseEntry to RemoteLogIndexEntry.parseEntry. This method is also needed by HDFS RSM Reviewers: harshach, satishd Reviewed By: harshach Subscribers: jenkins, #streaming_data Differential Revision: https://code.uberinternal.com/D2960007
Summary: Added initial implementation of RLM follower - Refactored the existing indexes for RLM follower - Followup PRs will address remaining issues about boundary cases of adding indexes for conflicting offsets and scale up with fine-grained locking Reviewers: harshach, yingz Reviewed By: yingz Subscribers: jenkins Differential Revision: https://code.uberinternal.com/D2972601
…r. It includes inter broker fetch protol changes for sending local log offset when the requested offset is in remote tier. This will allow follower broker to know that the... Summary: Initial version of fetch implementation for RemoteLogManager. - It includes inter broker fetch protol changes for sending local log offset when the requested offset is in remote tier. This will allow follower broker to know that therequested offset is moved to remote tier and it can start fetching the messages available in local log from the leader. - Added test for RLM fetch API. - Fixed spotbug errors with recent additions. Reviewers: harshach, yingz Reviewed By: harshach, yingz Differential Revision: https://code.uberinternal.com/D3014447
Summary: 1. Copy dependent jars in the same way as Kafka core and the other modules do. 2. Remove the dependency on slf4japi, which is already included in Kafka core dependency Reviewers: satishd, harshach Reviewed By: satishd, harshach Subscribers: #streaming_data_kafka Differential Revision: https://code.uberinternal.com/D3035243
…mote tier storage - Fixed issues in finding the right local segments while copying to remote tier storage Summary: Local segments are scheduled to be deleted once they are copied to remote tier storage. - Fixed issues in finding the right local segments while copying to remote tier storage . Reviewers: harshach, yingz Reviewed By: harshach Subscribers: jenkins Differential Revision: https://code.uberinternal.com/D3046097
This commit adds a new variant of listRemoteSegments method to RemoteStorageManager, which has minBaseOffset parameter. This is useful for reducing listing time on the remote tier in some implementations (like S3).
…cleaningup expired remote log segments based on retention period by using tasks run at regular intervals. Summary: RLM enhancements for copying segments, syncing indexes and cleaningup expired remote log segments based on retention period by using tasks run at regular intervals. Added configs for task thread pool size and interval. Need to add remote log startoffset handling, which will be done in a followup PR/diff. Reviewers: harshach, yingz Reviewed By: harshach, yingz Subscribers: jenkins Differential Revision: https://code.uberinternal.com/D3211435
…h internal protocol schema. - Added serdes for the messages stored in remote log metadata topic with internal protocol schema - Refactored RLMMWithTopicStorage to be more modular.
…e recceived messages from remote log metadata topic. - Added tests for RLMM on leader and follower events from remote log metadata topic.
…mentMetadata arg only. - RemoteLogMetadataManager#putRemoteLogSegmentData to take RemoteLogSegmentMetadata arg only.
- Add an option to deactivate actual delete in the LocalRemoteStorageManager. Eases testing by allowing simulation of non-strongly consistent storage systems which do not guarantee visibility of a successful delete for subsequent read or list operations. - Add visitor to traverse locally emulated remote storage to provide support for test assertions. - Add local remote storage listener to allow tests to be notified of modifications in the storage; - Add snaphost support for the local remote storage. - Add waiter on local remote storage to allow tests to formulate expectations on asynchronously populated remote storage. - Set readOffset's initial value to -1 to handle the edge case of a single record for the first segment of a topic-partition. - Use the index size instead of log segment size to enable the creation of a single-record log segment. - Added another segment to be offloaded in base integration test. Couple of fixes in test assertions. - Add a test case for a segment with multiple records.
- Minor fixes in RLMMWithTopicStorage to call configure only once - Made requried fields to be volatile to avoid any stale reads being updated/read in different threads.
…uce-action in tiered storage tests - Support producing to a log segment already contained records in produce-action. - Generate basic test reports with one description per test action. - Add utility to dump files from local tiered storage. - Fix condition on local broker storage (filter out inactive and non-assigned brokers). Reviewed-by: firstname.lastname@example.org
…ename to provide it in local tiered storage dumps. Add the broker ID which offloaded a given fileset to metadata and filename to provide it in local tiered storage dumps. Reviewd-by: email@example.com
* Support batch size > 1 in tiered storage integration tests. * Enforce earliest offset in log directory equality rather than lower bound. * Add all records and offsets to the tiered storage content dump. * Explicit the records to be found in remote log segments in integration tests. Reviewed-by: Satish Duggana<firstname.lastname@example.org>