Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for remote WAL based on Kafka #2722

Closed
6 of 8 tasks
MichaelScofield opened this issue Nov 9, 2023 · 8 comments · Fixed by #2993
Closed
6 of 8 tasks

Tracking issue for remote WAL based on Kafka #2722

MichaelScofield opened this issue Nov 9, 2023 · 8 comments · Fixed by #2993
Assignees
Labels
C-enhancement Category Enhancements tracking-issue A tracking issue for a feature.
Milestone

Comments

@MichaelScofield
Copy link
Collaborator

MichaelScofield commented Nov 9, 2023

What type of enhancement is this?

Feature

What does the enhancement do?

Tracking tasks:

Implementation challenges

No response

@tisonkun
Copy link
Contributor

Roughtly share some thoughts:

  1. For the client library, you may prefer rdkafka which is endorsed by Confluent and features in sync with the maintained librdkafka.
  2. For the log store API, with a prototype I find the following topics to be determined:

First, Entry's ID should be generalized if we keep this interface:

/// Create a new `EntryStream` to asynchronously generates `Entry` with ids
    /// starting from `id`.
    async fn read(
        &self,
        ns: &Self::Namespace,
        id: Id,
    ) -> Result<SendableEntryStream<Self::Entry, Self::Error>, Self::Error>;

Cause Kafka's seek takes both partition and offset (at least two integeters).

Second, we should decide how to map Namespace to Kafka's topic. Simply, config a base topic for the log store (wal_dir?), and create a new topic for each namespace (format!("{wal_dir}-{ns}")).

The other topics would be how much options we should include in the first version, and if we just let message retention to GC.

BTW, this doesn't seems a "Tech debt reduction" but a new feature?

@tisonkun
Copy link
Contributor

With rdkafka you can simply implement append with FutureProducer like:

        let topic = format!("BASE-TOPIC-{}", e.ns);
        let (partition, offset) = self
            .producer
            .send(
                FutureRecord::to(&topic).payload(&e.data).key(&e.id),
                Duration::from_secs(0),
            )
            .await?;
        Ok(todo_partition_offset_to_id(partition, offset))

and read with StreamConsumer, namespace manipulation with AdminClient<DefaultClientContext>.

@MichaelScofield
Copy link
Collaborator Author

@tisonkun thx for your suggestions! rdkafka is looking great. However, for the usage of kafka as a remote wal service, we prefer rskafka. See the reasons here.

As to the kafka topic and partition design, we are taking a very special way. Will illustrate it in the following RFC, stay tune if you are interested!

It's kind of "tech debt reduction" because it's a long awaiting feature that's planned since early days.

@tisonkun
Copy link
Contributor

However, for the usage of kafka as a remote wal service, we prefer rskafka. See the reasons here.

This makes sense. Good to know :D

@tisonkun
Copy link
Contributor

@MichaelScofield I noticed that part of the tracking items are checked. Are they implemented in a personal branch or privately?

@niebayes
Copy link
Contributor

@tisonkun The remote wal project is split into several tasks. Most of them are implemented and are waiting to be merged.
See: #2753

@evenyag evenyag changed the title [Tracking] Remote WAL based on Kafka Tracking issue for remote WAL based on Kafka Nov 24, 2023
@evenyag
Copy link
Contributor

evenyag commented Nov 24, 2023

The remote wal project is split into several tasks. Most of them are implemented and are waiting to be merged. See: #2753

@niebayes Could you please link your pull requests to the tasks? I think we should mark the item as checked after the related pull request is merged otherwise it is confusing. e.g. I can't find KafkaLogStore in our codebase but it is already checked.

@evenyag evenyag added the tracking-issue A tracking issue for a feature. label Nov 24, 2023
@niebayes
Copy link
Contributor

@evenyag Unmarked and linked related issues or pull requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category Enhancements tracking-issue A tracking issue for a feature.
Projects
None yet
4 participants