Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust SDK #21089

Open
damccorm opened this issue Jun 4, 2022 · 43 comments
Open

Rust SDK #21089

damccorm opened this issue Jun 4, 2022 · 43 comments

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 4, 2022

It would be great to have Rust SDK in order to create very high-performant yet safe pipelines.

Imported from Jira BEAM-12658. Original Jira may contain additional context.
Reported by: ­­­.

@gauravchak
Copy link

Thanks @bvolpato for pointing me to this #21089
This would be extremely useful for us at Discord. We use Rust heavily and we are using Beam (using python) for data processing. To reduce training serving skew this would be very beneficial.

@brucearctor
Copy link
Contributor

brucearctor commented Oct 31, 2022

@gauravchak , any... Contributions welcome! Do advise if need some resources to get started, for example a related talk: https://www.youtube.com/watch?v=VsGQ2LFeTHY

@nivaldoh
Copy link
Contributor

nivaldoh commented Nov 1, 2022

Hi, I would like to express interest in working on the Rust SDK. I'll create an incubator fork soon.

@nivaldoh
Copy link
Contributor

nivaldoh commented Nov 1, 2022

.take-issue

@nivaldoh
Copy link
Contributor

Work is underway here. Progress may be slow, and early code will look quite rough. I'll be really happy to receive any feedback or collaboration opportunities.

@dofinn
Copy link

dofinn commented Nov 12, 2022

@nivaldoh iv been looking for a reason to learn rust. Happy to take on any house keeping work that will slow you down

@nivaldoh
Copy link
Contributor

@dofinn I really appreciate the offer. Currently we have a few TODOs with improvement ideas which I'd be happy to describe in more detail, and we could also try to coordinate effort on the larger tasks as well (which I'm also planning to organize in the main README file) if you're interested. Feel free to open a PR directly in the fork or reach out to me by email (nivaldo.humbertoo@gmail.com) if you'd like.

@esadler-hbo
Copy link

esadler-hbo commented Dec 23, 2022

@nivaldoh thanks for doing this!

I will be an early adopter when you are ready for that.

@nivaldoh
Copy link
Contributor

@esadler-hbo thanks for the support!

There's still a lot of work to be done, and I try to keep an updated roadmap here if you or anyone else is interested in a quick overview of the current state of the implementation.

In particular, the user API is starting to take shape, and the snippet below (inspired by the new patterns set by the Typescript SDK) is now functional end-to-end:

let runner = DirectRunner::new();

// Impulse won't be exposed to the user, but it serves as a mock transform for now
let transform = Impulse::new();

runner.run(|root| root.apply(transform)).await;

To anyone interested, any early input on this format would be highly appreciated, as this is the current foundation that I'll be using for everything else. Any other contributions (including early code reviews, even if partial) would be awesome as well.

@TommyCpp
Copy link

I also like to help! I have some experience with Rust but am pretty new to Beam or large-scale data processing framework

@TommyCpp
Copy link

TommyCpp commented Jan 1, 2023

@nivaldoh I think reading the doc will only get me so far and I should probably start working on some implementation and see how it goes. Is there any coder/transform or other stuff you want my help with?

@robertwb
Copy link
Contributor

robertwb commented Jan 6, 2023

I just saw this, there's actually an effort to build a Rust SDK this week from the Dataflow team. What we have is at https://github.com/kennknowles/beam/tree/rust/sdks/rust ; it would be great to combine efforts. Though that one looks much further along.

@brucearctor
Copy link
Contributor

Awesome! 100% the right move to combine efforts.

@brucearctor
Copy link
Contributor

@robertwb and @kennknowles -- I'm glad you're looking into this. I have advised Nivaldo on strategies, with an eye on getting this to be something useful enough to warrant being merged into the proper project as another sdk. Your experience/knowledge attending to this will go a long way!

Maybe you two can dig a little into https://github.com/nivaldoh/beam/tree/rust_sdk and @nivaldoh can look at https://github.com/kennknowles/beam/tree/rust/sdks/rust -- suggesting that we wind only working on one or the other.

I wonder if it'd be easier to keep @kennknowles 's eyes on progress if developed in his repo? But, if the other is much further along might it better to jump into that [ @nivaldoh -- I assume you can give @kennknowles , @robertwb , and others relevant merge/commit permissions in your repo -- in things Beam they know their stuff and should absolutely be trusted ]. Else, it might then depend on the merge/migration path to get relevant bits into https://github.com/kennknowles/beam/tree/rust/sdks/rust, which I also imagine @nivaldoh might be open to taking on, as that could help solidify understandings, implementation.

I'll try to have a look to compare the repos over the weekend or sometime next week.

@nivaldoh - please advise on your thoughts/inclinations [ I've been sorta speaking for you, based on my read of you, motivation, inclinations when we've connected ]

@robertwb
Copy link
Contributor

robertwb commented Jan 6, 2023

IMHO, @nivaldoh's repo is further along, and better structured, so I think it makes sense to start there. In the next day or two we'll probably be pushing willy-nilly to the one at kennknowles, in the spirit of the hackathon to explore ideas, but next week I suggest we start creating pull requests to https://github.com/nivaldoh/beam/tree/rust_sdk to carry anything over that has value (and isn't already in the latter) and continue there.

@brucearctor
Copy link
Contributor

Sounds like a plan!

@nivaldoh
Copy link
Contributor

nivaldoh commented Jan 7, 2023

@TommyCpp Besides some of the smaller TODOs spread out around the codebase, adding new coders such as DoubleCoder (mirroring from the Typescript SDK) could be a great way option since their structure is a bit more organized at the moment and they can be reliably tested in isolation. However, considering the current discussions, the Dataflow team is going to implement a lot of things in an upcoming hackaton and likely introduce different approaches, so it might be better to wait until then.

@brucearctor I agree with all your points. Additionally, @robertwb and @kennknowles need no introduction for me, so I've already sent both of them invites for collaborator access on my repo. What I really intend is to grant them owner permissions but I'm not completely familiar with this sort of thing on GitHub, so please let me know if any access is still missing after this (as well as anyone else who might require access).

I'm quite happy that we might be able to use what I've done so far as the base repo for the initial stages of the Rust SDK, but if it turns out to be a better idea to move the code there into https://github.com/kennknowles/beam/tree/rust/sdks/rust instead as @brucearctor mentioned, I'd be more than happy to make any adaptations necessary so that we may continue from there.

I'll also keep an eye on the progress there over the next few days to see if there's anything that could be changed in advance inside my repo. There are plenty of minor (such as the current module structure forcing me to import certain libraries in more than one Cargo file) and not so minor (such as downcasting coders from Any and using their literal TypeId as a key) things that need to be restructured soon over here, so I'll be looking for ideas to improve them and speed up the merge as well.

@sjvanrossum
Copy link
Contributor

@nivaldoh As promised on the dev thread I've just opened a PR at nivaldoh#20 with some worker code changes as well as a container and boot script based on the existing SDK containers.
I assumed that Rust pipelines would typically be statically compiled like Go pipelines, so the boot script only looks for a single artifact file at the moment. The binaries must match between the launcher and worker if we were to use serde_traitobject to serialize the DoFns, I've got some additional changes coming up to provide some scaffolding for that.
The user binary needs to be able to switch between pipeline construction and pipeline execution mode, so there's an init function much like the Go SDK requires to run soon after the binary is started. That init function needs to be in a different place, but that would require restructuring the crates a bit I think. Happy to sync on that at some point, I think most of the framework code could live in an apache-beam crate and optional features could live in separate crates e.g. apache-beam-io-gcp/aws/azure.
The worker code I had started on uses a concurrent cache, such that we don't need to lock on the worker to interact with the caches and such that we can expire entries in the cache like the Java SDK does.
Looking forward to continue working on this with you!

@nivaldoh
Copy link
Contributor

nivaldoh commented Jan 8, 2023

@sjvanrossum Thanks a lot, the PR is now merged.

I had other issues with the current module/crate structure as I mentioned in my previous comment, and with the point you brought up about the binaries I think this is a good time to change that. I think the crate structure you proposed makes a lot of sense, so that's what I'll be aiming for.

I'll start looking into this in a more general manner to join all current modules into a single crate, but please let me know if this would disrupt or heavily overlap with the upcoming changes you mentioned.

Looking forward to continue working with you as well!

nivaldoh added a commit to nivaldoh/beam that referenced this issue Jan 8, 2023
This eliminates redundant imports and avoids need to create multiple binaries. Separate crates can be used for optional features such as apache-beam-io-[aws/azure/gcp], as suggested in apache#21089 (comment)
@Miuler
Copy link
Contributor

Miuler commented Jan 26, 2023

I'm interested in helping, I'm still new to Rust, but I'm already making my first contributions to the java SDK, and I wanted to do the same in Go, but seeing that you're getting started in Rust I'd like to join you.

@laysakura
Copy link
Contributor

laysakura commented Feb 6, 2023

@nivaldoh I'm also interested in using and contributing to https://github.com/nivaldoh/beam/tree/rust_sdk/sdks/rust.

I am the author of SpringQL, an in-memory and single-node streame processor written in Rust.
I'd like to support Beam as a programming model for newer versoin of SpringQL.

I will try to integrate the https://github.com/nivaldoh/beam/tree/rust_sdk/sdks/rust with our SpringQL and make some necessary changes to both repository.

@laysakura
Copy link
Contributor

Unfortunatelly, it seems that @nivaldoh's repository is inactive as of February 1st, 2023. There are 5 pull requests that have not been reviewed or merged.

image

To address this issue, I have created a fork of the repository. In my fork, I have:

  • hand-merged a topic branch from @robertwb
  • (wip) stopped using Any, and instead used generics for PTransform in-out parameters
  • made many other refactorings to make the code more Rust-like

I welcome any contributions to this repository.

@brucearctor
Copy link
Contributor

@laysakura -- thanks for keeping this moving!

@brucearctor
Copy link
Contributor

@nivaldoh - thanks for getting it started, and please continue to collaborate as makes sense

@dahlbaek
Copy link

dahlbaek commented Apr 6, 2023

I'm interested in helping. I have some experience with Rust and the Beam Python SDK, along with previous experience with big data frameworks like Spark and Scalding.

It would be awesome with guidance as to how/where to get started contributing 🤔 Should one just grep for TODOs in the fork by @laysakura and submit prs for review? Or maintain one's own fork and submit prs to the fork by @nivaldoh?

@brucearctor
Copy link
Contributor

@dahlbaek -- officially your questions are outside the scope of Beam project governance, since happening outside of the organization/official-repos.

To try to help keep things moving --> based on recent lack of activity from @nivaldoh , it seems more likely development to occur with @laysakura . Probably TODOs there, and/or maybe @laysakura will add some GH Issues or have other suggestions for concrete things that are bite-size enough for individuals to take on. In general, I believe the SDK will come together, so I would also imagine there is no shortage of things that could be accomplished around improved testing, automation, etc [ not to mention bug work, feature development, and more ]. I imagine PRs would be welcome by @laysakura ... but there could always be conversations in issues in https://github.com/laysakura/beam/tree/rust_sdk ...

All: I'm far from much of a Rust developer, but am happy to do what I can to ensure smooth collaboration and that we can eventually get this merged and as a proper Beam Rust SDK!

@laysakura
Copy link
Contributor

I'm happy to receive help from @dahlbaek. I'll start the conversation on laysakura#1.

@brucearctor, I appreciate your assistance. We will report our progress here. I also hope to collaborate with @nivaldoh again. If @nivaldoh becomes interested in Rust SDK again, I would be happy if you contacted me.

sjvanrossum pushed a commit to sjvanrossum/beam that referenced this issue Apr 12, 2023
This eliminates redundant imports and avoids need to create multiple binaries. Separate crates can be used for optional features such as apache-beam-io-[aws/azure/gcp], as suggested in apache#21089 (comment)
@Miuler
Copy link
Contributor

Miuler commented Apr 13, 2023

Desafortunadamente, parece que@nivaldohEl repositorio de está inactivo desde el 1 de febrero de 2023. Hay 5 solicitudes de incorporación de cambios que no se han revisado ni fusionado.

imagen

Para solucionar este problema, he creado una bifurcación del repositorio. En mi tenedor, tengo:

  • fusión manual de una rama de tema de @robertwb
  • (wip) dejó de usar Anyy, en su lugar, usó genéricos para los parámetros de entrada y salida de PTransform
  • hizo muchas otras refactorizaciones para hacer que el código fuera más parecido a Rust

Doy la bienvenida a cualquier contribución a este repositorio.

Ok, I understand that it is all new from the main project no? there is nothing from @nivaldoh's branch ?

@Miuler
Copy link
Contributor

Miuler commented Apr 13, 2023

What is the most fluid conversation channel? Telegram? matrix/element? discord? slack?

@laysakura
Copy link
Contributor

laysakura commented Apr 13, 2023

@Miuler

Ok, I understand that it is all new from the main project no? there is nothing from @nivaldoh's branch ?

laysakura/beam's rust_sdk branch is a fork from nivaldoh/beam's rust_sdk.
I manually merged the following PRs created in nivaldoh/beam.

I'm sorry for not merging your nivaldoh#24 because nivaldoh#25 might make FIXMEs in nivaldoh#24 unnecessary (also described in nivaldoh#25).

You may check the git history in laysakura/beam by yourself.

What is the most fluid conversation channel? Telegram? matrix/element? discord? slack?

laysakura#1 is.

@robertwb
Copy link
Contributor

Thank you @laysakura for rebooting this effort!

@laysakura
Copy link
Contributor

@robertwb Thank you for creating the basic mechanisms of Beam, such as ParDo and GBK.

@dahlbaek and I are now working on creating a more Rust-like, statically-typed pipeline based on your work. We would also appreciate your contribution to laysakura/beam based on your extensive experience with the TypeScript SDK.

@sjvanrossum
Copy link
Contributor

Oh, I seem to have missed some traffic on this issue. I received collaborator access to @nivaldoh's fork yesterday, but I'll move development over to @laysakura's fork. :)

I've got some work in progress for data channels on DataSource and DataSink and I'm currently drafting a change to serialization as mentioned on nivaldoh#22.

@laysakura
Copy link
Contributor

@sjvanrossum Thank you so much! I sent an invitation to add you as a collaborator of laysakura/beam.

@laysakura
Copy link
Contributor

I think it would be helpful to create design documents in order to align our goals and understanding. To start, I have written an initial version of a document titled "Custom Coders for the Beam Rust SDK".

A portion of the proposal outlined in the document has already been implemented and tested, as can be seen here: laysakura#30.

I would especially appreciate it if @sjvanrossum and @dahlbaek, who have recently collaborated with me on the laysakura/beam repository, could take a look and provide any comments or suggestions.

Of course, I welcome feedback from anyone.

@brucearctor
Copy link
Contributor

@laysakura and all: a little note to see whether interest has dwindled, or just other priorities, etc. It seems this had been taking shape and would be great to eventually get it in a state where we can get this merged into Beam.

@laysakura
Copy link
Contributor

@brucearctor For me, I still has interests but cannot prioritize Beam-related work in my company for a while 😞
It would be great if others lead the Rust SDK's development.

@brucearctor
Copy link
Contributor

@brucearctor For me, I still has interests but cannot prioritize Beam-related work in my company for a while 😞 It would be great if others lead the Rust SDK's development.

Makes sense -- our abilities to devote work time to various efforts does change over time.

Sounds like a call for Any/All that are interested to consider stepping in and helping/contribute!

@dahlbaek
Copy link

From my side I'm still interested in contributing, but I do it on my own time, and haven't had much to spare lately.

@brucearctor
Copy link
Contributor

From my side I'm still interested in contributing, but I do it on my own time, and haven't had much to spare lately.

@dahlbaek : Totally understandable, and one of the nice things about Open Source!

@sjvanrossum
Copy link
Contributor

sjvanrossum commented Nov 28, 2023

@brucearctor I have Coder/DoFn serialization and authoring functionality in the works that I've only been able to progress on again and off again, but and I'm happy to support contributors if they wish to contribute PRs (not sure if my remote branch is fully up to date, but I'll take a peek). My situation matches that of the other contributors, I've unfortunately had to prioritize my core role over this for the past few months.

@brucearctor
Copy link
Contributor

brucearctor commented Dec 30, 2023

Just found this --> https://github.com/swiftdiaries/beam-rust hadn't dug deep, unclear the extent of what inside it might be usable here.

@sjvanrossum
Copy link
Contributor

Seems like it didn't progress beyond "Hello, world!" unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests