Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Engine] Adopt Ray as orchestration backend #14

Closed
1 of 2 tasks
Superskyyy opened this issue Jul 29, 2022 · 4 comments
Closed
1 of 2 tasks

[Engine] Adopt Ray as orchestration backend #14

Superskyyy opened this issue Jul 29, 2022 · 4 comments
Assignees
Labels
Core Core functionality that impacts the engine design Engine The work is on the engine side type: feature A feature to be implemented
Milestone

Comments

@Superskyyy
Copy link
Member

Superskyyy commented Jul 29, 2022

I've been experimenting with the pipeline using native multiprocessing and Redis Streams/RQ recently, and it quickly becomes messy when we spawn many processes.

So I'm evaluating Ray as the backend engine to orchestrate the streaming processing jobs while supporting batch learning that anomaly detection might utilize. By far, it looks promising.

The main benefit of Ray to us includes:

  1. Worker management (Redis Streams will be only for IN/OUT data queue, no longer a task queue),
  2. It's much lighter than Spark/Flink.
  3. Autoscaling.
  4. It has a UI to monitor some critical system metrics.

@Liangshumin @Fengrui-Liu FYI, there'll be some changes to the existing designs that I communicated over chat, please pay attention to the algorithm training part as Ray offers many out-of-the-box ML features.

  • For logs (clustering)
  • For metrics
@Superskyyy Superskyyy added type: feature A feature to be implemented Engine The work is on the engine side Core Core functionality that impacts the engine design labels Jul 29, 2022
@Superskyyy Superskyyy added this to the 0.1.0 milestone Jul 29, 2022
@Superskyyy Superskyyy self-assigned this Jul 29, 2022
@Superskyyy
Copy link
Member Author

Things I've tested:

  1. Pure multiprocessing with native queue - very low throughout.

  2. Redis streams + multiprocessing - fast but complex, it cannot be scaled or reduced easily.

  3. Redis task queue - high Redis overhead, weird to do stream processing.

  4. Current plan for Log data:

Source (OAP) ->
N*gRPC(Ingestors) ->
In Stream(Redis)->
Ray Actor(Stream Consumers) -> Maskers(Preprocessors) ->
Ready Stream(Redis)->
ML(Learners) ->
Out Stream(Redis)->
Ray Actor (Exporters) ->
Destination (OAP)

@Superskyyy
Copy link
Member Author

I'll complete a prototype to showcase the flow over this weekend.

@Superskyyy
Copy link
Member Author

POC: #23

@Superskyyy
Copy link
Member Author

Closing in favor of movement to Flink. New PoC is implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core Core functionality that impacts the engine design Engine The work is on the engine side type: feature A feature to be implemented
Projects
Status: Done
Development

No branches or pull requests

2 participants