[Engine] Adopt Ray as orchestration backend #14

Superskyyy · 2022-07-29T01:02:32Z

I've been experimenting with the pipeline using native multiprocessing and Redis Streams/RQ recently, and it quickly becomes messy when we spawn many processes.

So I'm evaluating Ray as the backend engine to orchestrate the streaming processing jobs while supporting batch learning that anomaly detection might utilize. By far, it looks promising.

The main benefit of Ray to us includes:

Worker management (Redis Streams will be only for IN/OUT data queue, no longer a task queue),
It's much lighter than Spark/Flink.
Autoscaling.
It has a UI to monitor some critical system metrics.

@Liangshumin @Fengrui-Liu FYI, there'll be some changes to the existing designs that I communicated over chat, please pay attention to the algorithm training part as Ray offers many out-of-the-box ML features.

For logs (clustering)
For metrics

Superskyyy · 2022-07-29T04:16:52Z

Things I've tested:

Pure multiprocessing with native queue - very low throughout.
Redis streams + multiprocessing - fast but complex, it cannot be scaled or reduced easily.
Redis task queue - high Redis overhead, weird to do stream processing.
Current plan for Log data:

Source (OAP) ->
N*gRPC(Ingestors) ->
In Stream(Redis)->
Ray Actor(Stream Consumers) -> Maskers(Preprocessors) ->
Ready Stream(Redis)->
ML(Learners) ->
Out Stream(Redis)->
Ray Actor (Exporters) ->
Destination (OAP)

Superskyyy · 2022-07-29T04:19:02Z

I'll complete a prototype to showcase the flow over this weekend.

Superskyyy · 2022-09-16T15:11:07Z

POC: #23

Superskyyy · 2023-03-19T19:48:28Z

Closing in favor of movement to Flink. New PoC is implemented.

Superskyyy added type: feature A feature to be implemented Engine The work is on the engine side Core Core functionality that impacts the engine design labels Jul 29, 2022

Superskyyy added this to the 0.1.0 milestone Jul 29, 2022

Superskyyy self-assigned this Jul 29, 2022

Superskyyy assigned Liangshumin Aug 15, 2022

This was referenced Sep 9, 2022

[Algorithm] Drain3 raw log parsing potential enhancement #9

Open

[Algorithm] Parallelize Drain pre-processing and maintain state within single process #10

Closed

Superskyyy closed this as completed Mar 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Engine] Adopt Ray as orchestration backend #14

[Engine] Adopt Ray as orchestration backend #14

Superskyyy commented Jul 29, 2022 •

edited

Loading

Superskyyy commented Jul 29, 2022

Superskyyy commented Jul 29, 2022

Superskyyy commented Sep 16, 2022

Superskyyy commented Mar 19, 2023

[Engine] Adopt Ray as orchestration backend #14

[Engine] Adopt Ray as orchestration backend #14

Comments

Superskyyy commented Jul 29, 2022 • edited Loading

Superskyyy commented Jul 29, 2022

Superskyyy commented Jul 29, 2022

Superskyyy commented Sep 16, 2022

Superskyyy commented Mar 19, 2023

Superskyyy commented Jul 29, 2022 •

edited

Loading