Some questions after reading the design ideas #2

gly-hub · 2024-03-01T04:23:02Z

1.What is the life cycle of each flow in a kafka scenario? If function is abnormal, should the data be discarded? If it should not be discarded, how should it be disposed of?

2.A lot of flow parallelism, how to solve the performance problem? I haven't figured out the right way yet.

aceld · 2024-03-01T05:38:09Z

For the first question, in Kafka, the lifecycle of data processing within a flow ends when all the Functions in that flow have completed execution. The handling of result data depends on how the developer chooses to manage it. If a Function encounters an exception, it signifies an error in the data processing flow. Developers can control whether to proceed to the next Function in the flow by invoking flow.Next() or flow.Next(Abort) within the Function. These actions are defined within the flow control mechanism of Kis-flow, as shown in the code here: https://github.com/aceld/kis-flow/blob/master/kis/action.go. Alternatively, developers can place erroneous data in a delayed processing message queue within the current Function where the error occurred and initiate another flow to handle this data. These decisions are based on the specific business logic of the streaming computation.
Regarding the performance issues arising from parallel processing of numerous flows, let's consider Kafka as the data source. Each data source should correspond to a topic in Kafka. The manner in which data is consumed depends on the number of consumers, for example, if the Kisflow consumers are deployed on Kubernetes with 8 pods, Kafka should open 8 partitions for that particular topic. When data is produced to Kafka, it should be hashed based on the data's primary key to distribute it across different partitions, ensuring data ordering.
With this setup, for a single flow, there will be 8 pods executing the streaming computation, each handling data associated with different primary keys while preserving data order. This approach effectively addresses performance concerns. I hope this addresses your inquiries.

gly-hub · 2024-03-01T07:45:31Z

Thanks for your reply, I probably have some ideas, and I will continue to pay attention to this project in combination with some application scenarios. Look forward to your follow-up updates, thank you for sharing.

gly-hub closed this as completed Mar 1, 2024

aceld added the good first issue Good for newcomers label Mar 1, 2024

aceld reopened this Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions after reading the design ideas #2

Some questions after reading the design ideas #2

gly-hub commented Mar 1, 2024

aceld commented Mar 1, 2024

gly-hub commented Mar 1, 2024

Some questions after reading the design ideas #2

Some questions after reading the design ideas #2

Comments

gly-hub commented Mar 1, 2024

aceld commented Mar 1, 2024

gly-hub commented Mar 1, 2024