Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions after reading the design ideas #2

Open
gly-hub opened this issue Mar 1, 2024 · 2 comments
Open

Some questions after reading the design ideas #2

gly-hub opened this issue Mar 1, 2024 · 2 comments
Labels
good first issue Good for newcomers

Comments

@gly-hub
Copy link

gly-hub commented Mar 1, 2024

1.What is the life cycle of each flow in a kafka scenario? If function is abnormal, should the data be discarded? If it should not be discarded, how should it be disposed of?

2.A lot of flow parallelism, how to solve the performance problem? I haven't figured out the right way yet.

@aceld
Copy link
Owner

aceld commented Mar 1, 2024

  1. For the first question, in Kafka, the lifecycle of data processing within a flow ends when all the Functions in that flow have completed execution. The handling of result data depends on how the developer chooses to manage it. If a Function encounters an exception, it signifies an error in the data processing flow. Developers can control whether to proceed to the next Function in the flow by invoking flow.Next() or flow.Next(Abort) within the Function. These actions are defined within the flow control mechanism of Kis-flow, as shown in the code here: https://github.com/aceld/kis-flow/blob/master/kis/action.go. Alternatively, developers can place erroneous data in a delayed processing message queue within the current Function where the error occurred and initiate another flow to handle this data. These decisions are based on the specific business logic of the streaming computation.

  2. Regarding the performance issues arising from parallel processing of numerous flows, let's consider Kafka as the data source. Each data source should correspond to a topic in Kafka. The manner in which data is consumed depends on the number of consumers, for example, if the Kisflow consumers are deployed on Kubernetes with 8 pods, Kafka should open 8 partitions for that particular topic. When data is produced to Kafka, it should be hashed based on the data's primary key to distribute it across different partitions, ensuring data ordering.
    With this setup, for a single flow, there will be 8 pods executing the streaming computation, each handling data associated with different primary keys while preserving data order. This approach effectively addresses performance concerns. I hope this addresses your inquiries.

@gly-hub
Copy link
Author

gly-hub commented Mar 1, 2024

Thanks for your reply, I probably have some ideas, and I will continue to pay attention to this project in combination with some application scenarios. Look forward to your follow-up updates, thank you for sharing.

@gly-hub gly-hub closed this as completed Mar 1, 2024
@aceld aceld added the good first issue Good for newcomers label Mar 1, 2024
@aceld aceld reopened this Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants