-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Influx Sink can only handle primitive properties #2106
Comments
@bossenti Unfortunately Influx only allows primitive data types so I guess the original author who wrote that if statement expected no array input too. To bypass this limitation, we can for example define the following simple protocol to use strings to represent arrays: "isArray:1,val1,val2,val3,...". That being said, I don't quite agree with this work around since there may be real strings (as opposed to strings as arrays) that contain keywords such as "isArray". I think it is best to let users transform their data from arrays to other types and let them define their own protocol. |
thanks for investigating @muyangye I would propose that we change the configuration of the InfluxSink such that it can only expect primitive fields. In addition, we need to think about how this affects the automatic pipeline creation in case the user selects |
Hi @muyangye thank you for initiating work on this issue. The current challenge is that simply removing properties removes the user's ability to persist them. On the other hand, using a special encoding for arrays and serializing them as strings raises concerns about the use of the data for downstream processes such as visualization. To move forward, it would be beneficial to collectively define the user's goal when dealing with array data. I see two potential options: Option 1: Enforce users to transform arrays into an alternative representation before storing the data. And as @bossenti stated, we need to consider the side effects on other functions (such as the automatic pipeline generation) I'm open to both options but would also appreciate any additional ideas or insights you might have. |
Hi @tenthe @bossenti thanks for your insights! In my opinion, option 2 is better because users should definitely be able to store arrays. And I am willing to provide an implementation of array serialization/deserialization and modify downstream processes accordingly. However, this is a non-trivial task and may take some time. For now, we should let users know InfluxDB can only store primitive types when they have non-primitive types and are connecting to DataLake (such as when creating a pipeline or selecting I will first implement the reminder. Then, we should further break down the task "support array in influx sink" to multiple tasks/issues, after figuring out what downstream processes need to be modified (we can collect them in this thread). Once those are clear, we can start implementing this important enhancement. |
sounds like a great plan! With respect to the reminder: Please be aware that sinks (and processing elements as well) can define a If you have any problems, finding the specific functionalities, feel free to reach out 🙂 |
Hey @muyangye, I really like your systematic approach. When considering the implementation's complexity, I'm uncertain if Option 1 is easier to implement due to its broader impact on components like UI and pipeline creation. It might be simpler to change only the Data Lake sink, serializing arrays to strings. This way, users won't need to modify the data stream or receive notifications. The only drawback is the inability to display this data in the data explorer and dashboard, which isn't currently feasible anyway. We could explore potential solutions for this in a separate issue. What are your thoughts on this? |
@tenthe I see your points here and I agree your concerns about the impact of Option 1 (e.g. users are forced to modify the data stream) are totally valid. But if we don't give users some notifications, wouldn't it be the same situation right now? Currently the user is unaware of the data lost so @bossenti raised this issue. If I am not mistaken, are you suggesting directly start working on Option 2? Please let me know if I am misunderstood something, but if the worry is about limiting/decreasing usability of Option 1, I would suggest this: instead of making changes that impact usability such as modifying What do you think? |
My suggestion was to work directly on option 2. I thought that it would be easier to implement then an exception handling for array properties. |
I see, I just realized after serializing the array to string the influx store will store it which will be displayed in the dashboard (as opposed to nothing) so the user will notice a difference between the current situation. Yes I think it is definitely better. Thanks a lot for your suggestion! |
…xDB (#2196) * Serialize non-primitive types and store in Influx * extract RawFieldSerializer * rename test * delete old
Closed by #2196 |
Apache StreamPipes version
dev (current development state)
Affected StreamPipes components
Processing Elements
What happened?
When trying to persist an event stream with an array as property, the influx sink throws an exception:
This is du to the fact that only the handling of primitive types is implemented:
streampipes/streampipes-data-explorer-commons/src/main/java/org/apache/streampipes/dataexplorer/commons/influx/InfluxStore.java
Line 129 in a34a4dd
How to reproduce?
Persist an event stream containing an array as property
Expected behavior
Storage can handle non-primitve types as well
Additional technical information
No response
Are you willing to submit a PR?
None
The text was updated successfully, but these errors were encountered: