MINIFICPP-646 - Re-evaluate passing attributes. Potentially deprecate…#515
MINIFICPP-646 - Re-evaluate passing attributes. Potentially deprecate…#515arpadboda wants to merge 1 commit intoapache:masterfrom
Conversation
|
This is for 1.0, just coded quickly while I remember what I wanted to do. Can stay open for a while. |
… attribute usage functions
6d71438 to
1f31f30
Compare
| const char *key; | ||
| void *value; | ||
| size_t value_size; | ||
| const char *value; |
There was a problem hiding this comment.
I would probably leave the size because a) you don't know if the value is null terminated ( it may not be ) -- and b) as we move away from C++ we will likely manage the memory and update_attribute at that point could/may be a realloc.
There was a problem hiding this comment.
My idea here was to make sure that whatever we store is null-terminated.
We provide API to add non-terminated str with length, (something like strncpy), but inside store null-terminated things.
Two reasons behind:
-no need to store len in internal structures
-whatever we return is null-terminated, which makes it easy to copy/log
In my opinion this leads to a more compact and cleaner API, but I can be convinced about the opposite way, too.
There was a problem hiding this comment.
We provide API to add non-terminated str with length, (something like strncpy), but inside store null-terminated things.
We can not provide non-null-terminated API for null-terminated storage, because the former allows for null bytes in the payload while the latter does not.
+1 for keeping void* + size_t if the attribute value can ever contain a null byte as part of the payload, -1 if it is always a string that can be null-terminated.
edit: If this struct refers to flow file attributes, then I'm 100% for keeping the data as binary instead of null-terminated. I can think of flows with binary attributes as a valid use case regardless of whether they're currently supported or not. Building a null-terminated string API on top of a binary buffer is possible (see std::string) but not the other way around.
There was a problem hiding this comment.
Let me resurrect this PR! I would like to agree on a way we can proceed and do it as attribute handling is quite painful at the moment, and the more nanofi code we write the more painful it becomes to change it.
What I'm quite sure about:
-We need attributes as we somehow need to support routing flowfiles at destination (NiFI).
-Attributes should be strings as those are strings in MiNiFi and NiFi, too.
-The key and the value should be owned by the flowfile structure, so getters return const pointers, setters take const pointers and copy. This might have some minor performance drawbacks (every time a value is set/updated, reallocation is required), but at least the responsibility is well defined.
What I'm not sure about is the way of storing them: char array and len stored separately or a null-terminated string.
For the setter it doesn't matter as we can provide the following API:
setvalue(const char * key, const char * value, size_t value_len = 0)
Where value_len being 0 indicates that value is a 0 terminated string. The user can provide anyhow, we can store any way we want.
The reasons I would prefer to store them as null-terminated:
-No need to store the length separately
-Even if we want an API that returns the length separately, it's easy to do.
-Value can be used in printf/log statements easily. (this is the most important in my opinion)
-Key is also null-terminated, so at least it would be consistent.
@phrocker , @bakaid, @msharee9, could you share your opinion?
|
I agree that attributes should be owned by the flowfile structure, otherwise it would be very error-prone to handle lifetimes. It is more than worth the small overhead.
|
| * @param value location of value - expected to be null-terminated string | ||
| * @return 0 in case of success, -1 otherwise (already existed) | ||
| **/ | ||
| int8_t add_attribute_str(flow_file_record*, const char *key, const char *value); |
There was a problem hiding this comment.
So this functions as purely a convenience function?
|
@arpadboda Value needn't be null terminated, especially given the sink and source where nanofi may exist; however, I don't actually have a strong opinion on that for this PR -- especially if @bakaid and @msharee9 agree with that approach. NiFi is not the penultimate input/output of NanoFi code -- but as we evolve we can live in the confines of that decision for attributes. Think of the input/output of nanofi edge controller units as being site to site, kafka, mqtt, pipe, and file ( and maybe redis eventually ). In any of these cases attributes can be of an intermediate form. That means that attributes could carry binary data ( hence why they were initially not typed and instead a bag of keys and values -- I think this is how redis stores them internally ). I think you can draw a line in the sand that attributes are null terminated and require that another mechanism be built. This may alleviate your concerns about having non textual data in the attributes -- so then we can simply create a follow on to store intermediate data that are attributes of the data that are binary. The initial API facilitated C strings that were null terminated and arbitrary binary data. If you feel separating these concerns is important we can make a follow on ticket for the release that references 646. I can certainly see the value in isolating those APIs. I don't have a super strong opinion on how you structure this "struct" -- as long as we have a way to ultimately carry attributes that are not null terminated. if you think it helps to add API calls for something like "add_binary_attribute", then I'm okay with that. I don't really foresee a great deal of cost with ownership as you present it. Since we own that data structure and if it were opaque as @bakaid suggests, then we can realloc more safely. All code that uses the variants of the API defined in the last release should compile in this minor release. How everything is represented internally is up to you. |
|
@phrocker : what you say in aspect of Redis for eg. makes sense, however I think that's already messed up. To deal with binary formats I think I would prefer to offer base64 encoding for attribute settings and keep the internal representation (and network representation) as string. |
|
@arpadboda In my opinion Base64 encoding the underlying data is a great way to tackle the problem. As long as the API allows them to get/set these as binary data ( or handles the translation ). then I think that's a great idea. |
szaszm
left a comment
There was a problem hiding this comment.
+1 for making attribute an opaque struct, but I'd store the buffer as binary already, since it's not much more effort to implement. We should avoid base64 IMO, as it would add unnecessary complexity.
We need to sort out the ownership semantics of the struct and enforce them through the API. I'd provide binary and c-string getters and copying setters for now, optionally adopting setters now or in the future.
| const char * attr_value = "some value"; | ||
| attribute1.value = (void *)attr_value; | ||
| attribute1.value_size = strlen(attr_value); | ||
| attribute1.value = "some value"; |
There was a problem hiding this comment.
I've expected attribute to be an owning container of key/value. We really need to clarify ownership semantics and enforce them through the API.
I support the opaque struct proposal and would provide copying and optionally adopting setters for the struct.
|
Closing this because of inactivity |
… attribute usage functions
Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.