Skip to content

MINIFICPP-646 - Re-evaluate passing attributes. Potentially deprecate…#515

Closed
arpadboda wants to merge 1 commit intoapache:masterfrom
arpadboda:MINIFICPP-646
Closed

MINIFICPP-646 - Re-evaluate passing attributes. Potentially deprecate…#515
arpadboda wants to merge 1 commit intoapache:masterfrom
arpadboda:MINIFICPP-646

Conversation

@arpadboda
Copy link
Contributor

… attribute usage functions

Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file?
  • If applicable, have you updated the NOTICE file?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

@arpadboda
Copy link
Contributor Author

This is for 1.0, just coded quickly while I remember what I wanted to do. Can stay open for a while.

const char *key;
void *value;
size_t value_size;
const char *value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably leave the size because a) you don't know if the value is null terminated ( it may not be ) -- and b) as we move away from C++ we will likely manage the memory and update_attribute at that point could/may be a realloc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea here was to make sure that whatever we store is null-terminated.
We provide API to add non-terminated str with length, (something like strncpy), but inside store null-terminated things.

Two reasons behind:
-no need to store len in internal structures
-whatever we return is null-terminated, which makes it easy to copy/log

In my opinion this leads to a more compact and cleaner API, but I can be convinced about the opposite way, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would convince you?

Copy link
Member

@szaszm szaszm Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We provide API to add non-terminated str with length, (something like strncpy), but inside store null-terminated things.

We can not provide non-null-terminated API for null-terminated storage, because the former allows for null bytes in the payload while the latter does not.
+1 for keeping void* + size_t if the attribute value can ever contain a null byte as part of the payload, -1 if it is always a string that can be null-terminated.

edit: If this struct refers to flow file attributes, then I'm 100% for keeping the data as binary instead of null-terminated. I can think of flows with binary attributes as a valid use case regardless of whether they're currently supported or not. Building a null-terminated string API on top of a binary buffer is possible (see std::string) but not the other way around.

Copy link
Contributor Author

@arpadboda arpadboda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me resurrect this PR! I would like to agree on a way we can proceed and do it as attribute handling is quite painful at the moment, and the more nanofi code we write the more painful it becomes to change it.

What I'm quite sure about:
-We need attributes as we somehow need to support routing flowfiles at destination (NiFI).
-Attributes should be strings as those are strings in MiNiFi and NiFi, too.
-The key and the value should be owned by the flowfile structure, so getters return const pointers, setters take const pointers and copy. This might have some minor performance drawbacks (every time a value is set/updated, reallocation is required), but at least the responsibility is well defined.

What I'm not sure about is the way of storing them: char array and len stored separately or a null-terminated string.

For the setter it doesn't matter as we can provide the following API:

setvalue(const char * key, const char * value, size_t value_len = 0) 

Where value_len being 0 indicates that value is a 0 terminated string. The user can provide anyhow, we can store any way we want.

The reasons I would prefer to store them as null-terminated:
-No need to store the length separately
-Even if we want an API that returns the length separately, it's easy to do.
-Value can be used in printf/log statements easily. (this is the most important in my opinion)
-Key is also null-terminated, so at least it would be consistent.

@phrocker , @bakaid, @msharee9, could you share your opinion?

@bakaid
Copy link
Contributor

bakaid commented Jun 20, 2019

@arpadboda

I agree that attributes should be owned by the flowfile structure, otherwise it would be very error-prone to handle lifetimes. It is more than worth the small overhead.
Because of this, we must provide setters.
With getters we have a choice: either we can make the internal storage of the attributes part of the API, and we don't have to provide getters, or we make the attribute structure opaque and provide getters for the attributes (which can either copy the values, or return a const pointer to them, with the proper lifetime warning part of the specification of the getter).
This choice does not influence how we store the attributes, but it influences whether we can change how we store them without breaking the API.
From a setter standpoint I see the merits of having a buffer + size setter along with the C string setter.
From a getter standpoint I think it is easier to use the API if it returns a C string.
Therefore my preference would be to:

  • make the structure opaque
  • create getters
  • make getters return a C string
  • because we can now modify the internal struct it does not really matter how we store the data internally, but because we are returning a C string in the getter, if there are no other considerations it would be worth storing it that way for the time being.

* @param value location of value - expected to be null-terminated string
* @return 0 in case of success, -1 otherwise (already existed)
**/
int8_t add_attribute_str(flow_file_record*, const char *key, const char *value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this functions as purely a convenience function?

@phrocker
Copy link
Contributor

@arpadboda Value needn't be null terminated, especially given the sink and source where nanofi may exist; however, I don't actually have a strong opinion on that for this PR -- especially if @bakaid and @msharee9 agree with that approach. NiFi is not the penultimate input/output of NanoFi code -- but as we evolve we can live in the confines of that decision for attributes.

Think of the input/output of nanofi edge controller units as being site to site, kafka, mqtt, pipe, and file ( and maybe redis eventually ). In any of these cases attributes can be of an intermediate form. That means that attributes could carry binary data ( hence why they were initially not typed and instead a bag of keys and values -- I think this is how redis stores them internally ). I think you can draw a line in the sand that attributes are null terminated and require that another mechanism be built. This may alleviate your concerns about having non textual data in the attributes -- so then we can simply create a follow on to store intermediate data that are attributes of the data that are binary.

The initial API facilitated C strings that were null terminated and arbitrary binary data. If you feel separating these concerns is important we can make a follow on ticket for the release that references 646. I can certainly see the value in isolating those APIs.

I don't have a super strong opinion on how you structure this "struct" -- as long as we have a way to ultimately carry attributes that are not null terminated. if you think it helps to add API calls for something like "add_binary_attribute", then I'm okay with that.

I don't really foresee a great deal of cost with ownership as you present it. Since we own that data structure and if it were opaque as @bakaid suggests, then we can realloc more safely.

All code that uses the variants of the API defined in the last release should compile in this minor release. How everything is represented internally is up to you.

@arpadboda
Copy link
Contributor Author

@phrocker : what you say in aspect of Redis for eg. makes sense, however I think that's already messed up.
Attributes are handled as UTF strings on S2S, so when transferring them, we handle them as strings, which means cutting at the first 0 byte.
Even if we fix that I'm unsure whether we would achieve the behaviour we want or not given NiFi handles attributes as strings, too.

To deal with binary formats I think I would prefer to offer base64 encoding for attribute settings and keep the internal representation (and network representation) as string.

@phrocker
Copy link
Contributor

phrocker commented Jun 25, 2019

@arpadboda In my opinion Base64 encoding the underlying data is a great way to tackle the problem. As long as the API allows them to get/set these as binary data ( or handles the translation ). then I think that's a great idea.

Copy link
Member

@szaszm szaszm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for making attribute an opaque struct, but I'd store the buffer as binary already, since it's not much more effort to implement. We should avoid base64 IMO, as it would add unnecessary complexity.

We need to sort out the ownership semantics of the struct and enforce them through the API. I'd provide binary and c-string getters and copying setters for now, optionally adopting setters now or in the future.

const char * attr_value = "some value";
attribute1.value = (void *)attr_value;
attribute1.value_size = strlen(attr_value);
attribute1.value = "some value";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've expected attribute to be an owning container of key/value. We really need to clarify ownership semantics and enforce them through the API.
I support the opaque struct proposal and would provide copying and optionally adopting setters for the struct.

@szaszm
Copy link
Member

szaszm commented Dec 16, 2020

Closing this because of inactivity

@szaszm szaszm closed this Dec 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants