MINIFICPP-646 - Re-evaluate passing attributes. Potentially deprecate… by arpadboda · Pull Request #515 · apache/nifi-minifi-cpp

arpadboda · 2019-03-17T21:15:26Z

… attribute usage functions

Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?

For code changes:

If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE file?
If applicable, have you updated the NOTICE file?

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

arpadboda · 2019-03-17T21:16:06Z

This is for 1.0, just coded quickly while I remember what I wanted to do. Can stay open for a while.

… attribute usage functions

phrocker · 2019-03-18T14:02:21Z

nanofi/include/core/cstructs.h

  const char *key;
-  void *value;
-  size_t value_size;
+  const char *value;


I would probably leave the size because a) you don't know if the value is null terminated ( it may not be ) -- and b) as we move away from C++ we will likely manage the memory and update_attribute at that point could/may be a realloc.

My idea here was to make sure that whatever we store is null-terminated.
We provide API to add non-terminated str with length, (something like strncpy), but inside store null-terminated things.

Two reasons behind:
-no need to store len in internal structures
-whatever we return is null-terminated, which makes it easy to copy/log

In my opinion this leads to a more compact and cleaner API, but I can be convinced about the opposite way, too.

What would convince you?

We provide API to add non-terminated str with length, (something like strncpy), but inside store null-terminated things.

We can not provide non-null-terminated API for null-terminated storage, because the former allows for null bytes in the payload while the latter does not.
+1 for keeping void* + size_t if the attribute value can ever contain a null byte as part of the payload, -1 if it is always a string that can be null-terminated.

edit: If this struct refers to flow file attributes, then I'm 100% for keeping the data as binary instead of null-terminated. I can think of flows with binary attributes as a valid use case regardless of whether they're currently supported or not. Building a null-terminated string API on top of a binary buffer is possible (see std::string) but not the other way around.

arpadboda

Let me resurrect this PR! I would like to agree on a way we can proceed and do it as attribute handling is quite painful at the moment, and the more nanofi code we write the more painful it becomes to change it.

What I'm quite sure about:
-We need attributes as we somehow need to support routing flowfiles at destination (NiFI).
-Attributes should be strings as those are strings in MiNiFi and NiFi, too.
-The key and the value should be owned by the flowfile structure, so getters return const pointers, setters take const pointers and copy. This might have some minor performance drawbacks (every time a value is set/updated, reallocation is required), but at least the responsibility is well defined.

What I'm not sure about is the way of storing them: char array and len stored separately or a null-terminated string.

For the setter it doesn't matter as we can provide the following API:

setvalue(const char * key, const char * value, size_t value_len = 0)

Where value_len being 0 indicates that value is a 0 terminated string. The user can provide anyhow, we can store any way we want.

The reasons I would prefer to store them as null-terminated:
-No need to store the length separately
-Even if we want an API that returns the length separately, it's easy to do.
-Value can be used in printf/log statements easily. (this is the most important in my opinion)
-Key is also null-terminated, so at least it would be consistent.

@phrocker , @bakaid, @msharee9, could you share your opinion?

bakaid · 2019-06-20T12:17:50Z

@arpadboda

I agree that attributes should be owned by the flowfile structure, otherwise it would be very error-prone to handle lifetimes. It is more than worth the small overhead.
Because of this, we must provide setters.
With getters we have a choice: either we can make the internal storage of the attributes part of the API, and we don't have to provide getters, or we make the attribute structure opaque and provide getters for the attributes (which can either copy the values, or return a const pointer to them, with the proper lifetime warning part of the specification of the getter).
This choice does not influence how we store the attributes, but it influences whether we can change how we store them without breaking the API.
From a setter standpoint I see the merits of having a buffer + size setter along with the C string setter.
From a getter standpoint I think it is easier to use the API if it returns a C string.
Therefore my preference would be to:

make the structure opaque
create getters
make getters return a C string
because we can now modify the internal struct it does not really matter how we store the data internally, but because we are returning a C string in the getter, if there are no other considerations it would be worth storing it that way for the time being.

phrocker · 2019-06-25T09:45:21Z

nanofi/include/api/nanofi.h

+ * @param value location of value - expected to be null-terminated string
+ * @return 0 in case of success, -1 otherwise (already existed)
+ **/
+int8_t add_attribute_str(flow_file_record*, const char *key, const char *value);


So this functions as purely a convenience function?

phrocker · 2019-06-25T10:17:18Z

@arpadboda Value needn't be null terminated, especially given the sink and source where nanofi may exist; however, I don't actually have a strong opinion on that for this PR -- especially if @bakaid and @msharee9 agree with that approach. NiFi is not the penultimate input/output of NanoFi code -- but as we evolve we can live in the confines of that decision for attributes.

Think of the input/output of nanofi edge controller units as being site to site, kafka, mqtt, pipe, and file ( and maybe redis eventually ). In any of these cases attributes can be of an intermediate form. That means that attributes could carry binary data ( hence why they were initially not typed and instead a bag of keys and values -- I think this is how redis stores them internally ). I think you can draw a line in the sand that attributes are null terminated and require that another mechanism be built. This may alleviate your concerns about having non textual data in the attributes -- so then we can simply create a follow on to store intermediate data that are attributes of the data that are binary.

The initial API facilitated C strings that were null terminated and arbitrary binary data. If you feel separating these concerns is important we can make a follow on ticket for the release that references 646. I can certainly see the value in isolating those APIs.

I don't have a super strong opinion on how you structure this "struct" -- as long as we have a way to ultimately carry attributes that are not null terminated. if you think it helps to add API calls for something like "add_binary_attribute", then I'm okay with that.

I don't really foresee a great deal of cost with ownership as you present it. Since we own that data structure and if it were opaque as @bakaid suggests, then we can realloc more safely.

All code that uses the variants of the API defined in the last release should compile in this minor release. How everything is represented internally is up to you.

arpadboda · 2019-06-25T15:07:51Z

@phrocker : what you say in aspect of Redis for eg. makes sense, however I think that's already messed up.
Attributes are handled as UTF strings on S2S, so when transferring them, we handle them as strings, which means cutting at the first 0 byte.
Even if we fix that I'm unsure whether we would achieve the behaviour we want or not given NiFi handles attributes as strings, too.

To deal with binary formats I think I would prefer to offer base64 encoding for attribute settings and keep the internal representation (and network representation) as string.

phrocker · 2019-06-25T15:47:11Z

@arpadboda In my opinion Base64 encoding the underlying data is a great way to tackle the problem. As long as the API allows them to get/set these as binary data ( or handles the translation ). then I think that's a great idea.

szaszm

+1 for making attribute an opaque struct, but I'd store the buffer as binary already, since it's not much more effort to implement. We should avoid base64 IMO, as it would add unnecessary complexity.

We need to sort out the ownership semantics of the struct and enforce them through the API. I'd provide binary and c-string getters and copying setters for now, optionally adopting setters now or in the future.

szaszm · 2020-02-17T13:21:33Z

nanofi/examples/transmit_payload.c

-  const char * attr_value = "some value";
-  attribute1.value = (void *)attr_value;
-  attribute1.value_size = strlen(attr_value);
+  attribute1.value = "some value";


I've expected attribute to be an owning container of key/value. We really need to clarify ownership semantics and enforce them through the API.
I support the opaque struct proposal and would provide copying and optionally adopting setters for the struct.

szaszm · 2020-12-16T15:26:39Z

Closing this because of inactivity

MINIFICPP-646 - Re-evaluate passing attributes. Potentially deprecate…

1f31f30

… attribute usage functions

arpadboda force-pushed the MINIFICPP-646 branch from 6d71438 to 1f31f30 Compare March 17, 2019 21:17

phrocker reviewed Mar 18, 2019

View reviewed changes

arpadboda commented Jun 18, 2019

View reviewed changes

phrocker reviewed Jun 25, 2019

View reviewed changes

szaszm requested changes Feb 17, 2020

View reviewed changes

szaszm closed this Dec 16, 2020

Conversation

arpadboda commented Mar 17, 2019

For all changes:

For code changes:

For documentation related changes:

Note:

Uh oh!

arpadboda commented Mar 17, 2019

Uh oh!

phrocker Mar 18, 2019

Choose a reason for hiding this comment

Uh oh!

arpadboda Mar 18, 2019

Choose a reason for hiding this comment

Uh oh!

phrocker Mar 28, 2019

Choose a reason for hiding this comment

Uh oh!

szaszm Feb 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arpadboda left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bakaid commented Jun 20, 2019

Uh oh!

phrocker Jun 25, 2019

Choose a reason for hiding this comment

Uh oh!

phrocker commented Jun 25, 2019

Uh oh!

arpadboda commented Jun 25, 2019

Uh oh!

phrocker commented Jun 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szaszm left a comment

Choose a reason for hiding this comment

Uh oh!

szaszm Feb 17, 2020

Choose a reason for hiding this comment

Uh oh!

szaszm commented Dec 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

szaszm Feb 17, 2020 •

edited

Loading

arpadboda left a comment •

edited

Loading

phrocker commented Jun 25, 2019 •

edited

Loading