Add readdata() function for generic data input #2162

tzz · 2015-04-03T01:28:11Z

This PR has three components:

unify CSV, JSON, and YAML parsing in one FnCallReadData function
fix a bug with CSV parsing, where a null result from the parse could result in a segfault. Also the byte limit is raised to 50 MB.
introduce the function readdata() which takes two parameters: the filename and the mode (auto or CSV or YAML or JSON). In auto mode the file extension is checked. Note that readdata() has no file size parameter.

readdata() is handy for cases where the policy author wants to handle multiple types of files without special cases, like in cfengine/masterfiles#397 as @nickanderson will confirm.

If this function is acceptable, I will write docs and acceptance tests.

cfengine-review-bot · 2015-04-03T01:30:03Z

Can one of the admins verify this patch?

nickanderson · 2015-04-03T13:28:44Z

This is neat, it can make def.json support yaml and I guess csv without any code differences.

So maxbytes for JSON and YAML is inf, but CSV is 50MB? Is there a good reason for the difference? If there is no technical reason to limit CSV then I would suggest it operates the same as JSON and YAML, now we are just missing XML ;-0

nickanderson · 2015-04-03T13:33:21Z

Also, does this mean we should deprecate readyaml, readjson, readcsv ?

tzz · 2015-04-03T13:35:19Z

The current readyaml, readcsv, and readjson functions have two reasons to stay: first, explicit size limits (except for readcsv). Second, using them gives you static validation that you'll read the right format instead of deciding it dynamically.

I can make CSV unlimited too, I just thought 50MB was quite large.

I did think about XML! XML is much harder to read into a data container. The mapping is not one-to-one and CFEngine doesn't yet have the tools to extract both content and attributes if I mix them in the data (the mapjson() proposal would help). Also I don't know of anyone that needs it. But we do have libxml linkage... so maybe a future feature :)

I also thought about naming the function data_read. Not sure, what do you think?

nickanderson · 2015-04-03T13:38:32Z

For naming I prefer readdata or read_data. @estenberg opinion on CSV data limit?

50MB csv is big and annoying, but so is knowing that CSV is a special case. So unless there is a technical reason to limit it, I would vote for consistency (yes, there is a hobgoblin in there).

ediosyncratic · 2015-04-07T11:13:29Z

It is only petty consistency that's a hobgoblin. Where consistency is actually useful, as here, there are no hobgoblins.

tzz · 2015-04-15T13:36:58Z

ping?

tzz · 2015-04-24T13:24:00Z

ping? This is a pretty safe isolated change IMO.

kacf · 2015-04-28T08:06:52Z

I agree that CSV should not be size limited, so that we are consistent with the others. Other than that this is great, no concerns from me!

What's missing is an acceptance test and some docs.

tzz · 2015-04-30T18:45:55Z

For the acceptance test, I'll wait for #1485 to get merged because it will make the acceptance test much easier.

tzz · 2015-04-30T20:14:15Z

Added examples. Docs in cfengine/documentation#1026

tzz · 2015-05-04T14:04:13Z

The auto-guessing feature didn't feel right so I've removed it. Now we simply try reading as JSON if the mode is auto and the extension is unknown.

(This is different from the inline JSON feature, which does auto-guess because the string is inline. Here we'd have to peek into the file, which is much more complicated and less useful.)

tzz · 2015-05-04T14:04:46Z

For JSON and YAML I had to set a max read size, and chose 50MB just like CSV. I think that's reasonable but we can raise it.

…tate() fix

tzz · 2015-05-04T14:07:37Z

Acceptance test added. A small fix for bundlestate() in DefaultTemplateData() was needed to remove mangled var refs from the bundle state.

tzz · 2015-05-04T14:08:51Z

The state-based acceptance test functionality is really nice here--almost entirely data-driven.

tzz · 2015-05-04T14:09:04Z

Docs updated as well.

kacf · 2015-05-05T06:41:34Z

libpromises/evalfunction.c

+                if (NULL == strchr(lval_key, '#')) // don't collect mangled refs
+                {
+                    JsonObjectAppendElement(scope_obj, lval_key, RvalToJson(var->rval));
+                }


I didn't quite understand this. Can you explain why this happens?

this#promise_filename shows up in the bundle's variables (claims its scope is the bundle). I don't know why it happens since I didn't write the variable-gathering code, but since we can't have variables with # in the name, the fix is good regardless of the underlying issue.

I vaguely recall that there are some exceptions to this rule, and '#' is used to mark those exceptions, but I don't remember any details (or even if I remember correctly..). I suppose we're ok if the tests pass, and this is anyway limited to the template data.

# is used to replace . when variables are "mangled." Do git grep -i mangle in core.git and prepare yourself :)

Haha, you're right, I wish I hadn't. :-)
I suppose this is appropriate then, since the data for the bundle is not supposed to include any variables from outside the bundle. The this#promise_filename is an exception merely because it is in fact local to that bundle.

kacf · 2015-05-05T06:42:31Z

I tried adding some test cases for failed parsing, like here: https://gist.github.com/kacfengine/0e0abe107a8b63061394, but this caused an infinite loop. Would be nice if you included it in your test.

kacf · 2015-05-05T06:43:12Z

And btw you're right, the state based testing is really nice!

tzz · 2015-05-05T13:33:53Z

OK; acceptance test expanded as you suggested. That exposed a pre-existing bug in the YAML parser: it wouldn't break on YAML_NO_EVENT like it's supposed to (on malformed JSON). Fixed; thanks for the suggestion.

tzz · 2015-05-05T13:35:44Z

I think it makes sense to have a failed CSV parse return the empty array [] but for failed JSON and YAML parses, to fail the variable promise.

The reason is that CSV has no structure: anything can be CSV. Whereas JSON and YAML have to obey strict rules to be parseable.

If you disagree, I can change that behavior.

tzz · 2015-05-05T13:36:12Z

@nickanderson the data limit is 50MB for all formats now.

kacf · 2015-05-05T13:40:26Z

I think it makes sense to have a failed CSV parse return the empty array [] but for failed JSON and YAML parses, to fail the variable promise.

The reason is that CSV has no structure: anything can be CSV. Whereas JSON and YAML have to obey strict rules to be parseable.

If you disagree, I can change that behavior.

Nope, I agree as well. Empty CSV is still valid CSV, so the only possible error is being unable to open the file.

kacf · 2015-05-05T20:18:20Z

trigger build

Add readdata() function for generic data input

tzz · 2015-05-06T09:10:20Z

Thank you!

tzz mentioned this pull request Apr 11, 2015

def.cf: use services/def.json to override some variables tagged "defvar" cfengine/masterfiles#397

Merged

kacf self-assigned this Apr 28, 2015

tzz mentioned this pull request Apr 30, 2015

Add readdata() function docs cfengine/documentation#1026

Merged

tzz added 2 commits May 4, 2015 09:29

Add readdata() function for generic data input

a33bbd7

Add readdata() function example

69de849

tzz force-pushed the feature/readdata branch from 45d1439 to 69de849 Compare May 4, 2015 13:29

Add readdata() function acceptance test, size limit and small bundles…

d32196f

…tate() fix

kacf reviewed May 5, 2015
View reviewed changes

Expand readdata() acceptance test and fix bug in YAML parser

01c0ac5

kacf added a commit that referenced this pull request May 6, 2015

Merge pull request #2162 from tzz/feature/readdata

8bba516

Add readdata() function for generic data input

kacf merged commit 8bba516 into cfengine:master May 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add readdata() function for generic data input #2162

Add readdata() function for generic data input #2162

tzz commented Apr 3, 2015

cfengine-review-bot commented Apr 3, 2015

nickanderson commented Apr 3, 2015

nickanderson commented Apr 3, 2015

tzz commented Apr 3, 2015

nickanderson commented Apr 3, 2015

ediosyncratic commented Apr 7, 2015

tzz commented Apr 15, 2015

tzz commented Apr 24, 2015

kacf commented Apr 28, 2015

tzz commented Apr 30, 2015

tzz commented Apr 30, 2015

tzz commented May 4, 2015

tzz commented May 4, 2015

tzz commented May 4, 2015

tzz commented May 4, 2015

tzz commented May 4, 2015

kacf May 5, 2015

tzz May 5, 2015

kacf May 5, 2015

tzz May 5, 2015

kacf May 5, 2015

kacf commented May 5, 2015

kacf commented May 5, 2015

tzz commented May 5, 2015

tzz commented May 5, 2015

tzz commented May 5, 2015

kacf commented May 5, 2015

kacf commented May 5, 2015

tzz commented May 6, 2015

Add readdata() function for generic data input #2162

Add readdata() function for generic data input #2162

Conversation

tzz commented Apr 3, 2015

cfengine-review-bot commented Apr 3, 2015

nickanderson commented Apr 3, 2015

nickanderson commented Apr 3, 2015

tzz commented Apr 3, 2015

nickanderson commented Apr 3, 2015

ediosyncratic commented Apr 7, 2015

tzz commented Apr 15, 2015

tzz commented Apr 24, 2015

kacf commented Apr 28, 2015

tzz commented Apr 30, 2015

tzz commented Apr 30, 2015

tzz commented May 4, 2015

tzz commented May 4, 2015

tzz commented May 4, 2015

tzz commented May 4, 2015

tzz commented May 4, 2015

kacf May 5, 2015

Choose a reason for hiding this comment

tzz May 5, 2015

Choose a reason for hiding this comment

kacf May 5, 2015

Choose a reason for hiding this comment

tzz May 5, 2015

Choose a reason for hiding this comment

kacf May 5, 2015

Choose a reason for hiding this comment

kacf commented May 5, 2015

kacf commented May 5, 2015

tzz commented May 5, 2015

tzz commented May 5, 2015

tzz commented May 5, 2015

kacf commented May 5, 2015

kacf commented May 5, 2015

tzz commented May 6, 2015