New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add readdata() function for generic data input #2162
Conversation
Can one of the admins verify this patch? |
This is neat, it can make def.json support yaml and I guess csv without any code differences. So maxbytes for JSON and YAML is inf, but CSV is 50MB? Is there a good reason for the difference? If there is no technical reason to limit CSV then I would suggest it operates the same as JSON and YAML, now we are just missing XML ;-0 |
Also, does this mean we should deprecate readyaml, readjson, readcsv ? |
The current I can make CSV unlimited too, I just thought 50MB was quite large. I did think about XML! XML is much harder to read into a data container. The mapping is not one-to-one and CFEngine doesn't yet have the tools to extract both content and attributes if I mix them in the data (the I also thought about naming the function |
For naming I prefer readdata or read_data. @estenberg opinion on CSV data limit? 50MB csv is big and annoying, but so is knowing that CSV is a special case. So unless there is a technical reason to limit it, I would vote for consistency (yes, there is a hobgoblin in there). |
It is only petty consistency that's a hobgoblin. Where consistency is actually useful, as here, there are no hobgoblins. |
ping? |
ping? This is a pretty safe isolated change IMO. |
I agree that CSV should not be size limited, so that we are consistent with the others. Other than that this is great, no concerns from me! What's missing is an acceptance test and some docs. |
For the acceptance test, I'll wait for #1485 to get merged because it will make the acceptance test much easier. |
Added examples. Docs in cfengine/documentation#1026 |
The auto-guessing feature didn't feel right so I've removed it. Now we simply try reading as JSON if the mode is (This is different from the inline JSON feature, which does auto-guess because the string is inline. Here we'd have to peek into the file, which is much more complicated and less useful.) |
For JSON and YAML I had to set a max read size, and chose 50MB just like CSV. I think that's reasonable but we can raise it. |
Acceptance test added. A small fix for |
The state-based acceptance test functionality is really nice here--almost entirely data-driven. |
Docs updated as well. |
if (NULL == strchr(lval_key, '#')) // don't collect mangled refs | ||
{ | ||
JsonObjectAppendElement(scope_obj, lval_key, RvalToJson(var->rval)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite understand this. Can you explain why this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this#promise_filename
shows up in the bundle's variables (claims its scope is the bundle). I don't know why it happens since I didn't write the variable-gathering code, but since we can't have variables with #
in the name, the fix is good regardless of the underlying issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely recall that there are some exceptions to this rule, and '#' is used to mark those exceptions, but I don't remember any details (or even if I remember correctly..). I suppose we're ok if the tests pass, and this is anyway limited to the template data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#
is used to replace .
when variables are "mangled." Do git grep -i mangle
in core.git and prepare yourself :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, you're right, I wish I hadn't. :-)
I suppose this is appropriate then, since the data for the bundle is not supposed to include any variables from outside the bundle. The this#promise_filename
is an exception merely because it is in fact local to that bundle.
I tried adding some test cases for failed parsing, like here: https://gist.github.com/kacfengine/0e0abe107a8b63061394, but this caused an infinite loop. Would be nice if you included it in your test. |
And btw you're right, the state based testing is really nice! |
OK; acceptance test expanded as you suggested. That exposed a pre-existing bug in the YAML parser: it wouldn't break on |
I think it makes sense to have a failed CSV parse return the empty array The reason is that CSV has no structure: anything can be CSV. Whereas JSON and YAML have to obey strict rules to be parseable. If you disagree, I can change that behavior. |
@nickanderson the data limit is 50MB for all formats now. |
Nope, I agree as well. Empty CSV is still valid CSV, so the only possible error is being unable to open the file. |
trigger build |
Add readdata() function for generic data input
Thank you! |
This PR has three components:
FnCallReadData
functionreaddata()
which takes two parameters: the filename and the mode (auto
orCSV
orYAML
orJSON
). Inauto
mode the file extension is checked. Note thatreaddata()
has no file size parameter.readdata()
is handy for cases where the policy author wants to handle multiple types of files without special cases, like in cfengine/masterfiles#397 as @nickanderson will confirm.If this function is acceptable, I will write docs and acceptance tests.