Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added newline escapes (avoid multiline strings), content sorting, simple normalized output #32

Open
wants to merge 99 commits into
base: master
Choose a base branch
from

Conversation

jimklimov
Copy link

Added support for sorting contents of the JSON markup, for printing out "normalized JSON markup" (the top element only, without the bracketed header), for concatenating input string values with embedded newlines into single-line strings with embedded "\n". Overall with these changes it is possible to compare two JSON documents even if in original form they are sorted differently (but otherwise have equivalent contents).

…ut "normalized JSON markup" (the top element only, without the bracketed header), for concatenating input string values with embedded newlines into single-line strings with embedded "\n"
@jimklimov
Copy link
Author

PS: The project test suite passes with flying colors ;)
No tests were added for the new features, though... is that required?

@dominictarr
Copy link
Owner

sorry I don't quite understand what you have added, can you give me an example of how you would use this.

Yes, you should definately add a test if you don't want your feature to get broken later

@jimklimov
Copy link
Author

See the two new commits for tests and small but important fixes to improve portable behavior across different OSes.

As for "why" - in our project we produce JSON output from some database queries, and as such they are inherently without guaranteed ordering (no ORDER BY/SORT is enforced since generic clients don't care, and it impacts performance). The problem was with automated testing - JSON lines with different order of the same content are not equal for simple comparisons nor diff (i.e. when JSON.sh brief output was tried, it had different array sequence numbers for the same items in runtime actual and saved expected outputs). With the exception of tuples which are inherently structures as quietly known to server and client (and maybe schema) but with an array syntax as far as simple parsers are concerned, sorting of content does (should) not compromise validity and integrity of the markup.

So to get reproducible strings for our test suite, whether valid or not - but same plain-texts under the same conditions - I added content sorting, and I added a way to simply receive the "normalized" view of JSON markup independent of variable white-spacing (you already had all logic for this in place), and I added concatenation of input strings with carriage returns so they become separated by backslash+n rather than end of line. And I allowed to optionally mix these features, so you can have the normalized single-line huge string of JSON markup with either sorted content or in its original order (as may be needed for tuples, so this is generally more correct to store as a piece of data rather than a piece for auto-comparison).

Now in our test-suite we can have "expected" markup and "runtime" markup both parsed by normalize+sort and then compared to catch regressions in the server (or peculiarities in cross-platform clients).

@jimklimov
Copy link
Author

Added one more set of tests, for clarity - combination of normalized+sorted content as this is closest to what we will use in our project's test suite.
Thanks for your script, it was quite educational about less frequently used nuances, lots of smart moves here and there ;)

… place) and meaningless whitespace in markup between elements (should be ignored)
@dominictarr
Copy link
Owner

Unescaped newlines inside strings is not valid json, so JSON.sh has no business parsing it. If your tool is outputting json with newlines in the keys it is outputting invalid json.

The other thing which is weird here is you are sorting arrays ([]) but not ({}). The ordering of keys in {} objects is not well defined in javascript - implementations may happen to be ordered, but it is not a part of the javascript spec.

It looks like you have done a pretty througher job on this pull request, but it seems like these features are tightly coupled to your specific application and not generic, and so I have trouble feeling that they should be merged into JSON.sh

Do you really want to use JSON.sh here? it's probably the slowest json parser around. and your tests would probably run much faster if you used a json parser written in any other language.

What database are you getting the json from?

@jimklimov
Copy link
Author

Our application is embedded so we are limited in available interpreters - most JSON parsers are available in perl, python or java, and neither of these fits into our storage sizing ;( So we are limited to shell scripts and compiled binaries... and for testing, shell suffices despite any performance and is more simple to tweak/maintain ;)
Objects ({}) are sorted too, by key names and if same keynames appear - by values, algorithm is essentially the same as for arrays (that is, for the sorting meta-code it is an array of strings that happen to have "key":"value" format).
I agree embedded unescaped newlines are invalid for JSON... just wanted to make sure they never ever appear because that would be toxic for line-based sorting (that's how explicit 'echo -E' got here, in particular - different OSes have different defaults about this ;) ).
Database at the moment is MySQL/MariaDB, but in general it would be anything that the project' persistence abstraction layer would be coded to support (and what a customer would have deployed and comfortable maintaining). So we make no assumptions beside more or less basic SQL support, which suits our simple storage needs here sufficiently.
As for general applicability... at least, the code is now up on GitHub and linked via fork/PR, so even if you don't find it suitable for your master fork, at least others have an easy way to find it should they need to ;)

  • Still, the normalization mode still produces valid JSON identically to your code except that it outputs nothing else and omits the empty-brackets key.
  • The newline-escaping should never fire for proper inputs. Maybe an extra option to check for this mismatch and report an error rather than embedding "\n" strings would be useful to validate input markups.
  • Sorting (of leaf and/or normalized output) is indeed questionable, as depending on particular schema it may be or not be producing an output functionally equivalent to original markup. Its application is comparison of two markups for equivalence (if tuples may be considered arrays) so you can think of it as a form of hashing the source data into some reproducible cipher string to reliably compare, rather than a way to convert from markup conformant to user's unknown schema into some other possibly non-conformant JSON markup, if that description eases your mind ;)

Thanks,
Jim Klimov

@jimklimov
Copy link
Author

New features:

  • option to "extract" just items whose jpath matches a requested regex;
  • do display empty objects "{}" and arrays "[]" as leaf items if that is requested (honors pruning request). Still matches all defined self-tests on both Solaris and Linux ;)

@jimklimov
Copy link
Author

Looking at how other PRs have examples, I'll post a few. A picture shows more than a thousand words, heh? Here is a few thousand words for you, to give a good display of the new features ;)

So here is a complicated value contrived just for show-off. Per above, it may be arguable that newlines are invalid... but for example if the data is generated by some shell-script (copy of a multiline file or command output, etc.) - then my new features in JSON.sh precisely allow to turn that into (more) valid markup ;)

:; LINE='{"var1":"val1","split
key":"value","var0":"escaped \" quote","splitValue":"there
  are a newline and three spaces (one after \"there\" and two before \"are\")",
"array":["z","a","b",3,20,0,"","","\""
,"escaping\"several\"\"
quote\"s and
newlines"],"aNumber":1,"var8":"string\nwith\nproper\\\nnewlines",
"var38":"","emptyarr":[],"emptyobj":{},
"arrOfObjs":[{"var":"val1","str":"s"},{"var":"val30","str":"s"},
  {"var":"val2","str":"z"},{"var":"val2","str":"x"},
  {"var":"val1","str":"S"},{"var":"val1","str":"\""},
{"var":"val1","str":5},{"var":"val1","str":"5"}]}'

### Examples related to sorting will also need this to be reproducible:
:; LANG=C; LC_ALL=C; export LANG; export LC_ALL
  • Note the use of "echo -E" to avoid shell's processing of escaped characters (such as \n in var8):
:; echo -E "$LINE"
{"var1":"val1","split
key":"value","var0":"escaped \" quote","splitValue":"there
  are a newline and three spaces (one after \"there\" and two before \"are\")",
"array":["z","a","b",3,20,0,"","","\""
,"escaping\"several\"\"
quote\"s and
newlines"],"aNumber":1,"var8":"string\nwith\nproper\\\nnewlines",
"var38":"","emptyarr":[],"emptyobj":{},
"arrOfObjs":[{"var":"val1","str":"s"},{"var":"val30","str":"s"},
  {"var":"val2","str":"z"},{"var":"val2","str":"x"},
  {"var":"val1","str":"S"},{"var":"val1","str":"\""},
{"var":"val1","str":5},{"var":"val1","str":"5"}]}
  • There is a mode to detect invalid input due to newlines:
:; echo "$LINE" | ./JSON.sh --no-newline
Invalid JSON markup detected: newline in a string value: at line #1
EXPECTED value GOT EOF
  • Otherwise conversion takes place:
:;  echo -E "$LINE" | ./JSON.sh
["var1"]        "val1"
["split\nkey"]  "value"
["var0"]        "escaped \" quote"
["splitValue"]  "there\n  are a newline and three spaces (one after \"there\" and two before \"are\")"
["array",0]     "z"
["array",1]     "a"
["array",2]     "b"
["array",3]     3
["array",4]     20
["array",5]     0
["array",6]     ""
["array",7]     ""
["array",8]     "\""
["array",9]     "escaping\"several\"\"\nquote\"s and\nnewlines"
["array"]       ["z","a","b",3,20,0,"","","\"","escaping\"several\"\"\nquote\"s and\nnewlines"]
["aNumber"]     1
["var8"]        "string\nwith\nproper\\\nnewlines"
["var38"]       ""
["emptyarr"]    []
["emptyobj"]    {}
["arrOfObjs",0,"var"]   "val1"
["arrOfObjs",0,"str"]   "s"
["arrOfObjs",0] {"var":"val1","str":"s"}
["arrOfObjs",1,"var"]   "val30"
["arrOfObjs",1,"str"]   "s"
["arrOfObjs",1] {"var":"val30","str":"s"}
["arrOfObjs",2,"var"]   "val2"
["arrOfObjs",2,"str"]   "z"
["arrOfObjs",2] {"var":"val2","str":"z"}
["arrOfObjs",3,"var"]   "val2"
["arrOfObjs",3,"str"]   "x"
["arrOfObjs",3] {"var":"val2","str":"x"}
["arrOfObjs",4,"var"]   "val1"
["arrOfObjs",4,"str"]   "S"
["arrOfObjs",4] {"var":"val1","str":"S"}
["arrOfObjs",5,"var"]   "val1"
["arrOfObjs",5,"str"]   "\""
["arrOfObjs",5] {"var":"val1","str":"\""}
["arrOfObjs",6,"var"]   "val1"
["arrOfObjs",6,"str"]   5
["arrOfObjs",6] {"var":"val1","str":5}
["arrOfObjs",7,"var"]   "val1"
["arrOfObjs",7,"str"]   "5"
["arrOfObjs",7] {"var":"val1","str":"5"}
["arrOfObjs"]   [{"var":"val1","str":"s"},{"var":"val30","str":"s"},{"var":"val2","str":"z"},{"var":"val2","str":"x"},{"var":"val1","str":"S"},{"var":"val1","str":"\""},{"var":"val1","str":5},{"var":"val1","str":"5"}]
[]      {"var1":"val1","split\nkey":"value","var0":"escaped \" quote","splitValue":"there\n  are a newline and three spaces (one after \"there\" and two before \"are\")","array":["z","a","b",3,20,0,"","","\"","escaping\"several\"\"\nquote\"s and\nnewlines"],"aNumber":1,"var8":"string\nwith\nproper\\\nnewlines","var38":"","emptyarr":[],"emptyobj":{},"arrOfObjs":[{"var":"val1","str":"s"},{"var":"val30","str":"s"},{"var":"val2","str":"z"},{"var":"val2","str":"x"},{"var":"val1","str":"S"},{"var":"val1","str":"\""},{"var":"val1","str":5},{"var":"val1","str":"5"}]}
  • Returning a valid JSON markup string (i.e. using JSON.sh as a filter to make scripted output into more valid JSON:
:; echo -E "$LINE" | ./JSON.sh -N
{"var1":"val1","split\nkey":"value","var0":"escaped \" quote","splitValue":"there\n  are a newline and three spaces (one after \"there\" and two before \"are\")","array":["z","a","b",3,20,0,"","","\"","escaping\"several\"\"\nquote\"s and\nnewlines"],"aNumber":1,"var8":"string\nwith\nproper\\\nnewlines","var38":"","emptyarr":[],"emptyobj":{},"arrOfObjs":[{"var":"val1","str":"s"},{"var":"val30","str":"s"},{"var":"val2","str":"z"},{"var":"val2","str":"x"},{"var":"val1","str":"S"},{"var":"val1","str":"\""},{"var":"val1","str":5},{"var":"val1","str":"5"}]}
  • Sorted output with defaults taken by "sort", i.e. alphabetic ("3" is greater than "20"), etc. and according to current locale/collation (influencing order of numbers over punctuation over letters etc.):
:; $ echo -E "$LINE" | ./JSON.sh -S
["aNumber"]     1
["arrOfObjs",0,"str"]   "5"
["arrOfObjs",0,"var"]   "val1"
["arrOfObjs",0] {"str":"5","var":"val1"}
["arrOfObjs",1,"str"]   "S"
["arrOfObjs",1,"var"]   "val1"
["arrOfObjs",1] {"str":"S","var":"val1"}
["arrOfObjs",2,"str"]   "\""
["arrOfObjs",2,"var"]   "val1"
["arrOfObjs",2] {"str":"\"","var":"val1"}
["arrOfObjs",3,"str"]   "s"
["arrOfObjs",3,"var"]   "val1"
["arrOfObjs",3] {"str":"s","var":"val1"}
["arrOfObjs",4,"str"]   "s"
["arrOfObjs",4,"var"]   "val30"
["arrOfObjs",4] {"str":"s","var":"val30"}
["arrOfObjs",5,"str"]   "x"
["arrOfObjs",5,"var"]   "val2"
["arrOfObjs",5] {"str":"x","var":"val2"}
["arrOfObjs",6,"str"]   "z"
["arrOfObjs",6,"var"]   "val2"
["arrOfObjs",6] {"str":"z","var":"val2"}
["arrOfObjs",7,"str"]   5
["arrOfObjs",7,"var"]   "val1"
["arrOfObjs",7] {"str":5,"var":"val1"}
["arrOfObjs"]   [{"str":"5","var":"val1"},{"str":"S","var":"val1"},{"str":"\"","var":"val1"},{"str":"s","var":"val1"},{"str":"s","var":"val30"},{"str":"x","var":"val2"},{"str":"z","var":"val2"},{"str":5,"var":"val1"}]
["array",0]     ""
["array",1]     ""
["array",2]     "\""
["array",3]     "a"
["array",4]     "b"
["array",5]     "escaping\"several\"\"\nquote\"s and\nnewlines"
["array",6]     "z"
["array",7]     0
["array",8]     20
["array",9]     3
["array"]       ["","","\"","a","b","escaping\"several\"\"\nquote\"s and\nnewlines","z",0,20,3]
["emptyarr"]    []
["emptyobj"]    {}
["splitValue"]  "there\n  are a newline and three spaces (one after \"there\" and two before \"are\")"
["split\nkey"]  "value"
["var0"]        "escaped \" quote"
["var1"]        "val1"
["var38"]       ""
["var8"]        "string\nwith\nproper\\\nnewlines"
[]      {"aNumber":1,"arrOfObjs":[{"str":"5","var":"val1"},{"str":"S","var":"val1"},{"str":"\"","var":"val1"},{"str":"s","var":"val1"},{"str":"s","var":"val30"},{"str":"x","var":"val2"},{"str":"z","var":"val2"},{"str":5,"var":"val1"}],"array":["","","\"","a","b","escaping\"several\"\"\nquote\"s and\nnewlines","z",0,20,3],"emptyarr":[],"emptyobj":{},"splitValue":"there\n  are a newline and three spaces (one after \"there\" and two before \"are\")","split\nkey":"value","var0":"escaped \" quote","var1":"val1","var38":"","var8":"string\nwith\nproper\\\nnewlines"}
  • Sorting with parameters, several can be passed - i.e. numeric (20 is greater than 3 for standalone number tokens) and reversed ("a" is after "z"):
:; echo -E "$LINE" | ./JSON.sh -S='-r -n'
["var8"]        "string\nwith\nproper\\\nnewlines"
["var38"]       ""
["var1"]        "val1"
["var0"]        "escaped \" quote"
["split\nkey"]  "value"
["splitValue"]  "there\n  are a newline and three spaces (one after \"there\" and two before \"are\")"
["emptyobj"]    {}
["emptyarr"]    []
["array",0]     20
["array",1]     3
["array",2]     0
["array",3]     "z"
["array",4]     "escaping\"several\"\"\nquote\"s and\nnewlines"
["array",5]     "b"
["array",6]     "a"
["array",7]     "\""
["array",8]     ""
["array",9]     ""
["array"]       [20,3,0,"z","escaping\"several\"\"\nquote\"s and\nnewlines","b","a","\"","",""]
["arrOfObjs",0,"var"]   "val30"
["arrOfObjs",0,"str"]   "s"
["arrOfObjs",0] {"var":"val30","str":"s"}
["arrOfObjs",1,"var"]   "val2"
["arrOfObjs",1,"str"]   "z"
["arrOfObjs",1] {"var":"val2","str":"z"}
["arrOfObjs",2,"var"]   "val2"
["arrOfObjs",2,"str"]   "x"
["arrOfObjs",2] {"var":"val2","str":"x"}
["arrOfObjs",3,"var"]   "val1"
["arrOfObjs",3,"str"]   5
["arrOfObjs",3] {"var":"val1","str":5}
["arrOfObjs",4,"var"]   "val1"
["arrOfObjs",4,"str"]   "s"
["arrOfObjs",4] {"var":"val1","str":"s"}
["arrOfObjs",5,"var"]   "val1"
["arrOfObjs",5,"str"]   "\""
["arrOfObjs",5] {"var":"val1","str":"\""}
["arrOfObjs",6,"var"]   "val1"
["arrOfObjs",6,"str"]   "S"
["arrOfObjs",6] {"var":"val1","str":"S"}
["arrOfObjs",7,"var"]   "val1"
["arrOfObjs",7,"str"]   "5"
["arrOfObjs",7] {"var":"val1","str":"5"}
["arrOfObjs"]   [{"var":"val30","str":"s"},{"var":"val2","str":"z"},{"var":"val2","str":"x"},{"var":"val1","str":5},{"var":"val1","str":"s"},{"var":"val1","str":"\""},{"var":"val1","str":"S"},{"var":"val1","str":"5"}]
["aNumber"]     1
[]      {"var8":"string\nwith\nproper\\\nnewlines","var38":"","var1":"val1","var0":"escaped \" quote","split\nkey":"value","splitValue":"there\n  are a newline and three spaces (one after \"there\" and two before \"are\")","emptyobj":{},"emptyarr":[],"array":[20,3,0,"z","escaping\"several\"\"\nquote\"s and\nnewlines","b","a","\"","",""],"arrOfObjs":[{"var":"val30","str":"s"},{"var":"val2","str":"z"},{"var":"val2","str":"x"},{"var":"val1","str":5},{"var":"val1","str":"s"},{"var":"val1","str":"\""},{"var":"val1","str":"S"},{"var":"val1","str":"5"}],"aNumber":1}
  • Normalized output can also be sorted, upon request:
:; echo -E "$LINE" | ./JSON.sh -N='-n'
{"aNumber":1,"arrOfObjs":[{"str":"5","var":"val1"},{"str":"S","var":"val1"},{"str":"\"","var":"val1"},{"str":"s","var":"val1"},{"str":"s","var":"val30"},{"str":"x","var":"val2"},{"str":"z","var":"val2"},{"str":5,"var":"val1"}],"array":["","","\"","a","b","escaping\"several\"\"\nquote\"s and\nnewlines","z",0,3,20],"emptyarr":[],"emptyobj":{},"splitValue":"there\n  are a newline and three spaces (one after \"there\" and two before \"are\")","split\nkey":"value","var0":"escaped \" quote","var1":"val1","var38":"","var8":"string\nwith\nproper\\\nnewlines"}

:; echo -E "$LINE" | ./JSON.sh -N=-r
{"var8":"string\nwith\nproper\\\nnewlines","var38":"","var1":"val1","var0":"escaped \" quote","split\nkey":"value","splitValue":"there\n  are a newline and three spaces (one after \"there\" and two before \"are\")","emptyobj":{},"emptyarr":[],"array":[3,20,0,"z","escaping\"several\"\"\nquote\"s and\nnewlines","b","a","\"","",""],"arrOfObjs":[{"var":"val30","str":"s"},{"var":"val2","str":"z"},{"var":"val2","str":"x"},{"var":"val1","str":5},{"var":"val1","str":"s"},{"var":"val1","str":"\""},{"var":"val1","str":"S"},{"var":"val1","str":"5"}],"aNumber":1}

:; echo -E "$LINE" | ./JSON.sh -N="-r -n"
{"var8":"string\nwith\nproper\\\nnewlines","var38":"","var1":"val1","var0":"escaped \" quote","split\nkey":"value","splitValue":"there\n  are a newline and three spaces (one after \"there\" and two before \"are\")","emptyobj":{},"emptyarr":[],"array":[20,3,0,"z","escaping\"several\"\"\nquote\"s and\nnewlines","b","a","\"","",""],"arrOfObjs":[{"var":"val30","str":"s"},{"var":"val2","str":"z"},{"var":"val2","str":"x"},{"var":"val1","str":5},{"var":"val1","str":"s"},{"var":"val1","str":"\""},{"var":"val1","str":"S"},{"var":"val1","str":"5"}],"aNumber":1}
  • And note that the normalized output returns (maybe sorted) JSON markup of the top-level item without whitespaces between syntactic elements, and other JSON.sh modifiers are essentially ignored (the new -x option is detailed below):
:; echo -E "$LINE" | ./JSON.sh -x 'empty' -N
{"var1":"val1","split\nkey":"value","var0":"escaped \" quote","splitValue":"there\n  are a newline and three spaces (one after \"there\" and two before \"are\")","array":["z","a","b",3,20,0,"","","\"","escaping\"several\"\"\nquote\"s and\nnewlines"],"aNumber":1,"var8":"string\nwith\nproper\\\nnewlines","var38":"","emptyarr":[],"emptyobj":{},"arrOfObjs":[{"var":"val1","str":"s"},{"var":"val30","str":"s"},{"var":"val2","str":"z"},{"var":"val2","str":"x"},{"var":"val1","str":"S"},{"var":"val1","str":"\""},{"var":"val1","str":5},{"var":"val1","str":"5"}]}

### Normalization mode can still be used for validation of input markup though:
:; echo -E "$LINE" | ./JSON.sh --no-newline -N
Invalid JSON markup detected: newline in a string value: at line #1
EXPECTED value GOT EOF
  • We can debug why something is or is not printed and thanks to which logical block (interesting excerpts copypasted):
:; echo -E "$LINE" | ./JSON.sh -p -l -d
# Leaf value printed
=== KEY='"var1"' VALUE='"val1"' B='1' isleaf='1'/L='1' isempty='0'/P='1': print='4'
["var1"]        "val1"
# Empty value pruned from line-by-line output (not from JSON markup, not from index numbering):
=== KEY='"array",6' VALUE='""' B='1' isleaf='1'/L='1' isempty='1'/P='1': print='0'
=== KEY='"array",7' VALUE='""' B='1' isleaf='1'/L='1' isempty='1'/P='1': print='0'
=== KEY='"var38"' VALUE='""' B='1' isleaf='1'/L='1' isempty='1'/P='1': print='0'
# Empty arrays and objects are NOW also pruned on request (this is different from brief mode which just does not output objects/arrays at all):
=== KEY='"emptyarr"' VALUE='[]' B='0' isleaf='0'/L='1' isempty='1'/P='1': print='0'
=== KEY='"emptyobj"' VALUE='{}' B='0' isleaf='0'/L='1' isempty='1'/P='1': print='0'
# Non-leaf items skipped from line-by-line printing:
=== KEY='"arrOfObjs",7' VALUE='{"var":"val1","str":"5"}' B='0' isleaf='0'/L='1' isempty='0'/P='1': print='0'
=== KEY='"arrOfObjs"' VALUE='[{"var":"val1","str":"s"},{"var":"val30","str":"s"},{"var":"val2","str":"z"},{"var":"val2","str":"x"},{"var":"val1","str":"S"},{"var":"val1","str":"\""},{"var":"val1","str":5},{"var":"val1","str":"5"}]' B='0' isleaf='0'/L='1' isempty='0'/P='1': print='0'
=== KEY='' VALUE='...' B='0' isleaf='0'/L='1' isempty='0'/P='1': print='0'
  • BTW, now pruning of empty arrays/structs (from line-by-line output) is supported as well as for strings before:
:; ELINE='{"emptyarr":[],"emptyobj":{},"emptystr":""}'

# Here you see them...
:; echo -E "$ELINE" | ./JSON.sh
["emptyarr"]    []
["emptyobj"]    {}
["emptystr"]    ""
[]      {"emptyarr":[],"emptyobj":{},"emptystr":""}

# Here you don't ;)
:; echo -E "$ELINE" | ./JSON.sh -p
[]      {"emptyarr":[],"emptyobj":{},"emptystr":""}
  • Last but not least, we now have an extractor to simplify scripted requests to particular entries by their jpaths, which helps scripted interaction with the markup:
:; echo -E "$LINE" | ./JSON.sh -x 'empty'
["emptyarr"]    []
["emptyobj"]    {}

:;  echo -E "$LINE" | ./JSON.sh -x 'var'
["var1"]        "val1"
["var0"]        "escaped \" quote"
["var8"]        "string\nwith\nproper\\\nnewlines"
["var38"]       ""
["arrOfObjs",0,"var"]   "val1"
["arrOfObjs",1,"var"]   "val30"
["arrOfObjs",2,"var"]   "val2"
["arrOfObjs",3,"var"]   "val2"
["arrOfObjs",4,"var"]   "val1"
["arrOfObjs",5,"var"]   "val1"
["arrOfObjs",6,"var"]   "val1"
["arrOfObjs",7,"var"]   "val1"

# Regex can be used:
:;  echo -E "$LINE" | ./JSON.sh -x '^\"var'
["var1"]        "val1"
["var0"]        "escaped \" quote"
["var8"]        "string\nwith\nproper\\\nnewlines"
["var38"]       ""

:;  echo -E "$LINE" | ./JSON.sh -x 'var\"$'
["arrOfObjs",0,"var"]   "val1"
["arrOfObjs",1,"var"]   "val30"
["arrOfObjs",2,"var"]   "val2"
["arrOfObjs",3,"var"]   "val2"
["arrOfObjs",4,"var"]   "val1"
["arrOfObjs",5,"var"]   "val1"
["arrOfObjs",6,"var"]   "val1"
["arrOfObjs",7,"var"]   "val1"

# You can also pick array elements...
 echo -E "$LINE" | ./JSON.sh -x 'arrOfObjs\",[0-9]*$'
["arrOfObjs",0] {"var":"val1","str":"s"}
["arrOfObjs",1] {"var":"val30","str":"s"}
["arrOfObjs",2] {"var":"val2","str":"z"}
["arrOfObjs",3] {"var":"val2","str":"x"}
["arrOfObjs",4] {"var":"val1","str":"S"}
["arrOfObjs",5] {"var":"val1","str":"\""}
["arrOfObjs",6] {"var":"val1","str":5}
["arrOfObjs",7] {"var":"val1","str":"5"}

#... unless of course you use leaf-only mode:
:; echo -E "$LINE" | ./JSON.sh -x 'arrOfObjs\",[0-9]*$' -l

#...or you can pick just the contents of the arrays:
:;  echo -E "$LINE" | ./JSON.sh -x 'arrOfObjs\",[0-9]+,.+$'
["arrOfObjs",0,"var"]   "val1"
["arrOfObjs",0,"str"]   "s"
["arrOfObjs",1,"var"]   "val30"
["arrOfObjs",1,"str"]   "s"
["arrOfObjs",2,"var"]   "val2"
["arrOfObjs",2,"str"]   "z"
["arrOfObjs",3,"var"]   "val2"
["arrOfObjs",3,"str"]   "x"
["arrOfObjs",4,"var"]   "val1"
["arrOfObjs",4,"str"]   "S"
["arrOfObjs",5,"var"]   "val1"
["arrOfObjs",5,"str"]   "\""
["arrOfObjs",6,"var"]   "val1"
["arrOfObjs",6,"str"]   5
["arrOfObjs",7,"var"]   "val1"
["arrOfObjs",7,"str"]   "5"

#...perhaps just the items starting with an "s":
:; echo -E "$LINE" | ./JSON.sh -x 'arrOfObjs\",[0-9]+,\"s.+$'
["arrOfObjs",0,"str"]   "s"
["arrOfObjs",1,"str"]   "s"
["arrOfObjs",2,"str"]   "z"
["arrOfObjs",3,"str"]   "x"
["arrOfObjs",4,"str"]   "S"
["arrOfObjs",5,"str"]   "\""
["arrOfObjs",6,"str"]   5
["arrOfObjs",7,"str"]   "5"

# Only jpaths (not contents) are matched:
:; echo -E "$LINE" | ./JSON.sh -x '\n' -l
["split\nkey"]  "value"
  • And another new feature to help scripting is "cooking" of input strings into escaped JSON that should be valid markup (with no trailing newline as well):
:; RAWLINE='[ This is text
It has
Several "lines"
maybe \n escaped \"
}'

:; ESCAPED="`echo -E "$RAWLINE" | ./JSON.sh -Q`"; echo -E "'$ESCAPED'"
'[ This is text\nIt has\nSeveral \"lines\"\nmaybe \\n escaped \\\"\n}'

@jimklimov
Copy link
Author

Another example to show off increased usability in scripts: turn some text-file dumps into markup with escaped newlines:

:; ( echo '['; for F in /etc/motd /etc/release ; do printf '{"filename":"'"$F"'","contents":"%s"},\n' "`cat "$F"`"; done; echo '{}]' ) | ./JSON.sh -N
[{"filename":"/etc/motd","contents":"The Illumos Project        SunOS 5.11      illumos-ad69a33 January 2015"},{"filename":"/etc/release","contents":"             OpenIndiana Development oi_151.1.8 X86 (powered by illumos)\n        Copyright 2011 Oracle and/or its affiliates. All rights reserved.\n                        Use is subject to license terms.\n                           Assembled 19 February 2013"},{}]

Recent commits added toleration for TAB characters in string contents (there are some in /etc/motd of this example), as well as escaping during "cooking" of plaintext into single-line strings:

:; cat /etc/motd | ./JSON.sh -Q ; echo ""
The Illumos Project\tSunOS 5.11\tillumos-ad69a33\tJanuary 2015

This reverts commit 2aac0ec.
While "local" is unknown to "ksh", the "typeset" is unknown in "dash/ash"... bummer...
…ng) results so as to not choke on escaped chard we are testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants