Conversation
|
REQUEST FOR PRODUCTION RELEASES: This will add The following labels are available |
|
@davidrohr any objections to this? |
|
Hm, the current version works in the CI, and also on Anton's Mac which initially showed some issues. |
|
For me, given the following trivial csv: it fails with: → GPU/GPUTracking/Definitions/Parameters/csv_to_json.sh foo.csv
+ [[ -z foo.csv ]]
+ LANG=C
+ LC_ALL=C
+ DELIM=$'\377'
+ set -o pipefail
+ sed -E $':loop\n s/^(([^"]*"[^"]*")*[^"]*),/\\1\377/;\n t loop' foo.csv
+ awk $'-F\377' $'BEGIN {\n print "{"\n } {\n if (count == 0) {\n for (i = 1; i <= NF; i++) {\n names[i] = $i\n }\n } else if ($1 == "CORE:" || $1 == "LB:" || $1 == "PAR:") {\n if (paramprinted) print "\\n }"\n else if (lineprinted) print ""\n if (catprinted) print " },"\n lineprinted = 0\n paramprinted = 0\n catprinted = 1\n gsub(/:$/, "", $1)\n print " \\""$1"\\": {";\n } else if ($1 != "") {\n if (lineprinted) print ""\n if (paramprinted) print " },"\n lineprinted = 0\n paramprinted = 1\n print " \\""$1"\\": {";\n lineprinted = 0\n for (i=2; i<=NF; i++) {\n if ($i != "") {\n gsub(/^"/, "", $i)\n gsub(/"$/, "", $i)\n gsub(/""/, "\\"", $i)\n if (lineprinted) print ","\n lineprinted = 1\n printf(" \\"%s\\": %s", names[i], $i)\n }\n }\n }\n count++;\n } END {\n if (paramprinted) print "\\n }"\n if (catprinted) print " }"\n print "}"\n }'
sed: RE error: illegal byte sequence
{
} |
|
Hm, I don't understand. We tried at Anton's mac, which had exactly that failure before, and setting the LANG and LC_ALL fixed it. |
|
This is the proper fix. There is no guarantee that LANG and LC_ALL are actually exported in the first place and if they are not, then sed misbehaves with the unicode char. I still hold that using an unicode character as delimiter and setting the locale is probably looking for troubles. |
|
If you could try and check if \x1C or \x1F instread of \xFF, that might be safer, since it is in the lower 128 ascii characters. It works for me on Linux but I cannot check MacOS. |
|
Yes, indeed \x1F works as well. Given it's "Unit Separator" I guess that's what it was intended for... |
|
Still you do need the export, otherwise other UTF8 chars in the input might confuse sed. |
|
Yes, sure, so let's do these 2 changes and hope that then it is fixed for good. |
Old code only works if the variables were already exported.
Old code only works if the variables were already exported.