v.db.select: Handle all formats through option, add CSV #1121

wenzeslaus · 2020-11-25T03:38:22Z

Instead of separate flags for JSON and vertical output format, use format option
so -j is replaced by format=json, etc. The formats are no longer spread among
the other flags and a written command is more readble. It is also more explicit
since the default format is now represented in the interface and vertical output
clearly stands out as yet another alternative.

The default format is named plain, alternative names are be text, ascii and default.

The use of an option does not change anything major in the code in terms of output
except for use of ints/booleans instead of directly accessing the flags' answers.
The parameter handling changed significantly. The standard exclude function
cannot be used and custom checks and messages are needed.

The significant clean-up related changes to existing formats are:

The JSON output of NULL values is fixed, writing JSON literal null.
JSON outputs are now always Object (dictionary) so theoretically they can be combined or enhanced in the future.
Different default separator for different formats and outputs.
Separator for -e can be customized.
All required JSON escapes are escaped.
Tab is escaped when escaping is enabled (so you could write TSV with escapes using plain).
JSON escapes except for double quotes are applied when escaping is enabled for other formats.
Vertical output does not add a newline when newline is vertical separator (no more two newlines).

This adds another format, CSV, as format=csv, i.e., without need to add another
flag (e.g., -c and -v were already taken). Most of the code was already there
as basic structure was the original plain default output and quoting was already
done for the JSON output. The original format was not a proper CSV because it
lacked handling quoting of fields which for common CSV parsers takes care of new line
characters and separators in values/cells. It was not completely safe even with escaping
because it did does not escape to the separator (to be delimeter separated values format
with escapes).

Now in CSV, all non-numeric columns are quoted (with two quotes representing a quote in the text).
The quotes used are double quotes, the separator is derived by the horizontal/field
separator option. By default is is a comma (a different separator is set for each format by default).
CSV output is available even for the -r flag (minimal region/bbox).

wenzeslaus · 2020-11-25T04:00:29Z

@landam Can you please look at this? v.db.select seems like a great example of a reduction of multiple flags into a single option.

Anybody, let me know what do you think. The idea is that v.db.select supports 3 formats (4 with newly added real CSV), but instead of having a flag for each format, this PR uses only one option/parameter which can have different values making the selection more clear (you can choose only one and you see that they are all in one group). This issue is generally important because there is a lot of cases of multiple related and exclusive flags and many of those are different output formats like in this case.

Before:

  -v   Vertical output (instead of horizontal)
  -j   JSON output
  -c   CSV output [newly added]

After:

              format   Output format
                       options: plain,csv,json,vertical
                       default: plain

There are some additional issues in regard to how different formats interact with the other options and flags. For example, JSON ignores most of the other format-related options while CSV supports different separators, but ideally would have comma as default rather than pipe which is the current default. Another example is -r flag which gives only the bounding box (not the attributes) and always ignores the separator in the current code.

This PR is now in draft mode, but I want to get some early feedback. If you are viewing the diff right now, you may want to check Hide whitespace changes in settings. I will eventually put the code reformatting into a separate PR.

wenzeslaus · 2020-12-02T02:28:23Z

This is ready for general comments, but I'm especially interested in feedback regarding the replacement of -v -j (-c) by format=plain|csv|json|vertical.

(Reformatting done in #1138. This now contains only the actual changes.)

lucadelu · 2020-12-02T07:42:40Z

This is ready for general comments, but I'm especially interested in feedback regarding the replacement of -v -j (-c) by format=plain|csv|json|vertical.

(Reformatting done in #1138. This now contains only the actual changes.)

I support this change, I think is better an option than several flag

mlennert · 2020-12-02T11:13:46Z

I also support this change. Just out of curiosity: what is the difference between plain with sep=comma and csv ?

wenzeslaus · 2020-12-04T03:24:54Z

what is the difference between plain with sep=comma and csv ?

That's a good question. The current and future default output allows escaping of certain characters and does not quote any fields. The escaping there is not really the common CSV escaping while other things namely separator inside fields is not handled in any way.

The new CSV output quotes all text fields/cells allowing the separator to be in there while, similarly to JSON, doing its own escaping, namely doubling the quotation mark which is a common CSV practice. Now, in combination with a proper CSV parser, you can have separator character, quotation character, and newline in cells and it will work. Any separator is allowed and default is now unfortunately pipe, not comma, so that's something I would like to change, but I don't know how to express that in the interface since we have default separator pipe everywhere and it still makes sense for the default/plain output (I was thinking separator=default: use format's default separator). (One limitation still there is that if you use separator which can occur in the numeric field such as period, the resulting CSV will be invalid. Seem like an unlikely problem though.)

The default/plain output makes sense even in the light of the CSV if we want a human readable output for the command line. Quotes would make too much clutter, and you can make sense of some tricks such as separator or newline in the cell. Pipe makes sense as it makes more readable than a comma. Here comes a question whether we should document format=plain (or whatever is the name) as something stable and thus okay to parse (current state before JSON and CSV), or as something which is really just meant for human eyes and may change without warning (more like r.univar or r.info default outputs).

Instead of separate flags for JSON and vertical output format, use format option so -j is replaced by format=json, etc. The formats are no longer spread among the other flags and a written command is more readble. It is also more explicit since the default format is now represented in the interface and vertical output clearly stands out as yet another alternative. The default format is named plain, alternative names are be text, ascii and default. The user of an option does not change anything major in the code in terms of output except for use of ints/booleans instead of directly accessing the flags' answers. The parameter handling changed significatly. The standard exclude function cannot be used and custom checks and messages are needed. This adds another format, CSV, as format=csv, i.e., without need to add another flag (e.g., -c and -v were already taken). Most of the code was already there as basic structure was the original plain default output and quoting was already done for the JSON output. The original format was not a proper CSV because it lacked handling quoting of fields which for common CSV parsers takes care of new line characters and separators in values/cells. All non-numeric columns are quoted (with two quotes representing a quote in the text). The quotes used are double quotes, the separator is drived by the horizontal/field separator option which is unfortunate because it is not a comma by default. CSV output is available even for the -r flag (minimal region/bbox). The JSON output of NULL values is fixed, writing JSON literal null.

…llenging characters

* Full JSON escape. * Escape also tabs for plain, vertical, and CSV if escaping is requested. * Allow escaping for CSV.

…y (allows for additions to the file)

* Enable all escapes for plain text/CSV too except for quote for simplicity.

wenzeslaus · 2021-05-19T04:05:32Z

The four old and new formats are described in the documentation.

PR description updated to reflect the new state.

wenzeslaus · 2021-05-24T21:32:19Z

To make the review easier, this has tests which shows that the different forms work even with special characters and non-default separators in case of CSV.

vector/v.db.select/main.c

HuidaeCho · 2021-05-26T22:29:54Z

vector/v.db.select/main.c

+        fatal_error_option_value_excludes_option(options.format, options.vsep,
+                                                 _("Only vertical output can use vertical separator"));
+    }
+


if (csv) fatal_error_option_value_excludes_option(options.format, options.fsep, _("Separator is based on the format"));

CSV (comma-separated values) is only comma-separated. Right now, fsep is configurable with comma as default for CSV, but can we call it CSV when a non-comma separator is used?

Although comma is often expected for CSV, CSV is also used to refer to broader format group with same or roughly same syntax, but different separator as demonstrated by the csv package in Python. In some natural languages, comma is used as a decimal point which leads to semicolon being the most common separator (as comma may lead to confusion even when period is used as decimal point). This is visible even in some GRASS helper code. So, yes, we can call it CSV even if delimiter/separator is customizable.

Practically, allowing different separators gives users the choice to produce some flavors of TSV or DSV although the plain format may be more suitable for that.

vector/v.db.select/main.c

…s, fix usage of binary versus logical or

vector/v.db.select/main.c

wenzeslaus · 2021-06-11T02:29:48Z

This is ready to be merged, but now you can test it at https://mybinder.org/v2/gh/wenzeslaus/grass/v_db_select_formats_and_csv after opening New > Terminal and run:

grass data/grassdata/nc_basic_spm_grass7/PERMANENT --exec v.db.select points_of_interest format=json

wenzeslaus · 2021-06-19T04:23:18Z

@HuidaeCho unless you have further comment I will merge. The Windows tests fails due to failed g.download.location download and a parallel temporal test fails with unrelated race condition.

HuidaeCho · 2021-06-19T13:30:53Z

@wenzeslaus Everything looks good!

wenzeslaus · 2021-06-20T01:19:31Z

Thanks @HuidaeCho. Happy parsing everyone!

) Instead of separate flags for JSON and vertical output format, use format option, so -j is replaced by format=json, etc. The formats are no longer spread among the other flags and a written command is more readble. It is also more explicit since the default format is now represented in the interface and vertical output clearly stands out as yet another alternative. The default format is named plain, alternative names considered and rejected were text, ascii and default. It is meant to be read by humans, but now it can be also newly safely parsed when set as TSV, however, this feature is questionable and the idea is that plain is not used for parsing. The use of an option does not change anything major in the code in terms of output except for use of an enum instead of directly accessing the flags' answers. The parameter handling changed significantly. The standard exclude function cannot be used and custom checks and messages are needed to disallow combinations such as JSON with custom null value. The significant clean-up related changes to existing formats are: * The JSON output of NULL values is fixed, writing JSON literal null. * JSON outputs are now always Object (dictionary) so theoretically they can be combined or enhanced in the future. * Different default separator for different formats and outputs (pipe for plain, comma for CSV). * Separator for -e can now be customized. * All required JSON escapes are escaped. * Tab is escaped when escaping is enabled (so you could theoretically write TSV with escapes using plain). * For all formats, JSON-style escapes except for double quotes are applied when escaping is enabled. * Vertical output does not add a newline when newline is vertical separator (no more two newlines). This adds another format, CSV, as format=csv, i.e., without need to add another flag (e.g., -c and -v were already taken). Most of the code was already there as basic structure was the original plain default output and quoting was already done for the JSON output. The original format was not a proper CSV because it lacked handling quoting of fields which for common CSV parsers takes care of new line characters and separators in values/cells. It was not completely safe even with escaping because it did does not escape to the separator (to be delimiter separated values format with escapes). Delimiter for CSV can be customized, but the quoting cannot. Now in CSV, all non-numeric columns are quoted (with two quotes representing a quote in the text). The quotes used are double quotes, the separator is derived by the horizontal/field separator option. By default is is a comma (a different separator is set for each format by default). CSV output is available even for the -r flag (minimal region/bbox).

This was referenced Nov 25, 2020

wxGUI/dbmgr: Show proper error when table parsing failed #1112

Merged

[Feat] Use different output formats if needed, but do not misuse stderr #1129

Open

wenzeslaus force-pushed the v_db_select_formats_and_csv branch from db98de7 to c73047a Compare December 2, 2020 02:17

wenzeslaus marked this pull request as ready for review December 2, 2020 02:18

wenzeslaus mentioned this pull request Jan 8, 2021

r.geomorphon: implement a new "one-off" mode #1157

Merged

wenzeslaus added 3 commits May 14, 2021 14:51

Fix mistakes in JSON and CSV syntax

f6732b8

Add test with standard Python CSV and JSON loading which includes cha…

5d578cc

…llenging characters

wenzeslaus force-pushed the v_db_select_formats_and_csv branch from c73047a to 5d578cc Compare May 15, 2021 03:45

wenzeslaus added 8 commits May 18, 2021 14:19

Use different separators as defaults for different formats

52694aa

* Full JSON escape. * Escape also tabs for plain, vertical, and CSV if escaping is requested. * Allow escaping for CSV.

Extend the basic test, doc the basic JSON test

e7e094e

Nest the results to dictionary item in JSON for forwards compatibilit…

6db5897

…y (allows for additions to the file)

If vsep is newline, don't add it

4048898

Allow escape for CSV as an addition to double double quote.

09f8e0f

* Enable all escapes for plain text/CSV too except for quote for simplicity.

Add doc (desc and Python example)

04148f4

Update authors and year

9128caf

Document and Pylint clean the new test file

de3829c

wenzeslaus requested a review from HuidaeCho May 19, 2021 04:00

wenzeslaus added bug Something isn't working enhancement New feature or request vector Related to vector data processing labels May 19, 2021

HuidaeCho requested changes May 26, 2021

View reviewed changes

Replace multiple format vars by one enum, unify some of the condition…

d64c9df

…s, fix usage of binary versus logical or

Do not allow null value replacement to be used with JSON

334dacd

wenzeslaus requested a review from HuidaeCho June 6, 2021 20:13

nilason reviewed Jun 6, 2021

View reviewed changes

vector/v.db.select/main.c Outdated Show resolved Hide resolved

wenzeslaus added 4 commits June 6, 2021 21:03

Use stdbool bool instead of int/TRUE/FALSE

aec6e01

Use bool for all boolean vars

65ebe22

Merge master branch to get bool to git.h

ba12ce8

bool is now guaranteed by gis.h, so save one line

3fc02cf

HuidaeCho approved these changes Jun 19, 2021

View reviewed changes

wenzeslaus merged commit dfc9b05 into OSGeo:master Jun 20, 2021

wenzeslaus deleted the v_db_select_formats_and_csv branch June 20, 2021 01:15

neteler added this to the 8.0.0 milestone Dec 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v.db.select: Handle all formats through option, add CSV #1121

v.db.select: Handle all formats through option, add CSV #1121

wenzeslaus commented Nov 25, 2020 •

edited

Loading

wenzeslaus commented Nov 25, 2020 •

edited

Loading

wenzeslaus commented Dec 2, 2020

lucadelu commented Dec 2, 2020

mlennert commented Dec 2, 2020

wenzeslaus commented Dec 4, 2020

wenzeslaus commented May 19, 2021

wenzeslaus commented May 24, 2021

HuidaeCho May 26, 2021 •

edited

Loading

wenzeslaus Jun 6, 2021

wenzeslaus commented Jun 11, 2021

wenzeslaus commented Jun 19, 2021 •

edited

Loading

HuidaeCho commented Jun 19, 2021

wenzeslaus commented Jun 20, 2021

v.db.select: Handle all formats through option, add CSV #1121

v.db.select: Handle all formats through option, add CSV #1121

Conversation

wenzeslaus commented Nov 25, 2020 • edited Loading

wenzeslaus commented Nov 25, 2020 • edited Loading

wenzeslaus commented Dec 2, 2020

lucadelu commented Dec 2, 2020

mlennert commented Dec 2, 2020

wenzeslaus commented Dec 4, 2020

wenzeslaus commented May 19, 2021

wenzeslaus commented May 24, 2021

HuidaeCho May 26, 2021 • edited Loading

Choose a reason for hiding this comment

wenzeslaus Jun 6, 2021

Choose a reason for hiding this comment

wenzeslaus commented Jun 11, 2021

wenzeslaus commented Jun 19, 2021 • edited Loading

HuidaeCho commented Jun 19, 2021

wenzeslaus commented Jun 20, 2021

wenzeslaus commented Nov 25, 2020 •

edited

Loading

wenzeslaus commented Nov 25, 2020 •

edited

Loading

HuidaeCho May 26, 2021 •

edited

Loading

wenzeslaus commented Jun 19, 2021 •

edited

Loading