From 7c17cdb273f3a8e376bdface8e053b02a5f75642 Mon Sep 17 00:00:00 2001 From: Kale Kundert Date: Wed, 7 Oct 2020 16:08:04 -0400 Subject: [PATCH] doc: create "Why NestedText?" section The index page had a lot of good material, but it was getting so long that I worried people wouldn't read it. To help with that, I moved some of the content into a new section devoted to making the case for *NestedText*. This commit also adds a section on TOML to the "Alternatives" page. --- README.rst | 162 +++++-------------------- doc/alternatives.rst | 102 ++++++++++++++++ doc/examples.rst | 1 + doc/index.rst | 9 +- doc/{releases.rst => installation.rst} | 0 doc/philosophy.rst | 25 ++++ 6 files changed, 166 insertions(+), 133 deletions(-) create mode 100644 doc/alternatives.rst rename doc/{releases.rst => installation.rst} (100%) create mode 100644 doc/philosophy.rst diff --git a/README.rst b/README.rst index aae8e89..80ed640 100644 --- a/README.rst +++ b/README.rst @@ -82,141 +82,39 @@ The format holds dictionaries (ordered collections of name/value pairs), lists (ordered collections of values) and strings (text) organized hierarchically to any depth. Indentation is used to indicate the hierarchy of the data, and a simple natural syntax is used to distinguish the types of data in such -a manner that it is not easily confused. Specifically, lines that begin with -a word or words followed by a colon are dictionary items; a dash introduces list -items, and a leading greater-than symbol signifies a line in a multi-line -string. Dictionaries and lists are used for nesting and the leaf values are -always simple text, hence the name, *NestedText*. The top-level must be -a dictionary. +a manner that it is not easily confused. Specifically, lines that begin with a +word (or words) followed by a colon are dictionary items, lines that begin with +a dash are list items, and lines that begin with a greater-than sign are part +of a multi-line string. Dictionaries and lists can be nested arbitrarily, and +the leaf values are always text, hence the name *NestedText*. *NestedText* is somewhat unique in that the leaf values are always strings. Of course the values start off as strings in the input file, but alternatives like -JSON or YAML aggressively convert those values into the underlying data types -such as integers, floats, and Booleans. For example, a value like 2.10 would be -converted to a floating point number. But making the decision to do so is based -purely on the form of the value, not the context in which it is found. This can -lead to misinterpretations. For example, assume that this value is the software -version number two point ten. By converting it to a floating point number it -becomes two point one, which is wrong. There are many possible versions of this -basic issue. But there is also the inverse problem; values that should be -converted to particular data types but are not recognized. For example, a value -of $2.00 should be converted to a real number but would be a string instead. -There are simply too many values types for a general purpose solution that is -only looking at the values themselves to be able to interpret all of them. For -example, 12/10/09 is likely a date, but is it in MM/DD/YY, YY/MM/DD or DD/MM/YY -form? The fact is, the value alone is often insufficient to reliably determine -how to convert values into internal data types. *NestedText* avoids these -problems by leaving the values in their original form and allowing the decision -to be made by the end application where more context is available to help guide -the conversions. If a price is expected for a value, then $2.00 would be -checked and converted accordingly. Similarly, local conventions along with the -fact that a date is expected for a particular value allows 12/10/09 to be -correctly validated and converted. This process of validation and conversion is -referred to as applying a schema to the data. There are packages such as -`Voluptuous `_ and `Pydantic -`_ available that make this process easy -and reliable. - - -The Zen of *NestedText* ------------------------ - -*NestedText* aspires to be a simple dumb vessel that holds peoples' structured -data, and to do so in a way that allows people to easily interact with that -data. - -The desire to be simple is an attempt to minimize the effort required to learn -and use the language. Ideally people can understand it by looking at one or two -examples and they can use it without without needing to remember any arcane -rules and without relying on any of the knowledge that programmers accumulate -through years of experience. One source of simplicity is consistency. As such, -*NestedText* uses a small amount of rules that it applies with few exceptions. - -The desire to be dumb means that it tries not to transform the data in any -meaningful way. It allows you to recover the structure in your data without -doing anything that might change the interpretation of the data. Rather, it -tries to make it easy for you to interpret the data by managing the structure, -which allows you to analyze it in small easy to interpret pieces without making -any changes that would get in your way. - - -Alternatives ------------- - -There are no shortage of well established alternatives to *NestedText* for -storing data in a human-readable text file. Probably the most obvious are `json -`_ and `YAML -`_. Both are primarily intended to -be used as serialization languages. *NestedText* is not intended to be used as -a serialization language, rather it is more suitable for configuration and hand -generated and edited data files. In these applications, both *JSON* and *YAML* -have significant short comings. - - -JSON -"""" - -*JSON* is a subset of JavaScript suitable for holding data. Like *NestedText*, -it consists of a hierarchical collection of dictionaries, lists, and strings, -but also allows integers, floats, Booleans and nulls. The problem with *JSON* -for this application is that it is awkward. With all those data types it must -syntactically distinguish between them. For example, in *JSON* 32 is an -integer, 32.0 is the real version of 32, and "32" is the string version. These -distinctions are not meaningful and can be confusing to non-programmers. In -addition, in most datasets a majority of leaf values are strings and the -required quotes adds substantial visual clutter. *NestedText* avoids these -issues by keeping all leaf values as unmodified strings; no need for quoting or -escaping. It is up to the application that employs *NestedText* as an input -format to use context to check these strings and convert them to the right -datatype. - -*JSON* does not provide for multi-line strings and any special characters like -newlines are encoded with escape codes, which can make strings long and -difficult to interpret. Also, dictionary and list items must be separated with -commas, but a comma must not follow last item. All of this results in *JSON* -being a frustrating format for humans to read, enter or edit. - -*NestedText* has the following clear advantages over *JSON* as human readable -and writable data file format: - -- text does not require quotes -- data is left in its original form -- comments -- multiline strings -- special characters without escaping them -- commas are not used to separate dictionary and list items - - -YAML -"""" - -*YAML* is considered by many to be a human friendly alternative to *JSON*, but -over time it has accumulated too many data types and too many formats. To -distinguish between all the various types and formats, a complicated and -non-intuitive set of rules developed. *YAML* at first appears very appealing -when used with simple examples, but things can quickly become complicated or -provide unexpected results. A reaction to this is the use of *YAML* subsets, -such as `StrictYAML `_. However, the subsets -still try to maintain compatibility with *YAML* and so inherit much of its -complexity. For example, both *YAML* and *StrictYAML* support `nine different -ways of writing multi-line strings -`_. - -*YAML* avoids excessive quoting and supports comments and multiline strings, but -like *JSON* it converts data to the underlying data types as appropriate, but -unlike with *JSON*, the lack of quoting makes the format ambiguous, which means -it has to guess at times, and small seemingly insignificant details can affect -the result. - -*NestedText* was inspired by *YAML*, but eschews its complexity. It has the -following clear advantages over *YAML* as human readable and writable data file -format: - -- simple -- unambiguous (no implicit typing) -- data is left in its original form -- syntax is insensitive to special characters within text -- safe, no risk of malicious code execution +*YAML* or *TOML* aggressively convert those values into the underlying data +types such as integers, floats, and Booleans. For example, a value like 2.10 +would be converted to a floating point number. But making the decision to do so +is based purely on the form of the value, not the context in which it is found. +This can lead to misinterpretations. For example, assume that this value is +the software version number two point ten. By converting it to a floating point +number it becomes two point one, which is wrong. There are many possible +versions of this basic issue. But there is also the inverse problem; values +that should be converted to particular data types but are not recognized. For +example, a value of $2.00 should be converted to a real number but would be a +string instead. There are simply too many values types for a general purpose +solution that is only looking at the values themselves to be able to interpret +all of them. For example, 12/10/09 is likely a date, but is it in MM/DD/YY, +YY/MM/DD or DD/MM/YY form? The fact is, the value alone is often insufficient +to reliably determine how to convert values into internal data types. +*NestedText* avoids these problems by leaving the values in their original form +and allowing the decision to be made by the end application where more context +is available to help guide the conversions. If a price is expected for a +value, then $2.00 would be checked and converted accordingly. Similarly, local +conventions along with the fact that a date is expected for a particular value +allows 12/10/09 to be correctly validated and converted. This process of +validation and conversion is referred to as applying a schema to the data. +There are packages such as `Pydantic `_ +and `Voluptuous `_ available that +make this process easy and reliable. Issues diff --git a/doc/alternatives.rst b/doc/alternatives.rst new file mode 100644 index 0000000..69b3403 --- /dev/null +++ b/doc/alternatives.rst @@ -0,0 +1,102 @@ +************ +Alternatives +************ + +There are no shortage of well established alternatives to *NestedText* for +storing configuration data in a human-readable text file. The features and +shortcomings of some of these alternatives are discussed below: + +JSON +==== + +JSON_ is a subset of JavaScript suitable for holding data. Like *NestedText*, +it consists of a hierarchical collection of dictionaries, lists, and strings, +but also allows integers, floats, Booleans and nulls. The fundamental problem +with *JSON* in this context is that its meant for serializing and exchanging +data between programs; it's not meant for configuration files. Of course, it's +used for this purpose anyways, where it has a number of glaring shortcomings: + +To begin, it has an excessive amount of syntactic clutter. Dictionary keys and +strings both have to be quoted, commas are required between dictionary and list +items (but forbidden after the last item), braces are required around +dictionaries, etc. Features that would improve clarity are also lacking. +Comments are not allowed, multiline strings are not supported, and whitespace +is insignificant (leading to the possibility that the appearance of the data +may not match its true structure). More conceptually, it is the responsibility +of the user to provide data of the correct type (e.g. ``32`` vs. ``32.0`` vs. +``"32"``), even though the application already knows what type it expects. All +of this results in *JSON* being a frustrating format for humans to read, enter +or edit. + +*NestedText* has the following clear advantages over *JSON* as human readable +and writable data file format: + +- text does not require quotes +- data is left in its original form +- comments +- multiline strings +- special characters without escaping them +- commas are not used to separate dictionary and list items + +YAML +==== + +YAML_ is considered by many to be a human friendly alternative to *JSON*, but +over time it has accumulated too many data types and too many formats. To +distinguish between all the various types and formats, a complicated and +non-intuitive set of rules developed. *YAML* at first appears very appealing +when used with simple examples, but things can quickly become complicated or +provide unexpected results. A reaction to this is the use of *YAML* subsets, +such as StrictYAML_. However, the subsets still try to maintain compatibility +with *YAML* and so inherit much of its complexity. For example, both *YAML* and +*StrictYAML* support `nine different ways of writing multi-line strings +`_. + +*YAML* avoids excessive quoting and supports comments and multiline strings, but +like *JSON* it converts data to the underlying data types as appropriate, but +unlike with *JSON*, the lack of quoting makes the format ambiguous, which means +it has to guess at times, and small seemingly insignificant details can affect +the result. + +*NestedText* was inspired by *YAML*, but eschews its complexity. It has the +following clear advantages over *YAML* as human readable and writable data file +format: + +- simple +- unambiguous (no implicit typing) +- data is left in its original form +- syntax is insensitive to special characters within text +- safe, no risk of malicious code execution + +TOML +==== + +TOML_ is a configuration file format inspired by the well-known *INI* syntax. +It supports a number of basic data types (notably including dates and times) +using syntax that is more similar to *JSON* (explicit but verbose) than to +*YAML* (succinct but confusing). As discussed previously, though, this makes +it the responsibility of the user to specify the correct type for each field, +when it should be the responsibility of the application to convert each field +to the correct type. + +Another flaw in TOML is that it is difficult to specify deeply nested +structures. The only way to specify a nested dictionary is to give the full +key to that dictionary, relative to the root of the entire hierarchy. This is +not much a problem if the hierarchy only has 1-2 levels, but any more than that +and you find yourself typing the same long keys over and over. A corollary to +this is that TOML-based configurations do not scale well: increases in +complexity are often accompanied by disproportionate decreases in readability +and writability. + +*NestedText* has the following clear advantages over *TOML* as human readable +and writable data file format: + +- text does not require quotes +- data is left in its original form +- indentation used to succinctly represent nested data +- the structure of the file matches the structure of the data + +.. _json: https://www.json.org/json-en.html +.. _yaml: https://yaml.org/ +.. _strictyaml: + alternatives + .. toctree:: :caption: Getting started :maxdepth: 1 - releases + installation basic_syntax basic_use schemas diff --git a/doc/releases.rst b/doc/installation.rst similarity index 100% rename from doc/releases.rst rename to doc/installation.rst diff --git a/doc/philosophy.rst b/doc/philosophy.rst new file mode 100644 index 0000000..e02528c --- /dev/null +++ b/doc/philosophy.rst @@ -0,0 +1,25 @@ +*********************** +The Zen of *NestedText* +*********************** + +*NestedText* aspires to be a simple dumb vessel that holds peoples' structured +data, and does so in a way that allows people to easily interact with that +data. + +The desire to be simple is an attempt to minimize the effort required to learn +and use the language. Ideally people can understand it by looking at one or two +examples and they can use it without without needing to remember any arcane +rules and without relying on any of the knowledge that programmers accumulate +through years of experience. One source of simplicity is consistency. As such, +*NestedText* uses a small amount of rules that it applies with few exceptions. + +The desire to be dumb means that *NestedText* tries not to transform the data +in any meaningful way. It parses the structure of the data without doing +anything that might change how the data is interpreted. Instead, it aims to +make it easy for you to interpret the data yourself. After all, you understand +what the data is supposed to mean, so you are in the best position to interpret +it. There are also many powerful tools available to help with :doc:`this exact +task `. + + +