doc: create "Why NestedText?" section

The index page had a lot of good material, but it was getting so long that I worried people wouldn't read it. To help with that, I moved some of the content into a new section devoted to making the case for *NestedText*. This commit also adds a section on TOML to the "Alternatives" page.
KenKundert · Oct 7, 2020 · 7c17cdb · 7c17cdb
1 parent ab737a0
commit 7c17cdb
Show file tree

Hide file tree

Showing 6 changed files with 166 additions and 133 deletions.
diff --git a/README.rst b/README.rst
@@ -82,141 +82,39 @@ The format holds dictionaries (ordered collections of name/value pairs), lists
 (ordered collections of values) and strings (text) organized hierarchically to 
 any depth.  Indentation is used to indicate the hierarchy of the data, and 
 a simple natural syntax is used to distinguish the types of data in such 
-a manner that it is not easily confused.  Specifically, lines that begin with 
-a word or words followed by a colon are dictionary items; a dash introduces list 
-items, and a leading greater-than symbol signifies a line in a multi-line 
-string.  Dictionaries and lists are used for nesting and the leaf values are 
-always simple text, hence the name, *NestedText*.  The top-level must be 
-a dictionary.
+a manner that it is not easily confused.  Specifically, lines that begin with a 
+word (or words) followed by a colon are dictionary items, lines that begin with 
+a dash are list items, and lines that begin with a greater-than sign are part 
+of a multi-line string.  Dictionaries and lists can be nested arbitrarily, and 
+the leaf values are always text, hence the name *NestedText*.
 
 *NestedText* is somewhat unique in that the leaf values are always strings. Of 
 course the values start off as strings in the input file, but alternatives like 
-JSON or YAML aggressively convert those values into the underlying data types 
-such as integers, floats, and Booleans.  For example, a value like 2.10 would be 
-converted to a floating point number. But making the decision to do so is based 
-purely on the form of the value, not the context in which it is found.  This can 
-lead to misinterpretations.  For example, assume that this value is the software 
-version number two point ten. By converting it to a floating point number it 
-becomes two point one, which is wrong. There are many possible versions of this 
-basic issue. But there is also the inverse problem; values that should be 
-converted to particular data types but are not recognized. For example, a value 
-of $2.00 should be converted to a real number but would be a string instead.
-There are simply too many values types for a general purpose solution that is 
-only looking at the values themselves to be able to interpret all of them.  For 
-example, 12/10/09 is likely a date, but is it in MM/DD/YY, YY/MM/DD or DD/MM/YY 
-form?  The fact is, the value alone is often insufficient to reliably determine 
-how to convert values into internal data types.  *NestedText* avoids these 
-problems by leaving the values in their original form and allowing the decision 
-to be made by the end application where more context is available to help guide 
-the conversions.  If a price is expected for a value, then $2.00 would be 
-checked and converted accordingly. Similarly, local conventions along with the 
-fact that a date is expected for a particular value allows 12/10/09 to be 
-correctly validated and converted.  This process of validation and conversion is 
-referred to as applying a schema to the data. There are packages such as 
-`Voluptuous <https://github.com/alecthomas/voluptuous>`_ and `Pydantic 
-<https://pydantic-docs.helpmanual.io>`_ available that make this process easy 
-and reliable.
-
-
-The Zen of *NestedText*
------------------------
-
-*NestedText* aspires to be a simple dumb vessel that holds peoples' structured 
-data, and to do so in a way that allows people to easily interact with that 
-data.
-
-The desire to be simple is an attempt to minimize the effort required to learn 
-and use the language. Ideally people can understand it by looking at one or two 
-examples and they can use it without without needing to remember any arcane 
-rules and without relying on any of the knowledge that programmers accumulate 
-through years of experience.  One source of simplicity is consistency.  As such, 
-*NestedText* uses a small amount of rules that it applies with few exceptions.
-
-The desire to be dumb means that it tries not to transform the data in any 
-meaningful way. It allows you to recover the structure in your data without 
-doing anything that might change the interpretation of the data. Rather, it 
-tries to make it easy for you to interpret the data by managing the structure, 
-which allows you to analyze it in small easy to interpret pieces without making 
-any changes that would get in your way.
-
-
-Alternatives
-------------
-
-There are no shortage of well established alternatives to *NestedText* for 
-storing data in a human-readable text file. Probably the most obvious are `json 
-<https://docs.python.org/3/library/json.html>`_ and `YAML 
-<https://pyyaml.org/wiki/PyYAMLDocumentation>`_.  Both are primarily intended to 
-be used as serialization languages. *NestedText* is not intended to be used as 
-a serialization language, rather it is more suitable for configuration and hand 
-generated and edited data files.  In these applications, both *JSON* and *YAML* 
-have significant short comings.
-
-
-JSON
-""""
-
-*JSON* is a subset of JavaScript suitable for holding data. Like *NestedText*, 
-it consists of a hierarchical collection of dictionaries, lists, and strings, 
-but also allows integers, floats, Booleans and nulls.  The problem with *JSON* 
-for this application is that it is awkward.  With all those data types it must 
-syntactically distinguish between them.  For example, in *JSON* 32 is an 
-integer, 32.0 is the real version of 32, and "32" is the string version. These 
-distinctions are not meaningful and can be confusing to non-programmers. In 
-addition, in most datasets a majority of leaf values are strings and the 
-required quotes adds substantial visual clutter.  *NestedText* avoids these 
-issues by keeping all leaf values as unmodified strings; no need for quoting or 
-escaping.  It is up to the application that employs *NestedText* as an input 
-format to use context to check these strings and convert them to the right 
-datatype.
-
-*JSON* does not provide for multi-line strings and any special characters like 
-newlines are encoded with escape codes, which can make strings long and 
-difficult to interpret.  Also, dictionary and list items must be separated with 
-commas, but a comma must not follow last item.  All of this results in *JSON* 
-being a frustrating format for humans to read, enter or edit.
-
-*NestedText* has the following clear advantages over *JSON* as human readable 
-and writable data file format:
-
-- text does not require quotes
-- data is left in its original form
-- comments
-- multiline strings
-- special characters without escaping them
-- commas are not used to separate dictionary and list items
-
-
-YAML
-""""
-
-*YAML* is considered by many to be a human friendly alternative to *JSON*, but 
-over time it has accumulated too many data types and too many formats.  To 
-distinguish between all the various types and formats, a complicated and 
-non-intuitive set of rules developed.  *YAML* at first appears very appealing 
-when used with simple examples, but things can quickly become complicated or 
-provide unexpected results.  A reaction to this is the use of *YAML* subsets, 
-such as `StrictYAML <https://hitchdev.com/strictyaml>`_.  However, the subsets 
-still try to maintain compatibility with *YAML* and so inherit much of its 
-complexity. For example, both *YAML* and *StrictYAML* support `nine different 
-ways of writing multi-line strings 
-<http://stackoverflow.com/a/21699210/660921>`_.
-
-*YAML* avoids excessive quoting and supports comments and multiline strings, but 
-like *JSON* it converts data to the underlying data types as appropriate, but 
-unlike with *JSON*, the lack of quoting makes the format ambiguous, which means 
-it has to guess at times, and small seemingly insignificant details can affect 
-the result.
-
-*NestedText* was inspired by *YAML*, but eschews its complexity. It has the 
-following clear advantages over *YAML* as human readable and writable data file 
-format:
-
-- simple
-- unambiguous (no implicit typing)
-- data is left in its original form
-- syntax is insensitive to special characters within text
-- safe, no risk of malicious code execution
+*YAML* or *TOML* aggressively convert those values into the underlying data 
+types such as integers, floats, and Booleans.  For example, a value like 2.10 
+would be converted to a floating point number. But making the decision to do so 
+is based purely on the form of the value, not the context in which it is found.  
+This can lead to misinterpretations.  For example, assume that this value is 
+the software version number two point ten. By converting it to a floating point 
+number it becomes two point one, which is wrong. There are many possible 
+versions of this basic issue. But there is also the inverse problem; values 
+that should be converted to particular data types but are not recognized. For 
+example, a value of $2.00 should be converted to a real number but would be a 
+string instead.  There are simply too many values types for a general purpose 
+solution that is only looking at the values themselves to be able to interpret 
+all of them.  For example, 12/10/09 is likely a date, but is it in MM/DD/YY, 
+YY/MM/DD or DD/MM/YY form?  The fact is, the value alone is often insufficient 
+to reliably determine how to convert values into internal data types.  
+*NestedText* avoids these problems by leaving the values in their original form 
+and allowing the decision to be made by the end application where more context 
+is available to help guide the conversions.  If a price is expected for a 
+value, then $2.00 would be checked and converted accordingly. Similarly, local 
+conventions along with the fact that a date is expected for a particular value 
+allows 12/10/09 to be correctly validated and converted.  This process of 
+validation and conversion is referred to as applying a schema to the data.  
+There are packages such as `Pydantic <https://pydantic-docs.helpmanual.io>`_ 
+and `Voluptuous <https://github.com/alecthomas/voluptuous>`_ available that 
+make this process easy and reliable.
 
 
 Issues

diff --git a/doc/alternatives.rst b/doc/alternatives.rst
@@ -0,0 +1,102 @@
+************
+Alternatives
+************
+
+There are no shortage of well established alternatives to *NestedText* for 
+storing configuration data in a human-readable text file.  The features and 
+shortcomings of some of these alternatives are discussed below:
+
+JSON
+====
+
+JSON_ is a subset of JavaScript suitable for holding data.  Like *NestedText*, 
+it consists of a hierarchical collection of dictionaries, lists, and strings, 
+but also allows integers, floats, Booleans and nulls.  The fundamental problem 
+with *JSON* in this context is that its meant for serializing and exchanging 
+data between programs; it's not meant for configuration files.  Of course, it's 
+used for this purpose anyways, where it has a number of glaring shortcomings:
+
+To begin, it has an excessive amount of syntactic clutter.  Dictionary keys and 
+strings both have to be quoted, commas are required between dictionary and list 
+items (but forbidden after the last item), braces are required around 
+dictionaries, etc.  Features that would improve clarity are also lacking.  
+Comments are not allowed, multiline strings are not supported, and whitespace 
+is insignificant (leading to the possibility that the appearance of the data 
+may not match its true structure).  More conceptually, it is the responsibility 
+of the user to provide data of the correct type (e.g. ``32`` vs. ``32.0`` vs.  
+``"32"``), even though the application already knows what type it expects.  All 
+of this results in *JSON* being a frustrating format for humans to read, enter 
+or edit.
+
+*NestedText* has the following clear advantages over *JSON* as human readable 
+and writable data file format:
+
+- text does not require quotes
+- data is left in its original form
+- comments
+- multiline strings
+- special characters without escaping them
+- commas are not used to separate dictionary and list items
+
+YAML
+====
+
+YAML_ is considered by many to be a human friendly alternative to *JSON*, but 
+over time it has accumulated too many data types and too many formats.  To 
+distinguish between all the various types and formats, a complicated and 
+non-intuitive set of rules developed.  *YAML* at first appears very appealing 
+when used with simple examples, but things can quickly become complicated or 
+provide unexpected results.  A reaction to this is the use of *YAML* subsets, 
+such as StrictYAML_.  However, the subsets still try to maintain compatibility 
+with *YAML* and so inherit much of its complexity. For example, both *YAML* and 
+*StrictYAML* support `nine different ways of writing multi-line strings 
+<http://stackoverflow.com/a/21699210/660921>`_.
+
+*YAML* avoids excessive quoting and supports comments and multiline strings, but 
+like *JSON* it converts data to the underlying data types as appropriate, but 
+unlike with *JSON*, the lack of quoting makes the format ambiguous, which means 
+it has to guess at times, and small seemingly insignificant details can affect 
+the result.
+
+*NestedText* was inspired by *YAML*, but eschews its complexity. It has the 
+following clear advantages over *YAML* as human readable and writable data file 
+format:
+
+- simple
+- unambiguous (no implicit typing)
+- data is left in its original form
+- syntax is insensitive to special characters within text
+- safe, no risk of malicious code execution
+
+TOML
+====
+
+TOML_ is a configuration file format inspired by the well-known *INI* syntax.  
+It supports a number of basic data types (notably including dates and times) 
+using syntax that is more similar to *JSON* (explicit but verbose) than to 
+*YAML* (succinct but confusing).  As discussed previously, though, this makes 
+it the responsibility of the user to specify the correct type for each field, 
+when it should be the responsibility of the application to convert each field 
+to the correct type.
+
+Another flaw in TOML is that it is difficult to specify deeply nested 
+structures.  The only way to specify a nested dictionary is to give the full 
+key to that dictionary, relative to the root of the entire hierarchy.  This is 
+not much a problem if the hierarchy only has 1-2 levels, but any more than that 
+and you find yourself typing the same long keys over and over.  A corollary to 
+this is that TOML-based configurations do not scale well: increases in 
+complexity are often accompanied by disproportionate decreases in readability 
+and writability.
+
+*NestedText* has the following clear advantages over *TOML* as human readable 
+and writable data file format:
+
+- text does not require quotes
+- data is left in its original form
+- indentation used to succinctly represent nested data
+- the structure of the file matches the structure of the data
+
+.. _json: https://www.json.org/json-en.html
+.. _yaml: https://yaml.org/
+.. _strictyaml: <https://hitchdev.com/strictyaml
+.. _toml: https://toml.io/en/
diff --git a/doc/examples.rst b/doc/examples.rst
@@ -127,5 +127,6 @@ And finally, the code:
 .. literalinclude:: ../examples/cryptocurrency
    :language: python
 
+
 .. _pydantic: https://pydantic-docs.helpmanual.io/
 .. _voluptuous: https://github.com/alecthomas/voluptuous
diff --git a/doc/index.rst b/doc/index.rst
@@ -1,10 +1,17 @@
 .. include:: ../README.rst
 
+.. toctree::
+   :caption: Why NestedText?
+   :maxdepth: 1
+
+   Philosophy <philosophy>
+   alternatives
+
 .. toctree::
    :caption: Getting started
    :maxdepth: 1
 
-   releases
+   installation
    basic_syntax
    basic_use
    schemas

diff --git a/doc/releases.rst → doc/installation.rst b/doc/releases.rst → doc/installation.rst
diff --git a/doc/philosophy.rst b/doc/philosophy.rst
@@ -0,0 +1,25 @@
+***********************
+The Zen of *NestedText*
+***********************
+
+*NestedText* aspires to be a simple dumb vessel that holds peoples' structured 
+data, and does so in a way that allows people to easily interact with that 
+data.
+
+The desire to be simple is an attempt to minimize the effort required to learn 
+and use the language. Ideally people can understand it by looking at one or two 
+examples and they can use it without without needing to remember any arcane 
+rules and without relying on any of the knowledge that programmers accumulate 
+through years of experience.  One source of simplicity is consistency.  As such, 
+*NestedText* uses a small amount of rules that it applies with few exceptions.
+
+The desire to be dumb means that *NestedText* tries not to transform the data 
+in any meaningful way.  It parses the structure of the data without doing 
+anything that might change how the data is interpreted.  Instead, it aims to 
+make it easy for you to interpret the data yourself.  After all, you understand 
+what the data is supposed to mean, so you are in the best position to interpret 
+it.  There are also many powerful tools available to help with :doc:`this exact 
+task <schemas>`.
+
+
+