/
unflatten.rst
1443 lines (974 loc) · 46.7 KB
/
unflatten.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Spreadsheet Designer's Guide
++++++++++++++++++++++++++++
In this guide you'll learn the various rules Flatten Tool uses to convert one
or more sheets in a spreadsheet into a JSON document. These rules are
documented with examples based around a cafe theme.
Once you've understood how Flatten Tool works you should be able to design your
own spreadsheet structures, debug problems in your spreadsheets and be able to
make use of Flatten Tool's more advanced features.
Before we get into too much detail though, let's start by looking
at the Command Line API for unflattening a spreadsheet.
Command Line API
================
To demonstrate the command line API you'll start with the simplest possible
example, a sheet listing Cafe names:
.. csv-table::
:file: ../examples/cafe/too-simple/data.csv
:header-rows: 1
We'd like Flatten Tool to convert it to the following JSON structure for an array
of cafes, with the cafe name being the only property we want for each cafe:
.. literalinclude:: ../examples/cafe/simple/expected.json
:language: json
Let's try converting the sheet to the JSON above.
.. literalinclude:: ../examples/cafe/too-simple/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/too-simple/expected.json
:language: json
That's not too far off what we wanted. You can see the array of cafes, but the
key is named ``main`` instead of ``cafe``. You can tell Flatten Tool that that the
rows in the spreadsheet are cafes and should come under a ``cafe`` key by
specifying a *root list path*, described next.
.. caution::
Older Python versions add a trailing space after ``,`` characters when
outputting indented JSON. This means that your output might have whitespace
differences compared to what is described here.
Root List Path
--------------
The *root list path* is the key under which Flatten Tool should add an array of
objects representing each row of the main sheet.
You specify the root list path with ``--root-list-path`` option. If you don't
specify it, ``main`` is used as the default as you saw in the last example.
Let's set ``--root-list-path`` to ``cafe`` so that our original input generates the
JSON we were expecting:
.. literalinclude:: ../examples/cafe/simple/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/simple/expected.json
:language: json
That's what we expected. Great.
.. note::
Although ``--root-list-path`` sounds like it accepts a path such as
``building/cafe``, it only accepts a single key.
The root is a list
------------------
You can also specify the data outputted is just a list, using the ``--root-is-list`` option.
.. literalinclude:: ../examples/cafe/root-is-list/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/root-is-list/expected.json
:language: json
Writing output to a file
------------------------
By default, Flatten Tool prints its output to standard output (your terminal).
If you want it to write its output to a file instead, you can use the
``--output-name`` option (or ``-o`` for short).
.. literalinclude:: ../examples/cafe/simple-file/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/simple-file/expected.json
:language: json
Base JSON
---------
If you want the resulting JSON to also include other keys that you know in
advance, you can specify them in a separate *base JSON* file and Flatten Tool
will merge the data from your spreadsheet into that file.
For example, if ``base.json`` looks like this:
.. literalinclude:: ../examples/cafe/simple-base-json/base.json
:language: json
and the data looks like this:
.. csv-table::
:file: ../examples/cafe/simple-base-json/data.csv
:header-rows: 1
you can run this command using the ``--base-json`` option to see the ``base.json``
data with the spreadsheet rows merged in:
.. literalinclude:: ../examples/cafe/simple-base-json/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/simple-base-json/expected.json
:language: json
.. warning::
If you give the base JSON the same key as you specify in ``--root-list-path``
then Flatten Tool will overwrite its value.
All unflatten options
---------------------
You can see all the options available for unflattening by running:
.. literalinclude:: ../examples/help/unflatten/cmd.txt
:language: bash
.. literalinclude:: ../examples/help/unflatten/expected.txt
:language: text
As you can see, some of the documentation is specific to two projects that
use Flatten Tool:
* OCDS - http://standard.open-contracting.org/validator/
* 360Giving - http://www.threesixtygiving.org/standard/reference/
Other options such as ``--cell-source-map`` and ``--heading-source-map`` will be
described in the Developer Guide once the features stabilise.
Understanding JSON Pointer and how Flatten Tool uses it
=======================================================
Let's consider this data again and explore the algorithm Flatten Tool
uses to make it work:
.. csv-table::
:file: ../examples/cafe/simple/data.csv
:header-rows: 1
Here's a command to unflatten it and the resulting JSON:
.. literalinclude:: ../examples/cafe/simple/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/simple/expected.json
:language: json
The key to understanding how Flatten Tool represents more complex examples in a
spreadsheet lies in knowing about the `JSON Pointer specification
<https://tools.ietf.org/html/rfc6901>`_. This specification describes a fairly
intuitive way to reference values in a JSON document.
To briefly describe how it works, each ``/`` character after the first one drills
down into a JSON structure. If the value after the ``/`` is a string, then a key
is looked up, if it is an integer then an array index is taken.
For example, the JSON pointer ``/cafe/0/name`` is equivalent to taking the
following value out of a JSON document named ``document``:
.. code-block:: python
>>> document['cafe'][0]['name']
In the JSON document above, the JSON pointer ``/cafe/0/name`` would return ``Healthy Cafe``.
.. note::
JSON pointer array indexes start at 0, just like lists in Python, hence the
first cafe is at index 0.
Whilst JSON pointer is designed as a way for getting data *out* of a JSON
document, Flatten Tool uses JSON Pointer as a way of describing how to move
values *into* a JSON document from a spreadsheet.
To do this, as it comes across JSON pointers, it automatically creates the
objects and arrays required.
You can think of Flatten Tool doing the following as it parses a sheet:
* Load the base JSON or use an empty JSON object
* For each row:
* Convert each column heading to a JSON pointer by removing whitespace and
prepending with ``/cafe/``, then adding the row index and another ``/`` to the
front
* Take the value in each column and associate it with the JSON pointer
(treating any numbers as array indexes, and overwriting existing JSON pointer
values for that row if necessary)
* Write the value into the position in the JSON object being specified by the
JSON pointer, creating more structures as you go
In this example there is only one sheet, and only one row, so when parsing that
first row, ``/cafe/0/`` is appended to ``name`` to give the JSON pointer
``/cafe/0/name``. Flatten Tool then writes ``Healthy Cafe`` in the correct position.
Index behaviour
---------------
There is one subtlety you need to be aware of though before you see some examples.
Although Flatten Tool always uses strings in a JSON pointer as object keys, it
only takes numbers it comes across as an *indication* of the array position.
For example, if you gave it the JSON pointer ``/cafe/1503/name``, there is no
guarantee that the ``name`` would be placed in an object at index position 1503.
Instead Flatten Tool uses numbers in the same sheet that are at the same parent
JSON pointer path (``/cafe/`` in this case), as being the sort order the child
objects should appear in, but not the literal index positions.
If two objects use the same index at the same base JSON pointer path, Flatten
Tool will keep both but the one it comes across first will come before the
other.
This behaviour has two advantages:
* data won't be lost if for some reason the index wasn't specified correctly
* the data in the generated JSON will be in the same order as it was specified
in the sheets which is likely to be what the person putting data into the
spreadsheet would expect
This behaviour is also important when you learn about Lists of Objects (without
IDs) later.
.. tip::
You'll see later in the relationships section, that special ``id`` values can
alter the index behavior described here and allow Flatten Tool to merge rows
from multiple sheets.
Multiple rows
-------------
Let's look at a multi-row example:
.. csv-table::
:file: ../examples/cafe/simple-row/data.csv
:header-rows: 1
This time ``Healthy Cafe`` would be placed at ``/cafe/0/name`` and ``Vegetarian
Cafe`` at ``/cafe/1/name`` producing this:
.. literalinclude:: ../examples/cafe/simple-row/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/simple-row/expected.json
:language: json
Although both ``Healthy Cafe`` and ``Vegetarian Cafe`` are under a column that
resolves to ``/cafe/0/name``, the rules described in the previous section explain
why both are present in the output and why ``Healthy Cafe`` comes before
``Vegetarian Cafe``.
Multiple columns
----------------
Let's add the cafe address to the spreadsheet:
.. csv-table::
:header-rows: 1
:file: ../examples/cafe/simple-col/data.csv
.. note::
CSV files require cells containing ``,`` characters to be escaped by wrapping
them in double quotes. That's why if you look at the source CSV, the addresses
are escaped with ``"`` characters.
This time ``Healthy Cafe`` is placed at ``/cafe/0/name`` as before, ``London`` is
placed at ``/cafe/0/address``. ``Vegetarian Cafe`` at ``/cafe/1/name`` as before and
``Bristol`` is at ``/cafe/1/address``.
The result is:
.. literalinclude:: ../examples/cafe/simple-col/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/simple-col/expected.json
:language: json
Multiple sheets
---------------
So far, all the examples have just used one sheet. When multiple sheets are
involved, the behaviour isn't much different.
In effect, all Flatten Tool does is:
* take the JSON structure produced after processing the previous sheets and use
it as the base JSON for processing the next sheet
* keep track of the index numbers of existing objects and generate JSON
pointers that point to the next free index at any existing locations (with the
effect of having new objects appended to any existing ones at the same
location)
Once all the sheets have been processed the resulting JSON is returned.
.. note::
The CSV specification doesn't support multiple sheets. To work around this,
Flatten Tool treats a directory of CSV files as a single spreadsheet with
multiple sheets - one for each file.
This is why all the CSV file examples given so far have been written to a
file in an empty directory and why only the directory name was needed in
the ``flatten-tool`` commands.
Here's a simple two-sheet example where the headings are the same in both
sheets:
.. csv-table:: sheet: data
:file: ../examples/cafe/multiple/data.csv
:header-rows: 1
.. csv-table:: sheet: other
:file: ../examples/cafe/multiple/other.csv
:header-rows: 1
When you run the example you get this:
.. literalinclude:: ../examples/cafe/multiple/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/multiple/expected.json
:language: json
The order is because the ``data`` sheet was processed before the ``other`` sheet.
.. tip::
CSV file sheets are processed in the order returned by ``os.listdir()`` so
you should name them in the order you would like them processed.
Objects
=======
Now you know that the column headings are really just a JSON Pointer
specification, and the index values are only treated as indicators of the
presence of arrays you can write some more sophisticated examples.
Rather than have the address just as string, we could represent it as an
object. For example, imagine you'd like out output JSON in this structure:
.. literalinclude:: ../examples/cafe/object/expected.json
:language: json
You can do this by knowing that the JSON Pointer to "123 City Street" would be
``/cafe/0/address/street`` so that we would need to name the street column
``address/street``.
Here's the data:
.. csv-table::
:file: ../examples/cafe/object/data.csv
:header-rows: 1
Let's try it:
.. literalinclude:: ../examples/cafe/object/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/object/expected.json
:language: json
Lists of Objects (without IDs)
==============================
The cafe's that have made up our examples so far also have tables, and the
tables have a table number so that the waiters know where the food has to be
taken to.
Each cafe has many tables, so this is an example of a one-to-many relationship
if you are used to working with relational databases.
You can represent the table information in JSON as a array of objects, where each
object represents a table, and each table has a ``number`` key. Let's imagine the
``Healthy Cafe`` has three tables numbered 1, 2 and 3. We'd like to produce this
structure:
.. literalinclude:: ../examples/cafe/list-of-objects/expected.json
:language: json
In the relationships section later, we'll see other (often better) ways of
arranging this data using *identifiers*, but for now we'll demonstrate an
approach that puts all the table information in the same row as the cafe
itself.
For example, consider this spreadsheet data:
.. csv-table::
:file: ../examples/cafe/one-table/data.csv
:header-rows: 1
.. literalinclude:: ../examples/cafe/list-of-objects/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/list-of-objects/expected.json
:language: json
We'll use this example of tables (of the furniture variety) in subsequent
examples.
Index behaviour
---------------
Just as in the multiple sheets example earlier, the exact numbers at the table
index positions aren't too important to Flatten Tool. They just tell Flatten
Tool that the value in the cell is part of an object in an array.
In this particular case though, Flatten Tool will keep columns in order implied
by the indexes.
For example here the index values are such that the lowest number comes last:
.. csv-table::
:file: ../examples/cafe/tables-index/data.csv
:header-rows: 1
We'd still expect 3 tables in the output, but we expect Flatten Tool to
re-order the columns so that table 3 comes first, then 2, then 1:
.. literalinclude:: ../examples/cafe/tables-index/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/tables-index/expected.json
:language: json
Child objects like these tables can, of course have more than one key. Let's
add a ``reserved`` key to table number 1 but to try to confuse Flatten Tool,
we'll specify it at the end:
.. csv-table::
:file: ../examples/cafe/tables-index-reserved/data.csv
:header-rows: 1
.. literalinclude:: ../examples/cafe/tables-index-reserved/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/tables-index-reserved/expected.json
:language: json
Notice that Flatten Tool correctly associated the ``reserved`` key with table 1
because of the index numbered ``30``, even though the columns weren't next to
each other.
For a much richer way of organising arrays of objects, see the Relationships
section.
Plain Lists (Unsupported)
-------------------------
Flatten Tool doesn't recognise arrays of JSON values other than objects (just
described in the previous section) unless a schema is used.
Heading names such as ``tag/0`` and ``tag/1`` would be ignored and an
empty array would be put into the JSON.
Here's some example data:
.. csv-table::
:file: ../examples/cafe/plain-list/data.csv
:header-rows: 1
And the result:
.. literalinclude:: ../examples/cafe/plain-list/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/plain-list/expected.json
:language: json
However, an array of tags in the following format (semi-colon separated) can be handled if a schema is passed to Flatten Tool specifying the array type of the field.
.. csv-table::
:file: ../examples/cafe/plain-list-schema/data.csv
:header-rows: 1
The schema we'll pass is:
.. literalinclude:: ../examples/cafe/plain-list-schema/tagsArraySchema.json
:language: json
And the result:
.. literalinclude:: ../examples/cafe/plain-list-schema/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/plain-list-schema/expected.json
:language: json
Read on for more about typed fields and use of schemas.
Typed fields
============
In the table examples you've seen so far, the table numbers are produced as
strings in the JSON. The JSON Pointer specification doesn't provide any way of
telling you what type the value being pointed to is, so we can't get the
information from the column headings.
There are two places we can get it from though:
* The spreadsheet cell (if the underlying spreadsheet type supports it, e.g.
CSV doesn't but XLSX does)
* An external JSON Schema describing the data
If we can't get any type information we fall back to assuming strings.
Here is the sample data we'll use for the examples in the next two sections:
.. csv-table::
:file: ../examples/cafe/tables-typed-schema/data.csv
:header-rows: 1
Using spreadsheet cell formatting
---------------------------------
CSV files only support string values, so the easiest way to get the example
above to use integers would be to use a spreadsheet format such as XLSX that
supported integers and make sure the cell type was number. Flatten Tool would
pass the cell value through to the JSON as a number in that case.
.. note::
Make sure you specify the correct format ``-f=xlsx`` on the command line if
you want to use an XLSX file.
.. literalinclude:: ../examples/cafe/tables-typed-xlsx/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/tables-typed-xlsx/expected.json
:language: json
.. caution::
Number formats in spreadsheets are ignored in Python 2.7 so this
example won't work. It does work in Python 3.4 and above though.
If you look at Flatten Tool's source code you'll see the in ``test_docs.py``
that the above example is skipped on older Python versions.
Using a JSON Schema with types
------------------------------
Here's an example of a JSON Schema that can provide the typing information:
.. literalinclude:: ../examples/cafe/tables-typed-schema/cafe.schema
:language: json
.. literalinclude:: ../examples/cafe/tables-typed-schema/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/tables-typed-schema/expected.json
:language: json
.. tip::
Although this example is too simple to demonstrate it, Flatten Tool ignores
the order of individual properties in a JSON schema when producing JSON output,
and instead follows the order of the columns in the sheets.
Human-friendly headings using a JSON Schema with titles
=======================================================
Let's take a closer look at the array of objects example from earlier again:
.. csv-table::
:file: ../examples/cafe/list-of-objects/data.csv
:header-rows: 1
The column headings ``table/0/number``, ``table/1/number`` and ``table/2/number``
aren't very human readable, wouldn't it be great if we could use headings like
this:
.. csv-table::
:file: ../examples/cafe/tables-human-1/data.csv
:header-rows: 1
Flatten Tool supports this if you do the following:
* Write a JSON Schema specifying the titles being used and
specify it with the ``--schema`` option
* Use ``:`` characters instead of ``/`` characters in the headings
* Specify the ``--convert-titles`` option on the command line
.. caution::
If you forget any of these, Flatten Tool might produce incorrect JSON rather
than failing.
Here's a new JSON schema for this example:
.. literalinclude:: ../examples/cafe/tables-human-1/cafe.schema
:language: json
Notice that both ``Table`` and ``Number`` are specified as titles.
Here's what we get when we run it:
.. literalinclude:: ../examples/cafe/tables-human-1/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/tables-human-1/expected.json
:language: json
Optional array indexes
----------------------
Looking at the JSON Schema from the last example again you'll see that ``table``
is specified as an array type:
.. literalinclude:: ../examples/cafe/tables-human-2/cafe.schema
:language: json
This means that Flatten Tool can work out that any names specified in that
column are part of that array. If you had an example with just one column
representing each level of the tree, you could miss out the index in the
heading when using ``--schema`` and ``--convert-titles``.
Here's a similar example, but with just one rolled up column:
.. csv-table::
:header-rows: 1
:file: ../examples/cafe/tables-human-2/data.csv
Here's what we get when we run this new data with this schema:
.. literalinclude:: ../examples/cafe/tables-human-2/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/tables-human-2/expected.json
:language: json
Relationships using Identifiers
===============================
So far, all the examples you've seen have served to demonstrate how Flatten
Tool works, but probably wouldn't be particularly useful in real life, simply
because they require everything related to be on the same row.
In this section you'll learn how identifiers work and that will allow you much
more freedom in designing different spreadsheet layouts that produce the same
JSON.
In Flatten Tool, any field named ``id`` is considered special. Flatten Tool knows
that any objects with the same ``id`` at the same level are the same object and
that their values should be merged.
ID-based object merge behaviour
-------------------------------
The merge behaviour happens whether the two IDs are specified in:
* different rows in the same sheet
* two rows in two different sheets
Basically, any time Flatten Tool comes across a row with an ``id`` in it, it will
lookup any other objects in the array to see if that ``id`` is already used and if
it is, it will merge it. If not, it will just append a new object to the array.
.. caution::
It is important to make sure your ``id`` values really are unique. If you
accidentally use the same ``id`` for two different objects, Flatten Tool
will think they are the same and merge them.
Flatten Tool will merge an existing and new object as follows:
* Any fields in new object that are missing in the existing one are added
* Any fields in the existing object that aren't in the new one are left as
they are
* If there are fields that are in both that have the same value, that value
is kept
* If there are fields that are in both with different values, the existing
values are kept and conflict warnings issued
This means that values in later rows do not overwrite existing conflicting
values.
Let's have a look at these rules in action in the next two sections with an
example from a single sheet, and one from multiple sheets.
ID-based object merge in a single sheet
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here's an example that demonstrates these rules:
.. csv-table::
:file: ../examples/cafe/relationship-merge-single/data.csv
:header-rows: 1
Let's run it:
.. literalinclude:: ../examples/cafe/relationship-merge-single/cmd.txt
:language: bash
Notice the warnings above about values being over-written:
.. literalinclude:: ../examples/cafe/relationship-merge-single/expected_stderr.json
The actual JSON contains a single Cafe with ``id`` value ``CAFE-HEALTH`` and all
the values merged in:
.. literalinclude:: ../examples/cafe/relationship-merge-single/expected.json
:language: json
ID-based object merge in multiple sheets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here's an example that uses the same data as the single sheet example above,
but spreads the rows over four sheets named ``a``, ``b``, ``c`` and ``d``:
.. csv-table:: sheet: a
:file: ../examples/cafe/relationship-merge-multiple/a.csv
:header-rows: 1
.. csv-table:: sheet: b
:file: ../examples/cafe/relationship-merge-multiple/b.csv
:header-rows: 1
.. csv-table:: sheet: c
:file: ../examples/cafe/relationship-merge-multiple/c.csv
:header-rows: 1
.. csv-table:: sheet: d
:file: ../examples/cafe/relationship-merge-multiple/d.csv
:header-rows: 1
Let's run it:
.. literalinclude:: ../examples/cafe/relationship-merge-multiple/cmd.txt
:language: bash
Notice the warnings above about values being over-written:
.. literalinclude:: ../examples/cafe/relationship-merge-multiple/expected_stderr.json
And the rest of the output:
.. literalinclude:: ../examples/cafe/relationship-merge-multiple/expected.json
:language: json
The result is the same as before.
Parent-child relationships (arrays of objects)
----------------------------------------------
Things get much more interesting when you start dealing with arrays of objects
whose parents have an ``id``. This enables you to split the parents and children
up into multiple sheets rather than requiring everything sits one the same row.
As an example, let's imagine that ``Vegetarian Cafe`` is arranged having two
tables numbered ``16`` and ``17`` because they are share tables with another
restaurant next door.
.. literalinclude:: ../examples/cafe/relationship-lists-of-objects-simple/expected.json
:language: json
From the knowledge you gained when learning about arrays of objects without IDs
earlier, you know that you can produce the correct structure with a CSV file
like this:
.. csv-table:: sheet: cafes
:file: ../examples/cafe/relationship-lists-of-objects-simple/data.csv
:header-rows: 1
This time, we'll give both the Cafe's IDs and move the tables into a separate
sheet:
.. csv-table:: sheet: cafes
:file: ../examples/cafe/relationship-lists-of-objects/cafes.csv
:header-rows: 1
.. csv-table:: sheet: tables
:file: ../examples/cafe/relationship-lists-of-objects/tables.csv
:header-rows: 1
By having the tables in a separate sheet, you can now support cafe's with as
many tables as you like, just by adding more rows and making sure the ``id``
column for the table matches the ``id`` value for the cafe.
Let's run this example:
.. literalinclude:: ../examples/cafe/relationship-lists-of-objects/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/relationship-lists-of-objects/expected.json
:language: json
By specifying an ID, the values in the tables sheet can be associated with the
correct part of the tree created by the cafes sheet.
Index behaviour
---------------
Within the array of tables for each cafe, you might have noticed that each table
number has a JSON Pointer that ends in with ``/0/number``. Since they all have the
same index, they are simply ordered within each cafe in the order of the rows
in the sheet.
Grandchild relationships
------------------------
In future we might like to extend this example so that we can track the dishes
ordered by each table so we can generate a bill.
Let's take the case of dishes served at tables and imagine that ``Healthy Cafe``
has its own health ``fish and chips`` dish. Now let's also imagine that the dish
is ordered at tables 1 and 3.
If you are used to thinking about relational database you would probably think
about having a new sheet called ``dishes`` with a two columns, one for an ``id``
and one for the ``name`` of the dish. You would then create a sheet to represent
a join table called ``table_dishes`` that contained the ID of the table and of
the dish.
The problem with this approach is that the output is actually a tree, and not a
normalised relational model. Have a think about how you would write the
``table_dishes`` sheet. You'd need to write something like this:
.. csv-table::
:header-rows: 1
table/0/id,dish/0/id
TABLE-1,DISH-fish-and-chips
TABLE-3,DISH-fish-and-chips
The problem is that ``dish/0/id`` is really a JSON Pointer to ``/cafe/0/dish/0/id``
and so would try to create a new ``dish`` key under each *cafe*, not a ``dish`` key
under each *table*.
You can't do it this way. Instead you have to design you ``dish`` sheet to
specify both the ID of the cafe and the ID of the table as well as the name of
the dish. If a dish is used in multiple tables, you will have multiple rows,
each with the same name in the name column. In this each way row contains the
entire path to its position in the tree.
Since nothing depends on the dishes yet, they don't have to have an ID
themselves, they just need to reference their parent IDs:
.. csv-table:: sheet: cafes
:file: ../examples/cafe/relationship-multiple/cafes.csv
:header-rows: 1
.. csv-table:: sheet: tables
:file: ../examples/cafe/relationship-multiple/tables.csv
:header-rows: 1
.. csv-table:: sheet: dishes
:file: ../examples/cafe/relationship-multiple/dishes.csv
:header-rows: 1
Here are the results:
.. literalinclude:: ../examples/cafe/relationship-multiple/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/relationship-multiple/expected.json
:language: json
Notice the ordering in this example. Because ``dishes`` is processed before
``tables``, ``TABLE-3`` gets defined before ``TABLE-2``, and ``dish`` gets added as a
key before ``tables``.
If the sheets were processed the other way around the data would be the same,
but the ordering different.
.. tip::
Flatten Tool supports producing JSON hierarchies of arbitrary depth, not just
the parent-child and parent-child-grandchild relationships you've seen in the
examples so far. Just make sure that however deep an object is, it always has
the IDs of *all* of its parents in the same row as it, as the tables and dishes
sheets do.
Arbitrary-depth in a single table
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can also structure all the data into a single table. It is only
recommended to do this if you have a very simple data structure where there is
only one object at each part of the hierarchy.
In this example we'll use a JSON Schema to infer the structure, allowing us to
use human-readable column titles.
Here's the data:
.. csv-table:: sheet: dishes
:file: ../examples/cafe/relationship-grandchild/cafes.csv
:header-rows: 1
Let's unflatten this table:
.. literalinclude:: ../examples/cafe/relationship-grandchild/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/relationship-grandchild/expected.json
:language: json
If you'd like to explore this example yourself, here's the schema used in the
example above:
.. literalinclude:: ../examples/receipt/cafe.schema
:language: json
Missing IDs
-----------
You might be wondering what happens if IDs are accidentally missing. There are
two cases where this can happen:
* The ID is missing but no child objects reference it anyway
* The ID is missing and so children can't be added
To demonstrate both of these in one example consider the following example. In particular notice that:
* ``CAFE-VEG`` is missing from the ``cafes`` sheet
* ``CAFE-VEG`` is missing from the last row in the ``tables`` sheet
.. csv-table:: sheet: cafes
:file: ../examples/cafe/relationship-missing-ids/cafes.csv
:header-rows: 1
.. csv-table:: sheet: tables
:file: ../examples/cafe/relationship-missing-ids/tables.csv
:header-rows: 1
Let's run this example:
.. literalinclude:: ../examples/cafe/relationship-missing-ids/cmd.txt
:language: bash
.. literalinclude:: ../examples/cafe/relationship-missing-ids/expected.json
:language: json
You'll notice that all the data and tables for ``CAFE-HEALTH`` are output
correctly in the first object. This is what we'd expect because all the IDs
were present.
.. code-block:: text
{
"id": "CAFE-HEALTH",
"name": "Healthy Cafe",
"address": "123 City Street, London",
"table": [
{
"number": "1"
},
{
"number": "2"
},
{
"number": "3"
}
]
},
Next is this cafe:
.. code-block:: text
{
"name": "Vegetarian Cafe",
"address": "42 Town Road, Bristol"
},