One of the great features of the [R](https://www.r-project.org) [cdata](https://github.com/WinVector/cdata) and Python [data_algebra](https://github.com/WinVector/data_algebra) data wrangling system is: it can print what a transform is going to do.  This makes reasoning about data transforms *much* easier.  Let's re-work a small [R cdata example](https://github.com/WinVector/cdata/blob/master/vignettes/control_table_keys.Rmd), using the Python package [data_algebra](https://github.com/WinVector/data_algebra).

First we import some modules and packages, and type in some notional data.

In [1]:
import pandas
import yaml

import data_algebra.cdata
import data_algebra.cdata_impl
import data_algebra.yaml

# ask YAML to write simpler structures
data_algebra.yaml.fix_ordered_dict_yaml_rep()

iris = pandas.read_csv('iris_small.csv')
iris

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species,id
0,5.1,3.5,1.4,0.2,setosa,0
1,4.9,3.0,1.4,0.2,setosa,1
2,4.7,3.2,1.3,0.2,setosa,2


Our goal is to move from this normalized or wide-form into a tall form where information that is currently in multiple columns in a single row is in many rows with descriptive row-keys.

Or concretely we want our data to look like the following.

In [2]:
answer = pandas.read_csv("answer.csv")
answer

Unnamed: 0,id,Species,Part,Measure,Value
0,0,setosa,Petal,Length,1.4
1,0,setosa,Petal,Width,0.2
2,0,setosa,Sepal,Length,5.1
3,0,setosa,Sepal,Width,3.5
4,1,setosa,Petal,Length,1.4
5,1,setosa,Petal,Width,0.2
6,1,setosa,Sepal,Length,4.9
7,1,setosa,Sepal,Width,3.0
8,2,setosa,Petal,Length,1.3
9,2,setosa,Petal,Width,0.2


First we build a structure describing what we thing a data record looks like.  The simplest data records are exactly rows, but often meaningful records span many rows.  So let's describe the record structure we want.

In [3]:
control_table = pandas.DataFrame({
    'Part': ["Sepal", "Sepal", "Petal", "Petal"],
    'Measure': ["Length", "Width", "Length", "Width"],
    'Value': ["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"]
})
control_table

Unnamed: 0,Part,Measure,Value
0,Sepal,Length,Sepal.Length
1,Sepal,Width,Sepal.Width
2,Petal,Length,Petal.Length
3,Petal,Width,Petal.Width


Notice the above is literally an example of the desired record layout.  We then add a specification of which parts of the
record are keys (tell us which row is which), which are values (to be filled out by the transform), and how
we tell which rows are in the same record (the `record_key`).  This is shown below.

In [4]:
record_spec = data_algebra.cdata.RecordSpecification(
    control_table,
    control_table_keys = ['Part', 'Measure'],
    record_keys = ['id', 'Species']
    )
record_spec

RecordSpecification
   record_keys: ['id', 'Species']
   control_table_keys: ['Part', 'Measure']
   control_table:
       Part Measure         Value
   0  Sepal  Length  Sepal.Length
   1  Sepal   Width   Sepal.Width
   2  Petal  Length  Petal.Length
   3  Petal   Width   Petal.Width

The above is saying: we want each data record to be 4 rows internally keyed by the `Part` and `Measure` columns, and we expect which rows in a larger data frame that correspond to the same record to be identified by key-columns `id` and `Species`.  The "A.B" entries are stand-ins showing where we expect values to be placed.

Now we can transform our original row-record oriented data into general block records.  To do this we specify a `RecordMap` using our record specification to describe the outgoing record structure. The incoming record structure is implicitly assumed to be single-row records, unless we specify otherwise (using the `blocks_in` argument).

In [5]:
mp_to_blocks = data_algebra.cdata_impl.RecordMap(blocks_out=record_spec)
print(str(mp_to_blocks))

Transform row records of the form:
  record_keys: ['id', 'Species']
 ['id', 'Species', 'Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width']
to block records of structure:
RecordSpecification
   record_keys: ['id', 'Species']
   control_table_keys: ['Part', 'Measure']
   control_table:
       Part Measure         Value
   0  Sepal  Length  Sepal.Length
   1  Sepal   Width   Sepal.Width
   2  Petal  Length  Petal.Length
   3  Petal   Width   Petal.Width



Entries in the `RecordSpecification` that are not in columns mentioned `control_key_columns` are stand-in values that show where real values will later map. This is easiest to see by continuing the example.

So let's apply our specified transform.

In [6]:
arranged_blocks = mp_to_blocks.transform(iris)
arranged_blocks

Unnamed: 0,id,Species,Part,Measure,Value
0,0,setosa,Petal,Length,1.4
1,0,setosa,Petal,Width,0.2
2,0,setosa,Sepal,Length,5.1
3,0,setosa,Sepal,Width,3.5
4,1,setosa,Petal,Length,1.4
5,1,setosa,Petal,Width,0.2
6,1,setosa,Sepal,Length,4.9
7,1,setosa,Sepal,Width,3.0
8,2,setosa,Petal,Length,1.3
9,2,setosa,Petal,Width,0.2


We see the operation has been performed for us. Notice we specify the transform *declaratively* with data structures carrying deceptions of what we want, instead of having to build a sequence of verbs that realize the transformation.

An inverse transform is simply expressed by reversing the roles of the `blocks_out` and `blocks_in` arguments. In this case the output is row-records, as we didn't specify an outgoing block structure with `blocks_out`.

In [7]:
mp_to_rows = data_algebra.cdata_impl.RecordMap(blocks_in=record_spec)
print(str(mp_to_rows))

Transform block records of structure:
RecordSpecification
   record_keys: ['id', 'Species']
   control_table_keys: ['Part', 'Measure']
   control_table:
       Part Measure         Value
   0  Sepal  Length  Sepal.Length
   1  Sepal   Width   Sepal.Width
   2  Petal  Length  Petal.Length
   3  Petal   Width   Petal.Width
to row records of the form:
  record_keys: ['id', 'Species']
 ['id', 'Species', 'Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width']



In [8]:
arranged_rows = mp_to_rows.transform(arranged_blocks)
arranged_rows

Unnamed: 0,id,Species,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
0,0,setosa,5.1,3.5,1.4,0.2
1,1,setosa,4.9,3.0,1.4,0.2
2,2,setosa,4.7,3.2,1.3,0.2


Arbitrary record to record transforms can be specified by setting both `blocks_in` (to describe incoming structure) and `blocks_out` (to describe outgoing structure) at the same time.  data_algebra also implements all the transform steps in databases using `SQL` (via `data_algebra.db_model.row_recs_to_blocks_query()` and `data_algebra.db_model.blocks_to_row_recs_query()`).