A minor issue with the usual manner of using the data algebra
is one has to specify the table twice, once to get the description
of the table and once to trigger execution. This is by design.
Separating the operator specification from use is a major performance
improvement, as it allows the system to know if we are composing
operations or if we are sequencing operations. This also allows re-use
of operator pipelines on related tables, and makes exporting
pipelines to SQL much easier to specify.

Let's look at this issue.

The usual way of using data algebra is as follows

In [1]:
import pandas
from data_algebra.data_ops import *

d = pandas.DataFrame({
  'x': [1, 1, 2],
  'y': [5, 4, 3],
  'z': [6, 7, 8],
})

d

Unnamed: 0,x,y,z
0,1,5,6
1,1,4,7
2,2,3,8


In [2]:
ops = describe_table(d, table_name='d', keep_all=True). \
    drop_columns(['z'])

res_1 = ops.transform(d)

res_1

Unnamed: 0,x,y
0,1,5
1,1,4
2,2,3


The point being, there may be some dissatisfaction with
having to specify the table twice: once in `describe_table()`,
and once in the transform.

This is by design. We can avoid the issue. By the use of `keep_all=True` the operator platform
captures a copy of the original table. Such an augmented pipeline allows an `.ex()` call, which
executes the pipeline on the capture table copies.



In [3]:
ops.ex()

Unnamed: 0,x,y
0,1,5
1,1,4
2,2,3
