# StaticFrame from the Ground Up: Getting Started with Immutable DataFrames
### Christopher Ariza

# Introduction

Back in 2017 I found myself frequently asking: "is Pandas a suitable foundation for production library code?" While Pandas is well-known for its utility in data science, I consistently found its flexibility and implicit behaviors a detriment in building library code for production systems.

This led me to create StaticFrame, an alternative dataframe library built on an immutable data model. After years of development and use, I am confident that StaticFrame reduces opportunities for error and leads to more maintainable code. While not yet always more efficient than Pandas, in some areas StaticFrame offers very significant improvements in run-time and memory usage. Beyond common functionality, StaticFrame offers a more explicit and consistent API, novel multi-Frame containers and processors, and support for high-performance serialization through the NPZ format.

This notebook is designed as to provide a rapid, breadth-first survey of StaticFrame. How StaticFrame relates to Pandas is highlighted.

# What is a DataFrame?
* A 2D table with labelled axis (rows, columns)
    * Labels stay with data after selection
    * Operations align on labels
    * Can reindex axis based on labels
* Distinct from a simple 2D array
    * Labels can be any (hashable) type
    * Types can be hetergenous by column
* Just like a 2D array, a dataframe supports binary operators and broadcasting
    * Can multiply a dataframe by a constant, 1D, or 2D container
    * All operations align on labels, not order
* A high-level language (Python) can be used to implement dataframe functionality over a low-level, high-performance array library (NumPy)
    * A dataframe manages underlying arrays segmented by dtype
    * Index objects assigned to axis translate labels to array positions

# A Brief History of DataFrames

* 1991: earliest implementation of a dataframe in the S language
* 2009: Pandas 0.1 released
* 2018: StaticFrame 0.1 released
* There are presently a number of dataframe libraries in Python and other languages


# Why Not Just Use Pandas?

* Pandas supports in-place mutation
* Pandas API has many inconsistencies and non-orthogonal parameters
* Pandas only optionally supports unique indices (`verify_integrity` defaults to `False`)
* Pandas does not support all NumPy types (Unicode, `datetime64`)
* Pandas removed multi-frame containers (i.e., the `pd.Panel`)

* See also: https://dev.to/flexatone/ten-reasons-to-use-staticframe-instead-of-pandas-4aad

# Learning StaticFrame from Pandas

* Nearly everything you can do with Pandas you can do with StaticFrame
* There are few things Pandas does that StaticFrame does not do
    * No internal graphing / plotting support
    * Few internal implementations of calculations available elsewhere (NumPy, SciPy)
* Much of what you already know from Pandas will directly translate
    * Many interfaces and methods are identical
    * StaticFrame has more numerous, more narrow interfaces with keyword only arguments
    * StaticFrame follows hierarchical naming
* You can go back and forth
    * `sf.Frame.to_pandas()`
    * `sf.Frame.from_pandas()`

# Learning StaticFrame from Examples
* Examples used here are intentionally compact
* Examples mostly on `sf.Frame`: interfaces on `sf.Series` are often identical

# StaticFrame Development

* Development
    * Code contributions from a small pool of developers
    * Feature and design contributions from multiple internal teams
    * New contributors are welcome!
* Quality & Test
    * 100% test coverage
    * Robust CI/CD with MyPy, Pylint, and multiplatform test
* Documentation
    * Fully code-generated API documentation (https://static-frame.readthedocs.io)
    * Every object exposes API via `interface` attribute
* Very Few Core Dependencies
    * NumPy
    * Team-maintained CPython extension libraries: `automap`, `arraykit`
    * Additional, optional dependencies provide support for different serialization formats
* Releases
    * Stable API on minor releases (i.e., 0.9 will introduce backward incompatibilities on 0.8)
    * 1.0 Pending `arraykit` implementation of delimited file readers to fix known issues, maybe by end of 2022


# Installing & Importing

* Available via pip, conda-forge
* `import static_frame as sf`


In [1]:
import static_frame as sf
import numpy as np

# The `sf.Frame` & the `sf.Series`
* A `sf.Series` is a 1D array (of a single dtype) with labels 
* A `sf.Frame` is a 2D container (of one or more columnar dtypes) with row and column labels
* When extracting a row or column from a `sf.Frame`, we get a `sf.Series`.
* Support for higher-dimensional data
    * Use hierarchical indices on a 2D container (i.e., the `sf.IndexHierarchy`)
    * Use multi-`sf.Frame` containers (i.e., the `sf.Bus`)

# Anatomy of a Frame

* A `sf.Frame` wraps 1D and 2D NumPy arrays
* NumPy dtype are partitioned by column
* Each axis is labelled with an `sf.Index` (or subclass)
    * Row labels available via `sf.Frame.index`
    * Column labels available via `sf.Frame.columns`
    * An `sf.IndexAutoFactory` can be used to create integer labels
* Hashable metadata via `name` attributes on all containers
    * `sf.Frame.name` (StaticFrame only)
    * `sf.Frame.index.name`
    * `sf.Frame.columns.name`

# Getting Data In & Out: Constructors & Exporters

* Constructors always live on containers (i.e., `sf.Frame`)
    * `pd.read_csv()`, `pd.DataFrame.from_records()`
    * `sf.Frame.from_csv()`, `sf.Frame.from_records()`
* Explicit constructors with narrow functionality
    * `pd.DataFrame()` supports a single element, or a column of elements
    * `sf.Frame.from_element()`, `sf.Frame.from_elements()`
* Support for common serialization formats
    * `pd.read_excel()`, `pd.read_csv()`, `pd.read_parquet()`
    * `sf.Frame.from_xlsx()`, `sf.Frame.from_csv()`, `sf.Frame.from_parquet()`
* Serialization methods exclusive to StaticFrame
    * NPZ and NPY formats faster than Parquet with comparable file sizes
    * Encodes all `sf.Frame` characteristics
    * NPY supports memory mapping out-of-core data
    * `sf.Frame.to_npz()`, `sf.Frame.from_npz()`

In [2]:
# Creating a Frame from row iterables
f = sf.Frame.from_records(((True, 20, '1954-11-02'), 
                           (False, 30, '2020-04-28')))
# Force a string representation 
print(str(f))

<Frame>
<Index> 0      1       2          <int64>
<Index>
0       True   20      1954-11-02
1       False  30      2020-04-28
<int64> <bool> <int64> <<U10>


# String Representations

* `sf.Frame.__repr__()` provides more information than `pd.DataFrame.__repr__()`
* Shows types and `name` of `Frame`, `.index`, and `.columns`
* Shows NumPy dtypes for each column, `.index`, and `.columns`
* In terminal environments can use colors for types, dtypes
* Completely configurable with `sf.DisplayConfig`

In [3]:
# Creating a Frame with Frame subclass, Index subclasses, name attributes
f = sf.FrameGO.from_records(((True, 20, '1954-11'), (False, 30, '2020-04')), 
        index=sf.IndexYear(('1954', '2020'), name='year'),
        columns=('A', 'B', 'C'),
        name='records', 
        )
print(str(f))

<FrameGO: records>
<IndexGO>          A      B       C       <<U1>
<IndexYear: year>
1954               True   20      1954-11
2020               False  30      2020-04
<datetime64[Y]>    <bool> <int64> <<U7>


# Representation in Jupyter Notebooks

* Default is an HTML table repsentation
* `name` attributes, type, and dtype information are hidden by default

In [4]:
f1 = sf.Frame.from_records(((True, 20, '1954-11-02'), (False, 30, '2020-04-28')), 
                            index=tuple('xy'), columns=tuple('ABC'))
f1

Unnamed: 0,A,B,C
x,True,20,1954-11-02
y,False,30,2020-04-28


# Finding All Constructors

* Every StaticFrame container has an `.interface` attribute
* `.interface` returns a `sf.Frame` of the complete API
* The same representation is used to populate API overview: https://static-frame.readthedocs.io/en/latest/api_overview/frame.html


In [5]:
# Using the interface attribute to show the signature of all constructors
f = sf.Frame.interface
f.loc[f['group'] == 'Constructor'].head(4)

Unnamed: 0,cls_name,group,doc
"__init__(data, *, index, columns, ...)",Frame,Constructor,Initializer. Args: data: Default Frame initialization requires typed data such a...
"from_arrow(value, *, index_depth, index_name_depth_level, ...)",Frame,Constructor,Realize a Frame from an Arrow Table. Args: value: A pyarrow.Table instance. inde...
"from_clipboard(*, delimiter, index_depth, index_column_first, ...)",Frame,Constructor,Create a Frame from the contents of the clipboard (assuming a table is stored as...
"from_concat(frames, *, axis, union, ...)",Frame,Constructor,Concatenate multiple Frames into a new Frame. If index or columns are provided a...


# Constructors Are Class Methods
* Pandas places some constructors on the `pd` name space
* All StaticFrame constructors are class methods on classes
* Creating a Frame from concatenation 
    * Pandas: `pd.concat()`
    * StaticFrame: `sf.Frame.from_concat()`, `sf.Frame.from_concat_items()`
* Creating a Frame from other `sf.Frame`s by overlaying on missing values
    * Pandas offers an instance method: `pd.DataFrame.combine_first()`
    * StaticFrame offers a class method for combining one or more `sf.Frame`: `sf.Frame.from_overlay()`

# `sf.Frame` and `sf.Series` Are Like Python `dict`
* Both containers have `items()` and `keys()` methods
* Both containers have a `values` attribute (not a method like on `dict`)
* `dict`-like Interfaces are almost the same in Pandas and StaticFrame
* `sf.Frame`:
    * `keys()` returns column labels
    * `items()` returns pairs of label, column `sf.Series`
    * `values` returns a homogenized array (often requiring dtype casting)
    * `iter(sf.Frame)` iterates `keys()`
* `sf.Series`:
    * `keys()` returns labels
    * `items()` returns pairs of label, element values
    * `values` returns the 1D immutable array (a no-copy operation)
    * `iter(sf.Series)` iterates `keys()`
* Deviation from Pandas:
    * With Pandas, `iter(pd.Series)` iterates `values`, not `keys()`
    * Pandas deviates from the expected `dict`-like interface

In [6]:
# sf.Frame.keys() iterates column labels
display(f1)
tuple(f1.keys())

Unnamed: 0,A,B,C
x,True,20,1954-11-02
y,False,30,2020-04-28


('A', 'B', 'C')

In [7]:
# sf.Frame.items() iterates pairs of label, sf.Series
tuple(f1.items())

(('A',
  <Series: A>
  <Index>
  x           True
  y           False
  <<U1>       <bool>),
 ('B',
  <Series: B>
  <Index>
  x           20
  y           30
  <<U1>       <int64>),
 ('C',
  <Series: C>
  <Index>
  x           1954-11-02
  y           2020-04-28
  <<U1>       <<U10>))

In [8]:
# sf.Frame.values is an array representation of the sf.Frame
f1.values

array([[True, 20, '1954-11-02'],
       [False, 30, '2020-04-28']], dtype=object)

# Selection Interfaces
* StaticFrame implements all NumPy and Pandas-style selection routines
* Selection interfaces
    * `loc[]`: use lables
    * `iloc[]`: use integer position (from zero)
    * `bloc[]`: use Boolean indicator (StaticFrame only)
* NumPy-style per-axis selection values 
    * A single label (a tuple is a single label)
    * A list of labels (must be a list to distinguish from a tuple label)
    * A slice of labels
    * A 1D Boolean array


# Selection Interfaces on `sf.Frame`
    
* `sf.Frame[]`: root `__getitem__()` selection 
    * `pd.DataFrame[]` selects by column labels, or row and column labels, or by 2D Boolean array
    * `sf.Frame[]` is exclusively column selection
* `sf.Frame.loc[]`: select rows, optionally columns, by label (same as Pandas)
* `sf.Frame.iloc[]`: select rows, optionally columns, by integer position (same as Pandas)
* `sf.Frame.bloc[]`: select with a 2D Boolean array (StaticFrame only)

In [9]:
f1 = sf.Frame.from_records(((True, 20, '1954-11-02'), (False, 30, '2020-04-28')), 
                            index=tuple('xy'), columns=tuple('ABC'))
display(f1)
f1['B'] # Select a column with a single label

Unnamed: 0,A,B,C
x,True,20,1954-11-02
y,False,30,2020-04-28


0,1
x,20
y,30


In [10]:
display(f1.columns == 'C')
# Select columns with a Boolean indicator
f1[f1.columns == 'C'] 

array([False, False,  True])

Unnamed: 0,C
x,1954-11-02
y,2020-04-28


In [11]:
f1.loc['y':, ['A', 'C']] # Select a row with a slice and list of labels

Unnamed: 0,A,C
y,False,2020-04-28


In [12]:
f1.iloc[-1, -1] # Select an element with iloc labels

'2020-04-28'

In [13]:
display(f1.isin([20, '2020-04-28']))
f1.bloc[f1.isin([20, '2020-04-28'])] # Selcting non-contiguous values

Unnamed: 0,A,B,C
x,False,True,False
y,False,False,True


0,1
"('x', 'B')",20
"('y', 'C')",2020-04-28


# Mixing `loc` and `iloc` Selection

* `sf.ILoc` (StaticFrame only) permits embedding `iloc`-style selection in a `loc` selection
* There exists a related interface for embedding hierarchical selection in `loc` selection
    * `sf.HLoc` (similar to `pd.IndexSlice`)
    * For use on axis with `sf.IndexHierarchy`

In [14]:
display(f1)
f1.loc[sf.ILoc[-1], ['A', 'C']] # Get the last row, columns A and C

Unnamed: 0,A,B,C
x,True,20,1954-11-02
y,False,30,2020-04-28


0,1
A,False
C,2020-04-28


# Dropping
* The inverse of selection is dropping 
* `sf.Frame.drop` interface exposes the same selection interfaces as attributes
    * `sf.Frame.drop[]`
    * `sf.Frame.drop.loc[]`
    * `sf.Frame.drop.iloc[]`
* This approach of interfaces that expose sub-component interfaces is common in StaticFrame

In [15]:
# Drop two columns
display(f1)
f1.drop[['A', 'C']]

Unnamed: 0,A,B,C
x,True,20,1954-11-02
y,False,30,2020-04-28


Unnamed: 0,B
x,20
y,30


In [16]:
# Drop the last row and drop column "A"
f1.drop.loc[sf.ILoc[-1], 'A']

Unnamed: 0,B,C
x,20,1954-11-02


# Handling Missing Values
* Missing values are `None` and `np.nan` (same as Pandas)
* Boolean indicators (same as Pandas)
    * `sf.Frame.isna()`
    * `sf.Frame.notna()`
* Replacing missing values with new containers (same as Pandas)
    * `sf.Frame.dropna()`
    * `sf.Frame.fillna()`

# Handling Falsy Values
* Sometimes we want to treat `0` or `''` or `()` as missing
* Functions corresponding to `*na` functions (StaticFrame only)
    * `sf.Frame.isfalsy()`
    * `sf.Frame.notfalsy()`
    * `sf.Frame.dropfalsy()`
    * `sf.Frame.fillfalsy()`

# Fill Missing Values Along an Axis
* Fill the first or last non-missing observation up to the `limit` parameter.
    * Related functionaliy provided in `pd.DataFrame.fillna()`
    * `sf.Frame.fillna_forward()`
    * `sf.Frame.fillna_backward()`
* Fill the leading or trailing missing values with a provided value
    * StaticFrame only
    * `sf.Frame.fillna_leading()`
    * `sf.Frame.fillna_trailing()`

# Fill Falsy Values Along an Axis
* StaticFrame only
* Fill the first or last non-missing observation up to the `limit` parameter.
    * `sf.Frame.fillfalsy_forward()`
    * `sf.Frame.fillfalsy_backward()`
* Fill the leading or trailing missing values with a provided value
    * `sf.Frame.fillfalsy_leading()`
    * `sf.Frame.fillfalsy_trailing()`

# Immutability and "No-Copy" Operations
* Immutability reduces opportunities for errors 
* NumPy provides no-copy "views" of array data when possible
* With immutabile arrays, we can pass around views without making defensive copies
* Examples:
    * Renaming an `sf.Frame` is no-copy
    * Relabelling `index` or `columns` does not copy underlying arrays
    * Horizontal concatenation of same-index components is no-copy
* Pandas support for mutation, combined with NumPy views, leads to the commonly observed Pandas `SettingWithCopyWarning`

# Assignment with Immutable Frames
* Pandas permits in-place assignment and mutation to all types of selections
    * `pd.DataFrame.loc['x', 'B':] = 1.0`
* StaticFrame offers an `assign` interface that defines a selection that is then called with a value to assign
* The value to assign can be an element or labelled data (i.e., `sf.Series`, `sf.Frame`)
* Example: `sf.Frame.assign.loc['x', 'B':](1.0)`
    * Returns a new container
    * Unchanged columns will be views and re-used (no-copy)

In [17]:
# Assigning a value to a slice in a single row
display(f1)
f1.assign.loc['x', 'B':](-1)

Unnamed: 0,A,B,C
x,True,20,1954-11-02
y,False,30,2020-04-28


Unnamed: 0,A,B,C
x,True,-1,-1
y,False,30,2020-04-28


In [18]:
# Assigning a Series to a column, matching on label
f1.assign['B'](sf.Series(('y', 'x'), index=('y', 'x')))

Unnamed: 0,A,B,C
x,True,x,1954-11-02
y,False,y,2020-04-28


# Grow-Only Mutation
* Pandas permits growing a DataFrame by columns (efficient) and rows (very inefficient)
* The `sf.FrameGO` permits grow-only column addition or whole-`sf.Frame` extension
* While the `sf.Frame.GO` container is mutable, underlying array data always remains immutable
    * Going from an `sf.Frame` to an `sf.FrameGO` is a no-copy operations
    * `sf.FrameGO` are often used within a narrow scope (i.e., a single function)
* Growing rows is never permitted (use `sf.Frame.from_concat()` with collected rows)

In [19]:
# Adding a column to a FrameGO; column arrays in f1 are not copied
display(f1)
f2 = f1.to_frame_go()
f2['D'] = (34, 87)
f2

Unnamed: 0,A,B,C
x,True,20,1954-11-02
y,False,30,2020-04-28


Unnamed: 0,A,B,C,D
x,True,20,1954-11-02,34
y,False,30,2020-04-28,87


In [20]:
# Extending a FrameGO with another Frame
# On aligned indices this is a no-copy operation from f2, f3
f3 = (f1[['A', 'B']] * 100).relabel(columns=lambda l: l.lower())
f2.extend(f3)
f2

Unnamed: 0,A,B,C,D,a,b
x,True,20,1954-11-02,34,100,2000
y,False,30,2020-04-28,87,0,3000


# A Family of `sf.Frame`

* Pandas has only one `DataFrame` class
* StaticFrame has a family
    * `sf.Frame`
    * `sf.FrameGO`: a grow-only `sf.Frame`
    * `sf.FrameHE`: a hashable `sf.Frame`
        * HE for `__hash__` and `__eq__`, the methods implemented to support hashability
        * Some hashing scenarios may require a full values comparison for lookup
* Methods exist to easily convert between all three (always a no-copy operation)
    * `sf.Frame.to_frame_go()`
    * `sf.Frame.to_frame_he()`
    * `sf.FrameGO.to_frame()`
    * `sf.FrameGO.to_frame_he()`
    * `sf.FrameHE.to_frame()`
    * `sf.FrameHE.to_frame_go()`


In [21]:
# A FrameHE as a key in a dictionary
f = sf.Frame(np.arange(4).reshape(2, 2)).to_frame_he()
d = {f: True} 
f in d

True

# Changing Columnar dtypes

* `sf.Frame.astype()` can be used to retype an entire Frame (same as Pandas)
* Can use column selection to isolate targets
    * `sf.Frame.astype[['A', 'B']](float)`
    * Similar to `sf.Frame.drop`, `sf.Frame.assign` interfaces
* Returns a new `sf.Frame`
* Unaffected columns will not be copied

In [22]:
# Converting two columns to a float dtype
display(f1)
f1.astype[['A', 'B']](float)

Unnamed: 0,A,B,C
x,True,20,1954-11-02
y,False,30,2020-04-28


Unnamed: 0,A,B,C
x,1.0,20.0,1954-11-02
y,0.0,30.0,2020-04-28


# Full Support for All NumPy dtypes
* NumPy is the foundation of StaticFrame and Pandas
* Pandas only supports a subset of NumPy dtypes; StaticFrame supports all
* NumPy's fixed-size Unicode arrays
    * Optimal when string are diverse and of similar size
    * Pandas always converts these to object arrays of Python strings
* NumPy's `datetime64` type
    * Fast datetime representation with units defining resolution (from year to attosecond)
    * Pandas coerces any `datetime64` to nanosecond units
    * StaticFrame permits using year, date, or any `datetime64` unit
    * See also: https://www.youtube.com/watch?v=jdnr7sgxCQI

In [23]:
# By default, StaticFrame always shows all types and dtypes
print(str(f1))
# Can get a Series by column label
f1.dtypes

<Frame>
<Index> A      B       C          <<U1>
<Index>
x       True   20      1954-11-02
y       False  30      2020-04-28
<<U1>   <bool> <int64> <<U10>


0,1
A,bool
B,int64
C,<U10


In [24]:
# Can convert Unicode dtypes to Python string object (as Pandas)
print(str(f1.astype['C'](object)))

<Frame>
<Index> A      B       C          <<U1>
<Index>
x       True   20      1954-11-02
y       False  30      2020-04-28
<<U1>   <bool> <int64> <object>


In [25]:
# Can convert strings to NumPy datetime64 date objects
print(str(f1.astype['C'](np.datetime64)))

<Frame>
<Index> A      B       C               <<U1>
<Index>
x       True   20      1954-11-02
y       False  30      2020-04-28
<<U1>   <bool> <int64> <datetime64[D]>


# A Family of `sf.Index`

* To use `datetime64` as index labels, use a `datetime64` `sf.Index` subclass
    * `sf.IndexDate`, `sf.IndexYearMonth`, `sf.IndexYear`, etc.
    * Provides robust translation from Python date / datetime objects
    * Provides partial selection with less granular date units
    * Provides alternative constructors to build date ranges
* Hierarchical indices with `sf.IndexHierarchy`
* Many interfaces expose `index_constructor` arguments to specify what kind of index to make.
    

In [26]:
# Transfer a column to an index
f4 = f1.set_index('C', drop=True, index_constructor=sf.IndexDate)
print(str(f4))

<Frame>
<Index>         A      B       <<U1>
<IndexDate: C>
1954-11-02      True   20
2020-04-28      False  30
<datetime64[D]> <bool> <int64>


In [27]:
# Selection with a less granular unit (year)
f4.loc['2020']

Unnamed: 0,A,B
2020-04-28,False,30


In [28]:
# sf.IndexDate understands Python datetime / date objects
import datetime
f4.loc[datetime.date(1954, 11, 2)]

0,1
A,True
B,20


In [29]:
# Removing an index, similar to Pandas pd.DataFrame.reset_index()
print(str(f4.unset_index()))

<Frame>
<Index> C               A      B       <<U1>
<Index>
0       1954-11-02      True   20
1       2020-04-28      False  30
<int64> <datetime64[D]> <bool> <int64>


# Rename, Reindex, Relabel
* Changing "outer" attributes of a `sf.Frame`
* Will always try to reuse as much data as possible
* `rename()` sets the `name` attribute on all containers
    * `pd.DataFrame.rename()` relabels the axis, `pd.Series.rename()` sets the name of the container
    * `sf.Frame.rename()`, `sf.Series.rename()` all do the same thing
    * Always a no-copy operation
* `reindex()` applies new index, aligning to the previous index
    * Can transform the shape of the `sf.Frame`
    * Similar to `pd.DataFrame.reindex()`
    * Conform the old index to the new index, where matching labels retain their data
    * New labels will introduce missing values (provided with a `fill_value`)
* `relabel()` applies a new index, regardless of alignment to previous index
    * Cannot change the shape of the `sf.Frame`
    * Three values accepted per axis
        * Can map old to new with `dict`
        * Can process old to new with a function
        * Can replace with a new `sf.Index` or iterable
    * Always a no-copy operation

In [30]:
# Renaming the sf.Frame, the index, and the columns
print(f1.rename('p', index='q', columns='r'))

<Frame: p>
<Index: r> A      B       C          <<U1>
<Index: q>
x          True   20      1954-11-02
y          False  30      2020-04-28
<<U1>      <bool> <int64> <<U10>


In [31]:
# Reindexing index and columns, filling new values with ""
f1.reindex(index=tuple('yz'), columns=tuple('ACD'), fill_value='')

Unnamed: 0,A,C,D
y,False,2020-04-28,
z,,,


In [32]:
# Relabelling index by assignment, columns by function (can also use a mapping)
f1.relabel(index=(-1, -2), columns=lambda l: l.lower())

Unnamed: 0,a,b,c
-1,True,20,1954-11-02
-2,False,30,2020-04-28


# Iterating Components of an `sf.Frame`
* Iterating elements: `Frame.iter_elements()`
* Iterating rows or columns:
    * Specify axis, 1 for by row, 0 for column
    * Distinct methods determine what container to return
        * `Frame.iter_series()`
        * `Frame.iter_tuple()`
        * `Frame.iter_array()`
    * The "lighter" the container, the better the performance

In [33]:
# Create an sf.FrameGO and add a column
f5 = sf.FrameGO(np.arange(18).reshape(6,3), columns=tuple('ABC'))
f5['D'] = tuple('abbaca')
f5

Unnamed: 0,A,B,C,D
0,0,1,2,a
1,3,4,5,b
2,6,7,8,b
3,9,10,11,a
4,12,13,14,c
5,15,16,17,a


In [34]:
# Iteration of elements proceed row-wise (by default)
tuple(f5.iter_element())[:18]

(0, 1, 2, 'a', 3, 4, 5, 'b', 6, 7, 8, 'b', 9, 10, 11, 'a', 12, 13)

In [35]:
# Iterating sf.Series or array by axis 1 iterates rows; next() gets the first
display(next(iter(f5.iter_series(axis=1))))
next(iter(f5.iter_array(axis=1)))

0,1
A,0
B,1
C,2
D,a


array([0, 1, 2, 'a'], dtype=object)

In [36]:
# Iterating Series or array by axis 0 iterates columns, next() gets the first
display(next(iter(f5.iter_series(axis=0))))
next(iter(f5.iter_array(axis=0)))

0,1
0,0
1,3
2,6
3,9
4,12
5,15


array([ 0,  3,  6,  9, 12, 15])

# Applying Functions & Maps
* Function application implies iteration
* Choose what you want to iterate on and call `apply()`
* Can multi-process / thread with `apply_pool()`
* Can iterate through results with `apply_iter()`
* Can apply a map (i.e, `dict` or `sf.Series`) instead of function
    * `map_all()`: if value not mappable, raise
    * `map_any()`: map what you can, leave the rest unchanged
    * `map_fill()`: map what you can, provide `fill_value` for others

In [37]:
# Using iter_element.apply(), we get back a same-shaped container
f5.iter_element().apply(lambda e: f'--{e}--')

Unnamed: 0,A,B,C,D
0,--0--,--1--,--2--,--a--
1,--3--,--4--,--5--,--b--
2,--6--,--7--,--8--,--b--
3,--9--,--10--,--11--,--a--
4,--12--,--13--,--14--,--c--
5,--15--,--16--,--17--,--a--


In [38]:
# Replacing values with map_any()
f5.iter_element().map_any({0:'', 4:'', 8:'', 'a':''})

Unnamed: 0,A,B,C,D
0,,1.0,2.0,
1,3.0,,5.0,b
2,6.0,7.0,,b
3,9.0,10.0,11.0,
4,12.0,13.0,14.0,c
5,15.0,16.0,17.0,


In [39]:
# Apply a sf.Series-consuming fuction to each row
# Returns a sf.Series labled by index
display(f5)
f5[:'C'].iter_series(axis=1).apply(lambda s: s['A'] / s['C'])

Unnamed: 0,A,B,C,D
0,0,1,2,a
1,3,4,5,b
2,6,7,8,b
3,9,10,11,a
4,12,13,14,c
5,15,16,17,a


0,1
0,0.0
1,0.6
2,0.75
3,0.8181818181818182
4,0.8571428571428571
5,0.8823529411764706


In [40]:
# Apply a sf.Series-consuming fuction to each column
# Returns a sf.Series labled by columns
display(f5)
f5[:'C'].iter_series(axis=0).apply(lambda s: s[3] / s[5])

Unnamed: 0,A,B,C,D
0,0,1,2,a
1,3,4,5,b
2,6,7,8,b
3,9,10,11,a
4,12,13,14,c
5,15,16,17,a


0,1
A,0.6
B,0.625
C,0.6470588235294118


In [41]:
# Choose the container to iterate depending on your needs
# Iterating arrays is always faster
f5[:'C'].iter_array(axis=0).apply(lambda a: a[3] / a[5])

0,1
A,0.6
B,0.625
C,0.6470588235294118


# Grouping & Windowing

* Grouping and windowing are two different ways of collecting sub-`sf.Frame`s from an `sf.Frame`
* Just another type of `sf.Frame` iterator
* `sf.Frame.iter_group()`
    * Group by unique values in one or more columns (axis 0) or rows (axis 1)
    * Can use `apply()` if reducing to an `sf.Series`
    * `sf.Frame.iter_group_items()` returns pairs of group label, sub-`sf.Frame`
    * Can use an `sf.Batch` for performing operations on sub-Frames like `pd.DataFrameGroupBy`
* `sf.Frame.iter_window()`
    * `sf.Frame.iter_window_items()` returns pairs of window label, sub-`sf.Frame`
    * Can use an `sf.Batch` for performing operations on sub Frames like `pd.Rolling`
    * `sf.Frame.iter_window_array()` available when only array data is needed

In [42]:
f5

Unnamed: 0,A,B,C,D
0,0,1,2,a
1,3,4,5,b
2,6,7,8,b
3,9,10,11,a
4,12,13,14,c
5,15,16,17,a


In [43]:
# Iterate the first group by unique values found in column 'D'
it = iter(f5.iter_group('D'))
display(next(it))
display(next(it))
display(next(it))

Unnamed: 0,A,B,C,D
0,0,1,2,a
3,9,10,11,a
5,15,16,17,a


Unnamed: 0,A,B,C,D
1,3,4,5,b
2,6,7,8,b


Unnamed: 0,A,B,C,D
4,12,13,14,c


In [44]:
# Applying a function returns a sf.Series labelled by group 
# drop removes the column or columns used in grouping
f5.iter_group('D', drop=True).apply(lambda f: f.size)

0,1
a,9
b,6
c,3


In [45]:
# Can do operations on all groups (each sf.Frame) with a sf.Batch
# Need to deliver to sf.Batch pairs of label, sf.Frame
sf.Batch(
    f5.iter_group_items('D')
).loc[sf.ILoc[-1], ['A', 'C']].sum().to_frame()

Unnamed: 0,None
a,32
b,14
c,26


In [46]:
# Windowing axis 0 collects rows; many parameters to configure window shape, size, and step
it = iter(f5.iter_window(size=2, step=2, axis=0))
display(next(it))
display(next(it))
display(next(it))

Unnamed: 0,A,B,C,D
0,0,1,2,a
1,3,4,5,b


Unnamed: 0,A,B,C,D
2,6,7,8,b
3,9,10,11,a


Unnamed: 0,A,B,C,D
4,12,13,14,c
5,15,16,17,a


In [47]:
# Windowing axis 1 collects columns
it = iter(f5.iter_window(size=3, step=1, axis=1))
display(next(it))
display(next(it))

Unnamed: 0,A,B,C
0,0,1,2
1,3,4,5
2,6,7,8
3,9,10,11
4,12,13,14
5,15,16,17


Unnamed: 0,B,C,D
0,1,2,a
1,4,5,b
2,7,8,b
3,10,11,a
4,13,14,c
5,16,17,a


In [48]:
# Better performance available by iterating arrays
it = iter(f5.iter_window_array(size=2, step=2))
display(next(it))
display(next(it))
display(next(it))

array([[0, 1, 2, 'a'],
       [3, 4, 5, 'b']], dtype=object)

array([[6, 7, 8, 'b'],
       [9, 10, 11, 'a']], dtype=object)

array([[12, 13, 14, 'c'],
       [15, 16, 17, 'a']], dtype=object)

In [49]:
# Processing window items with an sf.Batch
# Need to deliver to sf.Batch pairs of label, sf.Frame
sf.Batch(f5.iter_window_items(size=2, step=2))[['A', 'C']].mean().to_frame()

Unnamed: 0,A,C
1,1.5,3.5
3,7.5,9.5
5,13.5,15.5


# Processing Collections of Frames
* `sf.Batch` is a general-purpose interface for processing collections of `sf.Frame`
* Consumes an iterator of label, `sf.Frame`
* Methods and selection called on `sf.Batch` are called on each `sf.Frame` in the iterator
* Chained operations permit function pipelining
* Each step can use mutli-processing / threading to process each `sf.Frame` 
* More here: https://static-frame.readthedocs.io/en/latest/articles/uhoc.html


# Working with Collections of Frames
* Pandas deprecated the `pd.Panel` for 3D data
* Hierarchical indices incur overhead and force loading all data at once
* The `sf.Bus` provide a novel alternative
    * Offers a `sf.Series`-like interface to collections of `sf.Frame`s
    * Can read to and write from multi-table storage formats
        * XLSX, HDF5, SQLite
            * XLSX authoring similar to Pandas `pd.ExcelWriter`
            * HDF5 authoring similar to Pandas `pd.HDFStore`
        * Zipped archives of CSV, TSV, Parquet, and NPZ
    * When reading from disk, loads lazily, only when data is accesed
    * Optionally unloads eagerly with `max_persist` argument
* A family of higher-order containers
    * The `sf.Yarn` lazily links `sf.Bus`
    * The `sf.Quilt` is a virtual concatenation of the contents of a `sf.Bus`
* More here: https://static-frame.readthedocs.io/en/latest/articles/uhoc.html

In [50]:
# Creating a sf.Bus from an iterable of label, sf.Frame pairs
b = sf.Bus.from_items((('f1', f1), ('f3', f3), ('f5', f5.to_frame())))
b

<Bus>
<Index>
f1      Frame
f3      Frame
f5      Frame
<<U2>   <object>

In [51]:
# When reading from a file store, loading is lazy
# When creating a sf.Bus from in-memory sf.Frame, all are loaded
b.status

Unnamed: 0,loaded,size,nbytes,shape
f1,True,6.0,98.0,"(2, 3)"
f3,True,4.0,32.0,"(2, 2)"
f5,True,24.0,168.0,"(6, 4)"


In [52]:
# Accessing a single element provides an sf.Frame
b['f3']

Unnamed: 0,a,b
x,100,2000
y,0,3000


In [53]:
# Using a sf.Batch, all sf.Frame in a sf.Bus can be processed and/or combined
sf.Batch(b.items()).to_frame(fill_value='')

Unnamed: 0,Unnamed: 1,A,B,C,D,a,b
f1,x,True,20.0,1954-11-02,,,
f1,y,False,30.0,2020-04-28,,,
f3,x,,,,,100.0,2000.0
f3,y,,,,,0.0,3000.0
f5,0,0,1.0,2,a,,
f5,1,3,4.0,5,b,,
f5,2,6,7.0,8,b,,
f5,3,9,10.0,11,a,,
f5,4,12,13.0,14,c,,
f5,5,15,16.0,17,a,,


# By Way of `via`

* Alternate interfaces for "viewing" a container (or its elements) differently
* Provides a hierarchical interface
* Available on `sf.Frame`, `sf.Series`, and `sf.Index`
* Some have corresponding Pandas interfaces, some are StaticFrame only

# Interfaces for Working with Strings
* `sf.Frame.via_str`, similar to `pd.Series.str`
* Exposes Python string interfaces for application to all elements
* https://static-frame.readthedocs.io/en/latest/api_overview/frame.html#frame-accessor-string

In [54]:
display(f1)
f1.via_str.upper()

Unnamed: 0,A,B,C
x,True,20,1954-11-02
y,False,30,2020-04-28


Unnamed: 0,A,B,C
x,True,20,1954-11-02
y,False,30,2020-04-28


In [55]:
f1.via_str.replace('0', '+')

Unnamed: 0,A,B,C
x,True,2+,1954-11-+2
y,False,3+,2+2+-+4-28


# Interfaces for Working with Dates
* `sf.Frame.via_dt`, similar to `pd.Series.dt`
* Exposes Python `date`, `datetime` interface for application on `np.datetime64` types
* https://static-frame.readthedocs.io/en/latest/api_overview/frame.html#frame-accessor-datetime

In [56]:
display(f1)
f1['C'].astype(np.datetime64).via_dt.month

Unnamed: 0,A,B,C
x,True,20,1954-11-02
y,False,30,2020-04-28


0,1
x,11
y,4


In [57]:
f1['C'].astype(np.datetime64).via_dt.year

0,1
x,1954
y,2020


In [58]:
f1['C'].astype(np.datetime64).via_dt.weekday()

0,1
x,1
y,1


# Interfaces for Applying Regular Expressions
* `sf.Frame.via_re` 
* Similar to `pd.Series.str.extract()`, but provides full interface from `re` module
* https://static-frame.readthedocs.io/en/latest/api_overview/frame.html#frame-accessor-regular-expression

In [59]:
display(f1)
# Match any element with "2" or "a"
f1.via_re('[2a]').search()

Unnamed: 0,A,B,C
x,True,20,1954-11-02
y,False,30,2020-04-28


Unnamed: 0,A,B,C
x,False,True,True
y,True,False,True


# Configuring `fill_value` in Operator Application

* Binary operators on labelled containers may force reindexing
* By default, no way to provide fill value
* `sf.Frame.via_fill_value()` permits providing a fill value
* Pandas offers related functionality with `pd.DataFrame.add()`, `pd.DataFrame.sub()`, `pd.DataFrame.mul()`, etc., methods.

In [60]:
display(f1['B'])
# Default binary operator application takes the union index and uses `nan` as a fill value
f1['B'] * sf.Series((1000, 1, .001), index=tuple('zyx'))

0,1
x,20
y,30


0,1
x,0.02
y,30.0
z,


In [61]:
# Using via_fill_value() a fill value can be specified
f1['B'].via_fill_value(0) * sf.Series((1000, 1, .001), index=tuple('zyx'))

0,1
x,0.02
y,30.0
z,0.0


# Virtual Transposition in Operator Application
* Applying a 1D container on a 2D container applies to rows
* `sf.Frame.via_T` presents 2D containers "virtually" transposed
* Useful for applying a 1D container to the columns (not rows) of a 2D container
* Pandas offers related functionality with `pd.DataFrame.add()`, `pd.DataFrame.sub()`, `pd.DataFrame.mul()`, etc., methods.

In [62]:
# 2D to 1D assumes row-wise application
display(sf.Frame(np.arange(8).reshape(2, 4), index=tuple('xy')))
display(f1['B'])
# The columns of the sf.Frame and the index of the Series have no labels in common 
sf.Frame(np.arange(8).reshape(2, 4), index=tuple('xy')) * f1['B']

Unnamed: 0,0,1,2,3
x,0,1,2,3
y,4,5,6,7


0,1
x,20
y,30


Unnamed: 0,0,1,2,3,y,x
x,,,,,,
y,,,,,,


In [63]:
# Using via_T, can perform column-wise application
display(sf.Frame(np.arange(8).reshape(2, 4), index=tuple('xy')))
display(f1['B'])
sf.Frame(np.arange(8).reshape(2, 4), index=tuple('xy')).via_T * f1['B']

Unnamed: 0,0,1,2,3
x,0,1,2,3
y,4,5,6,7


0,1
x,20
y,30


Unnamed: 0,0,1,2,3
x,0,20,40,60
y,120,150,180,210


# All the Rest

* Complete API best viewed through docs: https://static-frame.readthedocs.io/en/latest/api_overview/frame.html
* Can summarize groupings to help see similarities and differences from Pandas

# All the Rest: NumPy-Style Interfaces

* StaticFrame supports common NumPy interfaces and methods (Same as Pandas)
* Attributes:
    * `sf.Frame.shape`
    * `sf.Frame.ndim`
    * `sf.Frame.size`
    * `sf.Frame.nbytes`
    * `sf.Frame.T`
* Logical operations (by axis):
    * `sf.Frame.all()`
    * `sf.Frame.any()`
* Mathematical operations (by axis):
    * `sf.Frame.sum()`
    * `sf.Frame.min()`
    * `sf.Frame.max()`
    * `sf.Frame.mean()`
    * `sf.Frame.median()`
    * `sf.Frame.std()`
    * `sf.Frame.var()`
    * `sf.Frame.prod()`
    * `sf.Frame.cumsum()`
    * `sf.Frame.cumprod()`
    

# All the Rest: Common Interfaces with Pandas
* `sf.Frame.isin()`
* `sf.Frame.head()`, `sf.Frame.tail()`
* `sf.Frame.cov()`, `sf.Frame.var()`
* `sf.Frame.clip()`
* `sf.Frame.count()`
* `sf.Frame.equals()`
* `sf.Frame.sample()`
* `sf.Frame.sort_values()`, `sf.Frame.sort_index()`

# All the Rest: Handling Duplicated Values


* Pandas: `pd.duplicated()`. `pd.DataFrame.drop_duplicates()`
* StaticFrame: `sf.Frame.duplicated()`, `sf.Frame.drop_duplicated()`, 

# All the Rest: Joins
* Pandas: `pd.DataFrame.join()` with a `how` parameter (‘left’, ‘right’, ‘outer’, ‘inner’)
* StaticFrame:
    * `sf.Frame.join_left()`
    * `sf.Frame.join_right()`
    * `sf.Frame.join_outer()`
    * `sf.Frame.join_inner()`    

# All the Rest: Ranking
* Pandas: `pd.DataFrame.rank()` with a `method` parameter of (‘average’, ‘min’, ‘max’, ‘first’, ‘dense’)
* StaticFrame:
    * `sf.Frame.rank_mean()`
    * `sf.Frame.rank_min()`
    * `sf.Frame.rank_max()`
    * `sf.Frame.rank_ordinal()`
    * `sf.Frame.rank_dense()`

# All the Rest: Pivoting
* Pivoting
    * Pandas: `pd.DataFrame.pivot()`, `pd.DataFrame.pivot_table()`
    * StaticFrame: `sf.Frame.pivot()`
* Stacking & unstacking are types of pivots
    * Pandas: `pd.DataFrame.stack()`, `pd.DataFrame.unstack()`
    * StaticFrame: `sf.Frame.pivot_stack()`, `sf.Frame.pivot_unstack()`


# Performance
* In many situations StaticFrame can lead to more efficient systems
* Code can be more efficient with memory
    * Many operations naturally reuse immutable views
    * No need for implementing defensive copies
* Pandas still outperforms StaticFrame in some areas
    * Some types of pivots
    * Joining
    * Windowing
* Focus of current development is performance
    * Profiling with `cprofile`, `pyinstrument`, `line-profiler` and `gprof2dot` (for call graph analysis)
    * C-extensions in ArrayKit, AutoMap
    

# Performance: Sample Measures

* 49 current metrics under study
* Native is StaticFrame, Reference is Pandas
* Out of 49 metrics, StaticFrame out-performs in 34
* When StaticFrame is faster, it tends to be a lot faster

#### python: 3.8.12 | numpy: 1.17.4 | pandas: 1.3.5 | static_frame: 0.8.38

|name                                                             |iterations |Native |Reference |n/r    |r/n     |win                 |
|-----------------------------------------------------------------|-----------|-------|----------|-------|--------|--------------------|
|IndexIterLabelApply.index_int                     |200.0      |0.0236 |0.0475    |0.497  |2.0121  |True   |
|IndexIterLabelApply.index_int_dtype               |200.0      |0.011  |0.0471    |0.2337 |4.2786  |True   |
|SeriesIsNa.bool_index_auto                        |10000.0    |0.0405 |0.443     |0.0915 |10.9286 |True   |
|SeriesIsNa.float_index_auto                       |10000.0    |0.0328 |0.445     |0.0738 |13.5509 |True   |
|SeriesIsNa.object_index_auto                      |10000.0    |0.7724 |0.9618    |0.803  |1.2453  |True   |
|SeriesDropNa.bool_index_auto                      |200.0      |0.0005 |0.0062    |0.0739 |13.5382 |True   |
|SeriesDropNa.bool_index_str                       |200.0      |0.0004 |0.0221    |0.0171 |58.5127 |True   |
|SeriesDropNa.float_index_auto                     |200.0      |1.055  |0.404     |2.6115 |0.3829  |False|
|SeriesDropNa.float_index_str                      |200.0      |2.3476 |1.0857    |2.1623 |0.4625  |False|
|SeriesDropNa.object_index_auto                    |200.0      |2.4041 |1.3349    |1.801  |0.5552  |False|
|SeriesDropNa.object_index_str                     |200.0      |4.0834 |2.1502    |1.8991 |0.5266  |False|
|SeriesFillNa.float_index_str                      |100.0      |0.0191 |0.0319    |0.597  |1.6751  |True   |
|SeriesFillNa.object_index_str                     |100.0      |0.7616 |0.4732    |1.6096 |0.6213  |False|
|SeriesDropDuplicated.bool_index_str               |500.0      |0.0265 |0.0352    |0.7549 |1.3247  |True   |
|SeriesDropDuplicated.float_index_str              |500.0      |0.0904 |0.062     |1.4596 |0.6851  |False|
|SeriesDropDuplicated.object_index_str             |500.0      |0.1367 |0.4813    |0.284  |3.5217  |True   |
|SeriesIterElementApply.bool_index_str             |500.0      |0.3783 |0.1443    |2.6212 |0.3815  |False|
|SeriesIterElementApply.float_index_str            |500.0      |0.3589 |0.2617    |1.3714 |0.7292  |False|
|SeriesIterElementApply.object_index_str           |500.0      |0.312  |0.2367    |1.3185 |0.7584  |False|
|FrameDropNa.float_index_auto_column               |100.0      |0.0157 |0.1131    |0.1391 |7.19    |True   |
|FrameDropNa.float_index_auto_row                  |100.0      |0.0079 |0.0751    |0.1048 |9.5454  |True   |
|FrameDropNa.float_index_str_column                |100.0      |0.0162 |0.1063    |0.1521 |6.5734  |True   |
|FrameDropNa.float_index_str_row                   |100.0      |0.0084 |0.0745    |0.1126 |8.8802  |True   |
|FrameILoc.element_index_auto                      |100000.0   |0.1948 |2.0305    |0.0959 |10.4257 |True   |
|FrameILoc.element_index_str                       |100000.0   |0.1809 |1.9821    |0.0913 |10.957  |True   |
|FrameLoc.element_index_auto                       |100000.0   |0.2879 |0.587     |0.4905 |2.0388  |True   |
|FrameLoc.element_index_str                        |100000.0   |0.4145 |0.581     |0.7134 |1.4017  |True   |
|FrameIterSeriesApply.float_index_str_column       |50.0       |2.1359 |4.3977    |0.4857 |2.059   |True   |
|FrameIterSeriesApply.float_index_str_column_dtype |50.0       |2.2175 |4.413     |0.5025 |1.9901  |True   |
|FrameIterSeriesApply.float_index_str_row          |50.0       |2.1727 |3.1413    |0.6917 |1.4458  |True   |
|FrameIterSeriesApply.float_index_str_row_dtype    |50.0       |2.0905 |3.1324    |0.6674 |1.4983  |True   |
|FrameIterSeriesApply.mixed_index_str_column       |50.0       |0.1451 |1.1056    |0.1312 |7.6211  |True   |
|FrameIterSeriesApply.mixed_index_str_column_dtype |50.0       |0.1718 |1.0997    |0.1563 |6.3991  |True   |
|FrameIterSeriesApply.mixed_index_str_row          |50.0       |2.4029 |1.958     |1.2272 |0.8148  |False|
|FrameIterSeriesApply.mixed_index_str_row_dtype    |50.0       |2.4298 |1.9693    |1.2338 |0.8105  |False|
|FrameIterGroupApply.int_index_str_double          |1000.0     |0.8679 |0.8418    |1.031  |0.9699  |False|
|FrameIterGroupApply.int_index_str_single          |1000.0     |0.4314 |0.4868    |0.8863 |1.1282  |True   |
|FrameIterGroupApply.str_index_str_double          |1000.0     |0.8398 |0.9179    |0.915  |1.0929  |True   |
|FrameIterGroupApply.str_index_str_single          |1000.0     |0.4614 |0.6049    |0.7629 |1.3109  |True   |
|Pivot.index1_columns0_data2                       |150.0      |0.336  |0.8242    |0.4076 |2.4534  |True   |
|Pivot.index1_columns1_data1                       |150.0      |1.54   |1.1434    |1.3468 |0.7425  |False|
|Pivot.index1_columns1_data3                       |150.0      |1.4955 |1.7098    |0.8747 |1.1433  |True   |
|Pivot.index2_columns0_data1                       |150.0      |1.9755 |0.9986    |1.9782 |0.5055  |False|
|FrameToParquet.write_tall_mixed_index_str         |4.0        |0.0465 |0.0406    |1.1464 |0.8723  |False|
|FrameToParquet.write_wide_mixed_index_str         |4.0        |2.0851 |2.7063    |0.7705 |1.2979  |True   |
|Group.tall_group_100                              |200.0      |3.519  |0.9934    |3.5422 |0.2823  |False|
|Group.wide_group_2                                |200.0      |1.9207 |2.7234    |0.7053 |1.4179  |True   |
|FrameFromConcat.tall_mixed_20                     |50.0       |0.333  |0.9868    |0.3375 |2.9631  |True   |
|FrameFromConcat.tall_uniform_20                   |50.0       |0.1357 |0.177     |0.7668 |1.3042  |True   |
|min                                               |           |0.0004 |0.0062    |0.0171 |0.2823  |                    |
|max                                               |           |4.0834 |4.413     |3.5422 |58.5127 |                    |
|mean                                              |           |0.8926 |1.0326    |0.8724 |4.3434  |                    |
|median                                            |           |0.3589 |0.6049    |0.7134 |1.4017  |                    |
|std                                               |           |1.0535 |1.1147    |0.7791 |8.7447  |                    |


# Conclusion

* Some applications will benefit from more-specialized DataFrame libraries
* More strict interfaces lead to more maintable code 
* An immutable data model reduces opportunities for error and permits more efficient memory usage
* Higher-order Frame containers offer an efficient alternative to representing N-dimensional data with hierarchical indices