Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] A first proposal for a Table object #199

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
99 changes: 99 additions & 0 deletions blaze/objects/table.py
@@ -0,0 +1,99 @@
"""
This file defines the Table class.

A Table is constructed from Array objects which are columns. Hence the
data layout is columnar, and columns can be added and removed
efficiently. It is also meant to provide easy filtering based on column
conditions.
"""

from __future__ import absolute_import, division, print_function

import datashape
from .array import Array


class Table(object):
"""
Table(cols, labels=None, **kwargs)

Create a new Table from `cols` with optional `names`.

Parameters
----------
columns : tuple or list of column objects
The list of column data to build the Table object. This list would
typically be made of Blaze Array objects, but could also understand
DyND or NumPy arrays. A list of lists or tuples is valid too, as
long as they can be converted into barray objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be better described as "objects convertible to Blaze Arrays"

labels : list of strings
The list of names for the columns. The names in this list have
to be specified in the same order as the `cols`. If not passed, the
names will be chosen as 'f0' for the first column, 'f1' for the
second and so on so forth (NumPy convention).
kwargs : list of parameters or dictionary
Allows to pass additional arguments supported by Array
constructors in case new columns need to be built.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Maybe leave it out until an example shows it's needed?


Notes
-----
Columns passed as Array objects are not be copied, so their settings
will stay the same, even if you pass additional arguments.

"""
def __init__(self, columns, labels=None, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kwargs isn't being used, definitely remove it until it is needed.

arr_cols = {}
for i, column in enumerate(columns):
if isinstance(column, Array):
arr_cols['f%d'%i] = column
else:
try:
arr_cols['f%d'%i] = array(column)
except:
raise TypeError(
('Constructing a blaze table directly '
'requires columns that can be converted '
'to Blaze arrays') % (type(data)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always producing the 'f#' names isn't a good idea, in my opinion. I think a dual representation with an array of columns indexed by integers, and a dictionary of columns indexed by label would be better.

self._cols = arr_cols
self.labels = labels or arr_cols.keys()


@property
def dshape(self):
# Build a dshape out of the columns and labels
return XXX

@property
def deferred(self):
return XXX

def __array__(self):
import numpy as np

# Expose PEP-3118 buffer interface for columns
return XXX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this interface should be excluded, the table seem sufficiently different that requiring an explicit conversion to array would be better.


def __iter__(self):
return XXX

def __getitem__(self, key):
return Array(self._data.__getitem__(key))

def __setitem__(self, key, val):
self._data.__setitem__(key, val)

def __len__(self):
shape = self.dshape.shape
if shape:
return shape[0]
raise IndexError('Scalar blaze arrays have no length')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tables always look "2D", so no need to check for a scalar case.


def __str__(self):
if hasattr(self._data, '_printer'):
return XXX
return XXX

def __repr__(self):
if hasattr(self._data, "_printer_repr"):
return XXX
return XXX