New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] A first proposal for a Table object #199
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
""" | ||
This file defines the Table class. | ||
|
||
A Table is constructed from Array objects which are columns. Hence the | ||
data layout is columnar, and columns can be added and removed | ||
efficiently. It is also meant to provide easy filtering based on column | ||
conditions. | ||
""" | ||
|
||
from __future__ import absolute_import, division, print_function | ||
|
||
import datashape | ||
from .array import Array | ||
|
||
|
||
class Table(object): | ||
""" | ||
Table(cols, labels=None, **kwargs) | ||
|
||
Create a new Table from `cols` with optional `names`. | ||
|
||
Parameters | ||
---------- | ||
columns : tuple or list of column objects | ||
The list of column data to build the Table object. This list would | ||
typically be made of Blaze Array objects, but could also understand | ||
DyND or NumPy arrays. A list of lists or tuples is valid too, as | ||
long as they can be converted into barray objects. | ||
labels : list of strings | ||
The list of names for the columns. The names in this list have | ||
to be specified in the same order as the `cols`. If not passed, the | ||
names will be chosen as 'f0' for the first column, 'f1' for the | ||
second and so on so forth (NumPy convention). | ||
kwargs : list of parameters or dictionary | ||
Allows to pass additional arguments supported by Array | ||
constructors in case new columns need to be built. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand this. Maybe leave it out until an example shows it's needed? |
||
|
||
Notes | ||
----- | ||
Columns passed as Array objects are not be copied, so their settings | ||
will stay the same, even if you pass additional arguments. | ||
|
||
""" | ||
def __init__(self, columns, labels=None, **kwargs): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. kwargs isn't being used, definitely remove it until it is needed. |
||
arr_cols = {} | ||
for i, column in enumerate(columns): | ||
if isinstance(column, Array): | ||
arr_cols['f%d'%i] = column | ||
else: | ||
try: | ||
arr_cols['f%d'%i] = array(column) | ||
except: | ||
raise TypeError( | ||
('Constructing a blaze table directly ' | ||
'requires columns that can be converted ' | ||
'to Blaze arrays') % (type(data))) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Always producing the 'f#' names isn't a good idea, in my opinion. I think a dual representation with an array of columns indexed by integers, and a dictionary of columns indexed by label would be better. |
||
self._cols = arr_cols | ||
self.labels = labels or arr_cols.keys() | ||
|
||
|
||
@property | ||
def dshape(self): | ||
# Build a dshape out of the columns and labels | ||
return XXX | ||
|
||
@property | ||
def deferred(self): | ||
return XXX | ||
|
||
def __array__(self): | ||
import numpy as np | ||
|
||
# Expose PEP-3118 buffer interface for columns | ||
return XXX | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this interface should be excluded, the table seem sufficiently different that requiring an explicit conversion to array would be better. |
||
|
||
def __iter__(self): | ||
return XXX | ||
|
||
def __getitem__(self, key): | ||
return Array(self._data.__getitem__(key)) | ||
|
||
def __setitem__(self, key, val): | ||
self._data.__setitem__(key, val) | ||
|
||
def __len__(self): | ||
shape = self.dshape.shape | ||
if shape: | ||
return shape[0] | ||
raise IndexError('Scalar blaze arrays have no length') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tables always look "2D", so no need to check for a scalar case. |
||
|
||
def __str__(self): | ||
if hasattr(self._data, '_printer'): | ||
return XXX | ||
return XXX | ||
|
||
def __repr__(self): | ||
if hasattr(self._data, "_printer_repr"): | ||
return XXX | ||
return XXX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be better described as "objects convertible to Blaze Arrays"