New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`compute` API changes #1417

Merged
merged 25 commits into from Feb 24, 2016

Conversation

Projects
None yet
3 participants
@kwmsmith
Member

kwmsmith commented Feb 15, 2016

@postelrich @llllllllll here's my proof-of-concept for the compute() API changes. It's messy, and some of the imports will have to change, and we'll need some small refactoring, but I like that it doesn't require huge code changes, and fits in with the existing compute implementation easily.

Example usage:

In [26]: s = bz.symbol('s', 'var * {name: string, id: int}')

In [27]: t = bz.symbol('t', 'var * {name: string, a: int}')

In [28]: sdata = [['a', 1], ['b', 2]]

In [29]: tdata = [['a', 5], ['c', 7]]

The default gives back a list as one would expect:

In [30]: bz.compute(bz.join(s, t, 'name', how='outer'), {s: sdata, t: tdata})
Out[30]: [('a', 1, 5), ('c', None, 7), ('b', 2, None)]

And the return type can be easily overridden:

In [31]: bz.compute(bz.join(s, t, 'name', how='outer'), {s: sdata, t: tdata}, return_type=pd.DataFrame)
Out[31]:
  name  id   a
0    a   1   5
1    c NaN   7
2    b   2 NaN

In [32]: bz.compute(bz.join(s, t, 'name', how='outer'), {s: sdata, t: tdata}, return_type=np.ndarray)
Out[32]:
array([('a', 1.0, 5.0), ('c', nan, 7.0), ('b', 2.0, nan)],
      dtype=[('name', 'O'), ('id', '<f4'), ('a', '<f4')])

Richard Postelnik and others added some commits Feb 11, 2016

POC: proposed API for top-level `compute`
Adds `return_type` kwarg that can be set to:
* 'core' (current default) -- odos return value into a core backend
* 'native' -- leaves return value alone, doesn't modify at all
* user supplied type -- passed on to internal `odo` call.

@kwmsmith kwmsmith added the api design label Feb 15, 2016

@kwmsmith kwmsmith added this to the 0.10 milestone Feb 15, 2016

@@ -45,10 +45,13 @@

This comment has been minimized.

@llllllllll

llllllllll Feb 15, 2016

Member

do we need the changes in here?

result = into(pd.DataFrame, result, dshape=expr.dshape)
# TODO: detect single-column case, and odo into a pd.Series.
elif isinstance(return_type, type):
from blaze import into

This comment has been minimized.

@llllllllll

llllllllll Feb 15, 2016

Member

can we move these imports to module scope?

@@ -407,6 +407,29 @@ def compute(expr, d, **kwargs):
if post_compute_:
result = post_compute_(expr3, result, scope=d4)
if return_type == 'native':
# NOP, but we check for this case anyway.
result = result

This comment has been minimized.

@llllllllll

llllllllll Feb 15, 2016

Member

this should just be pass, or the very last else could be elif return_type != 'native'

@kwmsmith

This comment has been minimized.

Member

kwmsmith commented Feb 15, 2016

@llllllllll agreed on your comments, and if we go with this approach, we'll definitely make those improvements. This is just a quick and dirty POC to demonstrate the API I'm thinking of, and how it can be implemented without a major rename or big refactoring.

What are your thoughts on the API?

@llllllllll

This comment has been minimized.

Member

llllllllll commented Feb 15, 2016

I like the API. I would like to be able to register new return types like discussed in the other issue but for a first pass this looks great.

@sandhujasmine

This comment has been minimized.

Contributor

sandhujasmine commented Feb 16, 2016

+1, it looks good.

  • returning results is a more intuitive response from compute()
  • list output by default makes sense for SQLA backends
  • being able to override output return_type is a good idea

@kwmsmith kwmsmith changed the title from POC `compute` API changes to `compute` API changes Feb 23, 2016

@kwmsmith

This comment has been minimized.

Member

kwmsmith commented Feb 23, 2016

@llllllllll will be merging this in the next day or so, last call.

@llllllllll

This comment has been minimized.

Member

llllllllll commented Feb 24, 2016

No other comments, thanks!

kwmsmith added a commit that referenced this pull request Feb 24, 2016

@kwmsmith kwmsmith merged commit bd4bd78 into blaze:master Feb 24, 2016

1 of 2 checks passed

coverage/coveralls Coverage decreased (-0.02%) to 89.923%
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@kwmsmith kwmsmith deleted the kwmsmith:feature/compute-api branch Feb 24, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment