New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map/apply support in client/server #1504

Merged
merged 19 commits into from May 18, 2016

Conversation

Projects
None yet
3 participants
@kwmsmith
Member

kwmsmith commented May 16, 2016

This picks up PR #1497 from @sandhujasmine:

  • creates the *_trusted variants for SerializationFormat instances.
  • Tests updated.
  • object_hook.py, json_dumps.py, and their _trusted variants created as well.
  • all _trusted dispatching is implemented as a superset of their underlying untrusted counterparts to keep it DRY.

kwmsmith and others added some commits Apr 27, 2016

Add serialization for builtins. Tests in PR 1493 pass
- added json_dumps and object_hook for python builtin functions
- data_loads is not consistent for all serializations. For fastmsgpack,
  it uses pandas to decode to a Series; while other methods decode to a
  list.
TST: Separate 'map' test for fastmsgpack since assertion is different
- add most_formats frozen set to Serialization which includes all
  Serialization types except fastmsgpack. 'fastmsgpack' serializes/loads
  to a Series object as opposed to a list.

- break apart test_map_client_server() so all Serialization types that
  return a list are tested in one function. 'fastmsgpack' Serialzation
  is tested in test_map_client_server_fastmsgpack() since its assertion
  has to be for all() elements.
Serialize pandas/numpy functions for map operation
- checks the function given to map is pandas or numpy function. Raises
  NotImplementedError otherwise

- added simple tests for mapping numpy and pandas functions
Raise standard HTTP (403, 501) exceptions
- raise http exceptions if incoming json contains 'eval' or 'exec' which
  we don't support executing on server (ie 503 Forbidden)
- similarly if json contains a function not in pandas/numpy namespace,
  raise HTTP 501 Not Implemented exception

- don't limit what json_dumps can do since that is used for testing
- added tests but marked as xfail since pickle serialization is
  currently not raising these exceptions - need to ask/update
Trusted object_hook and json_dumps implementations.
Create `*_trusted` versions of standard `SerializationFormat`s as well.

@kwmsmith kwmsmith added this to the 0.11 milestone May 16, 2016

@kwmsmith

This comment has been minimized.

Member

kwmsmith commented May 17, 2016

@llllllllll @sandhujasmine you mind taking a once-over?

@llllllllll

This comment has been minimized.

Member

llllllllll commented May 17, 2016

I can take a look a little later today

from .json_dumps import json_dumps
json_dumps_trusted_ns = dict()

This comment has been minimized.

@llllllllll

llllllllll May 17, 2016

Member

instead of the loop below this might be more clear as json_dumps_trusted_ns = jsun_dumps.funcs.copy()

This comment has been minimized.

@kwmsmith

kwmsmith May 17, 2016

Member

Yes, I tried that initially, but the dispatching doesn't work in that case.

This comment has been minimized.

@llllllllll

llllllllll May 17, 2016

Member

how so, iterating over the dict and updating each k/v pair should be functionally equivalent? At the end of this module would the func maps be different?

This comment has been minimized.

@kwmsmith

kwmsmith May 18, 2016

Member

Using json_dumps.funcs.copy() still retains the json_dumps function name, which ultimately triggers an exception in multipledispatch's inner dispatching logic; I have to do json_dumps_trusted.register(types)(func) to ensure the json_dumps_trusted function name is updated.

json_dumps_trusted_ns = dict()
dispatch = partial(dispatch, namespace=json_dumps_trusted_ns)
@dispatch(types.BuiltinFunctionType)

This comment has been minimized.

@llllllllll

llllllllll May 17, 2016

Member

why is this different from the callable case?

This comment has been minimized.

@kwmsmith

kwmsmith May 17, 2016

Member

It isn't, I removed this case and collapsed it into the Callable dispatch.

from datashape import dshape
import numpy as np
# these are used throughout blaze, don't remove them

This comment has been minimized.

@llllllllll

llllllllll May 17, 2016

Member

this comment is for utils

for types, func in json_dumps.funcs.items():
json_dumps_trusted.register(types)(func)
print(json_dumps_trusted.funcs)

This comment has been minimized.

@llllllllll

llllllllll May 17, 2016

Member

dev cruft

del _converters_trusted
@object_hook_trusted.register('builtin_function')

This comment has been minimized.

@llllllllll

llllllllll May 17, 2016

Member

this will miss any C(ython) functions written in nump/pandas.

In [1]: type(np.array)
Out[1]: builtin_function_or_method
)
most_formats = frozenset(g for _, g in globals().items()

This comment has been minimized.

@llllllllll

llllllllll May 17, 2016

Member

can we give this a more descriptive name? what would the user want to use this set for? It might also be easier to write this as a set-comprehension over the all_formats

@object_hook_trusted.register('builtin_function')
def builtins_function_from_str(f):
if f in ("eval", "exec"):

This comment has been minimized.

@llllllllll

llllllllll May 17, 2016

Member

if the user is trusted should we bother blocking these?

@llllllllll

This comment has been minimized.

Member

llllllllll commented May 17, 2016

Can we move the serialization code to a new package: blaze.server.serialization? I think this makes it easier to navigate to this code.

from .dispatch import dispatch
from functools import partial, wraps

This comment has been minimized.

@sandhujasmine

sandhujasmine May 18, 2016

Contributor

wraps appears to be an used import

@@ -9,7 +9,6 @@
from datashape import dshape
import numpy as np
# these are used throughout blaze, don't remove them
import pandas as pd

This comment has been minimized.

@sandhujasmine

sandhujasmine May 18, 2016

Contributor

Unused imports below builtins, reduce

Question: So object_hook no longer supports map/apply - Is that right? Looks like the json for following objects correctly converts them back to python objects but these wouldn't be used by map/apply

In [2]: object_hook._converters
Out[2]: 
{'__!bytes': <function blaze.object_hook._read_bytes>,
 '__!datashape': <function datashape.util.dshape>,
 '__!datetime': pandas.tslib.Timestamp,
 '__!frozenset': frozenset,
 '__!mono': <function blaze.object_hook._read_mono>,
 '__!timedelta': <function blaze.object_hook._read_timedelta>}
@kwmsmith

This comment has been minimized.

Member

kwmsmith commented May 18, 2016

Can we move the serialization code to a new package: blaze.server.serialization? I think this makes it easier to navigate to this code.

Good idea -- done.

@kwmsmith kwmsmith changed the title from WIP: map/apply support in client/server to map/apply support in client/server May 18, 2016

@kwmsmith kwmsmith removed the wip label May 18, 2016

@kwmsmith

This comment has been minimized.

Member

kwmsmith commented May 18, 2016

@llllllllll @sandhujasmine I think this one is ready to merge.

@llllllllll

This comment has been minimized.

Member

llllllllll commented May 18, 2016

lgtm. Thanks!

@kwmsmith kwmsmith merged commit 85dbb05 into blaze:master May 18, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.1%) to 89.26%
Details

@kwmsmith kwmsmith deleted the kwmsmith:map-apply-client-server branch May 18, 2016

@kwmsmith kwmsmith modified the milestones: 0.11, 0.10.2 Jun 8, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment