Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonicalize blaze server dshape formatting. #1361

Merged
merged 2 commits into from
Dec 16, 2015

Conversation

kwmsmith
Copy link
Member

Non-Python clients (like JS) can get tripped up with different
formatting in dshapes.

For instance, small record dshapes, like:

var * {a: int32, b: float64}

are stringified by default on one line, but longer dshapes, like:

var * {
    sepal_length: float64,
    sepal_width: float64,
    petal_length: float64,
    petal_width: float64,
    species: ?string
    }

are stringified onto several lines.

This commit ensures that all datashapes are stringified in a consistent
way, regardless of the number of components in the dshape or the
dshape's overall string length. Again, this is motivated by non-Python
clients that expect a consistent formatting for dshapes.

This has little effect for Python clients, as they can simply re-hydrate
the dshape on the client side and interact with the DataShape object
API.

The server tests were modified to not compare the string form of dshapes
directly; instead, they use datashape.util.testing.assert_dshape_equal().

The long-term solution to this is to separate the serialization of
dshapes from their string representation. This would involve defining
something like to_tree() and from_tree() for DataShape objects, and
allow clients to represent the serialized dshape contents however it
chooses.

Non-Python clients (like JS) can get tripped up with different
formatting in dshapes.

For instance, small record dshapes, like:

    var * {a: int32, b: float64}

are stringified by default on one line, but longer dshapes, like:

    var * {
        sepal_length: float64,
        sepal_width: float64,
        petal_length: float64,
        petal_width: float64,
        species: ?string
        }

are stringified onto several lines.

This commit ensures that all datashapes are stringified in a consistent
way, regardless of the number of components in the dshape or the
dshape's overall string length.  Again, this is motivated by non-Python
clients that expect a consistent formatting for dshapes.

This has little effect for Python clients, as they can simply re-hydrate
the dshape on the client side and interact with the DataShape object
API.

The server tests were modified to not compare the string form of dshapes
directly; instead, they use `datashape.util.testing.assert_dshape_equal()`.

The long-term solution to this is to separate the serialization of
dshapes from their string representation.  This would involve defining
something like `to_tree()` and `from_tree()` for DataShape objects, and
allow clients to represent the serialized dshape contents however it
chooses.
@llllllllll
Copy link
Member

I feel like the string representation of the dshape is the best serialization format. Maybe what we need is a javascript implementation of the datashape parser. I am okay with this change for now though.

@kwmsmith
Copy link
Member Author

I am okay with this change for now though.

OK, merging.

I feel like the string representation of the dshape is the best serialization format. Maybe what we need is a javascript implementation of the datashape parser.

See blaze/datashape#204.

kwmsmith added a commit that referenced this pull request Dec 16, 2015
Canonicalize blaze server dshape formatting.
@kwmsmith kwmsmith merged commit 3b30902 into blaze:master Dec 16, 2015
@kwmsmith kwmsmith deleted the bugfix/server-dshape-formatting branch December 16, 2015 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants