-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Path towards better repr behavior #1395
Comments
I will dump some thoughts I have here: For the repr of expressions, I think the minimum should be the type and the dshape. The dshape is what tells me what methods and functions I can call on this. Just knowing def __repr__(self):
name = self._name
return '<%s %s:: %.200s>' % (
type(self).__name__,
name + ' ' if name is not None else '',
self.dshape,
) This generates reprs like: In [1]: s = bz.symbol('s', 'var * {a: int32, b: float64}')
In [2]: s
Out[2]: <Symbol s :: var * {a: int32, b: float64}>
In [3]: s.a
Out[3]: <Field a :: var * int32>
In [4]: s.a + 1
Out[4]: <Add a :: var * int64>
In [5]: s.a + 1 - 2
Out[5]: <Sub a :: var * int64>
In [6]: (s.a + 1).mean()
Out[6]: <mean a_mean :: float64>
In [7]: (s.a + 1).mean() + 1
Out[7]: <Add a_mean :: float64>
In [8]: bz.Data([1, 2, 3])
Out[8]: <InteractiveSymbol :: 3 * int64> I like this because it won't grow with the size of the expression, and we already have the I don't love the idea of having the global setting to turn on reprs. How would you feel about making it default to I think we should have a In [1]: ds = bz.Data([1, 2, 3])
In [2]: ds.min.preview()
Out[2]: 1
In [3]: _2 + 5
Out[3]: 6 Also noting that we will need to remove all of these: # in blaze.interactive
Expr.__repr__ = expr_repr
Expr._repr_html_ = lambda x: to_html(x)
Expr.__len__ = table_length
Expr.__array__ = intonumpy
Expr.__int__ = lambda x: convert_base(int, x)
Expr.__float__ = lambda x: convert_base(float, x)
Expr.__complex__ = lambda x: convert_base(complex, x)
Expr.__bool__ = lambda x: convert_base(bool, x)
Expr.__nonzero__ = lambda x: convert_base(bool, x)
Expr.__iter__ = into(Iterator) |
Agreed that the dshape is useful when small, and I agree with you if we can find out something to do when the dshape is large. By "large", I've seen usecases where there are several joins and the resulting dshape has a dozen or more fields. How do you propose to handle things in that case? I'd be in favor of not including the dshape in the repr after some (arbitrary) size cutoff. Perhaps the cutoff could be based on whether the dshape can fit comfortably on one line, but I'm interested to know your thoughts here. I'm -1 on multi-line reprs, regardless of what's being repr'd, especially since the repr of an object is used in many contexts outside the REPL. I like your
One question I have is with unnamed expressions like
Or similar. For your other examples, like
What if we're combining multiple symbols, like:
I also have to think about the Again, my preference in these cases would be to take the simpler road of something like:
I'm with you -- I don't want global state in Blaze if we can avoid it. The global setting was my attempt to provide user control of the repr behavior that's backwards compatible. This flag would be there for the full 0.10 release and then, in 0.11, we'd remove the setting altogether and switch over to the new repr behavior entirely. I admit this is heavyweight, but it's provided as a courtesy to end users. If you think that we should just switch over to the new repr behavior entirely, then I'm willing to give that thought.
Yes perhaps; either way I'd want the global setting to be around only for one minor release cycle, and after that we'd have the new behavior only.
Yes, I'm willing to do that.
With a warning and calling it out clearly in the documentation that "before blaze version 0.10, repr behaved like ...", I think we'll have done due diligence.
+1. What about the case when calling |
I think preview should just be listed as one of three options for explicit computations.
The original format I have for dshape cuts it off after it get's too long with
The |
Thanks for pointing that out -- I missed that when glancing at your code originally. I agree that if we include the dshape, we need some sort of summarization for dshapes once they are beyond a certain size. My preference would be for the summarized dshape to include the top-most levels, while the nested levels are collapsed with ellipses or something: # original full dshape
var * {foobar: var * {col0: int32, col1: float, col3: string}, bar: int32, foo: (string, int, string) # First level of summaraization
var * {foobar: var * {...}, bar: int32, foo: (...)} # Second level of summaraization
var * {foobar: ..., bar: ..., foo: (...)} Etc. That way the user can see the top-level field names and is able to explore to the next level down. |
I'd make the case that taking the It also breaks down as soon as you combine symbols: In [11]: x = bz.symbol('x', 'int')
In [12]: y = bz.symbol('y', 'int')
In [13]: (x + 1)._name
Out[13]: 'x'
In [14]: print (x + 1)._name
x
In [15]: print (x + y)._name
None Automatically assigning meaningful names to expressions is a hard problem, probably intractable. We can always generate names like we do automatically with interactive symbols, but since that's an internal thing, there's more opportunity to confuse rather than help users. We're starting to talk about how to allow users to give names to expressions explicitly -- in that case, using the user-provided expression name would be great. Thoughts? |
Closed by #1414. The deprecation introduced in that PR needs to be fully removed in version 0.11. |
There's widespread agreement regarding removing the implicit computation behavior in blaze when
repr
ing interactive expressions.Related to #1304 and #1356.
This thread is to discuss how to get there, given that this is a significant API change to Blaze.
One proposal:
blaze.evaluate_in_repr
flag (open to better names for the flag) that'sTrue
by default to preserve current behavior. If set toFalse
, then Blaze's interactive expression repr does not evaluate or compute anything, instead returning an more standard representation.compute()
.<blaze.expr.arithmetic.Add object at 0xdeadbeef>
style reprs.str(expr)
produces.I favor a simple repr, for the following reasons:
preview()
function / method could be provided that does a pandas-likecompute(expr.head(10))
The text was updated successfully, but these errors were encountered: