-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add __format__ method to Interval (improvement) #76
Comments
Yes, but I'm pretty sure we can figure out something. For the record, my first motivation to subclass Antoine |
If we can parse format specifier then we could try to apply it to infinities as well. But I don't see how currently :-) help is welcome on this. |
Not sure there is a unique / perfect solution. Probably some design choice has to be made. Not sure either that it is mandatory to try to implement all formating directives (e.g. alignment). I see them as indications subject to best effort, when it make sense, that can even be dropped. For instance if a string is to long for a length indication, it is not truncated. Another example is datetime formating that simply doesn't support padding or alignment directives. So, how the My motivation with datetime is that having (especially non atomic) intervals with the default datetime representation is just unreadable. def __format__(self, spec):
return P.to_string(self, lambda v: ('{:'+spec+'}').format(v)) The previous proposal is simple and fullfills this purpose. If one wants to deal with aligments in a consistent way, one must find a way to split the
where For instance: def __format__(self, spec):
# FIXME: handle null spec
# FIXME: handle exception because __format__ might not be defined for v
parts = spec[1:].split(spec[0])
pinf = '+inf'.__format__(parts[1]) if len(parts) > 1 else '+pinf'
ninf = '-inf'.__format__(parts[1]) if len(parts) > 1 else '-pinf'
result = P.to_string(self,
lambda v: v.__format__(parts[0]),
pinf=pinf,
ninf=ninf)
if len(parts) > 2:
result = result.__format__(parts[2])
return result Here, the separator is chosen by the user (first character after the semicolon). Another choice would be to use a predefined separator in which case it should not appear after the first semicolon. This would be slightly more readable, The drawback is that it can not be part of the spec for sub parts. Best regards, Antoine |
I don't really like the idea of using a separator. Formatting a string is already something sometimes complicated (because one has to remember the specifiers) and if we're going to support I like the simplicity of your first proposal:
What would be nice is to be allow to extract all parts of |
This could be a starting point: |
Yes, but it seems to be impossible to have a spec that could apply at the same time to strings (infinities) and objects (of unknown type) in the interval. See I can' say I like the separator idea... quick and dirty solution. I might use the separator in my subclasses as a convenience (I didn't so far). Maybe the reasonable choice is to do nothing (as you did) or simply provide the first proposal I made (with a warning about the limits en the doc). Best regards, Antoine |
I'll have a look at how |
I'm affraid there is no way to have a universal specifier, as specifiers (and how they are interpreted) depends on objet they are used with. Of course Anyway the idea of separator that came up because of this thread will solve (even if not perfect) some cases I have in a much more compact way. I will implement it in some of my subclasses of intervals! Best regards. |
Ideed, but the PEP defines a kind of "minimal set of specifiers" ( Notice that I discussed this morning with another Pythonista, and he told me something interesting: none of the object containers in Python (e.g. list, dict, etc.) actually supports defining the format of nested objects through |
Isn't the PEP you're referring to the one that introduced Formater class and already is in the language (python 2.6 and 3.0)? Since meaning of format specifier depends on the object it is applied to, I really can't see how a unique specifier could be processed in order to apply some (relevant) parts of it to infinities (str), and other parts to be applied to unknown objects. Consider some align requirements and intervals with datetimes: how is it possible to guess that you must delete the align spec when applying to a value, and keep for infinities. You can only do it if you have the knowledge of what format specifier is supported by non infinities. Here is a refined idea with separator, taking advantage of
Simple, powerfull enough for most use cases... Much more compact and handy than using Some examples: >>> i=closed(1,2)
>>> j=closed(1,inf)
>>> k=closed(datetime(2000,1,1), datetime.now())
>>> i
[1,2]
>>> j
[1,+inf)
>>> k
[datetime.datetime(2000, 1, 1, 0, 0),datetime.datetime(2023, 1, 30, 17, 55, 55, 608774)]
>>> f"{i:08d}"
'[00000001,00000002]'
>>> f"{j:08d}"
'[00000001,+inf)'
>>> f"{i:#/08d//^30}"
' [00000001,00000002] '
>>> f"{j:#/08d/^8/^30}"
' [00000001, +inf ) '
>>> f"{k:#/ %Y-%m-%d //^30}"
' [ 2000-01-01 , 2023-01-30 ] '
>>> f"{i:#/#010x}"
'[0x00000001,0x00000002]' Here is the code that would do it (not heavily tested). Anybody shall feel free to reuse it: class I(P.Interval):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Can owerwrite on a per class (or even instance) basis
DISJ_STR = " | "
SEP_STR = ","
LEFT_OPEN_STR = "("
LEFT_CLOSED_STR = "["
RIGHT_OPEN_STR = ")"
RIGHT_CLOSED_STR = "]"
PINF_STR = "+inf" # "+oo"
NINF_STR = "-inf" # "-oo"
EMPTY_STR = "()"
DEFAULT_CONV = str
def to_str(self,
conv: Callable = repr,
*,
disj: Optional[str] = None,
sep: Optional[str] = None,
left_open: Optional[str] = None,
left_closed: Optional[str] = None,
right_open: Optional[str] = None,
right_closed: Optional[str] = None,
pinf: Optional[str] = None,
ninf: Optional[str] = None,
empty: Optional[str] = None) -> str:
"""
"""
left_open = self.LEFT_OPEN_STR if left_open is None else left_open
right_open = self.RIGHT_OPEN_STR if right_open is None else right_open
if self.empty:
return (empty if empty is not None
else self.EMPTY_STR if self.EMPTY_STR
else left_open + right_open)
left_closed = self.LEFT_CLOSED_STR if left_closed is None else left_closed
right_closed = self.RIGHT_CLOSED_STR if right_closed is None else right_closed
sep = self.SEP_STR if sep is None else sep
pinf = self.PINF_STR if pinf is None else pinf
ninf = self.NINF_STR if ninf is None else ninf
result = []
for i in self._intervals:
if i.lower == i.upper:
result.append(left_closed + conv(i.lower) + right_closed)
else:
result.append(
(left_open if i.left == P.OPEN else left_closed)
+ (ninf if i.lower == -P.inf else conv(i.lower))
+ sep
+ (pinf if i.upper == P.inf else conv(i.upper))
+ (right_open if i.right == P.OPEN else right_closed)
)
return (disj if disj is not None else self.DISJ_STR).join(result)
def __format__(self, spec: str) -> str:
"""
Default object formatter.
The format specifier is applied to lower and upper bound values,
provided they are not infinities. Since their is not known controled
by the Interval class, it is up to the user to provide a compatible
format specifier.
An alternate specifier form (starting with '#') can be used in order
to control how infinities and the overall result are formated.
The '#' opening char must be followed by a (user chosen) separator.
Every string between separators is then a format specifier that apply
to the lower and upper bound values (first specifier), infinities
(second specifier applicable to str) and the overall result (third
specifier, applicable to str). Second and third specifiers are
optional.
Examples:
>>> i=closed(1,2)
>>> i
[1,2]
>>> j=closed(1,inf)
>>> j
[1,+inf)
>>> k=closed(datetime(2000,1,1), datetime.now())
>>> k
[datetime.datetime(2000, 1, 1, 0, 0),datetime.datetime(2023, 1, 30,
17, 55, 55, 608774)]
>>> f"{i:08d}"
'[00000001,00000002]'
>>> f"{j:08d}"
'[00000001,+inf)'
>>> f"{i:#/08d/^8/^30}"
' [00000001,00000002] '
>>> f"{j:#/08d/^8/^30}"
' [00000001, +inf ) '
>>> f"{k:#/ %Y-%m-%d //^30}"
' [ 2000-01-01 , 2023-01-30 ] '
>>> f"{i:#/#010x}"
'[0x00000001,0x00000002]'
"""
pinf = self.PINF_STR
ninf = self.NINF_STR
# Format specifier applies to values, but not to infinities
if not spec.startswith('#'):
spec = '{:'+spec+'}'
return self.to_str(conv = lambda v: spec.format(v))
if len(spec) == 1:
raise ValueError("Invalid format specifier (missing separator)")
# Alternate form: multiple format specifier that apply to values,
# infinities and the whole result. First char after '#' is a separator
# used to split spec into sub specs. For instance '#/8.2f/^8'
subspec = spec[2:].split(spec[1])
if subspec[0]:
if subspec[0][0] == '!':
value_fmt = '{'+subspec[0]+'}'
else:
value_fmt = '{:'+subspec[0]+'}'
conv = lambda v: value_fmt.format(v)
else:
conv = repr
if len(subspec) > 1 and subspec[1]:
pinf = ('{:'+subspec[1]+'}').format(pinf)
ninf = ('{:'+subspec[1]+'}').format(ninf)
result = self.to_str(conv=conv, ninf=ninf, pinf=pinf)
if len(subspec) > 2 and subspec[2]:
return ('{:'+subspec[2]+'}').format(result)
return result
def __str__(self):
return self.to_str(conv=self.DEFAULT_CONV)
def __repr__(self):
# EQUIV TO return self.to_str(conv=repr) but without flexibility
# and all constants hardcoded. Faster.
if self.empty:
return "()"
return " | ".join(
"["+repr(i.lower)+"]" if i.lower == i.upper
else (("(" if i.left == P.OPEN else "[")
+ repr(i.lower)
+ ","
+ repr(i.upper)
+ (")" if i.right == P.OPEN else "]"))
for i in self._intervals
) |
And a few speed tests... # Create Interval with a given number of atomic intervals
def make_interval(cls=P.Interval, n=10):
return cls(*(P.closed(k, k+1) for k in range(0, 2*n, 2)))
# Original class
i0=make_interval(n=0)
i10=make_interval(n=10)
i100=make_interval(n=100)
i1000=make_interval(n=1000)
i10000=make_interval(n=10000)
# Derived class
j0=make_interval(cls=I,n=0)
j10=make_interval(cls=I,n=10)
j100=make_interval(cls=I,n=100)
j1000=make_interval(cls=I,n=1000)
j10000=make_interval(cls=I,n=10000) Comparing >>> %timeit repr(i0)
... 192 ns ± 1.45 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> %timeit repr(j0)
... 189 ns ± 3.33 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit repr(i10)
... 8.31 µs ± 126 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>>> %timeit repr(j10)
... 6.07 µs ± 273 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>>> %timeit repr(i100)
... 77.7 µs ± 519 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit repr(j100)
... 53 µs ± 311 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit repr(i10000)
... 8.07 ms ± 496 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit repr(j10000)
... 5.4 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Comparing >>> %timeit P.to_string(i0)
... 560 ns ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit P.to_string(i10)
... 55.8 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit P.to_string(i100)
... 535 µs ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>>> %timeit P.to_string(i1000)
... 5.4 ms ± 155 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit P.to_string(i10000)
... 57.8 ms ± 4.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit j0.to_str()
... 418 ns ± 3.87 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit j10.to_str()
... 10.8 µs ± 471 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>>> %timeit j100.to_str()
... 98.9 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit j1000.to_str()
... 1.01 ms ± 25 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>>> %timeit j10000.to_str()
... 9.82 ms ± 132 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Best regards, Antoine |
Thanks for sharing the snippets and for these benchmarks. That's interesting, even if I'm not surprised that calling a method is faster than calling the You said:
I agree :) My point was rather to try to extract from a specifier what can be applied to infinities, and to use it as-is for the object. This is: use the specifier as-is for the bounds that are not infinities, and extract what can be applied to infinities (e.g. padding, alignment, etc.) and apply it on infinities. Anyway, given the potential confusion that could arise because of I've added the "information" label to this issue since I believe it will be helpful for people looking something similar. I'll definitely link to this issue in the future "Recipes" section of the documentation :-) |
I've changed our |
Hi Alexandre, Glad you found it usefull. Something puzzles me however. You find it usefull to get rid of unnecessary But you discard as almost irrelevant a 7 times (!) speedud just saying it was expected. Of course it was expected, but 7 times speedup...! where there is absolutely no benefit for the user to stick to the "functionnal style" (simply have Best regards, Antoine |
Hi, The speed-up for You said "interesting case of confirmation bias". I don't know if that's the case or not, but I always try to weigh the pros and cons of every decision I make about |
Sure, Alexandre: your code, your design, your choices, your decisions... And I would even say, and I truly mean it (trust me!), the best decisions overall are the one you make :-) But yet... you have 25 lines of code in io.py to implement Now as a library user, the implicit contract is that a call to Now, it is obvious that the difference between the 2 is that one of them involves many unnecessary object instanciations (hence being far more slower) while the other do not. And that any two of them could be a call to the other one (or, better, they both should call a unique method). So, there is not need to change the API. And the speedup also comes for free. Having said that, I don't see any benefit to keep 2 implementations, and to keep the slowest one. There is probably one and I am missing something. But what? This is just curiosity. And still, your code, your decision! And thanks for it! Best regards, Antoine |
Hi Alexandre,
I would suggest to add a
__format__
method to Interval, such as (or something equivalent, idea is easy to grasp):Best regards,
Antoine
The text was updated successfully, but these errors were encountered: