Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very poor performance with Angle.__str__ and Angle._repr_latex_ #6995

Closed
astrofrog opened this issue Dec 15, 2017 · 7 comments · Fixed by #7004
Closed

Very poor performance with Angle.__str__ and Angle._repr_latex_ #6995

astrofrog opened this issue Dec 15, 2017 · 7 comments · Fixed by #7004

Comments

@astrofrog
Copy link
Member

astrofrog commented Dec 15, 2017

The following illustrates very poor performance with Angle.__str__ and Angle._repr_latex_ with arrays:

In [15]: a = Angle(np.ones(1000000), u.deg)

In [16]: %time a.__repr__()
CPU times: user 324 µs, sys: 23 µs, total: 347 µs
Wall time: 332 µs
Out[16]: '<Angle [ 1., 1., 1.,...,  1., 1., 1.] deg>'

In [17]: %time a.__str__()
CPU times: user 22.4 s, sys: 341 ms, total: 22.8 s
Wall time: 23.4 s
Out[17]: "['1d00m00s' '1d00m00s' '1d00m00s' ..., '1d00m00s' '1d00m00s' '1d00m00s']"

In [18]: %time a._repr_latex_()
CPU times: user 22.6 s, sys: 436 ms, total: 23.1 s
Wall time: 23.7 s
Out[18]: '[$1^\\circ00{}^\\prime00{}^{\\prime\\prime}$\n $1^\\circ00{}^\\prime00{}^{\\prime\\prime}$\n $1^\\circ00{}^\\prime00{}^{\\prime\\prime}$ ...,\n $1^\\circ00{}^\\prime00{}^{\\prime\\prime}$\n $1^\\circ00{}^\\prime00{}^{\\prime\\prime}$\n $1^\\circ00{}^\\prime00{}^{\\prime\\prime}$]'

This poor performance is unnecessary since only a few items are returned in the strings, and the rest is ...

This is not an issue with Quantity or Distance, but does percolate up to e.g. SkyCoord I think.

@mhvk
Copy link
Contributor

mhvk commented Dec 15, 2017

It seems clear that it is representing all the items -- the solution likely is to pass on our own formatter to arrayprint - then the numpy machinery will take care of calling it only for those elements that are actually shown.

@vermashresth
Copy link
Contributor

vermashresth commented Dec 16, 2017

@astrofrog @mhvk I would like to work on this issue. I noticed that the np.vectorize call applies the formatter on the whole set of values in angles.py.
So would the solution be to just pass the defined do_format function as the parameter formatter to np.set_printoptions and return values? I did check that it gives faster performance.

@mhvk
Copy link
Contributor

mhvk commented Dec 16, 2017

@vermashresth - yes, that was the idea - really mostly you'll need to check that the output looks right!

@vermashresth
Copy link
Contributor

@mhvk I have a doubt. Presently, the to_string() method of Angle returns a numpy array of strings if Angle is an array too. When returning this, it applies formatter to all the values in the array and that is what takes time. But if numpy machinary is set to handle this, this function would return a string representation of array which would no longer be iterable on seperate Angle values although the print output would be fast and correct. How to go about it then?

@mhvk
Copy link
Contributor

mhvk commented Dec 18, 2017

@vermashresth - we should not change to_string(), but rather how it is used. So, __str__ might become:

def __str__(self):
    return np.array2string(a, formatter={'all': lambda x: x.to_string()})

I have not checked at all to what extend this slows down the more typical case, where there are not so many items.

@vermashresth
Copy link
Contributor

vermashresth commented Dec 18, 2017

@mhvk Oh, earlier I was only trying it inside the to_string() method, doing it in __str__ works now. I checked the execution time and the difference is of the order 10^-4sec for small inputs of size less than 10 and about 10^-2sec for a size of 999 after which it starts showing summary. Will that work?
Also, are tests needed for this code change?

@mhvk
Copy link
Contributor

mhvk commented Dec 18, 2017

Great! I think no additional tests are needed - this would be a performance label. We do want this in the benchmarks, but that is a different repo (maybe @astrofrog can advice)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants