New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graphics: Storing _points
for Smooth*
implementations result in a performance drop even on non-smooth drawings
#8664
Comments
@DexerBR (with no hurry) as you are the hero who recently touched these lines, do you have any suggestion on how to improve this specific area? |
Good catch @misl6! I can also confirm the difference in performance you mentioned. I think an alternative implementation using Cython objects can significantly improve the performance of this code. Take, for example, the changes marked as From my tests, the difference in performance between using this alternative implementation and not using it seems to be almost imperceptible (feel free to test it), unlike the performance difference you mentioned (current implementation using Python objects). # NOTE: alternative code
cdef struct ListPoints:
float x
float y
cdef class Ellipse(Rectangle):
'''A 2D ellipse.
:Parameters:
`segments`: int, the default value is calculated from the range between angle.
Define how many segments are needed for drawing the ellipse.
The ellipse drawing will be smoother if you have many segments,
however you can also use this property to create polygons with 3 or more sides.
`angle_start`: float, defaults to 0.0
Specifies the starting angle, in degrees, of the disk portion.
`angle_end`: float, defaults to 360.0
Specifies the ending angle, in degrees, of the disk portion.
.. versionchanged:: 1.0.7
Added angle_start and angle_end.
.. versionchanged:: 2.2.0
The default number of segments is no longer 180, it is now calculated
according to the angle range, as this is a more efficient approach.
'''
cdef int _segments
cdef float _angle_start
cdef float _angle_end
# NOTE: alternative code
cdef ListPoints* _list_points
def __init__(self, *args, **kwargs):
Rectangle.__init__(self, **kwargs)
self.batch.set_mode('triangle_fan')
self._segments = kwargs.get('segments') or 0
self._angle_start = kwargs.get('angle_start') or 0.0
self._angle_end = kwargs.get('angle_end') or 360.0
cdef void build(self):
cdef float *tc = self._tex_coords
cdef int i, angle_dir
cdef double angle_start, angle_end, angle_range
cdef double x, y, angle, rx, ry, ttx, tty, tx, ty, tw, th
cdef double cx, cy, tangential_factor, radial_factor, fx, fy
cdef vertex_t *vertices = NULL
cdef unsigned short *indices = NULL
cdef int segments = self._segments
# reset points
# self._points = []
if self.w == 0 or self.h == 0:
return
if segments == 0 or segments < 3:
if segments != 0:
Logger.warning('Ellipse: A minimum of 3 segments is required. The default value will be used instead.')
segments = max(1, int(abs(self._angle_end - self._angle_start) / 2))
# NOTE: alternative code
free(self._list_points)
self._list_points = <ListPoints*>malloc((segments + 1) * sizeof(ListPoints))
if self._list_points == NULL:
raise MemoryError("Failed to allocate memory for points")
tx = tc[0]
ty = tc[1]
tw = tc[4] - tx
th = tc[5] - ty
angle = 0.0
rx = 0.5 * self.w
ry = 0.5 * self.h
vertices = <vertex_t *>malloc((segments + 2) * sizeof(vertex_t))
if vertices == NULL:
raise MemoryError('vertices')
indices = <unsigned short *>malloc((segments + 2) * sizeof(unsigned short))
if indices == NULL:
free(vertices)
raise MemoryError('indices')
# calculate the start/end angle in radians, and adapt the range
if self._angle_end > self._angle_start:
angle_dir = 1
else:
angle_dir = -1
# rad = deg * (pi / 180), where pi / 180 = 0.0174...
angle_start = self._angle_start * 0.017453292519943295
angle_end = self._angle_end * 0.017453292519943295
angle_range = -1 * (angle_end - angle_start) / segments
# add start vertex in the middle
x = self.x + rx
y = self.y + ry
ttx = ((x - self.x) / self.w) * tw + tx
tty = ((y - self.y) / self.h) * th + ty
vertices[0].x = <float>(self.x + rx)
vertices[0].y = <float>(self.y + ry)
vertices[0].s0 = <float>ttx
vertices[0].t0 = <float>tty
indices[0] = 0
# super fast ellipse drawing
# credit goes to: http://slabode.exofire.net/circle_draw.shtml
tangential_factor = tan(angle_range)
radial_factor = cos(angle_range)
# Calculate the coordinates for a circle with radius 0.5 about
# the point (0.5, 0.5). Only stretch to an ellipse later.
cx = 0.5
cy = 0.5
r = 0.5
x = r * sin(angle_start)
y = r * cos(angle_start)
for i in range(1, segments + 2):
ttx = (cx + x) * tw + tx
tty = (cy + y) * th + ty
real_x = self.x + (cx + x) * self.w
real_y = self.y + (cy + y) * self.h
vertices[i].x = <float>real_x
vertices[i].y = <float>real_y
vertices[i].s0 = <float>ttx
vertices[i].t0 = <float>tty
indices[i] = i
fx = -y
fy = x
x += fx * tangential_factor
y += fy * tangential_factor
x *= radial_factor
y *= radial_factor
# self._points.extend([real_x, real_y])
# NOTE: alternative code
self._list_points[i - 1].x = <float>real_x
self._list_points[i - 1].y = <float>real_y
self.batch.set_data(vertices, segments + 2, indices, segments + 2)
free(vertices)
free(indices) All codes related to |
Did not tried it on runtime, but at first glance looks faster. What about doing the following instead (fixes both this issue and possibly other performance issues due to the memory allocation like what discussed in #8662 ):
How does this plan sound? |
I find this approach interesting, although the impact on memory needs to be measured to be sure that memory consumption would not increase significantly for apps that use many primitives. I mean, memory consumption will obviously increase, but we would have to have a metric to define to what extent this increase would be acceptable. Another possibility would be to have a kind of option that lets the user define what type of approach they want when rendering the primitives, for example:
Interestingly, however, there would have to be a mapping to define what the external points of the primitive are. For example, to draw an ellipse, the interior vertices are discarded (360 degrees). The same logic applies to other primitives. Nothing that seems too complex, but it is something that needs to be taken into account when "re-using" vertices to draw Another option would also be to keep the memory allocated until it is used by the |
Yeah, I agree. Better to fix out the
I like this approach, I guess we can do the following:
Do you think it can work? Please remember we're in |
Yes, I think it could work! After implementing this it will be very simple to test the persistent allocation system (during the life cycle of the instruction). |
Yep, separating things into smaller methods makes (almost) everything simpler. Marking it to "Ready" as we now have a plan. |
Hello guys, I've been exploring ways to enhance the performance of the graphics vertex instructions module, particularly for operations involving large sets of points. By leveraging numpy arrays instead of standard Python lists, we can achieve significant improvements in both memory usage and processing speed. This change primarily affects the way we store and manipulate point data in vertex classes, allowing us to utilize numpy's efficient array operations. Here's an example modification for the I believe this enhancement would be beneficial for Kivy users who deal with complex graphical applications and would appreciate your consideration for merging these improvements. Enhanced Performance: Numpy arrays provide faster operations due to their optimized numerical computation capabilities, especially useful when dealing with large sets of data. Best regards, Example: |
After careful consideration of the proposal to include the NumPy library to optimize parts of Kivy, I've realized that this approach might present significant challenges, especially related to response times and the implications of introducing a new external dependency. Given this context, it seems more prudent to explore other alternatives that do not require such deep changes to our user base and infrastructure. Some innovative solutions have already been discussed here, and I would like to complement those ideas with additional viable options that can be implemented using our existing resources and optimization techniques in Cython. I believe these additions could offer significant performance improvements without compromising the maintainability or accessibility of the project. I will be sharing these suggestions in detail in the coming days so that we can discuss them more thoroughly. I am excited to hear your thoughts and collaborate in finding the best strategies to enhance our work on Kivy. Thank you for your dedication and ongoing collaboration. |
Hi @tuliogv, Yes, adding NumPy as a new external dependency to optimize some code has the con of an increased App size. |
Hey everyone, I've been following the discussions on refactoring Python arrays for better memory management, especially considering the proposed solutions and the recent implementation for the Ellipse() class. Building on this momentum, I've suggested a solution for the Rectangle() class, complete with benchmarking results. I believe it would be beneficial for us to systematically address each class, marking them for optimization and gradually improving them one by one. By doing so, we can work towards achieving the optimal solution for memory efficiency across the board. Looking forward to collaborating on this optimization journey! Best regards, |
Software Versions
During tests that later pointed out an issue regarding
GL_TRIANGLE_FAN
(See PR #8662), Cython HTML annotation and perf tests made quite clear that storing_points
forSmooth
implementations as a pythonlist
leads to a noticeable performance drop.We should investigate alternatives for storing these values, work on "faster Kivy graphics", and avoid calls when unneeded.
For testing this issue (and for PR #8662 ) the following script has been used (it's just bad code meant for stressing out our
Ellipse
implementation) by continuously scrolling up and down a huge amount of ellipses.Test configuration: Ubuntu 22.04 + i7-7700HQ CPU + Intel(R) HD Graphics 630
Test results with
master
branch: ~44 fpsTest results with
master
branch, by disabling theself._points.extend([real_x, real_y])
line on theEllipse
implementation: ~60 fpsI did not tested this specific fix on other SW/HW configurations, but I expect something similar, as only the CPU is involved here.
The text was updated successfully, but these errors were encountered: