Cache locations to improve list_images performance#544
Cache locations to improve list_images performance#544relaxdiego wants to merge 1 commit intoapache:trunkfrom
Conversation
This change caches the locations in a private attribute to avoid repeated calls for the same result set. The number of calls made is directly proportional to the number of images returned. For a 223 result set, this can reach as high as 99.990 seconds. Caching the locations reduces it to around 2.001 seconds. The risk of stale location data during execution is negligible.
|
cProfile output of the code: Driver = get_driver(Provider.DIMENSIONDATA)
conn = Driver(username, password)
import cProfile
cProfile.run('conn.list_images()')BEFORE: AFTER |
|
Sorry for the delay. There was a discussion about doing caching in Libcloud in the past, but we decided not to implement it. It's simply too complex and there are too many edge cases (cache invalidation for long running processes, support for multiple caching backends, making sure we don't use too much memory, LRU, etc.), so it's better to leave caching up to the user. I will try to find some time to add a documentation section on caching - why don't we do it and other ways users can speed up the requests (putting caching proxy in front, etc.). |
|
I can modify this so that it only caches the data for each function call only. The problem I'm trying to solve here can't be solved by the client caching the data because the data is being repeatedly downloaded within a loop inside an internal method. Furthermore, a caching proxy up front is not always feasible (e.g. libcloud is being used as part of a CLI tool) |
|
"I can modify this so that it only caches the data for each function call only." I take that back. The implementation will be hacky at best. The only other way I can think of is for this part of the driver[1] to avoid eager loading the location object. cc @tonybaloney [1] |
|
I have updated the methods to get the list of locations before transforming the responses, then using a lamba over the in-memory list removing the requirement for a cache. This is now implemented in PR 587 so can be closed. |
This change caches the locations in a private attribute to avoid
repeated calls for the same result set. The number of calls made is
directly proportional to the number of images returned. For a 223
result set, this can reach as high as 99.990 seconds. Caching the
locations reduces it to around 2.001 seconds. The risk of stale location
data during execution is negligible.