-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BiG-CZ: Cache CUAHSI Requests #2260
Conversation
The suds library interacts with the CUAHSI SOAP API by reading their WSDL and creating dynamic classes on the fly for requests and responses. Since these classes are created at runtime, they are not serializable, since pickle (or json or anything else) will not know how to deserialize them, as their class definitions will not be available. In preparation for caching these values, they must be made serializable. To this end, we convert the results into Python Dicts, which are serializable. This uses the suds `asdict` method, which converts a suds Object to a Python Dict. Arrays are converted to Lists, literals used immediately, and sub-Objects are converted recursively to Dicts.
A CUAHSI search request uses two endpoints: one for fetching services, and the other for fetching series. The services endpoint has values which rarely change, so we cache them for a week. The series endpoint has more frequently changing values, so those are cached for 5 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested, and seems to be working well.
Initial | Cached | |
---|---|---|
No filter | 1.92s | 297.97ms |
Date filter | 959.02s | 662.26ms |
Gridded services | 2.93s | 636.92ms |
@@ -228,6 +237,7 @@ def get_series_catalog_in_box(box, from_date, to_date, networkIDs): | |||
to_date = to_date or DATE_MAX | |||
|
|||
result = make_request(client.service.GetSeriesCatalogForBox2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you speculate on the size of these responses so we have an idea of the impact to the cache storage we have available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vagrant@services:~$ redis-cli -n 1 GET ":1:bigcz_GetServicesInBox2_-536472219463548920" > GetServicesInBox2.txt
vagrant@services:~$ redis-cli -n 1 GET ":1:bigcz_GetSeriesCatalogForBox2_6025281192626754923" > GetSeriesCatalogForBox2.txt
vagrant@services:~$ du -sh *.txt
392K GetSeriesCatalogForBox2.txt # Cached for 5 minutes
144K GetServicesInBox2.txt # Cached for 1 week
This is for the Philadelphia HUC-12 for a search query of "water" with 214 results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be noted that the original issue only asked to cache GetServicesInBox2
. I'm also caching the other one to make interacting with it faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was skeptical of this approach since it requires the bbox (ie, aoi shape) to be the same in order to get the effect. However, tweaking the filters resulted in noticeably faster searches for having the initial services cached. It's too bad we can't cache the services based off of a spatial index that would allow us to cache things at a larger geographic level.
Thanks for taking a look! |
Overview
CUAHSI searches use two endpoints: one for fetching services, and another for fetching series within them. The services rarely update, so we cache them for a week. The series update more frequently, so we cache them for 5 minutes. Each cache key is composed of
bigcz_{name of request method}_{sorted hashset of arguments}
so new requests should work as expected, and existing requests should be cached for the given TTL.This work complements the filters PR #2258 which triggers searches on click. This should make enabling and disabling filters a lot quicker.
Connects #1932
Demo
Testing Instructions