Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BiG-CZ: Cache CUAHSI Requests #2260

Merged
merged 2 commits into from
Sep 20, 2017
Merged

Conversation

rajadain
Copy link
Member

Overview

CUAHSI searches use two endpoints: one for fetching services, and another for fetching series within them. The services rarely update, so we cache them for a week. The series update more frequently, so we cache them for 5 minutes. Each cache key is composed of bigcz_{name of request method}_{sorted hashset of arguments} so new requests should work as expected, and existing requests should be cached for the given TTL.

This work complements the filters PR #2258 which triggers searches on click. This should make enabling and disabling filters a lot quicker.

Connects #1932

Demo

2017-09-15 11 54 47

Testing Instructions

  • Check out this branch, go to :8000/?bigcz
  • Run a WDC search. It'll take it's time the first time.
  • Try and run the same search again. It should be noticeably faster.
  • Ensure the results are the same as before.

The suds library interacts with the CUAHSI SOAP API by reading
their WSDL and creating dynamic classes on the fly for requests
and responses. Since these classes are created at runtime, they
are not serializable, since pickle (or json or anything else)
will not know how to deserialize them, as their class definitions
will not be available.

In preparation for caching these values, they must be made
serializable. To this end, we convert the results into Python
Dicts, which are serializable. This uses the suds `asdict` method,
which converts a suds Object to a Python Dict. Arrays are
converted to Lists, literals used immediately, and sub-Objects
are converted recursively to Dicts.
A CUAHSI search request uses two endpoints: one for fetching
services, and the other for fetching series. The services endpoint
has values which rarely change, so we cache them for a week. The
series endpoint has more frequently changing values, so those are
cached for 5 minutes.
Copy link

@arottersman arottersman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, and seems to be working well.

Initial Cached
No filter 1.92s 297.97ms
Date filter 959.02s 662.26ms
Gridded services 2.93s 636.92ms

@arottersman arottersman assigned rajadain and unassigned arottersman Sep 19, 2017
@@ -228,6 +237,7 @@ def get_series_catalog_in_box(box, from_date, to_date, networkIDs):
to_date = to_date or DATE_MAX

result = make_request(client.service.GetSeriesCatalogForBox2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you speculate on the size of these responses so we have an idea of the impact to the cache storage we have available?

Copy link
Member Author

@rajadain rajadain Sep 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vagrant@services:~$ redis-cli -n 1 GET ":1:bigcz_GetServicesInBox2_-536472219463548920" > GetServicesInBox2.txt
vagrant@services:~$ redis-cli -n 1 GET ":1:bigcz_GetSeriesCatalogForBox2_6025281192626754923" > GetSeriesCatalogForBox2.txt
vagrant@services:~$ du -sh *.txt
392K	GetSeriesCatalogForBox2.txt  # Cached for 5 minutes
144K	GetServicesInBox2.txt        # Cached for 1 week

This is for the Philadelphia HUC-12 for a search query of "water" with 214 results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted that the original issue only asked to cache GetServicesInBox2. I'm also caching the other one to make interacting with it faster.

Copy link
Contributor

@mmcfarland mmcfarland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was skeptical of this approach since it requires the bbox (ie, aoi shape) to be the same in order to get the effect. However, tweaking the filters resulted in noticeably faster searches for having the initial services cached. It's too bad we can't cache the services based off of a spatial index that would allow us to cache things at a larger geographic level.

@mmcfarland mmcfarland removed their assignment Sep 20, 2017
@rajadain
Copy link
Member Author

Thanks for taking a look!

@rajadain rajadain merged commit fe10d23 into develop Sep 20, 2017
@rajadain rajadain deleted the tt/bigcz-cache-cuahsi-requests branch September 20, 2017 19:41
@rajadain rajadain mentioned this pull request Oct 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants