Skip to content

Refetch DSS API definition weekly (#79)#139

Merged
calvinnhieu merged 3 commits intomasterfrom
calvinnhieu-refetch-API-defn
Jul 23, 2018
Merged

Refetch DSS API definition weekly (#79)#139
calvinnhieu merged 3 commits intomasterfrom
calvinnhieu-refetch-API-defn

Conversation

@calvinnhieu
Copy link
Copy Markdown

@calvinnhieu calvinnhieu commented Jul 20, 2018

Currently, the DCP CLI fetches the DSS API definition once the first time it is run after installation and never updates after that. This change implements a strategy to refetch the DSS API definition after 7 days since the last update determined by the cached API definition's (~/.config/hca/<base-64-encoded-swagger-url>.json) last modified date. The refetch frequency is contestable if you guys have other ideas.

One minor improvement we are considering is to enable content encoding on the DSS API to enable serving a compressed version of the API definition in order to minimize HCA CLI start up times. Granted, with an update policy of 1 week, this fetch would occur relatively infrequently and may go unnoticed. Further, the performance tradeoff between reducing payload size and adding a decompression step may not be significant. This change would require a small API Gateway configuration change, be applied to the entire DSS API and the compression of response payloads would be selective and invoked only by including an appropriate Accept-Enconding header on the API request.

For more details:
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-gzip-compression-decompression.html

Fixes #79

@calvinnhieu calvinnhieu requested review from kislyuk and mweiden July 20, 2018 19:18
@calvinnhieu
Copy link
Copy Markdown
Author

calvinnhieu commented Jul 20, 2018

Build errors re: test_get_retry (test.test_dss_api_retry.TestDssApiRetry) are occurring across other unrelated, branches and is unreproducible locally.

https://travis-ci.org/HumanCellAtlas/dcp-cli/builds

@calvinnhieu calvinnhieu force-pushed the calvinnhieu-refetch-API-defn branch from fd3706c to 73a15e6 Compare July 20, 2018 20:33
Copy link
Copy Markdown
Member

@kislyuk kislyuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but:

  • sorry the unrelated test broke, but we need to fix them. Can you take a look?
  • It would be nice to double check that this works with unreadable or read-only home directory locations. No need to write a test for it but just check manually.

Comment thread hca/util/__init__.py Outdated
swagger_filename = os.path.join(self.config.user_config_dir, swagger_filename)
if not os.path.exists(swagger_filename):
is_cached = os.path.exists(swagger_filename)
if not is_cached or (is_cached and
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear here what the priority is. Please parenthesize (not is_cached).

Comment thread test/test_dss_swagger_spec.py Outdated
self.dummy_response.status_code = 200

def test_get_swagger_spec_new(self):
with mock.patch('os.path.exists') as mock_exists:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this form might be more readable:

with x as y, \
    a as b, \
    c as d:
    ...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're already overriding the location of the filename, maybe you don't need to mock things like os.makedirs?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using os.makedirs mock to verify function calls. Mocks for os.path.exists and load_swagger_json could be replaced by creating temp dirs/files, however, I would opt in favor of using mocks so as to avoid littering temp dirs/files. Plus, your suggestion for readability helps a lot.

@kislyuk kislyuk added BUG and removed BUG labels Jul 20, 2018
@calvinnhieu calvinnhieu force-pushed the calvinnhieu-refetch-API-defn branch from 73a15e6 to 801efd9 Compare July 23, 2018 15:32
@calvinnhieu
Copy link
Copy Markdown
Author

calvinnhieu commented Jul 23, 2018

Verified os.path.getmtime works for both read-only (444) and unreadable (000) files.

Addressing broken test

@calvinnhieu calvinnhieu force-pushed the calvinnhieu-refetch-API-defn branch from b2553b9 to 946f34b Compare July 23, 2018 16:16
@codecov-io
Copy link
Copy Markdown

codecov-io commented Jul 23, 2018

Codecov Report

Merging #139 into master will decrease coverage by 0.02%.
The diff coverage is 62.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #139      +/-   ##
==========================================
- Coverage   61.37%   61.35%   -0.03%     
==========================================
  Files          31       31              
  Lines        1178     1185       +7     
==========================================
+ Hits          723      727       +4     
- Misses        455      458       +3
Impacted Files Coverage Δ
hca/util/__init__.py 91.81% <62.5%> (-0.89%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ac14e6...e85f706. Read the comment docs.

@calvinnhieu calvinnhieu force-pushed the calvinnhieu-refetch-API-defn branch from 946f34b to e85f706 Compare July 23, 2018 17:18
@calvinnhieu calvinnhieu merged commit 1f6ae3f into master Jul 23, 2018
@ttung
Copy link
Copy Markdown
Member

ttung commented Jul 23, 2018

Can this be done with a conditional GET (if-none-match)?

I would also consider adding a cmd-line option to force a refresh of the spec.

You could make the refresh even less obtrusive by daemonizing the process, and then refresh the spec after you return control to the shell. This may not work for Windows, though.

Last, the cache for the swagger spec is not race-condition safe. Either we should save it with some checksum, or we should save it to a tempfile, and then os.rename it to its final location (os.rename is atomic while file writes are not).

@calvinnhieu
Copy link
Copy Markdown
Author

calvinnhieu commented Jul 23, 2018

A conditional GET would work, however, since (from my understanding) the CLI is stateless between commands, it would invoke an HTTP request (to compare ETags) for each command which I imagine is more overhead than checking a file's mtime (as these changes propose).

Daemonizing could resolve this performance issue and would be a requirement to using a conditional GET. Will briefly look into running daemons on Windows. Another point to consider re: asynchronously updating the cache, with the way the update policy is written now (update cache on swagger_spec access), is that the first command run in a batch of commands will most likely run with a stale version of the spec. To avoid this, we would have to update the cache synchronously, incurring the performance hit the daemon was introduced to avoid. @kislyuk thoughts?

Re: CLI command to force refresh spec + use race-condition safe operations, sounds great! I'll add those today.

@ttung
Copy link
Copy Markdown
Member

ttung commented Jul 23, 2018

A conditional GET would work, however, since (from my understanding) the CLI is stateless between commands, it would invoke an HTTP request (to compare ETags) for each command which I imagine is more overhead than checking a file's mtime (as these changes propose).

If you save the ETag of the cached entry, you can present it to the server. It's probably a relatively minor optimization, but if more tooling is built like this, it would be a good example to set.

Daemonizing

I suspect UX improvement is probably not worth the complexity, but just throwing it out there. :)

@calvinnhieu
Copy link
Copy Markdown
Author

Yeah totally, appreciate the insight @ttung. What do you mean by "present it to the server"? Wouldn't that still require an HTTP request?

@ttung
Copy link
Copy Markdown
Member

ttung commented Jul 23, 2018

What do you mean by "present it to the server"? Wouldn't that still require an HTTP request?

yes, but we would get a smaller response if there's no change.

@calvinnhieu calvinnhieu deleted the calvinnhieu-refetch-API-defn branch August 27, 2018 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants