Skip to content

Refactor CLI client#77

Merged
kislyuk merged 15 commits intomasterfrom
akislyuk-client-refactor
Dec 4, 2017
Merged

Refactor CLI client#77
kislyuk merged 15 commits intomasterfrom
akislyuk-client-refactor

Conversation

@kislyuk
Copy link
Copy Markdown
Member

@kislyuk kislyuk commented Nov 20, 2017

Sorry about the giant diff.

Refactor the library.

  • Introduce a new class, hca.util.SwaggerClient, that manages a
    Swagger API connection client and dynamically generates the client
    bindings and CLI parser for any Swagger API given to it. The only
    user of this class is currently hca.dss.DSSClient.

  • The class takes only the Swagger definition URL as its
    configuration, and caches the API definition in the user config
    directory after the first run. This allows dynamic configuration of
    API endpoints and other API parameters. Any higher level commands
    that should be added on top of the API bindings can be defined as
    methods in hca.util.SwaggerClient subclasses, and added to the
    commands attribute.

  • In the new class, enable a new transport subroutine. It is a wrapper
    around Requests, as before, but now it allows explicit, idiomatic
    control over pagination, streaming, and authentication.

  • The new class enables better autodoc generation. Run make docs;
    open _build/html/index.html to see the updated autodocs.

  • Remove code generation machinery.

Connects to HumanCellAtlas/data-store#468

Connects to HumanCellAtlas/data-store#531

Fixes #36, #56, #60

@ghost ghost assigned kislyuk Nov 20, 2017
@ghost ghost added the code review label Nov 20, 2017
@codecov-io
Copy link
Copy Markdown

codecov-io commented Nov 20, 2017

Codecov Report

Merging #77 into master will increase coverage by 11.38%.
The diff coverage is 87.84%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #77       +/-   ##
===========================================
+ Coverage   76.46%   87.84%   +11.38%     
===========================================
  Files          37       27       -10     
  Lines        1266      897      -369     
===========================================
- Hits          968      788      -180     
+ Misses        298      109      -189
Impacted Files Coverage Δ
hca/upload/cli/forget_command.py 100% <ø> (ø) ⬆️
hca/upload/cli/upload_command.py 84.37% <ø> (ø) ⬆️
hca/upload/cli/list_area_command.py 100% <ø> (ø) ⬆️
hca/__init__.py 100% <100%> (ø) ⬆️
hca/util/_docs.py 100% <100%> (ø)
hca/upload/upload_config.py 94.28% <100%> (-0.16%) ⬇️
hca/upload/cli/list_areas_command.py 100% <100%> (ø) ⬆️
hca/util/compat.py 100% <100%> (ø)
hca/upload/cli/select_command.py 100% <100%> (ø) ⬆️
hca/version.py 100% <100%> (ø)
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3dbf5b7...feca6e4. Read the comment docs.

@kislyuk kislyuk force-pushed the akislyuk-client-refactor branch 7 times, most recently from 287dbd8 to c939f0b Compare November 20, 2017 16:16
Comment thread hca/dss/__init__.py
finally:
response.close()
session.stream = False
return {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this return value?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's preferable if all commands return JSON objects on their standard output by default.

Comment thread hca/dss/__init__.py Outdated
logger.info("Uploading %i files from %s to %s", len(files_to_upload), src_dir, staging_bucket)
file_uuids, uploaded_keys = upload_to_cloud(files_to_upload, staging_bucket=staging_bucket, replica=replica,
from_cloud=False)
for file_handle in filter(lambda x: not isinstance(x, str), files_to_upload):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is files_to_upload ever a string?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be that this could be an S3 URL prefix, but with this refactor it's no longer possible to pass that in. Thanks for the catch, I've removed the filter.

Comment thread hca/util/__init__.py
@@ -0,0 +1,332 @@
"""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module contains the core of the entire CLI, yet has barebones documentation. I don't think this is in a state where other people would be able to come in and be able to work on this codebase.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, sorry about that! I've added a docstring, please take a look.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added documentation is a great step, but mostly for someone who wants to understand what this module does. It's not really that useful for understanding how it works.

If I were to come in and try to work on this, I'd probably have to drop print(..) statements everywhere to understand that.

Copy link
Copy Markdown
Collaborator

@Bento007 Bento007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there testcases for search using the paginator?

Comment thread hca/dss/__init__.py
"""
Client for the Data Storage Service API.
"""
UPLOAD_BACKOFF_FACTOR = 1.618
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity where does this magic number come from?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the golden ratio :)

Copy link
Copy Markdown
Member Author

@kislyuk kislyuk Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really matter, the exponential backoff factor for HTTP request backoffs can be any number not too far from 2.

(Also, all of this code is a refactoring of the existing upload/download code, so I'm not the original author.)

@kislyuk kislyuk force-pushed the akislyuk-client-refactor branch 2 times, most recently from bf0da4f to 6f237b4 Compare November 29, 2017 21:27
@kislyuk kislyuk force-pushed the akislyuk-client-refactor branch from 6f237b4 to ada4ad9 Compare November 30, 2017 18:58
It was meant to filter out S3 URLs, but we no longer handle those here.
@kislyuk
Copy link
Copy Markdown
Member Author

kislyuk commented Nov 30, 2017

@Bento007, I have added a test case for the paginator.

Comment thread hca/__init__.py
return os.path.join(self._user_config_home, self._name)


_config = None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just:

_config = HCAConfig(__name__)

I don't think you get much from lazy-initializing it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it's beneficial to lazy-initialize it, to avoid import-time side effects that include reading a bunch of files and registering an atexit handler that will write a bunch of files.

Comment thread hca/util/exceptions.py
def __init__(self, *args, **kwargs):
super(SwaggerAPIException, self).__init__(*args, **kwargs)

self.code = self.response.status_code
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a standardized template for responses that comes as part of the json. https://github.com/HumanCellAtlas/data-store/blob/b9471df71eeba3074a43ca50caee5c0e144f0f15/dss-api.yml#L998

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added error response fields to exception.

Comment thread hca/cli.py Outdated
parser = HCAArgumentParser(description=__doc__, formatter_class=argparse.RawTextHelpFormatter)
parser.add_argument("--version", action="version", version="%(prog)s {version}".format(version=__version__))
parser.add_argument("--log-level", default=get_config().get("log_level"),
help=str([logging.getLevelName(i) for i in range(0, 60, 10)]),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there really a need for NOTSET?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, removed.

Comment thread hca/util/__init__.py
containing the Swagger API definition that the client is providing an
interface for. On first use, this API definition will be downloaded
and saved into the user config directory (for example,
/Users/Alice/.config/hca) with a name determined by the base64
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an update strategy?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not yet an update strategy. I imagine an update strategy could involve re-downloading the Swagger API definition when the hca package version has changed with respect to what's recorded in the user-specific config file, or if the cached API definition file's mtime is older than a certain time cutoff. But I'd like to land that in a separate PR. Filed #79 to keep track of this.

Comment thread hca/util/__init__.py
@@ -0,0 +1,332 @@
"""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added documentation is a great step, but mostly for someone who wants to understand what this module does. It's not really that useful for understanding how it works.

If I were to come in and try to work on this, I'd probably have to drop print(..) statements everywhere to understand that.

Copy link
Copy Markdown
Member

@ttung ttung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems mostly fine, but see the comments. I'd consider improving the documentation of hca/util/__init__.py since it seems so central to everything.

@kislyuk kislyuk force-pushed the akislyuk-client-refactor branch from 8785095 to 44d08aa Compare December 4, 2017 19:23
@kislyuk kislyuk merged commit 22f9189 into master Dec 4, 2017
@ghost ghost removed the code review label Dec 4, 2017
@kislyuk kislyuk deleted the akislyuk-client-refactor branch December 4, 2017 19:56
list_area_parser = staging_subparsers.add_parser(
'list', description="List contents of currently selected upload area.")
'list', description=cls.__doc__, help=cls.__doc__)
list_area_parser.set_defaults(func=ListAreaCommand)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hca upload list is currently broken. It just prints help. Is that because this needs to be changed to parser.set_defaults(entry_point= like you have done for the other parsers?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed now. I forgot to update all the locations in the upload source tree to change func to entry_point. I'll add a CLI test to make sure this code path is asserted on.

@@ -4,12 +4,15 @@


class ListAreasCommand:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hca upload areas now prints some weird return value:

scripts/hca upload areas
aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee <- selected
deadbeef-dead-dead-dead-beeeeeeeeeef
0e2ea682-c9d0-49ad-8fa0-b9d8ab2f7418
340ff122-308c-47cd-afa2-f324ded875b6
"<hca.upload.cli.list_areas_command.ListAreasCommand instance at 0x10dcd9050>"

Copy link
Copy Markdown
Member Author

@kislyuk kislyuk Dec 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed now. In the new CLI entry point dispatch convention, the return value of the call to entry_point is printed on stdout if it's a string or bytes, or printed as JSON if it's a mapping. The upload commands are using a class constructor call (__new__/__init__), which returns an instance of the command class, which is not something that should be printed. I special-cased this in the entry point dispatcher to ignore the return value in this case. Alternatively, we could change the Command classes to move the command handler logic from __init__ to __call__, and change the entry points to be instances of the Command class instead of the classes themselves.

Comment thread hca/__init__.py

class HCAConfig(tweak.Config):
default_config_file = os.path.join(os.path.dirname(__file__), "default_config.json")
@property
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line above

Comment thread hca/__init__.py


_config = None
def get_config():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 blank lines before method

Comment thread hca/cli.py
import os
from __future__ import absolute_import, division, print_function, unicode_literals

import os, sys, argparse, logging, json, datetime, traceback
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep getting dinged in code reviews for not splitting my imports onto individual lines.
Seems like you're going in the opposite direction.

kislyuk added a commit to HumanCellAtlas/data-store that referenced this pull request Dec 7, 2017
CLI updates in HumanCellAtlas/dcp-cli#77
and subsequent commits, hca PyPI release v3.0.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants