Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pycloudlib configuration file (SC-519) #164

Merged
merged 2 commits into from
Dec 2, 2021
Merged

Conversation

TheRealFalcon
Copy link
Member

@TheRealFalcon TheRealFalcon commented Oct 22, 2021

Add a pycloudlib configuration file to obtain cloud-specific
credentials rather than embedding credential details in the API.

This is currently be backwards compatible, so credentials can still be
passed via API, and those passed via the API will take precedence
over credentials specified in the configuration file.

This allows us to pull individual cloud configuration out of the API. There's a few reasons I want to do this:

  1. There have often been "works on my machine" problems when it comes to credentials. One person might have a newer version of one of the cli tools that expects things in different locations. Moving credentials into a separate configuration file (ideally) means there's no more wondering what needs to be specified to launch an instance on particular cloud.
  2. Pycloudlib currently needs to be configured per project. E.g., UA tests are already work on certain jenkins machines, and re-setting things up so they work with cloud-init is a waste of time.
  3. It's easier to follow the liskov substitution principle. E.g., openstack currently requires a 'network' parameter that other clouds don't require. This means we can't make a generic get_cloud().launch() kind of API, because different clouds require different arguments based on their auth.
  4. I'd eventually like to build a cli (how I'm currently testing this branch) on top of pycloudlib, and this makes it a whole lot easier.

I haven't updated any tests or documentation as I would like to get buy-in on what's here first. I also realize that this config references other configs. Most of them can be pulled out into a single config, but we can iterate on that later, and I'd rather not make this PR even bigger.

@TheRealFalcon TheRealFalcon changed the title Add pycloudlib configuration file WIP: Add pycloudlib configuration file Oct 22, 2021
@TheRealFalcon TheRealFalcon changed the title WIP: Add pycloudlib configuration file WIP: Add pycloudlib configuration file (SC-519) Oct 22, 2021
Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested, but I think this makes sense.

Copy link
Contributor

@orndorffgrant orndorffgrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of this idea and I'm a fan of the cli too!

Haven't actually tested yet, but looks good overall - just a couple questions/comments

@@ -0,0 +1,59 @@
############### pycloudlib.toml.template #####################################
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toml +1

Comment on lines 2 to 3
# Copy this file to /etc/pycloudlib.toml or ~/.config/pycloudlib.toml and
# fill in the values appropriately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it also check in the current directory (wherever it was run) for a pycloudlib.toml?

That could make per-project configuration easier, but also brings accidental git commiting risk, as described below

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be able to specify the config file path via a PYCLOUDLIB_CONFIG environment variable. This plays well in Jenkins, where a credentials file can be added to the Jenkins creds storage, and then Jenkins can create it in a random location when a job runs, exporting the path via a custom env var.

Comment on lines +8 to +12
# If a key is uncommented, it is required to launch an instance on that cloud.
# Commented keys aren't required, but allow further customization for
# settings in which the defaults don't work for you. If a key has a value,
# that represents the default for that cloud.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it.

@@ -25,10 +28,17 @@ def __init__(self, tag, timestamp_suffix=True):
timestamp_suffic: Append a timestamped suffix to the tag string.
"""
self._log = logging.getLogger(__name__)
self.config = parse_config()[self._type]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there also be a way to pass in a config as a parameter to this constructor?

Comment on lines +55 to +58
elif 'GOOGLE_APPLICATION_CREDENTIALS' in os.environ:
self.credentials_path = os.environ[
'GOOGLE_APPLICATION_CREDENTIALS']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought is that I'd expect the envvar to have higher precedence than the config file, but not a particularly strong opinion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I'd also expect the env vars to have precedence over the config file.

@paride
Copy link
Collaborator

paride commented Oct 29, 2021

I think this is great! It will really make it easier to run tests in the clouds, both manually and in Jenkins. I left a couple of inline comments, but overall this LGTM. The branch needs fixing to make CI pass, but I didn't spot anything fundamental there.

@TheRealFalcon TheRealFalcon force-pushed the config-file branch 4 times, most recently from 1bb8de4 to 3909b6a Compare November 17, 2021 19:08
@TheRealFalcon TheRealFalcon removed the wip label Nov 18, 2021
@TheRealFalcon TheRealFalcon changed the title WIP: Add pycloudlib configuration file (SC-519) Add pycloudlib configuration file (SC-519) Nov 18, 2021
@TheRealFalcon
Copy link
Member Author

45 files changed!? Over half of those are docs changes and removal of a directory we don't need (make build from the docs directory if you don't believe me). I intend to keep that as a separate commit.

There were a lot of small required test changes, so a lot of files got touched, but the non-test code changes are almost identical to the first round of reviews. I tested using the config file on all of the pycloudlib supported clouds.

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, and I think a strict approach to defining a single conf entrypoint makes the most sense and avoids some of pitfalls we've seen with trying to be "too flexible" in supporting each cloud's means of obtaining credentials. One conf to rule them all, and proper pycloudlib docs to allow pycloudlib users to query the cloud to get those details where possible.

Comment on lines 17 to 20
clientId = ""
clientSecret = ""
subscriptionId = ""
tenantId = ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that Azure naming is camelCase, but it feels a bit strange that we have our generlized snake_case params mixed with camelCase. Is this PR an opportunity to standardize on snake_case and just do the proper camelCase mapping internal to pycloudlib when invoking the SDK?

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have a potential of people providing unaltered or incorrect toml config with empty string as values. I think we probably want to handle "empty" values and ignore them properly on each cloud.

Ec2 currently fails in an unfriendly manner because pycloudlib.ec2.util _get_session passes the empty strings for secet_access_key, access_key_id and region into boto3.Session, which attempts to use the empty string instead of sourcing your local .aws/config files.

    self.client = session.client('ec2')
  File "/home/csmith/src/pycloudlib/.tox/pylint/lib/python3.8/site-packages/boto3/session.py", line 258, in client
    return self._session.create_client(
  File "/home/csmith/src/pycloudlib/.tox/pylint/lib/python3.8/site-packages/botocore/session.py", line 847, in create_client
    client = client_creator.create_client(
  File "/home/csmith/src/pycloudlib/.tox/pylint/lib/python3.8/site-packages/botocore/client.py", line 86, in create_client
    client_args = self._get_client_args(
  File "/home/csmith/src/pycloudlib/.tox/pylint/lib/python3.8/site-packages/botocore/client.py", line 355, in _get_client_args
    return args_creator.get_client_args(
  File "/home/csmith/src/pycloudlib/.tox/pylint/lib/python3.8/site-packages/botocore/args.py", line 99, in get_client_args
    endpoint = endpoint_creator.create_endpoint(
  File "/home/csmith/src/pycloudlib/.tox/pylint/lib/python3.8/site-packages/botocore/endpoint.py", line 287, in create_endpoint
    raise ValueError("Invalid endpoint: %s" % endpoint_url)
ValueError: Invalid endpoint: https://ec2..amazonaws.com
csmith@uptown:~/src/pycloudlib$ 

Should we treat toml empty string values as None in pycloudlib before invoking cloud-specific libraries? Or should we just ensure ec2, gce, openstack oci adopt the right fallback when provided with an empty string value.

For ec2 specifically, the following retains fallback behavior to config in .aws/config:

diff --git a/pycloudlib/ec2/util.py b/pycloudlib/ec2/util.py
index 4c1a9ca..81d3a4f 100644
--- a/pycloudlib/ec2/util.py
+++ b/pycloudlib/ec2/util.py
@@ -66,7 +66,7 @@ def _get_session(access_key_id, secret_access_key, region):
                     _decode_console_output_as_bytes)
     return boto3.Session(
         botocore_session=mysess,
-        aws_access_key_id=access_key_id,
-        aws_secret_access_key=secret_access_key,
-        region_name=region
+        aws_access_key_id=access_key_id or None,
+        aws_secret_access_key=secret_access_key or None,
+        region_name=region or None
     )

possible_configs.extend(CONFIG_PATHS)
for path in possible_configs:
try:
return toml.load(path, _dict=Config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we print or log the config file we successfully sourced?

@TheRealFalcon
Copy link
Member Author

Re: empty or bad value checks, I think I'd like to wait and iterate on that. Especially if we decide that configuration file will be the only valid way to pass credentials to pycloudlib, it will be a lot easier to do validation of required values.

Copy link
Collaborator

@paride paride left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally managed to try test more things and what I tried worked smoothly. There is probably room for improvement in how some errors/failures are presented, but that only happens if the conf file is broken. We can have more iterations on this if needed, no need to block landing. Overall this is a very big improvement on the usability of the library!

credentials rather than embedding credential details in the API.

This is currently backwards compatible, so credentials can still be
passed via API, and those passed via the API will take precedence
over credentials specified in the configuration file.

This allows us to pull individual cloud configuration out of the API.
There's a few reasons to do this:

* There have often been "works on my machine" problems when it comes
to credentials. One person might have a newer version of one of the
cli tools that expects things in different locations. Moving
credentials into a separate configuration file (ideally) means there's
no more wondering what needs to be specified to launch an instance on
particular cloud.
* Pycloudlib currently needs to be configured per project. E.g., UA
tests already works on certain jenkins machines, and re-setting
things up so they work with cloud-init is a waste of time.
* It's easier to follow the liskov substitution principle. E.g.,
openstack currently requires a 'network' parameter that other clouds
don't require. This means we can't make a generic get_cloud().launch()
kind of API, because different clouds require different arguments
based on their auth.
* If we eventually want a cli on top of pycloudlib, this makes it a
whole lot easier.
It gets regenerated every doc build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants