Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End to end exhibitor lockdown #5985

Merged

Conversation

jr0d
Copy link
Contributor

@jr0d jr0d commented Jul 29, 2019

High-level description

Automated lock down of the exhibitor service via mutual TLS.

https://docs.mesosphere.com/1.13/security/ent/tls-ssl/exhibitor/

  • Adds the dcoscertstrap package which provides secure PKI bootstrap capabilities to the exhibitor service.
  • Orchestration for PKI initialization has been added to gen
  • start_exhibitor.py has been modified to generate end entity private keys and submit CSRs to the transient bootstrap CA service.

Corresponding DC/OS tickets (required)

  • DCOS-56036 Update installer for automated exhibitor lock down
  • DCOS-56038 Add Exhibitor lock down automation to exhibitor systemd unit

Related tickets (optional)

  • DCOS-55992 Develop install time CA CLI and service

Checklist for component/package updates:

If you are changing components or packages in DC/OS (e.g. you are bumping the sha or ref of anything underneath packages), then in addition to the above please also include:

  • Change log from the last version integrated (this should be a link to commits for easy verification and review): example
  • Included a test which will fail if code is reverted but test is not. If there is no test please explain.
  • Test Results: [link to CI job test results for component]
  • Code Coverage (if available): [link to code coverage report]

@d2iq-mergebot
Copy link
Collaborator

This repo has @mesosphere-mergebot integration. You can perform the following commands by submitting a comment. Submit a comment with content "@mesosphere-mergebot help" to view more detailed help text and examples. Be sure the have a look at the mergebot documentation, too.

@mesosphere-mergebot label [Request For Comment|Holding|Work In Progress|Ship It|merge-by-mergebot|Ready For Review] 
@mesosphere-mergebot changelog-not-required reason 
@mesosphere-mergebot sync  
@mesosphere-mergebot bump-ee  
@mesosphere-mergebot request-review  
@mesosphere-mergebot override-status pr-status-check jira-url 
@mesosphere-mergebot merge-it  
  • PR creators can apply one of [Ready For Review|Work In Progress]. Owners can apply any label.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump PR: mesosphere/dcos-enterprise/pull/6299

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

@jr0d jr0d force-pushed the jr/exhibitor-lockdown-automation-future branch from 142ac50 to c517762 Compare July 31, 2019 01:05
@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6299 updated.

@jr0d jr0d force-pushed the jr/exhibitor-lockdown-automation-future branch from c517762 to 24a9141 Compare July 31, 2019 12:07
@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6299 updated.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6299 updated.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

@jr0d jr0d force-pushed the jr/exhibitor-lockdown-automation-future branch from 05577b2 to c671585 Compare August 2, 2019 02:14
@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6299 updated.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

echo
fi

exit 0
Copy link
Contributor

@jgehrcke jgehrcke Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this patch we had exit $? (returning the exit status of docker run...?). Now we have exit 0. What's the rationale? Does not need to be answered before merging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the script is running with set -e it only gets here if the exit code is 0

@@ -50,4 +55,16 @@ else
trap 'docker kill {genconf_tar}' HUP QUIT INT TERM
docker run --rm --name=$DCOS_INSTALLER_CONTAINER_NAME -i -p $PORT:9000 -v $(pwd)/genconf/:/genconf {docker_image_name} "$@"
fi
exit $?

if [ -d genconf/ca ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would be good to have a code comment here in which case genconf/ca is expected to be present, and in which case it's not expected to be present.

if exhibitor_bootstrap_ca_url == '':
return

assert exhibitor_bootstrap_ca_url[-1] != '/', "Must not end in a '/'"
Copy link
Contributor

@jgehrcke jgehrcke Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.endswith() :D

@@ -1262,6 +1287,12 @@ def calculate_fault_domain_detect_contents(fault_domain_detect_filename):
'superuser_service_account_uid': '',
'superuser_service_account_public_key': '',
'_superuser_service_account_public_key_json': calculate__superuser_service_account_public_key_json,
# This is a private variable that is not supposed to be used by cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This" is easy to misunderstand. Instead of "This" let's write the name of the variable in the code comment. Can be done after merge.

'CA init in gen is only supported when using a remote'
' bootstrap node', False),
(lambda: 'exhibitor_bootstrap_ca_url' in config,
'Custom CA url has been specified, not initializing CA', False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely, thanks for putting error messages in a central place.

# Exhibitor TLS is required, fail hard
if final_arguments.get('exhibitor_tls_required') == 'true' and hard_failure:
raise ExhibitorTLSBootstrapError(errors=reasons)
print('[{}] not bootstrapping exhibitor CA: {}'.format(
Copy link
Contributor

@jgehrcke jgehrcke Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We should capitalize Exhibitor everywhere in docs and in error messages and in log messages, etc. Can do after merge.

final_arguments['cluster_packages']) + '.tar.xz'
package_path = Path('/artifacts/packages') / PACKAGE_NAME / package_filename

ca_alternative_names = ['127.0.0.1', 'localhost',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should have a code comment for why each of the values is needed. Can do after merge.

ip = subprocess.check_output(
['/opt/mesosphere/bin/detect_ip']).strip().decode('utf-8')
except subprocess.CalledProcessError as e:
print("check_output exited with {}".format(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: will this really be a useful error message? Will it say something about the program that was executed? Can be tuned after merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shamelessly copied this from somewhere else in the code base. I meant to reflector it :)

except socket.error as e:
print(
"inet_aton exited with {}. {} is not a valid IPv4 address".format(e, ip))
sys.exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that validation that we need here? Isn't that done elsewhere?


def test_connection(ca_url):
s = socket.socket()
s.settimeout(5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the connect() or the recv() timeout, or both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, does this really set the connect() timeout, which is what we seem to care about here to not hang forever...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're good:

The connect() operation is also subject to the timeout setting, and in general it is recommended to call settimeout() before calling connect()

https://docs.python.org/3/library/socket.html#timeouts-and-the-connect-method

print('could not connect to DC/OS bootstrap CA: {}'.format(e))
return False
finally:
s.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely.

print('connection to {}:{} successful'.format(host, port))
return True
except Exception as e:
print('could not connect to DC/OS bootstrap CA: {}'.format(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decent error handling, thanks!

if not test_connection(ca_url):
_fail_and_calculate_status('connection failed')

print('Bootstrapping exhibitor TLS')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again: let's capitalize.... (can do after merge)

# Clean up any partially written artifacts
_remove_artifact_dir()
_fail_and_calculate_status(
'error generating artifacts code: {} stdout: {} stderr: {}'.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After merge, let's polish:

Error generating Exhibitor TLS artifacts. Exit code: ...

if __name__ == '__main__':
main()
# Always flush stdout buffer when exiting the script
sys.stdout.flush()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhm if this has been a problem I recommend starting the Python interperter with the environment variable PYTHONIOBUFFERING set so that std stream buffering gets disabled completely.

Copy link
Contributor

@jgehrcke jgehrcke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely work, nice code base. Let's get this merged and iterate on it thereafter.

Copy link
Contributor

@jongiddy jongiddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone. Let's merge and fix the nits later.

@jgehrcke
Copy link
Contributor

@mesosphere-mergebot changelog-not-required This is covered in the downstream changelog

@jgehrcke
Copy link
Contributor

Applying the ship-it label manually and merging right away, this is highly unlikely to be in logical conflict with the changes to master from the last two days.

@jgehrcke
Copy link
Contributor

@mesosphere-mergebot label ship-it

@jgehrcke
Copy link
Contributor

@mesosphere-mergebot merge-it

@d2iq-mergebot
Copy link
Collaborator

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

@jgehrcke
Copy link
Contributor

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

I thought merge-it allows humans to be smarter than machines. Okay, I am not pulling off the completely manual merge now, so let's wait for a train to kick in.

d2iq-mergebot added a commit to mesosphere/dcos that referenced this pull request Aug 15, 2019
d2iq-mergebot added a commit to mesosphere/dcos that referenced this pull request Aug 15, 2019
d2iq-mergebot added a commit to mesosphere/dcos that referenced this pull request Aug 15, 2019
@d2iq-mergebot d2iq-mergebot merged commit 87aeb7e into dcos:master Aug 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants