New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
End to end exhibitor lockdown #5985
End to end exhibitor lockdown #5985
Conversation
This repo has @mesosphere-mergebot integration. You can perform the following commands by submitting a comment. Submit a comment with content "@mesosphere-mergebot help" to view more detailed help text and examples. Be sure the have a look at the mergebot documentation, too.
|
@mesosphere-mergebot bump-ee |
Enterprise Bump PR: mesosphere/dcos-enterprise/pull/6299 |
@mesosphere-mergebot bump-ee |
Your pull request's branch is not based on the most recent version of |
@mesosphere-mergebot bump-ee |
Your pull request's branch is not based on the most recent version of |
@mesosphere-mergebot bump-ee |
Your pull request's branch is not based on the most recent version of |
@mesosphere-mergebot bump-ee |
Your pull request's branch is not based on the most recent version of |
@mesosphere-mergebot bump-ee |
Your pull request's branch is not based on the most recent version of |
@mesosphere-mergebot bump-ee |
Your pull request's branch is not based on the most recent version of |
142ac50
to
c517762
Compare
@mesosphere-mergebot bump-ee |
Enterprise Bump mesosphere/dcos-enterprise/pull/6299 updated. |
c517762
to
24a9141
Compare
@mesosphere-mergebot bump-ee |
Enterprise Bump mesosphere/dcos-enterprise/pull/6299 updated. |
@mesosphere-mergebot bump-ee |
Enterprise Bump mesosphere/dcos-enterprise/pull/6299 updated. |
@mesosphere-mergebot bump-ee |
Your pull request's branch is not based on the most recent version of |
05577b2
to
c671585
Compare
@mesosphere-mergebot bump-ee |
Enterprise Bump mesosphere/dcos-enterprise/pull/6299 updated. |
@mesosphere-mergebot bump-ee |
echo | ||
fi | ||
|
||
exit 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this patch we had exit $?
(returning the exit status of docker run...
?). Now we have exit 0
. What's the rationale? Does not need to be answered before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the script is running with set -e
it only gets here if the exit code is 0
@@ -50,4 +55,16 @@ else | |||
trap 'docker kill {genconf_tar}' HUP QUIT INT TERM | |||
docker run --rm --name=$DCOS_INSTALLER_CONTAINER_NAME -i -p $PORT:9000 -v $(pwd)/genconf/:/genconf {docker_image_name} "$@" | |||
fi | |||
exit $? | |||
|
|||
if [ -d genconf/ca ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Would be good to have a code comment here in which case genconf/ca
is expected to be present, and in which case it's not expected to be present.
if exhibitor_bootstrap_ca_url == '': | ||
return | ||
|
||
assert exhibitor_bootstrap_ca_url[-1] != '/', "Must not end in a '/'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.endswith()
:D
@@ -1262,6 +1287,12 @@ def calculate_fault_domain_detect_contents(fault_domain_detect_filename): | |||
'superuser_service_account_uid': '', | |||
'superuser_service_account_public_key': '', | |||
'_superuser_service_account_public_key_json': calculate__superuser_service_account_public_key_json, | |||
# This is a private variable that is not supposed to be used by cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This" is easy to misunderstand. Instead of "This" let's write the name of the variable in the code comment. Can be done after merge.
'CA init in gen is only supported when using a remote' | ||
' bootstrap node', False), | ||
(lambda: 'exhibitor_bootstrap_ca_url' in config, | ||
'Custom CA url has been specified, not initializing CA', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely, thanks for putting error messages in a central place.
# Exhibitor TLS is required, fail hard | ||
if final_arguments.get('exhibitor_tls_required') == 'true' and hard_failure: | ||
raise ExhibitorTLSBootstrapError(errors=reasons) | ||
print('[{}] not bootstrapping exhibitor CA: {}'.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We should capitalize Exhibitor everywhere in docs and in error messages and in log messages, etc. Can do after merge.
final_arguments['cluster_packages']) + '.tar.xz' | ||
package_path = Path('/artifacts/packages') / PACKAGE_NAME / package_filename | ||
|
||
ca_alternative_names = ['127.0.0.1', 'localhost', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should have a code comment for why each of the values is needed. Can do after merge.
ip = subprocess.check_output( | ||
['/opt/mesosphere/bin/detect_ip']).strip().decode('utf-8') | ||
except subprocess.CalledProcessError as e: | ||
print("check_output exited with {}".format(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: will this really be a useful error message? Will it say something about the program that was executed? Can be tuned after merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I shamelessly copied this from somewhere else in the code base. I meant to reflector it :)
except socket.error as e: | ||
print( | ||
"inet_aton exited with {}. {} is not a valid IPv4 address".format(e, ip)) | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that validation that we need here? Isn't that done elsewhere?
|
||
def test_connection(ca_url): | ||
s = socket.socket() | ||
s.settimeout(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the connect() or the recv() timeout, or both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, does this really set the connect() timeout, which is what we seem to care about here to not hang forever...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're good:
The connect() operation is also subject to the timeout setting, and in general it is recommended to call settimeout() before calling connect()
https://docs.python.org/3/library/socket.html#timeouts-and-the-connect-method
print('could not connect to DC/OS bootstrap CA: {}'.format(e)) | ||
return False | ||
finally: | ||
s.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely.
print('connection to {}:{} successful'.format(host, port)) | ||
return True | ||
except Exception as e: | ||
print('could not connect to DC/OS bootstrap CA: {}'.format(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decent error handling, thanks!
if not test_connection(ca_url): | ||
_fail_and_calculate_status('connection failed') | ||
|
||
print('Bootstrapping exhibitor TLS') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again: let's capitalize.... (can do after merge)
# Clean up any partially written artifacts | ||
_remove_artifact_dir() | ||
_fail_and_calculate_status( | ||
'error generating artifacts code: {} stdout: {} stderr: {}'.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After merge, let's polish:
Error generating Exhibitor TLS artifacts. Exit code: ...
if __name__ == '__main__': | ||
main() | ||
# Always flush stdout buffer when exiting the script | ||
sys.stdout.flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mhm if this has been a problem I recommend starting the Python interperter with the environment variable PYTHONIOBUFFERING set so that std stream buffering gets disabled completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely work, nice code base. Let's get this merged and iterate on it thereafter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks everyone. Let's merge and fix the nits later.
@mesosphere-mergebot changelog-not-required This is covered in the downstream changelog |
Applying the ship-it label manually and merging right away, this is highly unlikely to be in logical conflict with the changes to |
@mesosphere-mergebot label ship-it |
@mesosphere-mergebot merge-it |
Your pull request's branch is not based on the most recent version of |
I thought |
High-level description
Automated lock down of the exhibitor service via mutual TLS.
https://docs.mesosphere.com/1.13/security/ent/tls-ssl/exhibitor/
Corresponding DC/OS tickets (required)
Related tickets (optional)
Checklist for component/package updates:
If you are changing components or packages in DC/OS (e.g. you are bumping the sha or ref of anything underneath
packages
), then in addition to the above please also include: