-
Notifications
You must be signed in to change notification settings - Fork 9
Ignore SSL protocol violation during certificate replacement #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2c7934d to
2c95f18
Compare
plugins/modules/certificate.py
Outdated
| ) | ||
| sleep(2) | ||
| continue | ||
| except errors.ScaleComputingError as ex: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this to client.py in order to avoid string-to-string comparison?
If we catch and raise it in client we can just catch it again inside the module without the string-to-string comparison.
I think the exception is called SSLEOFError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is called that way. But I have no way to test that code will still work after change. Replacing cert in loop, and in parallel accessing /rest/v1/ping shows ConnectionRefused and ConnectionReset, but I never got SSL error.
And this is problem - we refactor client.py to intercept/reraise one extra exception, and side effect might be this fix will no longer work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
- crafted SSL server, just to close TCP/SSL connection in middle.
- or tcpkill
- or - suggest something pls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should just remove the raise Scale error part of the client.
We've had this issue a couple of times now, where we don't exactly know which exception we are trying to handle.
And that part of the code does not seem to add much value ...
For the testing part, would it be too much effort on our part to build and kill an SSL connection?
Maybe we don't even need to setup a server and just close the SSL/VPN connection to the actual cluster as the data is being sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just remove the raise Scale error
Maybe. We would need to review all other except.
to build and kill an SSL connection
Worth to try. But I expect a blind tcpkill will most likely break opened TCP connection before or after SSL handshake, before or after full record (record as in https://msatechnosoft.in/blog/secure-socket-layer-ssl-work-accomplish-task-ssl-services/). And that SSL will not complain - it already got all bytes of previous record, and is waiting on 1st byte of next record.
Test:
# server
# openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days 365 -nodes
openssl s_server -key key.pem -cert cert.pem -accept 44330 -www
# client
while [ 1 ]; do curl -k https://localhost:44330/ 1>out 2>/dev/null; echo RET=$?; done # silent
while [ 1 ]; do curl -k https://localhost:44330/ 1>out ; echo RET=$?; done # verbose
sudo tcpkill -i lo port 44330
I get error curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to localhost:44330 all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to reproduce that EOF occurred in violation of protocol (_ssl.c:997)>:
---
# ansi.yml
- name: test SSL EOF
hosts: all
connection: local
gather_facts: false
tasks:
- uri:
url: https://127.0.0.1:44331
validate_certs: false
# terminal 1
nc -l 127.0.0.1 44331
# terminal 2
ansible-playbook -i localhost, ansi.yml -v
# terminal 1
Ctrl+D or Ctrl+C
# terminal 2
TASK [uri] **********************************************************************************************************
fatal: [localhost]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"}, "changed": false, "elapsed": 2, "msg": "Status code was -1 and not [200]: Request failed: <urlopen error EOF occurred in violation of protocol (_ssl.c:997)>", "redirected": false, "status": -1, "url": "https://127.0.0.1:44331"}
No need to interrupt SSL handshake in middle. Just do not send any byte over wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! 👍
2c95f18 to
ba2b111
Compare
Signed-off-by: Justin Cinkelj <justin.cinkelj@xlab.si>
Signed-off-by: Justin Cinkelj <justin.cinkelj@xlab.si>
It has MIT license, so we can include it into our repo (GPL) Signed-off-by: Justin Cinkelj <justin.cinkelj@xlab.si>
ba2b111 to
748195e
Compare
black and flake8 Signed-off-by: Justin Cinkelj <justin.cinkelj@xlab.si>
Signed-off-by: Justin Cinkelj <justin.cinkelj@xlab.si>
748195e to
d932557
Compare
Signed-off-by: Justin Cinkelj <justin.cinkelj@xlab.si>
sudo is used when testing locally Signed-off-by: Justin Cinkelj <justin.cinkelj@xlab.si>
Signed-off-by: Justin Cinkelj <justin.cinkelj@xlab.si>
19d8d03 to
5c27392
Compare
Signed-off-by: Justin Cinkelj <justin.cinkelj@xlab.si>
|
Refactored exception handling. We expect about 3 different ssl errors. Integ test passed in https://github.com/ScaleComputing/HyperCoreAnsibleCollection/actions/runs/4479886132/jobs/7874449568 |
domendobnikar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you. 👍
@ddemlow had problem replacing certificate, and this happened only it server was access via VPN. Likely VPN resulted in just correct timing to raise unexpected SSL error - in python code we got "EOF occurred in violation of protocol (_ssl.c:997)".
PR catches ScaleComputingError exception, and checks if message had this particular message inside.
I didn't try to refactor the ScaleComputingError to have a dedicated exception - I'm not able to reproduce the problem, so it is hard to test any code modification. For same reason, it is hard to add any integration test for this.
The extra
module.warnwere added, to help debugging if anything similar happens later. I hope warn message is kind enough ("... ignore and continue") - I do not want to make anyone nervous.