Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken micro services after deployment in K8 #11378

Closed
todor-ivanov opened this issue Nov 29, 2022 · 11 comments · Fixed by dmwm/deployment#1222
Closed

Broken micro services after deployment in K8 #11378

todor-ivanov opened this issue Nov 29, 2022 · 11 comments · Fixed by dmwm/deployment#1222
Assignees

Comments

@todor-ivanov
Copy link
Contributor

Impact of the bug
All micro services

Describe the bug
During the validation process for HG2212b with WMCore v2.1.5rc2 and MicorServices v1.1.5.rc2 We figured out that all the micro services are failing to initialize with the following error [1]. It could be due to the recent splitting of all the various micro services in separate packages... but this is just a speculation at this stage.

How to reproduce it
Just deploy the above tag and try to run the microservices in Kuberenetes.

Expected behavior
A clear and concise description of what you expected to happen.

Additional context and error message
[1]

[29/Nov/2022:18:26:08]  MicroService REST configuration subset:
data.manager = 'WMCore.MicroService.MSManager.MSManager'
data.rucioAuthUrl = 'https://cms-rucio-auth.cern.ch'
data.reqmgr2Url = 'https://cmsweb-testbed.cern.ch/reqmgr2'
data.services = ['monitor']
data.object = 'WMCore.MicroService.Service.RestApiHub.RestApiHub'
data.interval = 600
data.rucioAccount = 'wmcore_transferor'
data.rucioUrl = 'http://cms-rucio.cern.ch'
data.couch_wmstats_db = 'wmstats'
data.couch_host = 'https://cmsweb-testbed.cern.ch/couchdb'
data.enableStatusTransition = True
data.verbose = True

ERROR initializing MicroService REST module.
Traceback (most recent call last):
  File "/data/srv/HG2212b/sw/slc7_amd64_gcc630/cms/reqmgr2ms/1.1.5.rc2/lib/python3.8/site-packages/WMCore/MicroService/Service/Data.py", line 57, in __init__
    module = importlib.import_module('.'.join(arr[:-1]))
  File "/data/srv/HG2212b/sw/slc7_amd64_gcc630/external/python3/3.8.2-comp/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'WMCore.MicroService.MSManager'

@amaltaro
Copy link
Contributor

@todor-ivanov please make sure to highlight that this affects only the recently deployed services in TESTBED.
Given that it's testbed, I don't think it deserves the "Highest priority" label. But yes, we need to fix it as soon as possible (<24h).

You might want to delete text provided by the GH template as well, even if you do not have anything else to replace it by (e.g. "Expected behavior").

@amaltaro
Copy link
Contributor

I don't know to which microservice the log snippet above - in your original post - belongs to, but here is the problem:
https://gitlab.cern.ch/cmsweb-k8s/services_config/-/blob/preprod/reqmgr2ms-transferor/config-transferor.py#L67

In plain english, for some reason, microservices are not using the correct configuration file (which is supposed to have some recent changes made by Erik). Configuration files seem to be Okay in the preprod branch...

@todor-ivanov
Copy link
Contributor Author

Hi @amaltaro The highest priority label was a minor blunder I fixed it like a minute after I've created the ticket.

but here is the problem:

Thanks for noticing that actually. To me this line looks good in both cases - prod and preprod (at least from what I see in the services_config repostory).

In both cases the correct MSManager (from its new location has been exported:

data.manager = 'WMCore.MicroService.MSCore.MSManager.MSManager'

But indeed what is inside the Pod at K8 is something completely different:

data.manager = 'WMCore.MicroService.MSManager.MSManager'

@muhammadimranfarooqi can you please take a quick look, why are the micro services configured with a old configuration files and also to merge the rest of the already prepared MRs for this deployment. I have pasted a short list in the gitlab request. Thank you in advance!

@muhammadimranfarooqi
Copy link

Hi @amaltaro @todor-ivanov

I tried to redeploy those services configuration and all microservices still have errors.

@muhammadimranfarooqi
Copy link

Following errror is common in all services logs

    raise RuntimeError(
RuntimeError: You are linking against OpenSSL 1.0.2, which is no longer supported by the OpenSSL project. To use this version of cryptography you need to upgrade to a newer version of OpenSSL. For this version only you can also set the environment variable CRYPTOGRAPHY_ALLOW_OPENSSL_102 to allow OpenSSL 1.0.2.

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Nov 30, 2022

Hi @muhammadimranfarooqi @amaltaro ,
I think I have found the problem, but the solution I can think of is not the best one.

Even though I cannot reproduce it in my VM, I am now pretty sure this [1] upgrade of the rucio-clients rpm package brings in also a hard requirement for the obsolete openssl libraary 1.0.2. And the chain how it happens is the following: rucio-clients v1.29.10 depends on cryptography v3.2.1 and cryptography uses openssl as a backend. (Which is also confirmed from the log messages few lines above, where the backend library is tried: [2])

And here is what they say about using this ssl library version as a backend: [3].

As of why this fails with the RPM based installation but not with the pypi based deployments, I think because in this one we do link to the OS openssl, and since it comes with the base image's OS version - cc7 [4] we always end up with the obsolete version. And checking it in another K8 pod with a container, which is not breaking because is not explicitly importing rucio-clients, I can confirm the library version in question is the one in use:

[_reqmgr2@reqmgr2-85568b67fd-9r7p9 data]$ openssl version 
OpenSSL 1.0.2k-fips  26 Jan 2017

So one way to proceed here would be:

  • As a temporary workaround to set the proper environment variable through run.sh as follows:
export CRYPTOGRAPHY_ALLOW_OPENSSL_102=1

@muhammadimranfarooqi @arooshap, can you please give it a try manually in one of the containers

  • The other option would be to downgrade the rucio-clients rpm package and all its dependencies, then rebuild && redeploy.

[1]
cms-sw/cmsdist@bc19b20

[2]

  File "/data/srv/HG2212b/sw/slc7_amd64_gcc630/external/py3-cryptography/3.2.1-comp2/lib/python3.8/site-packages/cryptography/hazmat/backends/openssl/__init__.py", line 7, in <module>
    from cryptography.hazmat.backends.openssl.backend import backend
  File "/data/srv/HG2212b/sw/slc7_amd64_gcc630/external/py3-cryptography/3.2.1-comp2/lib/python3.8/site-packages/cryptography/hazmat/backends/openssl/backend.py", line 117, in <module>
    from cryptography.hazmat.bindings.openssl import binding
  File "/data/srv/HG2212b/sw/slc7_amd64_gcc630/external/py3-cryptography/3.2.1-comp2/lib/python3.8/site-packages/cryptography/hazmat/bindings/openssl/binding.py", line 222, in <module>
    _verify_openssl_version(Binding.lib)
  File "/data/srv/HG2212b/sw/slc7_amd64_gcc630/external/py3-cryptography/3.2.1-comp2/lib/python3.8/site-packages/cryptography/hazmat/bindings/openssl/binding.py", line 182, in _verify_openssl_version
    raise RuntimeError(
RuntimeError: You are linking against OpenSSL 1.0.2, which is no longer supported by the OpenSSL project. To use this version of cryptography you need to upgrade to a newer version of OpenSSL. For this version only you can also set the environment variable CRYPTOGRAPHY_ALLOW_OPENSSL_102 to allow OpenSSL 1.0.2.

[3]
https://cryptography.io/en/3.2/faq/#importing-cryptography-causes-a-runtimeerror-about-openssl-1-0-2

[4]
https://github.com/dmwm/CMSKubernetes/blob/05922fd0b039d505f70d133f6eae0e20fb6fe651/docker/cmsweb/Dockerfile#L5

@todor-ivanov
Copy link
Contributor Author

Hi @belforte @mapellidario ,

Tagging you here, even though the issues discussed are actually a longer chain, but the bit you might be interested with or affected somehow would be the obsolete OpenSSL version, completely explained in my previous comment here [1]. This is supposed to be a known one but we just stumbled on a hard dependency we did not expect. Hope this helps you avoid it. If you are using a newer OpenSSL version or a Docker image with a newer OS then you shouldn't be affected, but just in case...

[1]
#11378 (comment)

@mapellidario
Copy link
Member

Thanks @todor-ivanov , in TW/Publisher we noticed around one week ago, "discussed" it here dmwm/CRABServer#7475, implemented a temporary workaround and forgot about it :)

In crabserver rest we noticed when we migrated to py3 and put a workaround here

So, thanks for the heads up, while we simply thought it was our problem only :)

@todor-ivanov
Copy link
Contributor Author

👍

@todor-ivanov
Copy link
Contributor Author

And we just noticed GlobalWorkQueu is suffering the same issue as well.
Here [1] is the fix for it.

[1]
dmwm/deployment#1223

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Dec 7, 2022

This one obviously did not get automatically closed by merging the relevant PRs at the deployment repository [1]. I am closing it with the current comment.

[1]
dmwm/deployment#1222
dmwm/deployment#1223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants