Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

19.0.0, jobs fail with: Socket path does not exist: /var/run/receptor/receptor.sock #9967

Closed
gmtime opened this issue Apr 21, 2021 · 9 comments

Comments

@gmtime
Copy link

gmtime commented Apr 21, 2021

ISSUE TYPE
  • Bug Report
SUMMARY

After upgrading to 19.0.0 jobs consistently fail with "Socket path does not exist: /var/run/receptor/receptor.sock"

ENVIRONMENT
  • AWX version: 19.0.0
  • AWX install method: docker on linux
  • Operating System: Any
  • Web Browser: Any
STEPS TO REPRODUCE
  1. Go to Resources->Projects
  2. Go to Demo Project
  3. Click on Sync
EXPECTED RESULTS

Job should execute and Demo Project should sync correctly

ACTUAL RESULTS

Traceback (most recent call last): File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 1397, in run res = receptor_job.run() File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 2957, in run return self._run_internal(receptor_ctl) File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 2973, in _run_internal result = receptor_ctl.submit_work(worktype=self.work_type, payload=sockout.makefile('rb'), params=self.receptor_params) File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/receptorctl/socket_interface.py", line 144, in submit_work self.connect() File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/receptorctl/socket_interface.py", line 86, in connect raise ValueError(f"Socket path does not exist: {path}") ValueError: Socket path does not exist: /var/run/receptor/receptor.sock

ADDITIONAL INFORMATION
@shanemcd
Copy link
Member

Which instructions did you follow to deploy AWX?

@oyoyo14
Copy link

oyoyo14 commented Apr 23, 2021

Hi there,

I think I have the same issue.

SUMMARY

Error with awx 19.0.0, when changing the tower_ee_images.

ENVIRONMENT

  • AWX version: 19.0.0
  • Installation method: awx-operator with custom awx-ee (quay.io/ansible/awx-ee:0.2.0)
  • Operator System: N/A
  • Web brower: N/A

STEPS TO REPRODUCE

  • Install awx with the awx-operator, with following values:
 tower_ee_images:
  - image: quay.io/ansible/awx-ee:0.2.0
    name: AWX EE 0.2.0
  • Check that synchronize the Demo Project fails
  • Re-install without the tower_ee_images variable
  • Check that synchronize the Demo Project success

EXPECTED RESULTS

Can use tower_ee_images for custom awx-ee

ADDITIONAL INFORMATION

When synchronizing Demo Project, the job return in error with message:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 1397, in run
    res = receptor_job.run()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 2957, in run
    return self._run_internal(receptor_ctl)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 3008, in _run_internal
    raise RuntimeError(detail)
RuntimeError: exit status 0

In the container awx-ee, we get the following errors:

ERROR 2021/04/23 08:28:14 Read error in control service: read unix /var/run/receptor/receptor.sock->@: use of closed network connection
INFO 2021/04/23 08:28:14 Client disconnected from control service
ERROR 2021/04/23 08:28:14 Error closing connection: close unix /var/run/receptor/receptor.sock->@: use of closed network connection

AWX is installed with the awx-operator.
manifest file:

apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
spec:
  tower_admin_user: XXXXX
  tower_admin_password_secret: XXXXX
  tower_postgres_configuration_secret: awx-postgres-configuration
  tower_ee_images:
  - image: quay.io/ansible/awx-ee:0.2.0
    name: AWX EE 0.2.0
  tower_hostname: awx.example.com
  tower_ingress_type: Ingress

But when the installation is done without the tower_ee_images (so the awx-ee is quay.io/ansible/awx-ee:0.2.0) the Demo Project is correctly synchronized.

Hope it can help,
thanks

@oyoyo14
Copy link

oyoyo14 commented Apr 23, 2021

After further tests, the error log seems to not be part of the issue: I get it with the awx-ee:0.1.1 whereas jobs work.

@ilijamt
Copy link
Contributor

ilijamt commented Apr 24, 2021

I have the same issue, after adding

  tower_ee_images:
    - name: AWX EE 0.2.0
      image: quay.io/ansible/awx-ee:0.2.0

Nothing will work again in the system. In my UI I can see both EE.

After which the awx-task logs has errors like.

Traceback (most recent call last):
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/__init__.py", line 147, in manage
    prepare_env()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/__init__.py", line 102, in prepare_env
    if not settings.DEBUG:  # pragma: no cover
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/conf/__init__.py", line 79, in __getattr__
    self._setup(name)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/conf/__init__.py", line 66, in _setup
    self._wrapped = Settings(settings_module)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/conf/__init__.py", line 157, in __init__
    mod = importlib.import_module(self.SETTINGS_MODULE)
  File "/usr/lib64/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/settings/production.py", line 65, in <module>
    include(settings_file, optional(settings_files), scope=locals())
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/split_settings/tools.py", line 104, in include
    compiled_code = compile(  # noqa: WPS421
  File "/etc/tower/conf.d/execution_environments.py", line 3
    {'name': 'AWX EE 0.2.0' , 'image': 'quay.io/ansible/awx-ee:0.2.0'}
    ^
SyntaxError: invalid syntax

In awx-web

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/bin/daphne", line 8, in <module>
    sys.exit(CommandLineInterface.entrypoint())
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/daphne/cli.py", line 191, in entrypoint
    cls().run(sys.argv[1:])
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/daphne/cli.py", line 252, in run
    application = import_by_path(args.application)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/daphne/utils.py", line 12, in import_by_path
    target = importlib.import_module(module_path)
  File "/usr/lib64/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/asgi.py", line 12, in <module>
    prepare_env()  # NOQA
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/__init__.py", line 102, in prepare_env
    if not settings.DEBUG:  # pragma: no cover
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/conf/__init__.py", line 79, in __getattr__
    self._setup(name)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/conf/__init__.py", line 66, in _setup
    self._wrapped = Settings(settings_module)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/conf/__init__.py", line 157, in __init__
    mod = importlib.import_module(self.SETTINGS_MODULE)
  File "/usr/lib64/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/settings/production.py", line 65, in <module>
    include(settings_file, optional(settings_files), scope=locals())
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/split_settings/tools.py", line 104, in include
    compiled_code = compile(  # noqa: WPS421
  File "/etc/tower/conf.d/execution_environments.py", line 3
    {'name': 'AWX EE 0.2.0' , 'image': 'quay.io/ansible/awx-ee:0.2.0'}
    ^

This is what I found in /etc/tower/conf.d/execution_environments.py

DEFAULT_EXECUTION_ENVIRONMENTS = [
    {'name': 'AWX EE 0.1.1' , 'image': 'quay.io/ansible/awx-ee:0.1.1'}
    {'name': 'AWX EE 0.2.0' , 'image': 'quay.io/ansible/awx-ee:0.2.0'}
]

The issue is that the execution file is generated wrong it's missing a comma to separate the values

If you have ssh acess to the nodes, find on which node the awx-operator is working and you can temporary fix it with.

docker exec -u 0 -it k8s_awx-operator_awx-operator-f768499d-s97hr_default_c893fdaa-b40c-4600-924a-01e6c37996cb_0 /bin/bash
cat <<EOF > /opt/ansible/roles/installer/templates/execution_environments.py.j2
DEFAULT_EXECUTION_ENVIRONMENTS = [
{% for item in tower_ee_images %}
    {'name': '{{ item.name }}' , 'image': '{{ item.image }}'},
{% endfor %}
]
EOF

After which you can just delete the pods so the new configuration is reloaded, don't forget to edit the secret and update it or rerun the operator.

kubectl -n awx delete pod -l app.kubernetes.io/component=awx


After which all the jobs work again

@AlanCoding
Copy link
Member

In the updated images, this should be at /var/run/awx-receptor/receptor.sock instead of /var/run/receptor/receptor.sock.

I'm attempting to track down a similar issue, but I'm looking for reports of the "Socket path does not exist" error specifically from the most recent AWX release.

@pru-anixe
Copy link

@AlanCoding so here it is from latest release

ERROR 2021/11/16 06:31:31 Read error in control service: read unix /var/run/receptor/receptor.sock->@: use of closed network connection
INFO 2021/11/16 06:31:31 Client disconnected from control service
ERROR 2021/11/16 06:31:31 Error closing connection: close unix /var/run/receptor/receptor.sock->@: use of closed network connection

@roxsross
Copy link

roxsross commented Dec 6, 2021

@AlanCoding, aquí está la última versión

ERROR 2021/11/16 06:31:31 Read error in control service: read unix /var/run/receptor/receptor.sock->@: use of closed network connection
INFO 2021/11/16 06:31:31 Client disconnected from control service
ERROR 2021/11/16 06:31:31 Error closing connection: close unix /var/run/receptor/receptor.sock->@: use of closed network connection

hello , they still have that error?

@AlanCoding
Copy link
Member

I wonder if this is simple problem, where we need to pre-create the directory, because this line in the Dockerfile is creating a directory different from what the awx-operator uses.

@hesmithrh
Copy link

hesmithrh commented Jun 10, 2022

Hi!

Thank you very much for your submission to AWX. It means a lot to us that you have taken time to contribute.

On this issue, changes were requested but it has been some time since then. At this time we are closing your issue. If you get time to reproduce or revisit you are welcome to open another issue or we can reopen this issue upon request if you contact us by using any of the communication methods listed in the page below:

https://github.com/ansible/awx/#get-involved
Thank you once again for this and your interest in AWX!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants