Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tower-processes:dispatcher continuously crashes with psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "main_instance_ip_address_key #6750

Closed
ilijamt opened this issue Apr 19, 2020 · 3 comments

Comments

@ilijamt
Copy link
Contributor

ilijamt commented Apr 19, 2020

ISSUE TYPE
  • Bug Report
SUMMARY

process in awx-task container continually crashes with

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "main_instance_ip_address_key"
DETAIL:  Key (ip_address)=(10.42.240.2) already exists.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 152, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 55, in handle
    reaper.reap()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/reaper.py", line 38, in reap
    (changed, me) = Instance.objects.get_or_register()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 142, in get_or_register
    return self.register(ip_address=pod_ip)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 129, in register
    instance.save(update_fields=['ip_address'])
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/models/ha.py", line 40, in save
    super(BaseModel, self).save(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/base.py", line 741, in save
    force_update=force_update, update_fields=update_fields)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/base.py", line 779, in save_base
    force_update, using, update_fields,
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/base.py", line 851, in _save_table
    forced_update)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/base.py", line 900, in _do_update
    return filtered._update(values) > 0
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 760, in _update
    return query.get_compiler(self.db).execute_sql(CURSOR)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1469, in execute_sql
    cursor = super().execute_sql(result_type)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1140, in execute_sql
    cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.IntegrityError: duplicate key value violates unique constraint "main_instance_ip_address_key"
DETAIL:  Key (ip_address)=(10.42.240.2) already exists.

Logging into the container and running

bash-4.4$ supervisorctl -c /supervisor_task.conf status all
awx-config-watcher                  RUNNING   pid 99, uptime 0:15:46
tower-processes:callback-receiver   RUNNING   pid 101, uptime 0:15:46
tower-processes:dispatcher          STARTING  

The tower-process:dispatcher continually crashes with the error above

ENVIRONMENT
  • AWX version: 11.0.0 (upgrade from 10.0.0)
  • AWX install method: k8s
Additional information
awx=> select * from main_instance;
 id |                 uuid                 |       hostname       |            created            |           modified            | capacity | version | last_isolated_check | capacity_adjustment | cpu | memory | cpu_capacity | mem_capacity | enabled | managed_by_policy | ip_address  
----+--------------------------------------+----------------------+-------------------------------+-------------------------------+----------+---------+---------------------+---------------------+-----+--------+--------------+--------------+---------+-------------------+-------------
  4 | 0a2640f6-7eee-4253-a609-fe37881229af | awx-7586cffcfb-76g4t | 2020-04-19 17:44:02.996354+00 | 2020-04-19 17:45:07.938583+00 |       20 | 11.0.0  |                     |                1.00 |   0 |      0 |            6 |           20 | t       | t                 | 10.42.128.1
  5 | fca90a8c-7bba-49d7-a123-37e298ce6aa2 | awx-7586cffcfb-w84hg | 2020-04-19 17:45:50.45218+00  | 2020-04-19 17:45:55.683659+00 |       20 | 11.0.0  |                     |                1.00 |   0 |      0 |            6 |           20 | t       | t                 | 10.42.240.2
  6 | a187ea32-95a7-4c7e-baa2-d952bf169482 | awx-7586cffcfb-dqdnz | 2020-04-19 17:46:29.799294+00 | 2020-04-19 17:46:29.799321+00 |        0 |         |                     |                1.00 |   0 |      0 |            0 |            0 | t       | t                 | 
  7 | af573bc6-6216-484c-b935-5147ffaee13d | awx-7586cffcfb-xjhzx | 2020-04-19 19:14:45.289215+00 | 2020-04-19 19:14:45.289241+00 |        0 |         |                     |                1.00 |   0 |      0 |            0 |            0 | t       | t                 | 
  8 | f74638fb-340e-46d0-bbf2-26e7910c5b5a | awx-7586cffcfb-zdnm9 | 2020-04-19 19:15:29.573238+00 | 2020-04-19 19:15:29.573303+00 |        0 |         |                     |                1.00 |   0 |      0 |            0 |            0 | t       | t                 | 
awx=> select * from main_instancegroup_instances;
 id | instancegroup_id | instance_id 
----+------------------+-------------
  4 |                1 |           4
  5 |                1 |           5
(2 rows)

Running

DELETE FROM main_instancegroup_instances WHERE id=5;
DELETE FROM main_instance where id = 5;

Allows the process to start and then AWX is responsive.

@ryanpetrello
Copy link
Contributor

ryanpetrello commented Apr 20, 2020

cc @shanemcd @chrismeyersfsu

(looks like IPs get reused when pods spin down and back up. Maybe if we find a conflict in the .register() method, we should assume the prior instance is gone, since obviously they can't both have the IP assigned; I expect this is probably just a race between pod spin-up and our deprovisioning grace period)

@kdelee
Copy link
Member

kdelee commented Apr 29, 2020

Is this applicable to an Openshift install as well as awx on k8s?

@kdelee
Copy link
Member

kdelee commented May 1, 2020

After discussion w/ devs, I'm going to say "good 'nuf" because it would require deleting the node itself that a pod is running on, which is not something I want to do to our openshift. thanks @fosterseth

@kdelee kdelee closed this as completed May 1, 2020
@kdelee kdelee self-assigned this May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants