Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ERROR: HTTPConnectionPool(host='127.0.0.1', port=9101)" with gunicorn #106

Closed
arm4b opened this issue Jan 25, 2016 · 12 comments
Closed

"ERROR: HTTPConnectionPool(host='127.0.0.1', port=9101)" with gunicorn #106

arm4b opened this issue Jan 25, 2016 · 12 comments
Assignees
Labels
Milestone

Comments

@arm4b
Copy link
Member

arm4b commented Jan 25, 2016

it7dra4 1

For some still unknown reason there is a ghost bug, appearing rarely when st2api with recent gunicorn change is not reachable by port (service is starting OK).

Actually, we're not even sure if st2auth is reachable at all, because there are no tests.
We have the tests now.

@arm4b
Copy link
Member Author

arm4b commented Jan 27, 2016

Stats:

@arm4b
Copy link
Member Author

arm4b commented Jan 27, 2016

https://circleci.com/gh/StackStorm/st2-packages/291
On el6 we had some logs:

last 20 lines from log file of service st2api
>>>
2016-01-27 15:35:19,160 140432054114048 ERROR log [-] Conflict while trying to save in DB.
Traceback (most recent call last):
  File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/persistence/base.py", line 155, in add_or_update
    model_object = cls._get_impl().add_or_update(model_object)
  File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/models/db/__init__.py", line 174, in add_or_update
    instance.save()
  File "/usr/share/python/st2/lib/python2.7/site-packages/mongoengine/document.py", line 286, in save
    raise NotUniqueError(message % unicode(err))
NotUniqueError: Tried to save duplicate unique keys (E11000 duplicate key error index: st2.trigger_d_b.$pack_1_name_1  dup key: { : "core", : "st2.sensor.process_exit" })
2016-01-27 15:35:19,170 140432054114048 ERROR log [-] Shadow trigger creation of "core.st2.sensor.process_exit" failed with uniqueness conflict.
Traceback (most recent call last):
  File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/triggers.py", line 40, in _register_internal_trigger_type
    trigger_db = create_shadow_trigger(trigger_type_db)
  File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/services/triggers.py", line 253, in create_shadow_trigger
    return create_or_update_trigger_db(trigger)
  File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/services/triggers.py", line 154, in create_or_update_trigger_db
    trigger_db = Trigger.add_or_update(trigger_db)
  File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/persistence/base.py", line 167, in add_or_update
    model_object=model_object)
StackStormDBObjectConflictError: Tried to save duplicate unique keys (E11000 duplicate key error index: st2.trigger_d_b.$pack_1_name_1  dup key: { : "core", : "st2.sensor.process_exit" })

When st2api failed to start. Not sure if it's real reason, probably another ghost bug. Will investigate more.

@arm4b
Copy link
Member Author

arm4b commented Jan 27, 2016

Ok, here: https://circleci.com/gh/armab/st2-packages/442 st2auth is reported to be not listening.
It's wheezy.

Was lucky to catch that via CircleCI SSH debug.

Service st2auth reports that everything is running:

root@06c3f4732296:/var/log# service st2auth status
[ ok ] st2auth is running.

No processes really for st2auth service at all:

root@06c3f4732296:/var/log# ps -aux | grep st2auth
warning: bad ps syntax, perhaps a bogus '-'?
See http://gitorious.org/procps/procps/blobs/master/Documentation/FAQ
root       1459  0.0  0.0   6308   596 ?        S+   18:00   0:00 grep st2auth

/var/log/st2auth.log:

2016-01-27 17:45:23,660 139834305980160 INFO log [-] Connecting to database "st2" @ "172.17.0.1:27017" as user "None".
2016-01-27 17:45:23,666 139834305980160 DEBUG log [-] Ensuring database indexes...
2016-01-27 17:45:23,766 139834305980160 DEBUG log [-] Ensuring indexes for model "UserDB"...
2016-01-27 17:45:23,785 139834305980160 DEBUG log [-] Ensuring indexes for model "TokenDB"...
2016-01-27 17:45:23,787 139834305980160 DEBUG log [-] Ensuring indexes for model "ApiKeyDB"...
2016-01-27 17:45:23,790 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionDB"...
2016-01-27 17:45:23,807 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionExecutionDB"...
2016-01-27 17:45:23,845 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionExecutionStateDB"...
2016-01-27 17:45:23,854 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionAliasDB"...
2016-01-27 17:45:23,863 139834305980160 DEBUG log [-] Ensuring indexes for model "LiveActionDB"...
2016-01-27 17:45:23,878 139834305980160 DEBUG log [-] Ensuring indexes for model "RunnerTypeDB"...
2016-01-27 17:45:23,881 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionAliasDB"...
2016-01-27 17:45:23,882 139834305980160 DEBUG log [-] Ensuring indexes for model "KeyValuePairDB"...
2016-01-27 17:45:23,897 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionExecutionDB"...
2016-01-27 17:45:23,898 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionExecutionStateDB"...
2016-01-27 17:45:23,898 139834305980160 DEBUG log [-] Ensuring indexes for model "LiveActionDB"...
2016-01-27 17:45:23,898 139834305980160 DEBUG log [-] Ensuring indexes for model "PolicyTypeDB"...
2016-01-27 17:45:23,904 139834305980160 DEBUG log [-] Ensuring indexes for model "PolicyDB"...
2016-01-27 17:45:23,909 139834305980160 DEBUG log [-] Ensuring indexes for model "RuleDB"...
2016-01-27 17:45:23,924 139834305980160 DEBUG log [-] Ensuring indexes for model "RunnerTypeDB"...
2016-01-27 17:45:23,925 139834305980160 DEBUG log [-] Ensuring indexes for model "SensorTypeDB"...
2016-01-27 17:45:23,934 139834305980160 DEBUG log [-] Ensuring indexes for model "TriggerTypeDB"...
2016-01-27 17:45:23,953 139834305980160 DEBUG log [-] Ensuring indexes for model "TriggerDB"...
2016-01-27 17:45:23,954 139834305980160 DEBUG log [-] Ensuring indexes for model "TriggerInstanceDB"...
2016-01-27 17:45:23,956 139834305980160 DEBUG log [-] Registering exchanges...
2016-01-27 17:45:23,984 139834305980160 DEBUG connection [-] Start from server, version: 0.9, properties: {u'information': u'Licensed under the MPL.  See http://www.rabbitmq.com/', u'product': u'RabbitMQ', u'copyright': u'Copyright
2016-01-27 17:45:23,986 139834305980160 DEBUG connection [-] Open OK!
2016-01-27 17:45:23,986 139834305980160 DEBUG channel [-] using channel_id: 1
2016-01-27 17:45:23,989 139834305980160 DEBUG channel [-] Channel open
2016-01-27 17:45:23,990 139834305980160 DEBUG log [-] registered exchange st2.execution.
2016-01-27 17:45:23,992 139834305980160 DEBUG log [-] registered exchange st2.liveaction.
2016-01-27 17:45:23,994 139834305980160 DEBUG log [-] registered exchange st2.trigger.
2016-01-27 17:45:23,996 139834305980160 DEBUG log [-] registered exchange st2.trigger_instances_dispatch.
2016-01-27 17:45:23,997 139834305980160 DEBUG log [-] registered exchange st2.sensor.
2016-01-27 17:45:24,006 139834305980160 DEBUG channel [-] Closed channel #1
2016-01-27 17:45:24,007 139834305980160 DEBUG log [-] Registering internal trigger: st2.generic.actiontrigger
2016-01-27 17:45:24,020 139834305980160 DEBUG log [-] verified trigger and formulated TriggerDB=TriggerTypeDB(description="Trigger encapsulating the completion of an action execution.", id=None, name="st2.generic.actiontrigger", pac
2016-01-27 17:45:24,027 139834305980160 INFO log [-] Registered trigger: st2.generic.actiontrigger.
2016-01-27 17:45:24,128 139834305980160 ERROR log [-] Conflict while trying to save in DB.
Traceback (most recent call last):
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/persistence/base.py", line 155, in add_or_update
    model_object = cls._get_impl().add_or_update(model_object)
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/models/db/__init__.py", line 174, in add_or_update
    instance.save()
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/mongoengine/document.py", line 286, in save
    raise NotUniqueError(message % unicode(err))
NotUniqueError: Tried to save duplicate unique keys (E11000 duplicate key error index: st2.trigger_d_b.$pack_1_name_1  dup key: { : "core", : "st2.generic.actiontrigger" })
2016-01-27 17:45:24,131 139834305980160 ERROR log [-] Shadow trigger creation of "core.st2.generic.actiontrigger" failed with uniqueness conflict.
Traceback (most recent call last):
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/triggers.py", line 40, in _register_internal_trigger_type
    trigger_db = create_shadow_trigger(trigger_type_db)
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/services/triggers.py", line 253, in create_shadow_trigger
    return create_or_update_trigger_db(trigger)
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/services/triggers.py", line 154, in create_or_update_trigger_db
    trigger_db = Trigger.add_or_update(trigger_db)
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/persistence/base.py", line 167, in add_or_update
    model_object=model_object)
StackStormDBObjectConflictError: Tried to save duplicate unique keys (E11000 duplicate key error index: st2.trigger_d_b.$pack_1_name_1  dup key: { : "core", : "st2.generic.actiontrigger" })

Some error in logs, it happens. So st2auth died on this.
It's interesting why service st2auth is reported to be running.
And why it couldn't restart st2auth + gunicorn by startup service.


Now I know, will test locally by killing manually those pink unicorns and see if they can still revive.

@arm4b
Copy link
Member Author

arm4b commented Jan 27, 2016

Yeah, so those startup scripts need adjustments too. If child process died, service is considered as OK (running) and no restart/revive for dead (what is wrong).

$ service st2api status
[ ok ] st2api is running.

$ ps -aux | grep st2api
st2        1691  0.6  0.0 121416 40868 ?        S    19:56   0:00 /usr/share/python/st2/bin/python /usr/share/python/st2/bin/gunicorn_pecan /usr/share/python/st2/lib/python2.7/site-packages/st2api/gunicorn_config.py -k eventlet -b 127.0.0.1:9101 --threads 10 --workers 1
st2        1715  0.3  0.0 138244 45080 ?        S    19:56   0:00 /usr/share/python/st2/bin/python /usr/share/python/st2/bin/gunicorn_pecan /usr/share/python/st2/lib/python2.7/site-packages/st2api/gunicorn_config.py -k eventlet -b 127.0.0.1:9101 --threads 10 --workers 1
root       1775  0.0  0.0  11112   712 ?        S+   19:57   0:00 grep st2api

$ kill 1691

# no processes running for service
$ ps -aux | grep st2api

# service status is OK
$ service st2api status
[ ok ] st2api is running.

Will look tomorrow.

@dennybaa
Copy link
Contributor

I confirm same error on local test runs https://gist.github.com/16950b151fc4f85a85ad

StackStormDBObjectConflictError: Tried to save duplicate unique keys (E11000 duplicate key error collection: st2.trigger_d_b index: pack_1_name_1 dup key: { : "core", : "st2.key_value_pair.value_change" })

@arm4b arm4b assigned dennybaa and unassigned arm4b Jan 28, 2016
@arm4b
Copy link
Member Author

arm4b commented Jan 28, 2016

FYI it's good to know also that after service restart DB error is gone, so service starts normally at the 2nd run (probably because DB error appears randomly, don't really know).

It's clear then why we had this error under the wheezy, jessie, and never had it for trusty.
Because trusty uses upstart with respawn which restarts process on hang down automatically.

@lakshmi-kannan
Copy link
Contributor

StackStorm/st2#2435 Let's see if that does it.

@lakshmi-kannan
Copy link
Contributor

We (@lakshmi-kannan and @enykeev) couldn't repro this in last few builds. So we shall call this done until we can repro again.

@arm4b
Copy link
Member Author

arm4b commented Jan 29, 2016

Yesterday @dennybaa could reproduce it not on CircleCI, but locally and had the same DB error, so it's not infra problem.
The bug is still there.

We can't close it, because it fails the builds. We can theoretically revert all gunicorn changes related to st2auth and st2api just to see if problem still persist (as part of investigation).
It started to appear only after moving to gunicorn.


Answering questions from Slack. DB is clean every time you start CircleCI build, everything is fresh.

I'll implement #104 later that will help us to investigate this problem better.

@arm4b
Copy link
Member Author

arm4b commented Jan 29, 2016

Debugging help

@dennybaa @lakshmi-kannan I've implemented #104, now you can see logs via CircleCI Artifacts:
dvmzm2q 1

That helped to further debug our problem. You can find Artifacts section in CircleCI UI just near the Debug SSH button.

The upstream st2 fix

@lakshmi-kannan @manasdk https://github.com/StackStorm/st2/pull/2435/files I can confirm that fix localized the problem but we still have root issue in there.

Because of that st2 upstream fix, process is not dying anymore. See log (it's from CircleCI artifacts):

So the process never dies (service is running), but it reloads when that DB error is catched.

Questions

It's good that we now have green builds and services are running. But, to clarify what we should do else here, a few questions:

  • Is that OK for API service to be reloaded every few cycles?
  • Are we going to leave it as is and close the issue?
  • Are we going to investigate it further?

Just FYI: it appeared only after moving to gunicorn.
Did you have that error before? Make sure you don't get confused with different DEBUG errors that always were here like: dup key: { : "admin" }).

@arm4b
Copy link
Member Author

arm4b commented Feb 1, 2016

Upstream st2 fix: StackStorm/st2#2437 in WIP

@dennybaa dennybaa added this to the v1.3 milestone Feb 2, 2016
@arm4b
Copy link
Member Author

arm4b commented Feb 3, 2016

Closed by StackStorm/st2#2437 gunicorn fix, which is backported into 1.3 st2 version.

@enykeev FYI also, it would be interesting that bug we experienced was upstream st2 bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants