Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a new session on 404 when refreshing (closes #5) #27

Merged
merged 1 commit into from
Sep 17, 2016

Conversation

berardino
Copy link
Contributor

@juanjovazquez

This is the changes that I have implemented so far (including also the configuration changes).
And this setup seems to holding up much better. I have had a three nodes cluster running on AWS
for a day without experiencing any downtime.

The consul agent can report a false negative serfHealth status.
When that happens the consul server removes the faulty member.
This has two consequences :
1 - all the sessions associated with the node are deleted
2 - the creation of a new session might still fail with
500,Internal Server Error, Check 'serfHealth' is in critical state

This commit is an attempt to be more resilient in this cases and keep
trying to create a new session. If it succeded the system will
keep working as if nothing happened, otherwise after the max num of retry
the constructr machine will terminate the akka system.

The default constructr machines configuration don't give consul enough
time to recover under these circumstances. I suggest to change
the configuration for production env like to something like :

coordination-timeout = 10 seconds
nr-of-retries = 10
refresh-interval = 60 seconds
retry-delay = 10 seconds
ttl-factor = 5.0

The consul agent can report a false negative serfHealth status.
When that happens the consul server removes the faulty member.
This has two consequences :
1 - all the sessions associated with the node are deleted
2 - the creation of a new session might still fail with
500,Internal Server Error, Check 'serfHealth' is in critical state

This commit is an attempt to be more resilient in this cases and keep
trying to create a new session. If it succeded the system will
keep working as if nothing happened, otherwise after the max num of retry
the constructr machine will terminate the akka system.

The default constructr machines configuration don't give consul enough
time to recover under these circumstances. I suggest to change
the configuration for production env like to something like :

  coordination-timeout = 10 seconds
  nr-of-retries        = 10
  refresh-interval     = 60 seconds
  retry-delay          = 10 seconds
  ttl-factor           = 5.0
@berardino
Copy link
Contributor Author

@gerson24 I have rebased my PR on master

@juanjovazquez juanjovazquez merged commit 0cb7d4c into Tecsisa:master Sep 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants