Skip to content

Server Outage Journal

Danny B edited this page Apr 15, 2017 · 7 revisions

See our public pingdom page courtesy of Chesti

Every now and then I have to restart ZKLS or Springie because it stops working for one reason or another. To get a better picture of how stable the server is and what usually causes outages, I've created this journal.

Please add a line here for every unplanned server outage we have with the date, duration and a short description with known or tentative causes.

-- ikinz

2015-05-30 (2 hours)

The infamous black hole incident, http://zero-k.info/Forum/Post/129897.

2015-06-06 (30 minutes)

A repeat of the black hole, http://zero-k.info/Forum/Post/130408#130408.

2015-04-16 (4 hours)

Apparently windows update, http://zero-k.info/Forum/Thread/14061

2015-12-3 (1 hour)

ZKLS went down, worked after site restart. http://zero-k.info/Forum/Thread/20689

2015-12-14 (30 minutes)

Springie died, had to be restarted. http://zero-k.info/Forum/Thread/20736

2015-12-26 (intermittent failures throughout the day)

A faulty change was pushed to live that spammed constant errors into the log causing the db to grow uncontrollably and take up all available HDD space.

2016-02-15 (20 minutes)

Springie crashed with an OutOfMemory exception.

2016-03-05 (about 5 hours)

The whole machine went down and wasn't responding to RDP. Eventually it came back up again by itself.

2016-03-13 (about 1.5 hours)

ZKLS stopped giving out the initial info on login, instead it would slowly send 3-5 User {} messages and then close the connection. Restarting site fixed it. http://zero-k.info/Forum/Thread/22166

2016-03-28 (20 minutes)

Windows update. http://zero-k.info/Forum/Thread/22218

2016-05-07 (2 hours, then another 10 minutes)

http://zero-k.info/Forum/Thread/22412

2016-09-14 (2 hours)