Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Current unstableness, the "Offline - Reconnect" issue #91

Closed
mawson94 opened this Issue · 40 comments
@mawson94

I just found out about this site/ made an account today and I love the idea so far but:
After logging out it seems to be impossible to log back in.
The two main problems are:
1.) Log in button is greyed out (seen an issue post for this already) and the word "disconnected" appears next to my username in red.
2.) The message "offline - reconnect" appears at the top of the page and upon clicking "reconnect" the word "offline" is shown briefly until returning to the original message

I signed up using my email and not Facebook if that makes a difference?

@Mnnwbdd

I had this exact same experience.

@lefnire
Owner

@mawson94 a lot of people are chiming in with this issue, I'm going to direct everyone here as the main discussion. First off, the login button being greyed out is a different issue (#89).

Offline - Reconnect. The server is experiencing this (unknown cause, diagnosing) which is causing this. Seems to be highly exacerbated by load balancing issues, because it was quite rare up until now (I had the pleasure of being Reddted this morning). I've upped the dynos, but that's not seeming to fully fix it. I'm going to be working on this all day. Any JavaScript developers out there?

@lefnire
Owner

@bsideballer said

I also noticed that a TON of XHR requests are being sent on page load/task submit when the page loads correctly (read: hundreds of requests). These issues could be related?

I'll keep this in my back pocket as I'm diagnosing the issues. Heroku doesn't support websockets, so Derby is falling back on XHR-polling. I thought that would actually scale better since it doesn't keep open connections (memory pooling), but maybe not - I'm going to be trying to setup an AWS today

This was referenced
@lefnire
Owner

I'm working on a static page for new visitors right no, that should reduce #80

@racuna

Something like your video on kickstarter as a tour for visitors (hosted in youtube) would be nice.

@lefnire
Owner

brilliant, that'll keep them occupied too :)

@lefnire
Owner

Found it. I need to migrate to a different PaaS.

  1. Racer is not multi-process, but this is what Joseph is going to be dedicated to when he starts soon. Using multiple processes is not going to work correctly in lots of ways right now. We should have something workable in a couple of months.
  2. We just switched to XHR-polling for everything, because we were running into some issues with proxying websockets through node-http-proxy. Websockets should have lower overhead and be able to scale up to more active connections, but in practice it is pretty buggy still.
@lefnire
Owner

I'm currently testing app fog, if that fails I'll likely self-host on aws. i'll check out dotcloud too

@ghost

Go with app fog, I have heard nothing but good things about them

@BrianTolman

Set up a aws load balancer and then then put your db on seperate db server then expand the app servers behind it. Problem will solve itself and then you can add/subtract app servers based upon load.

@Many

im agree with @Neohuman u should use AppFog, its pretty nice

@lefnire
Owner

can't get it to launch, getting some funky errors. i'll keep trying

@eduardocruz

Not sure if this can help in any way, but here it goes ... https://github.com/lloyd/node-toobusy

@JamPol

If you just changed something, it worked. HabitRPG just started working really well for me!

@sigboe

JamPol, not feeling the same. Erratic offline/reconnecting. Haven't tried clicking any + or - buttons as I need to do something good first haha. Workout session in a few hours so gonna try then xD

@lefnire
Owner
@SamPearson

I appreciate the hard work and can't wait to continue using the site.

@lefnire
Owner

we don't happen to have any baller sysadmins on this thread do we?

@JamPol

Appreciate the hard work, but unfortunately I'm not a baller sysadmin. Best advice is to post issue on http://www.reddit.com/r/programming/ or http://www.reddit.com/r/gamedev/ they're both filled with hundreds of knowledgeable folk, even at this very moment they're both 200+ users online. Someone should be able to help.

@JamPol

Just wanted to let you know it flip-flopped. Right now it's displaying all the info I entered this morning when I posted that it was working really well and was running smoothly. Then it reverted to my old list for a while which didn't work very well, now it's back to smooth sailing.

Pure speculation, but maybe there's two copies fighting over the server?

@dmuth

I don't know if I'm a baller sysadmin, but I have a fair bit of sysadmin experience and an AWS background. I also do a lot of backend node.js stuff for a living. Feel free to shoot me a private email if you'd like to pick my brain.

@lefnire
Owner

we're almost up on liquid web (@alanthing helping out, he's a ninja!) - more soon

@lefnire
Owner

ok! we are migrated, and should be mostly in the clear. there are still some issues to pan out, follow #80 for the biggest bug specifically.

My suggestion for now - you're good to play, but if you see the "offline" message, refresh before making changes. This will be fixed #80

@JamPol

Thanks for the hard work, so psyched now.

@eduardocruz

FYI -> "Internal Server Error" for the last two hours.

@SamPearson

Still getting "offline - reconnect" frequently; have to make changes and refresh multiple times to get them to stick.

@theogeer

I'm still getting offline - reconnect every 3-5 minutes as well

@lefnire
Owner

Yep, still there. Better than before (trust me), but still there. It's #80. Any node devs out there check it out if you get the chance, that's where I'll be spending my day.

@yeahilift

do you have any estimate of the time until it's fixed? i know a lot of my students/friends are anxious to try this out

@lefnire
Owner

no ETA yet unfortunately, I keep thinking it's just around the corner. I have a lead on the fix, I'm going to try that out - if it works, a few hours. if not, next rabbit hole :) Here's the thing about Derby. unfortunately it's very difficult to test this issue locally, trying to devise a way...

@hrj

The later you fix, the more points you will get
:wink:

@lefnire
Owner

@hrj ooh i need that laugh :D

@lefnire
Owner

Status:
Talked to Brian (Racer dev) about #80 (the king error), he said just up the timeout. I followed this advice to up the amount here. Still kept getting some timeout errors on 1000ms (even though I set it to 7000ms), but less common (good). Maybe that configuration isn't respected somewhere. In the end, we still need a model.update operation to get past this database-DDoSing (locking), as I pointed out here, I think that will be the ultimate fix.

Luckily, we're crashing a lot less frequently. Stay tuned.

@lefnire
Owner

also, special thanks to @Andrew565 - he's been helping a lot, and learned the codebase fast. With any luck we should be stable today. If there are any other JavaScript devs that want to help, please jump on #80

@lefnire
Owner

alright! how we doing? I think we fixed it guys

@theogeer

It's been doing pretty good for me the last couple hours. I'm super excited

@ghost
@lefnire
Owner

@Neohuman is it #89 the greyed out button?

@ghost
@lefnire lefnire referenced this issue
Closed

Can't login #89

@lefnire
Owner

kk, i'm gonna close this ticket but keep a keen eye on it. will re-open if it comes back. i'll start working on #89

@lefnire lefnire closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.