Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex queries on registeration too expensive (was: Registration issues/Unable to access website (503 errors)) #4737

Closed
kimboslice44 opened this issue Feb 23, 2015 · 25 comments

Comments

@kimboslice44
Copy link

Edit from Repo Admin: There is no need to add new comments to this issue. If you would like to receive email updates about the progress of the issue, please press the subscribe button on the right.

What to do if you died due to this outage:

On the habitrpg.com website, go to Settings -> Site and click on the Fix Character Values button. You can then increase your level by 1, increase your Experience and Gold to approximately what they used to be, and also add some extra Gold to buy back the equipment piece that you lost.

If your Dailies have lost streaks, you can edit each Daily and reset the streak in the Advanced Options. If you are not sure what the streaks were, you can go to Data -> Data Display Tool and use the "Dailies History" section to estimate them. Ask in the Newbies guild if you need help with that (Help -> Ask a Question)!


I can't register for this site. I can click on play, and then register and fill in all the boxes, but when I hit submit, it waits for a while and then gives me the 503 error. This also happens when I try to register with Facebook, and on both Chrome and Firefox.

@crookedneighbor
Copy link
Contributor

@kimboslice44 Thanks for reporting it. We got a recent influx of traffic from imgur, we're trying to track down what the issue that is preventing the site from loading is. We'll report back here when everything is back to normal.

@crookedneighbor crookedneighbor changed the title Registration issues Registration issues (503 errors) Feb 23, 2015
@LlamaHobbit
Copy link

Same issue except I don't need to register. I've had an account I use daily for months, but as of about 3/3:30pm CST it gives me the 503 error when I try logging in on both the main site and on the backup herokuapp site.

@crookedneighbor crookedneighbor changed the title Registration issues (503 errors) Registration issues/Unable to access website (503 errors) Feb 23, 2015
@simkimsia
Copy link

Same situation with me on Android and main website

@Jadziyah
Copy link

Same here. Cannot access anything on the website, via browswer or app. We're not trying to register - we are current users. Simply, the site won't load from any platform. Glad it's being looked into

@kimboslice44
Copy link
Author

So I had an interesting development. I received the "Welcome to Habitica" email, even though I never actually successfully registered (that I know of).

@Dranian
Copy link

Dranian commented Feb 24, 2015

I'm still experiencing the 503 error as of 20/20:30 EST.

@bobdownsinfo
Copy link

Same issue happening for me while trying to Register on the website using my Win7 Laptop with Chrome. Tried registering with Facebook and by filling in the fields manually.

@lemoness
Copy link
Contributor

Unfortunately, we're still having a lot of trouble with the site :( It's
probably best not to register today, but to try again tomorrow. We're very
sorry!

On Mon, Feb 23, 2015 at 5:39 PM, Bob Downs notifications@github.com wrote:

Same issue happening for me while trying to Register on the website using
my Win7 Laptop with Chrome. Tried registering with Facebook and by filling
in the fields manually.


Reply to this email directly or view it on GitHub
#4737 (comment).

@bobdownsinfo
Copy link

No worries. Just wanted to add my two cents to help make sure you know
where it's coming from... if you didn't already. I've subscribed to this
thread, so if you post when it's fixed, I'll see that and register then.
Thanks! And, good luck!

On Mon Feb 23 2015 at 6:52:11 PM lemoness notifications@github.com wrote:

Unfortunately, we're still having a lot of trouble with the site :( It's
probably best not to register today, but to try again tomorrow. We're very
sorry!

On Mon, Feb 23, 2015 at 5:39 PM, Bob Downs notifications@github.com
wrote:

Same issue happening for me while trying to Register on the website using
my Win7 Laptop with Chrome. Tried registering with Facebook and by
filling
in the fields manually.


Reply to this email directly or view it on GitHub
<#4737 (comment)
.


Reply to this email directly or view it on GitHub
#4737 (comment).

@ghost
Copy link

ghost commented Feb 24, 2015

Unable to access as of 22:50 cst

@PWoF
Copy link

PWoF commented Feb 24, 2015

I don't know if this helps or not. I didn't really experience the 503 error until between 7 and 8 pm Central Time.

@bobdownsinfo
Copy link

Well, something changed at least. At 8:40 PM MST, I received an email from
HabitRPG with a Subject of "Welcome to Habitica!" This suggested to me that
at least one of my earlier Registration attempts succeeded. So, I clicked
on its link and tried clicking on the "Use Facebook to Login" button. It
thought about it for a while, but then gave me a new error message rather
than logging me in:

Application Error

An error occurred in the application and your page could not be served.
Please try again in a few moments.

If you are the application owner, check your logs for details.

This is fine and I imagine it's still being worked on. In case it might
help, I thought I'd share this update here. Good night!

On Mon Feb 23 2015 at 8:50:41 PM PWoF notifications@github.com wrote:

I don't know if this helps or not. I didn't really experience the 503
error until between 7 and 8 pm Central Time.


Reply to this email directly or view it on GitHub
#4737 (comment).

@crookedneighbor
Copy link
Contributor

Hey y'all, thanks for the updates. We're still sorting out why the site isn't working. There's no need to add additional comments at this time. We'll post here and on Twitter as soon as the site is back up and working normally again.

If you've commented already, you should be automatically subscribed to the issue. If you're new, press the subscribe button to the right. As soon as we have new info, we'll update this ticket.

@crookedneighbor
Copy link
Contributor

At this point we think it's something to do with how the Node app is interacting with Heroku. If there are any Heroku Doctors in the house, please come forward. For reference, here's the repo for the Heroku buildpack for Node.

edit: see next comment

@crookedneighbor
Copy link
Contributor

UPDATE

We don't think it's node vs heroku anymore.

On a hunch, we spun up a new database for beta.habitrpg.com, and the application errors ceased. This confirmed that the issue was with our database connection. If you tried logging into beta recently and it told you your account could not be found, that is why.

We're sorting out the rogue processes in the database, stand by!

@NovemberEcho
Copy link

Site seems to working great now other than the fact that I died due to the connection problems. Can that be undone?

@lemoness
Copy link
Contributor

It can! Go to Settings > Site > Fix Character Values and restore your stats
:) Sorry about that!

On Mon, Feb 23, 2015 at 11:21 PM, NovemberEcho notifications@github.com
wrote:

Site seems to working great now other than the fact that I died due to the
connection problems. Can that be undone?


Reply to this email directly or view it on GitHub
#4737 (comment).

@lefnire
Copy link
Contributor

lefnire commented Feb 24, 2015

Post Mortem (@lemoness I'ma splice your comment here and expand).

1. MongoDB $regex queries are very expensive

Even on indexed fields. They use indexes, and you can improve perf with a prefix expression - but they still peg CPU much more than a standard query. The culprit was here - that's registration code, meaning we were brought down because everyone was registering, rather than simply using the site. I'm working on ways to reduce $regex use (eg, 02901e5) but there's no quick-fix*. In order to get by for now, MongoLab has upgraded us from an M2 to M3 cluster, which doubles our CPU.

(TL;DR on the "no $regex quickfix")
The reason we're using regex in the first place is that our auth code is custom - I built it before there was a proper passport local auth system available. One very poor original oversight: unames & emails weren't case-sensitive. Because of that, we indeed have a non-trivial amount of duplicates in either username or email addresses. Were that not the case, we could run a migration against all usernames/passwords to downcase them, and use a standard {'auth.local.username':username} query going forward. (Or alternatively, to maintain caps use a "shadow" version like {uname:"Lefnire", downcase:"lefnire"}) - but we don't have that luxury

2. Index those queries

About 4 or 5 of our frequent mongo queries weren't indexed at all, mostly email stuff. A situation like this causes Dynos to wait for the query to resolve, blocking that Dyno from further traffic. We only get 4-13 Dynos at a time (they scale up and down with traffic), so blocking all 13 puts us on our knees. From Heroku:

Our router will drop a long-running request after 30 seconds, but the dyno behind it will continue processing the request until completion. Our router is unaware of it, though, so it'll dispatch new requests to that busy dyno. This effect tends to compound, and you'll eventually see H12 errors even for unrelated URLs, such as static assets. H13 errors are similar in what causes them, but are primarily related to concurrent web servers.

You'll need to use a tool like New Relic to gain visibility into queueing in your app.

If your app is using ExpressJS, you will also want to install something like timeout, which will ensure that a long running request is dropped at the dyno-level as well. Specifically, timeout raise a Response timeout exception when that happens.

With that in place, the compound effect is less likely to occur, but long-running actions still need to be addressed. Again, New Relic is a great tool to provide the visibility into your app to identify the long-running actions. You can then optimize them and make sure they're able to finish within a reasonable time, we suggest keeping all requests under 500ms. If they're performing any inherently long tasks, you should try to offload those to a background worker.

The fix was simple: add the indexes. For our devs, be sure to add db indexes for new queries. Or, you can see bottlenecks at mongolab.com > Slow Queries. Eg, here's an index that needs building:

slow

Simply click the "Build This Index" button and you're gold. Don't get trigger happy, here's an index that doesn't need building because it's a manual query I ran mongo shell:

dont bother

The difference is the "query count" field.


Anyway, we're back online. Note, we still have that one $regex so we're not in the clear yet, we're just much smoother than before. I'll keep an eye on the system tomorrow.

@lefnire
Copy link
Contributor

lefnire commented Feb 24, 2015

Oh man... just look at those $regex queries' average times:

time

We're gonna need to solve that ASAP

@StanLindsey
Copy link
Contributor

<3

@Ulugbeg
Copy link

Ulugbeg commented Feb 24, 2015

This might not be the right spot for it, for which I then apologize, but: one party member had her cron go of 3 times during this, killing me in the process (do you guys want screenshots or something for that? Our group-id is a7375dd2-8b1c-4f1b-8586-a10fb411bee6). While this is a pity, I can and will restore my stats. BUT I lost my limited edition 2014-2015 ice spike (for my character's left hand). I cannot rebuy that one. My userID is 3ae12e97-6fca-484e-9c92-a308e39afbb8

@Alys
Copy link
Contributor

Alys commented Feb 24, 2015

@Ulugbeg Have you reloaded your tasks page? The lost equipment should be showing up there under Rewards.

@Ulugbeg
Copy link

Ulugbeg commented Feb 24, 2015

Yes, indeed, I apologize for cluttering up this bug, thank you!

@deilann deilann changed the title Registration issues/Unable to access website (503 errors) regex queries on registeration too expensive (was: Registration issues/Unable to access website (503 errors)) Mar 5, 2015
@nivl4
Copy link
Contributor

nivl4 commented Oct 27, 2015

it seems @paglias fixed this in fc9d077 , e75d354, close this issue or more work needed?

@paglias
Copy link
Contributor

paglias commented Oct 27, 2015

@nivl4 yes, it's fixed!

@paglias paglias closed this as completed Oct 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests