Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive query count on forum index #139

Closed
unbolt opened this issue Jun 14, 2016 · 30 comments
Closed

Massive query count on forum index #139

unbolt opened this issue Jun 14, 2016 · 30 comments

Comments

@unbolt
Copy link

unbolt commented Jun 14, 2016

I'm just getting to grips with the codebase at the moment, and have written a set of importers to import our current forum from phpBB.

We have about 2000 threads and 20,000 posts - I have successfully imported our users separately, and then populated forum_categories, forum_threads and forum_posts and everything appears to be working the only issue at the moment is the massive number of queries that occurs on the forum index.

I am currently getting 4082 queries on forum index, which is 9 categories (under 3 title categories) - which seems way too much.

Taking a look at laravel debugbar I am seeing this - http://i.imgur.com/SCw1DSa.png

It repeats alternating between querying users and forum_posts for the majority of the 4000 queries.

So, two questions I suppose;

  • Is there another table I should be populating with additional information? I thought perhaps threads read might be the issue, perhaps?
  • Any ideas what else could be causing this? Have you tested the forums with a large amount of threads and posts?

Cheers

@Riari
Copy link
Collaborator

Riari commented Jun 14, 2016

Ouch - yeah, I'm aware of the excessive querying but never looked at it in detail. Certainly not for large amounts of data. I've been wanting to optimise Eloquent calls throughout, not just to reduce the number of queries but to make select statements more specific instead of doing SELECT * everywhere.

I'll try to prioritise some changes that should improve this. Thanks for the heads up.

Regarding the threads read table, you might want to see if you can populate that based on your phpBB data (otherwise every thread will appear as unread for everyone) but it won't help with the queries unfortunately.

@unbolt
Copy link
Author

unbolt commented Jun 14, 2016

I don't mind contributing and/or doing it myself if you can point me in the right direction as to what is happening? I am starting to review the code, but obviously there's a lot to get caught up on.

@unbolt
Copy link
Author

unbolt commented Jun 14, 2016

Additionally I'll provide my importers when they're clean, incase anyone else wants to use them.

@Riari
Copy link
Collaborator

Riari commented Jun 14, 2016

Of course, any contributions would be very welcome. The main place to look would be the API controllers. The views may be worth looking at as well, only because some of the properties/attributes being called on threads/posts in those may be triggering queries that aren't covered by eager loading.

There are probably a few methods in the API that could do with having some ->with() calls added.

@Riari
Copy link
Collaborator

Riari commented Jul 6, 2016

Hey,

If you're still using this, I've just pushed some changes to this and laravel-forum-frontend that reduce the query count overall. It still needs work, but the forum index should be less query-heavy now. If you've overridden the views and want to see the difference in category and thread views, you'll need to make the following replacements:

  • $category->threadsPaginated to $threads in the category.show view
  • $thread->postsPaginated to $posts in the thread.show view

I would be very grateful for some feedback if you get a chance to try it and I'll see if I can make some more optimisations before creating a new release.

Thanks

@Gummibeer
Copy link

Gummibeer commented Jul 7, 2016

@unbolt have you tried to debug it with clockwork and/or the debugbar package? It would be cool to have a list of repeating queries. Something I noticed that can massivly increase the query count is an uncached authorization system. Cause how I know laravel doesn't cache the results for a user, so if there is a DB request inside a policy it will query the DB everytime it calls the policy.

@Riari
Copy link
Collaborator

Riari commented Jul 7, 2016

Most of the queries being duplicated are specific to the package models. Unfortunately part of it is due to the Forum::route() helper method, where relationships are called to construct the route parameters and some of them point to models that are already available in the view. I've yet to figure out a workaround for that as it's not a simple matter of eager loading, and I want to avoid having to pass more than one model to the method.

The N+1 problem with categories is also a culprit, particularly with the breadcrumbs.

@Gummibeer
Copy link

Will caching be a solution? I don't mean the laravel caching for time but a custom caching for the single request. For example cache keys like: category_1 thread_5 and so on and before querying the DB it can look in the cache. This will prevent things like: thread 5 is put in as an extra model but also requested as a attributerelation from category 1. I created something like this for mutated attributes in an app. This could be put in an extra Cache class and should get implemented in all/most eloquent methods (what could be hard) - just an idea.

@Gummibeer
Copy link

Gummibeer commented Jul 7, 2016

class Cache
{
    protected $cache = [];

    protected function get($key, $default = null)
    {
        return array_get($this->cache, $key, $default);
    }

    protected function set($key, $value)
    {
        $this->cache[$key] = $value;
    }

    protected function remember($key, \Closure $callback)
    {
        if (!is_null($value = $this->get($key))) {
            return $value;
        }
        $this->set($key, $value = $callback());
        return $value;
    }
}

@Riari
Copy link
Collaborator

Riari commented Jul 7, 2016

Yeah, caching queries for the duration of the request would make a lot of sense. The only issue there is that some attributes are cached already using the Cache facade, which is obviously a different thing serving a slightly different purpose. I'll need to figure out suitable naming to avoid confusion.

@Gummibeer
Copy link

@Riari: What about Remember, Rememberer or something like this!?

@unbolt
Copy link
Author

unbolt commented Jul 7, 2016

I'll install the package again and run my import and let you know how it's looking, I took it out as I was considering using some old forum code I wrote instead. Will update this weekend.

@unbolt
Copy link
Author

unbolt commented Jul 7, 2016

The very first load did over 4k queries, now it's doing approx 250 for the page index. It's a hell of a lot better than it was with this amount of posts and replies anyway! I'll definitely need to look into some kind of view cache going on, the forums are super active (20k posts over the past 7 years or so) so I can probably get away with flushing caches on people making new posts.

@unbolt
Copy link
Author

unbolt commented Jul 7, 2016

I take it back, it doesn't seem to cache that for too long. Is there any information I can output for you to help in sorting this?

@Gummibeer
Copy link

Gummibeer commented Jul 7, 2016

I think it will help if you can put in the list of (all) queries that are done. The debugbar will give a great list for this including the place where the queries are requested. Probably as pastebin or something to don't flood this issue. ;)

And in addition to caching: I don't like to cache something over one request - there are so much places where it can get changed (counts, text and so on) caching them in a cache driver and refreshing them if needed will just blow up your logic and just produce a lot of requests against the cache driver.
I'm better with request caching to prevent requesting the same entity multiple times.

@Riari
Copy link
Collaborator

Riari commented Jul 7, 2016

Yes, a query log would be very useful, thank you. Odd that the initial load still resulted in so many queries as the changes I made should reduce them regardless of anything being cached.

@Gummibeer
Copy link

@Riari I take a look in the repo and can't find tests and also no factories!? It will help to have factories and/or seeder to solve this issue - if there are some I can setup a clean instance and fill it with thousands/millions of entries and can take a look in the queries to find the place where it creates all the too much ones.

@unbolt
Copy link
Author

unbolt commented Jul 7, 2016

It's too big for pastebin and too long to post here. So I've attached a text doc with the query log in it... If there's a better way to export this than c/p let me know.

querylog.txt

@Riari
Copy link
Collaborator

Riari commented Jul 7, 2016

@Gummibeer I agree, they would be useful. I wasn't very confident in my test-writing abilities when I originally forked this package, but 4.0 (which I started here: https://github.com/Team-Tea-Time/illuminate-forum) will have test coverage and model factories to support it. That said, I think it's still fairly easy to examine an issue like this with a much smaller set of data, especially as debugbar gives you some indication of duplicate queries.

@unbolt thanks :)

@unbolt
Copy link
Author

unbolt commented Jul 7, 2016

And a comparison on the second load if that helps at all.

query_2nd.txt

@Gummibeer
Copy link

Issues I noticed in the Query Log:

  • Queries categories minimum 2 times
  • Queries threads 2 times
  • Queries alot of Users after every Post Query

So creating some Kind of request Caching will help here, also if the real Cache is disabled, like on my installations.

@unbolt
Copy link
Author

unbolt commented Jul 8, 2016

I haven't looked into it too much, but it sounds as though a lot of additional information is being queried on many of the requests which isn't needed.

Perhaps something like the eventbrite API where you can optionally extend certain pieces of information would be useful on the API end so that you can grab simple versions of the category, for example, and optionally expand to get the last poster.

Just spit balling really, query caching would be one thing, but it doesn't really get to the root of the issue if that queried information isn't needed at all.

@unbolt
Copy link
Author

unbolt commented Jul 9, 2016

If I edit the routes on the laravel-forum-frontend here to point the index as my own controller somewhere, would that make any sense for me to generate my own version of the forum index?

I'm running out of time slightly, I wanted to get the site online by Legion pre-patch which is probably the 19th given the current rumours, so if I have to bodge together my own forum index using my own queries to get it workable I can do that.

Ran the code on the Pi2 instead of my MBP and it won't finish loading the page in under a minute, so probably won't be workable on the Digital Ocean node we currently run the site on. This would allow me to write something quickly that only uses the bare minimum query and then go from there.

Just wanted to check editing in that particular spot makes most sense before I do it?

@Riari
Copy link
Collaborator

Riari commented Jul 9, 2016

You can do that, or you can extend the controllers somewhere and change the namespace in the config: https://github.com/Riari/laravel-forum-frontend/blob/1.0/config/frontend.php#L41

@unbolt
Copy link
Author

unbolt commented Jul 9, 2016

Cool, thanks.

Minor thought - we don't appear to be saving post and thread count anywhere. Is it querying every thread in the forum to get the post count and total post count?

We could store this somewhere, either database level against the forum and thread (threads and reply count respectively) or cache somewhere and increment as the figures change.

Thoughts?

@Riari
Copy link
Collaborator

Riari commented Jul 9, 2016

They're cached at 5 minute intervals by default, but are otherwise not stored anywhere. Could you open an issue for that? Or if you're up to it, maybe do a PR to change the current caching of those attributes? I think doing it at the database level and incrementing/decrementing via the ThreadObserver and PostObserver classes would make the most sense.

@unbolt
Copy link
Author

unbolt commented Jul 9, 2016

That would explain the initial load and then no load for a little while. I'll open an issue for it and work on it and provide my code, but I don't think I feel confident enough in my gitting abilities to do a pull request just yet.

@Riari
Copy link
Collaborator

Riari commented Jul 10, 2016

With the changes from #142, this is certainly a lot better. All that leaves is to address the duplicated category and thread/post author queries.

I might move the category depth and deepestChild attributes to the DB as well, which probably means the caching (along with forum.preferences.cache_lifetimes) will be removed completely.

I'll keep this open until I get a chance to make more optimisations. Thanks for the help!

@anolek
Copy link

anolek commented Aug 30, 2016

@Riari Is it possible to make a release with the latest changes ? Thanks !

@Riari
Copy link
Collaborator

Riari commented Sep 6, 2016

New release is out now. Thanks for your patience!

@Riari Riari closed this as completed Sep 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants