New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the back end crawler configurable #6495
Conversation
Thanks for working on this! Remarks: I think part of the problem this was not tackled until now is that making only the max depth configurable isn't really gonna fix the entire problem. The most important factor is concurrency because that affects speed. And if people still need to adjust the configuration, they likely won't do it (as they don't know this is even possible). So effectively, nobody will change anything and it will still be slow for people in 5.3. I think, what would be way smarter would be 2 configuration options:
The minimum can be hardcoded as this is imho alwas These should also be considered in the So I would also like to have the concurrency selectable in the back end. You may ask why anybody would ever want to have less concurrency than the max configured? That's a reasonable question. I just think that there's a ton of users out there who use Contao as is and would never reconfigure things in Hence, imho 2 dropdowns would be super nice, also indicating the users that there is actually a possibility to change that (and if people want more than 10 they will ask and will find that there's an option in The best case for UX would include remembering the last used options in the back end in the session so I don't have to always re-configure when I log back in. |
We already discussed this and decided against adding a select menu for the concurrency (see #3809 (comment)). Should we discuss again? |
We don't have to, I'm just looking for a way to inform the users about the concurrency that's being used. Maybe a simple hint in the back end when the crawler is running would be enough? What do you think about always showing this:
|
Judging from the situations in the community I think crawling is mostly slow because of too many URLs being unnecessarily crawled, due to pagination and filter query parameter URLs increasing the amount of URLs exponentially (when using an unbound/high max-depth), not because the concurrency setting is too low. That being said, I am not against adding the concurrency setting (or info about it) to the back end. |
Yes, I know. Which is why my text included an info about how you can skip those :) |
I would output the text only when the crawler is running - otherwise it bloats the maintenance screen :)
Would need to be added, yes. |
Yeah, with the real values of course :D And I think I would leave the hint about the docs. I will extend them. |
Added in bdfc1b2. We can add the docs hint at any time as soon as the docs have been updated. |
You can already add it, I'm writing the docs as we speak. We can merge it only once the docs PR is merged :) |
Here we go with the docs: contao/docs#1260 |
Implements #3809
As discussed, all parameters can now be configured in the container configuration and the maximum depth can also be selected in a drop-down menu.