Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance by making database queries async / figure out better postgres caching. #606

Closed
Nutomic opened this issue Mar 20, 2020 · 12 comments · Fixed by #651
Closed
Labels
area: performance enhancement New feature or request

Comments

@Nutomic
Copy link
Member

Nutomic commented Mar 20, 2020

This is not important now because CPU load is very low on the server, but something to keep in mind for the future.

Make your Peform trait async (look into the async-trait crate), and put web::block(move || {}) around each of your r2d2 accesses.

It's a little unfortunate that y'all pulled your r2d2 .get()s out to the top-level of each route. That's definitely going to be a bottleneck for simultaneous requests, and also putting something synchronous at that level basic removes the opportunity to do asynchronous operations inside the routes.

Maybe instead of guarding the route with a pool.get(), you should just pass the pool into the route so you can .get() when you need it

@Nutomic Nutomic added enhancement New feature or request Component: Backend labels Mar 20, 2020
@dessalines
Copy link
Member

This is a good idea. I've noticed that basically all DB actions (within a single websocket connection) are synchronous. So the longer operations (like voting), can temporarily block even things like a post loading.

@asonix
Copy link
Collaborator

asonix commented Mar 20, 2020

from masto conversation:
the websocket part is particularly bad, because y'all are using a single Async actor to handle all connections, and that actor executes blocking operations for every request.

An easy path forward for that would be wrapping your shared state in Arc<RwLock<Whatever>> and then make the websockets actor a SyncActor

Actually i'm not sure if websockets can do that, maybe look into spinning up a separate SyncActor to manage shared state. Once you have a SyncActor, though, you can say "Spin up this many threads of SyncActors" and actix will go ahead and do that

@asonix
Copy link
Collaborator

asonix commented Mar 20, 2020

Also mentioned in the masto thread:
You should also either be using reqwest's async methods, or use Actix Web's built-in async http client. Blocking the current async thread with a web request is just as bad as blocking the current async thread with a db request

@dessalines
Copy link
Member

I'm pretty bad at async programming, esp with rust. I could def use some help with this. I could use web::block with the simple http endpoints, but its really complicated with websocket actors.

@dessalines
Copy link
Member

dessalines commented Apr 16, 2020

I'm really stuck with this. Wasted a day on trying to use SyncArbiter, using a shared Mutex ChatAppState, and failing badly.

I opened up a ticket on actix to see if anyone can help out there: actix/actix-web#1455 . Lemmy very closely follows their websocket_chat example, so this is an issue affecting everyone using that as a basis.

@dessalines
Copy link
Member

Unfortunately adding web::block to the websocket actor still did nothing.

@dessalines dessalines changed the title Improve performance by making database queries async Improve performance by making database queries async / figure out better postgres caching. Apr 22, 2020
@dessalines
Copy link
Member

dessalines commented Apr 22, 2020

@asonix @Nutomic Okay so I finally figured out the blocking issue, and I can't blame rust or async, but postgres materialized views. As usual its one of the two problems in compsci, caching.

So all the data in lemmy is stored in very normalized tables, but fetched using views, that often have to do some pretty slow queries... sometimes 500+ ms queries to get things with advanced sorting like a hot sort.

So a few months ago I did a lot of work to create materialized views to match every view in lemmy, IE there's a post_view view, and a post_mview materialized view, and DB triggers on each of the tables to run functions that refresh the materialized views concurrently. So when you make a comment on a post, it has to update several materialized views: the post_mview (for a comment count aggregate), the comment_mview, etc.

Using postgres' refresh concurrently, it means that it won't lock out / cancel current readers of the table, but it will stall out any readers until the refresh finishes. This is where the locking is happening. I'd thought it would just immediately current return readers old info, but unfortunately it doesn't, it stalls them.

This also means that any action is pretty much locking all reads out until that finishes, which is not good.

But then again, using the materialized views in the first place was to make lemmy have really fast read-times, but as a consequence its slowed down both write-times and read-times (while a materialized view is being updated).

I'm not really sure the best solution, because using a caching method outside of postgres adds a lot of complication, and also means pushing the same problem elsewhere... If I like a comment, I want to see that result immediately.

I don't know all my options, but my current favorite is:

  • Don't use DB triggers to update the mviews. When you do an action, return the uncached version as a live message to the user and other websocket users. Then every few minutes or so do a manual refresh of all the views. I still don't like this, because if you were to like a post, then refresh the page, it wouldn't show it.

  • Stop using materialized views altogether, switch back to views. This would mean reads (when not rebuilding the views) slow down a lot, but writes and reads would speed up as a whole.

edit: I found an article that might help: https://hashrocket.com/blog/posts/materialized-view-strategies-using-postgresql

Instead of a trigger refreshing the whole view, I might be able to do it more atomically.

@Nutomic
Copy link
Member Author

Nutomic commented Apr 22, 2020

This would mean reads slow down a lot, but writes and reads would speed up.

Typo I guess? Cant have reads slow down and speed up.

@dessalines
Copy link
Member

Kinda, I corrected a little. Reading right now is super-fast, but when refreshing those materialized views, it stalls all current readers. So removing the views would slow down reads overall, but it would be faster than the current "stalled out"

I think the solution might not be as difficult as I thought: and that's to not have the triggers run refresh materialized view concurrently, but to have the triggers change only the specific data that the CRUD action altered. Its actually kinda impressive that the current mviews are able to refresh huge tables with advanced sorting in a second or two... any time you make a comment or post its doing just that.

@pioneer
Copy link
Contributor

pioneer commented Apr 23, 2020

My little bit to add. Raddle shows a small "in progress" indicator when pressing up/downvote buttons. If average time of a vote to be recorded in the db is not too much, this approach could be used, in this case the UI behaves both honestly (you see exactly what's happening with no lie about your vote) and immediately/responsively. Not sure though about how users will like this. AFAIK, the UI is doing background update anyway, correct me if I'm wrong.

How long could it take for a vote to get finally recorded into db and the counts to be sent back to the frontend?

@dessalines
Copy link
Member

I used to show vote progress indicators just like the comment one, but then I made the vote actions semi-async. IE it doesn't wait for the result. Right now to run all the mview refreshes its taking about 2 seconds, but after I make the changes it'll probably be really fast, and I can go back to the indicators.

Either way I'm not gonna do this until after a federation merge, because its heavy DB changes.

@dessalines
Copy link
Member

Turns out the best way to do this is to just use fast tables, which are built from the views, indexed, and then triggers on the table inserts fetch rows from the views to fill the fast tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: performance enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants