Improve performance by making database queries async / figure out better postgres caching. #606

Nutomic · 2020-03-20T11:15:18Z

This is not important now because CPU load is very low on the server, but something to keep in mind for the future.

Make your Peform trait async (look into the async-trait crate), and put web::block(move || {}) around each of your r2d2 accesses.

It's a little unfortunate that y'all pulled your r2d2 .get()s out to the top-level of each route. That's definitely going to be a bottleneck for simultaneous requests, and also putting something synchronous at that level basic removes the opportunity to do asynchronous operations inside the routes.

Maybe instead of guarding the route with a pool.get(), you should just pass the pool into the route so you can .get() when you need it

The text was updated successfully, but these errors were encountered:

dessalines · 2020-03-20T14:13:14Z

This is a good idea. I've noticed that basically all DB actions (within a single websocket connection) are synchronous. So the longer operations (like voting), can temporarily block even things like a post loading.

asonix · 2020-03-20T16:40:16Z

from masto conversation:
the websocket part is particularly bad, because y'all are using a single Async actor to handle all connections, and that actor executes blocking operations for every request.

An easy path forward for that would be wrapping your shared state in Arc<RwLock<Whatever>> and then make the websockets actor a SyncActor

Actually i'm not sure if websockets can do that, maybe look into spinning up a separate SyncActor to manage shared state. Once you have a SyncActor, though, you can say "Spin up this many threads of SyncActors" and actix will go ahead and do that

asonix · 2020-03-20T16:44:10Z

Also mentioned in the masto thread:
You should also either be using reqwest's async methods, or use Actix Web's built-in async http client. Blocking the current async thread with a web request is just as bad as blocking the current async thread with a db request

dessalines · 2020-03-20T16:46:39Z

I'm pretty bad at async programming, esp with rust. I could def use some help with this. I could use web::block with the simple http endpoints, but its really complicated with websocket actors.

dessalines · 2020-04-16T17:52:56Z

I'm really stuck with this. Wasted a day on trying to use SyncArbiter, using a shared Mutex ChatAppState, and failing badly.

I opened up a ticket on actix to see if anyone can help out there: actix/actix-web#1455 . Lemmy very closely follows their websocket_chat example, so this is an issue affecting everyone using that as a basis.

dessalines · 2020-04-22T01:38:18Z

Unfortunately adding web::block to the websocket actor still did nothing.

dessalines · 2020-04-22T19:20:41Z

@asonix @Nutomic Okay so I finally figured out the blocking issue, and I can't blame rust or async, but postgres materialized views. As usual its one of the two problems in compsci, caching.

So all the data in lemmy is stored in very normalized tables, but fetched using views, that often have to do some pretty slow queries... sometimes 500+ ms queries to get things with advanced sorting like a hot sort.

So a few months ago I did a lot of work to create materialized views to match every view in lemmy, IE there's a post_view view, and a post_mview materialized view, and DB triggers on each of the tables to run functions that refresh the materialized views concurrently. So when you make a comment on a post, it has to update several materialized views: the post_mview (for a comment count aggregate), the comment_mview, etc.

Using postgres' refresh concurrently, it means that it won't lock out / cancel current readers of the table, but it will stall out any readers until the refresh finishes. This is where the locking is happening. I'd thought it would just immediately current return readers old info, but unfortunately it doesn't, it stalls them.

This also means that any action is pretty much locking all reads out until that finishes, which is not good.

But then again, using the materialized views in the first place was to make lemmy have really fast read-times, but as a consequence its slowed down both write-times and read-times (while a materialized view is being updated).

I'm not really sure the best solution, because using a caching method outside of postgres adds a lot of complication, and also means pushing the same problem elsewhere... If I like a comment, I want to see that result immediately.

I don't know all my options, but my current favorite is:

Don't use DB triggers to update the mviews. When you do an action, return the uncached version as a live message to the user and other websocket users. Then every few minutes or so do a manual refresh of all the views. I still don't like this, because if you were to like a post, then refresh the page, it wouldn't show it.
Stop using materialized views altogether, switch back to views. This would mean reads (when not rebuilding the views) slow down a lot, but writes and reads would speed up as a whole.

edit: I found an article that might help: https://hashrocket.com/blog/posts/materialized-view-strategies-using-postgresql

Instead of a trigger refreshing the whole view, I might be able to do it more atomically.

Nutomic · 2020-04-22T20:06:54Z

This would mean reads slow down a lot, but writes and reads would speed up.

Typo I guess? Cant have reads slow down and speed up.

dessalines · 2020-04-22T20:13:13Z

Kinda, I corrected a little. Reading right now is super-fast, but when refreshing those materialized views, it stalls all current readers. So removing the views would slow down reads overall, but it would be faster than the current "stalled out"

I think the solution might not be as difficult as I thought: and that's to not have the triggers run refresh materialized view concurrently, but to have the triggers change only the specific data that the CRUD action altered. Its actually kinda impressive that the current mviews are able to refresh huge tables with advanced sorting in a second or two... any time you make a comment or post its doing just that.

pioneer · 2020-04-23T10:53:38Z

My little bit to add. Raddle shows a small "in progress" indicator when pressing up/downvote buttons. If average time of a vote to be recorded in the db is not too much, this approach could be used, in this case the UI behaves both honestly (you see exactly what's happening with no lie about your vote) and immediately/responsively. Not sure though about how users will like this. AFAIK, the UI is doing background update anyway, correct me if I'm wrong.

How long could it take for a vote to get finally recorded into db and the counts to be sent back to the frontend?

dessalines · 2020-04-23T14:02:34Z

I used to show vote progress indicators just like the comment one, but then I made the vote actions semi-async. IE it doesn't wait for the result. Right now to run all the mview refreshes its taking about 2 seconds, but after I make the changes it'll probably be really fast, and I can go back to the indicators.

Either way I'm not gonna do this until after a federation merge, because its heavy DB changes.

dessalines · 2020-05-03T22:25:24Z

Turns out the best way to do this is to just use fast tables, which are built from the views, indexed, and then triggers on the table inserts fetch rows from the views to fill the fast tables.

Nutomic added enhancement New feature or request Component: Backend labels Mar 20, 2020

dessalines added the area: performance label Mar 24, 2020

dessalines mentioned this issue Apr 21, 2020

Adding web::block to all routes. #651

Merged

dessalines closed this as completed in #651 Apr 22, 2020

dessalines reopened this Apr 22, 2020

dessalines changed the title ~~Improve performance by making database queries async~~ Improve performance by making database queries async / figure out better postgres caching. Apr 22, 2020

dessalines mentioned this issue May 28, 2020

Clarify the difference between "remove" and "delete" #712

Closed

dessalines pinned this issue Jun 30, 2020

dessalines mentioned this issue Jul 5, 2020

Remove materialized views. #908

Merged

dessalines closed this as completed Jul 8, 2020

dessalines unpinned this issue Jul 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance by making database queries async / figure out better postgres caching. #606

Improve performance by making database queries async / figure out better postgres caching. #606

Nutomic commented Mar 20, 2020

dessalines commented Mar 20, 2020

asonix commented Mar 20, 2020

asonix commented Mar 20, 2020

dessalines commented Mar 20, 2020

dessalines commented Apr 16, 2020 •

edited

dessalines commented Apr 22, 2020

dessalines commented Apr 22, 2020 •

edited

Nutomic commented Apr 22, 2020

dessalines commented Apr 22, 2020

pioneer commented Apr 23, 2020

dessalines commented Apr 23, 2020

dessalines commented May 3, 2020

Improve performance by making database queries async / figure out better postgres caching. #606

Improve performance by making database queries async / figure out better postgres caching. #606

Comments

Nutomic commented Mar 20, 2020

dessalines commented Mar 20, 2020

asonix commented Mar 20, 2020

asonix commented Mar 20, 2020

dessalines commented Mar 20, 2020

dessalines commented Apr 16, 2020 • edited

dessalines commented Apr 22, 2020

dessalines commented Apr 22, 2020 • edited

Nutomic commented Apr 22, 2020

dessalines commented Apr 22, 2020

pioneer commented Apr 23, 2020

dessalines commented Apr 23, 2020

dessalines commented May 3, 2020

dessalines commented Apr 16, 2020 •

edited

dessalines commented Apr 22, 2020 •

edited