-
-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outgoing federation problem #4288
Comments
That's odd. Since the federation queue state says that they are up to date that (I think) means one of these things:
Could you look in the |
That query doesnt return anything. It seems that the activity wasnt submitted to the queue at all. Would also explain why its missing from all other instances. |
i've updated the query it was a bit broken, try again? if it's not in there then that's a different issue but @sunaurus reports having the issue for things that are in the table |
Can you increase the log level of the federation process to |
Right I also forgot the Errors 4xx for activity sending are logged here. I changed log level to debug for that crate and grepped "was rejected". There are countless messages, but nothing that seems relevant.
And here are the logs for lemmy.world, looks normal.
|
I'm still wary of them not being in sent_activity. maybe the above query is still wrong. because if it is not in that table, then there's two very different issues - it not being inserted would be something close to the |
I think its really a problem with db inserts (or throwing an error at some earlier point). Just now I performed some federated actions like following remote communities and voting on remote posts, yet none of it is reflected in the sent_activities table:
|
huh, that's interesting then and should be "easy" to debug. Maybe you can set the log level of the main process to info to see if this log happens for those activities: lemmy/crates/apub/src/activities/mod.rs Line 206 in dcb89f5
Maybe the issue then would be somewhere in the match_outgoing_activities channel thing ( https://github.com/LemmyNet/lemmy/blob/main/crates/apub/src/activities/mod.rs ) because we also changed that code |
There's definitely an issue there: lemmy/crates/apub/src/activities/mod.rs Lines 228 to 233 in dcb89f5
If there's an error anywhere in there then that loop permanently ends, and the error isn't even logged (until server shutdown) because the task is tokio::spawned and only joined at the end of the process. |
PR to fix that: #4295 (not sure if it fixes this issue though) |
Version 0.19.1-rc.1 is now available which should hopefully fix this issue. |
Let me know if you see log messages with "error while saving outgoing activity" you see. Also sunaurus reported broken federation for some events which were present in the saved_activity table which can't be something fixed by this. |
I'm not sure if it is same issue but I don't receive any posts on my Mastodon server Edit: I don't own that lemmy instance so I can't provide logs directly. |
I believe there is still a federation bug. The response from walden seen here: https://sub.wetshaving.social/comment/967575 I gave it some time in case it was a queue type of thing, but it was over 30 minutes ago, so should have gone through by now. |
That instance does not seem to currently have working federation due to something else, see https://phiresky.github.io/lemmy-federation-state/site?domain=sub.wetshaving.social . Their federation process seems to not be running since yesterday (for whatever reason of a different shape than this issue) |
feddit.de is also running into outgoing federation problems even though it's updated to 0.19.1 https://feddit.de/comment/5822060. This comment cannot be viewed from other instances. Have tried lemmy.one, lemmy.world and lemmy.dbzer0. From the lemmy federation site, it seems that feddit.de is seeing almost all other instances as dead instances. https://phiresky.github.io/lemmy-federation-state/site?domain=feddit.de |
My comments from my personal instance also only started federating after I restarted the Docker container. I think this bug is still present. You should keep an eye on it. |
I restarted that instance, and now all of the other instances that I linked to have received the post. I'll keep an eye on https://phiresky.github.io/lemmy-federation-state/site?domain=sub.wetshaving.social and see how things go. Thanks. |
lm.korako.me also has similar problems. |
My personal instance (0.19.1) https://phiresky.github.io/lemmy-federation-state/site?domain=biglemmowski.win also showed no up to date instances (and most as lagging), after restarting it took around ~30 minutes before the lagging and up to date numbers have switched. Could this be just a caching issue? Edit: Now it actually shows them as lagging again - by ~30 minutes. In logs I see:
|
Kept an eye on it today using https://phiresky.github.io/lemmy-federation-state/site?domain=sub.wetshaving.social and federation seemed to work for about 8 hours after a restart before failing. I just restarted it again and changed the logging to |
Out going is still not working on my server. I will look at the logs later. (Sorry, I don't speak English, so I'm using machine translation) |
If most instances show as dead, then thats #4039 . if the instances show lagging then it's a new bug where the queue creation loop somehow stops |
I've set a cronjob to restart the Lemmy container every 6 hours, and so far so good. Lemmy-ui, postgresql, etc. keep chugging along. Instances show as lagging a lot of the time, but it always goes back to "up to date" eventually. https://phiresky.github.io/lemmy-federation-state/site?domain=sub.wetshaving.social |
Thanks I will follow that and make sure the admins of feddit.de do the same |
I changed it to Federation is still not working on lm.korako.me.
Is there anywhere else I should look? As a supplement, I use Cloudflare Tunnel, is this relevant? |
What do those messages ("Federation state as of XXX: xxx others up to date. XX instances behind) look like over time? What do they look like after federation stops working? |
For me it looks like this https://termbin.com/omf8 Right now https://phiresky.github.io/lemmy-federation-state/site?domain=biglemmowski.win shows all 1865 as lagging behind (even with dev tools open and disable cache ticked in firefox) but last log looks like this:
|
@wereii thanks for the logs. Do the logs really end at 13:15Z? That's weird then because the last successful send seems to be at |
I might have found the issue. #4330 should fix it. The underlying cause would still be pool timeout errors though which are caused by too small pool or too bad hardware. Under normal operation the pool should never really timeout. |
Ad the first comment, here is
And about the performance bit, the container is not restricted so it has 64Gigs of RAM (~40 free) and full twelve threads of Ryzen 5 3600 to use - I don't have graphs/history but the processor mostly idles even with other services running there. So in my case it shouldn't be because of resource starvation. EDIT2:
This is after 4 days of uptime. |
The "Federation state" message is output every minute, but there seems to be nothing wrong with it. I have been experimenting and have found that only Mastodon is not federating well, while Misskey and Akkoma are federating correctly. When I follow a Lemmy community or user from Mastodon, it remains in request status. (Sorry, I don't speak English, so I'm using DeepL translation) |
Looks worse for me. If I can provide any debug info, let me know... |
Not a solution, but a workaround that I just shared on Lemmy: https://sub.wetshaving.social/post/487989 Adding the following in the crontab keeps federation chugging along, and by not restarting at midnight it helps avoid this bug
|
Was wondering why my comments suddently stopped being engaged with!! lol Can confirm that this is still an issue, restarting did help. Going to look into doing something like @etymotic 's workaround |
If you want, you can shorten this to:
Looks a bit cleaner so you dont have to have 4 commands in there. :) I had that command in my crontab for a while, but sometimes Lemmy would fail to start. It seems to work better to take down the entire stack and sleep in the middle:
|
lemmy.today fixed its federation problem by simply updating their instances table in the db: https://lemmy.today/comment/4405283 Perhaps other instances are also having similar issues, specially if they have been upgrading from version to version and something happened along the way somehow. Maybe its not the Lemmy software that has a bug, but rather the 0.19 upgrade triggered federation failure somehow? Edit: No its not fixed, just improved in some ways. Federation seems to stop after a while and not return until a restart, and even a restart doesnt federate certain posts to certain servers. |
@sunaurus That's really good and I feel like this should be released in a minor release as soon as possible since it's a critical bug to fix. |
We'll try to get one out shortly. |
@dessalines @Nutomic I think this issue can be closed now. Federation seems to work normally again on 0.19.2 and above. Thank you! |
Requirements
Summary
There seems to be a problem with federating outgoing activities in 0.19. For example in !lemmy_support the post "Is federation not working on lemmy.ml ?" is (ironically) not making it to any other instances:
Also the mod log for /c/memes hasnt federated to lemmy.ml in five days:
The federation queue shows all these instances as up to date. Server logs dont show anything relevant.
Edit: Also discussed here
Version
0.19.0
Lemmy Instance URL
lemmy.ml
The text was updated successfully, but these errors were encountered: