-
-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement user data import/export #3976
Conversation
295a8c6
to
970af5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename this to UserSettingsBackup
, because UserData
seems like #506 , which means not just settings, but all user data, stuff like posts, comments, votes, saves, etc. Many of those things would be impossible to import, because its historical data.
Lots of somewhat overlapping issues this could close: https://github.com/LemmyNet/lemmy/issues?q=is%3Aissue+is%3Aopen+export
cc12d14
to
834c13a
Compare
backing up comments Could also be useful (e.g. if i want to refer to something i wrote before when writing a new comment), i realize this might overload the server but this seems like a classic case where you could offer paid features (and there is a strong economic justification for having the features paid), or just having instance admin manually approve it (or stop the process if needed). |
src/api_routes_http.rs
Outdated
@@ -291,7 +292,14 @@ pub fn config(cfg: &mut web::ServiceConfig, rate_limit: &RateLimitCell) { | |||
.route("/verify_email", web::post().to(verify_email)) | |||
.route("/leave_admin", web::post().to(leave_admin)) | |||
.route("/totp/generate", web::post().to(generate_totp_secret)) | |||
.route("/totp/update", web::post().to(update_totp)), | |||
.route("/totp/update", web::post().to(update_totp)) | |||
.route("/export", web::get().to(export_user_backup)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a stricter rate limit for export too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya pry a good idea, the strictest we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I applied the new import rate limit also for export (max once per day).
@wiki-me This is meant for data which can be exported from one account and then imported into another one. With comments thats not really possible, unless the import would repost all the comments again. That would result in countless duplicates. So if you want to backup everything you ever posted, it should be handled by an external tool using the API imo. |
src/api_routes_http.rs
Outdated
.route(web::get().to(import_user_backup)), | ||
web::scope("/user") | ||
.wrap(rate_limit.import_user_settings()) | ||
.route("/export", web::get().to(export_user_backup)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, remember to rename these to export_settings
, and the structs to UserSettingsBackup
. In case in the future we want to export all user data, not just settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/api_routes_http.rs
Outdated
@@ -297,8 +297,8 @@ pub fn config(cfg: &mut web::ServiceConfig, rate_limit: &RateLimitCell) { | |||
.service( | |||
web::scope("/user") | |||
.wrap(rate_limit.import_user_settings()) | |||
.route("/export", web::get().to(export_user_backup)) | |||
.route("/import", web::get().to(import_user_backup)), | |||
.route("/export", web::get().to(export_settings)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The route urls should maybe change too, in case we add a user/export
which includes everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/export_settings and import_settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Could you just not send it to other servers, that way a user can checkout another user top posts (example given) , or controversal posts (if that is currently implemented for user profiles), in case you are considering if to invite him to some private community or doing some form of collaboration with him. But i agree there are more important things to implement but maybe you want to "strike while the iron is hot". |
Will user be able to export account settings after he/she was banned? |
@kovalensky No, that would be a separate issue. |
@Nutomic Is it necessary to open one? |
@kovalensky Not sure, use the search function. |
|
||
spawn_try_task(async move { | ||
let person_id = local_user_view.person.id; | ||
try_join_all(data.followed_communities.iter().map(|followed| async { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think try_join_all is a good method to use here for two reasons:
-
if one of the follows fail, the others should still go through. for example if a user has followed communities on instances that no longer exist then the import will fail at a random point and leave the import partially done. (from the docs: If any future returns an error then all other futures will be canceled and an error will be returned immediately)
Instead, a method should be used where each error is handled separately with the main import keeping going.
-
There should be a limit on the concurrency. If the user follows 500 communities we don't want the server to fetch and subscribe to all of those 100 communities at once because rate limits might cause that to fail. It should either be fully sequential or with a fixed limit on concurrency.
This should be possible by converting the iterator into a stream and calling
.buffered(10)
(see https://stackoverflow.com/a/70871743/2639190 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these links are to be believed then join_all runs futures concurrently, not in parallel. Though I havent seen a clear confirmation for that in the docs.
- https://stackoverflow.com/questions/63756169/rust-futuresfuturejoin-all-await-runs-futures-sequentially-but-parall
- https://www.reddit.com/r/rust/comments/dt6u0s/joined_futures_will_run_concurrently_not_in/
If this is correct it would be enough to switch from try_join_all
to join_all
and collect the errors manually. I also tried to switch to the method you suggested, but its not working because the compiler throws an error about context
geting moved.
futures::stream::iter(data.followed_communities.clone().into_iter().map(
|followed| async move {
// need to reset outgoing request count to avoid running into limit
let context_ = context_.reset_request_count();
let community = followed.dereference(&context_).await?;
let form = CommunityFollowerForm {
person_id,
community_id: community.id,
pending: true,
};
CommunityFollower::follow(&mut context_.pool(), &form).await?;
LemmyResult::Ok(())
},
))
.buffer_unordered(10)
.collect::<Vec<_>>()
.await;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I added error handling for remote fetches now. Also tried to return a list of failed items in the api response but couldnt get that working. For now its only being logged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concurrently, not in parallel
Makes sense, but that'll still mean that it will send out 100 requests at almost the same time (since it'll mainly be waiting on the HTTP I/O), which will potentially cause issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it working now. The code is rather verbose and repetetive now. It could potentially be simplified by moving the iter logic into a helper function, but that gets very complex with lots of generics and lambdas.
ADD COLUMN import_user_settings int NOT NULL DEFAULT 1; | ||
|
||
ALTER TABLE local_site_rate_limit | ||
ADD COLUMN import_user_settings_per_second int NOT NULL DEFAULT 86400; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm a bit confused by this naming. it sounds like it allows 86400 imports per second but what it really means is it allows only one import per 86400 seconds. so maybe it should be called smth like import_user_settings_rate_limit_seconds
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes all the rate limits have these confusing names so at least its consistent. It would be nice to rename them in the future to be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Has the Lemmy UI been given a button to allow the user to interact with this new feature yet, or is it just going to be through API interaction? |
Not yet no, you can track it over on that repo. |
This adds new endpoints which lets users export their data (community follows, blocklists, profile settings), and import them again. It can be used to migrate between instances more easily, but also for backups in case the own instance goes down unexpectedly or the user gets banned. The user can create a new account on another instance, and import the latest backup.