Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement user data import/export #3976

Merged
merged 16 commits into from
Oct 11, 2023
Merged

Implement user data import/export #3976

merged 16 commits into from
Oct 11, 2023

Conversation

Nutomic
Copy link
Member

@Nutomic Nutomic commented Sep 21, 2023

This adds new endpoints which lets users export their data (community follows, blocklists, profile settings), and import them again. It can be used to migrate between instances more easily, but also for backups in case the own instance goes down unexpectedly or the user gets banned. The user can create a new account on another instance, and import the latest backup.

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename this to UserSettingsBackup, because UserData seems like #506 , which means not just settings, but all user data, stuff like posts, comments, votes, saves, etc. Many of those things would be impossible to import, because its historical data.

Lots of somewhat overlapping issues this could close: https://github.com/LemmyNet/lemmy/issues?q=is%3Aissue+is%3Aopen+export

src/api_routes_http.rs Outdated Show resolved Hide resolved
crates/db_schema/src/impls/local_user.rs Show resolved Hide resolved
@Nutomic Nutomic changed the title Implement endpoints for user data import/export Implement user data import/export Sep 27, 2023
@Nutomic Nutomic marked this pull request as ready for review September 27, 2023 15:02
@wiki-me
Copy link
Contributor

wiki-me commented Sep 28, 2023

backing up comments Could also be useful (e.g. if i want to refer to something i wrote before when writing a new comment), i realize this might overload the server but this seems like a classic case where you could offer paid features (and there is a strong economic justification for having the features paid), or just having instance admin manually approve it (or stop the process if needed).

@@ -291,7 +292,14 @@ pub fn config(cfg: &mut web::ServiceConfig, rate_limit: &RateLimitCell) {
.route("/verify_email", web::post().to(verify_email))
.route("/leave_admin", web::post().to(leave_admin))
.route("/totp/generate", web::post().to(generate_totp_secret))
.route("/totp/update", web::post().to(update_totp)),
.route("/totp/update", web::post().to(update_totp))
.route("/export", web::get().to(export_user_backup)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a stricter rate limit for export too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya pry a good idea, the strictest we have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I applied the new import rate limit also for export (max once per day).

@Nutomic
Copy link
Member Author

Nutomic commented Sep 28, 2023

@wiki-me This is meant for data which can be exported from one account and then imported into another one. With comments thats not really possible, unless the import would repost all the comments again. That would result in countless duplicates. So if you want to backup everything you ever posted, it should be handled by an external tool using the API imo.

@dessalines
Copy link
Member

@wiki-me That's the issue I linked above, #506 , which would export all account data, not just settings like this one does.

.route(web::get().to(import_user_backup)),
web::scope("/user")
.wrap(rate_limit.import_user_settings())
.route("/export", web::get().to(export_user_backup))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, remember to rename these to export_settings, and the structs to UserSettingsBackup. In case in the future we want to export all user data, not just settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -297,8 +297,8 @@ pub fn config(cfg: &mut web::ServiceConfig, rate_limit: &RateLimitCell) {
.service(
web::scope("/user")
.wrap(rate_limit.import_user_settings())
.route("/export", web::get().to(export_user_backup))
.route("/import", web::get().to(import_user_backup)),
.route("/export", web::get().to(export_settings))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The route urls should maybe change too, in case we add a user/export which includes everything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like how?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/export_settings and import_settings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wiki-me
Copy link
Contributor

wiki-me commented Sep 30, 2023

This is meant for data which can be exported from one account and then imported into another one. With comments thats not really possible, unless the import would repost all the comments again.

Could you just not send it to other servers, that way a user can checkout another user top posts (example given) , or controversal posts (if that is currently implemented for user profiles), in case you are considering if to invite him to some private community or doing some form of collaboration with him.

But i agree there are more important things to implement but maybe you want to "strike while the iron is hot".

@kovalensky
Copy link

Will user be able to export account settings after he/she was banned?

@Nutomic
Copy link
Member Author

Nutomic commented Oct 2, 2023

@kovalensky No, that would be a separate issue.

@Nutomic Nutomic mentioned this pull request Oct 2, 2023
4 tasks
@kovalensky
Copy link

@Nutomic Is it necessary to open one?

@Nutomic
Copy link
Member Author

Nutomic commented Oct 2, 2023

@kovalensky Not sure, use the search function.


spawn_try_task(async move {
let person_id = local_user_view.person.id;
try_join_all(data.followed_communities.iter().map(|followed| async {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think try_join_all is a good method to use here for two reasons:

  1. if one of the follows fail, the others should still go through. for example if a user has followed communities on instances that no longer exist then the import will fail at a random point and leave the import partially done. (from the docs: If any future returns an error then all other futures will be canceled and an error will be returned immediately)

    Instead, a method should be used where each error is handled separately with the main import keeping going.

  2. There should be a limit on the concurrency. If the user follows 500 communities we don't want the server to fetch and subscribe to all of those 100 communities at once because rate limits might cause that to fail. It should either be fully sequential or with a fixed limit on concurrency.

    This should be possible by converting the iterator into a stream and calling .buffered(10) (see https://stackoverflow.com/a/70871743/2639190 )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these links are to be believed then join_all runs futures concurrently, not in parallel. Though I havent seen a clear confirmation for that in the docs.

If this is correct it would be enough to switch from try_join_all to join_all and collect the errors manually. I also tried to switch to the method you suggested, but its not working because the compiler throws an error about context geting moved.


    futures::stream::iter(data.followed_communities.clone().into_iter().map(
      |followed| async move {
        // need to reset outgoing request count to avoid running into limit
        let context_ = context_.reset_request_count();
        let community = followed.dereference(&context_).await?;
        let form = CommunityFollowerForm {
          person_id,
          community_id: community.id,
          pending: true,
        };
        CommunityFollower::follow(&mut context_.pool(), &form).await?;
        LemmyResult::Ok(())
      },
    ))
    .buffer_unordered(10)
    .collect::<Vec<_>>()
    .await;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I added error handling for remote fetches now. Also tried to return a list of failed items in the api response but couldnt get that working. For now its only being logged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concurrently, not in parallel

Makes sense, but that'll still mean that it will send out 100 requests at almost the same time (since it'll mainly be waiting on the HTTP I/O), which will potentially cause issues

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it working now. The code is rather verbose and repetetive now. It could potentially be simplified by moving the iter logic into a helper function, but that gets very complex with lots of generics and lambdas.

ADD COLUMN import_user_settings int NOT NULL DEFAULT 1;

ALTER TABLE local_site_rate_limit
ADD COLUMN import_user_settings_per_second int NOT NULL DEFAULT 86400;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a bit confused by this naming. it sounds like it allows 86400 imports per second but what it really means is it allows only one import per 86400 seconds. so maybe it should be called smth like import_user_settings_rate_limit_seconds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes all the rate limits have these confusing names so at least its consistent. It would be nice to rename them in the future to be clearer.

Copy link
Collaborator

@phiresky phiresky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Nutomic Nutomic merged commit 6d7b38f into main Oct 11, 2023
2 checks passed
@K4LCIFER
Copy link

Has the Lemmy UI been given a button to allow the user to interact with this new feature yet, or is it just going to be through API interaction?

@dessalines
Copy link
Member

Not yet no, you can track it over on that repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants