Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that username contains no uppercase letters #2000

Closed
wants to merge 16 commits into from
Closed

Check that username contains no uppercase letters #2000

wants to merge 16 commits into from

Conversation

Kradyz
Copy link
Contributor

@Kradyz Kradyz commented Dec 15, 2021

Addresses #1955 and #1986 for future users

Removed uppercase letters from the names passed to test_valid_actor_name()
@dessalines
Copy link
Member

We've been wanting to add lowercase-forced names for a while, so this might as well do the rest too.

This will also need a migration to:

  • Delete any to_lowercase dupes for the person.name column, keeping the first one. There are examples of this in the migration folder.
  • Convert all existing names to lower case.
  • Check if any actor_ids contain upper case chars ( i don't think they do, but it'd be good to check )
  • Add a lowercase enforcement check to the name column.

@Kradyz
Copy link
Contributor Author

Kradyz commented Dec 16, 2021

  • Adapted the remove duplicates code from this migration. Since "actor_id" now contains the information about both username and instance, I think that group by lower(actor_id) should be enough to identify duplicates.

  • Implemented by transforming person.name, person.actor_id, and person.inbox_url to lowercase, as these three columns contain the username. I also created "old_variable" columns so that down.sql command is able to recover the old names. I am not sure about how to undo the "remove duplicates" command with down.sq.

  • person.actor_ids do contain upper case chars when the username does, as the actor_id contains the username.

  • Adding this lowercase enforcement check will block new entries if they contain uppercase characters, rather than re-writing. A post from another instance is not fetched if the name of the author is capitalized in the other instance, so this will have backwards-compatibility issues with instances that have not migrated.

  • activity.data contains the username under the "actor" key, so it might be necessary to update this one too. However, the data also contains some text that should be capitalized, so applying lower(data) is probably not the right solution.

@dessalines
Copy link
Member

Great, this is looking good so far. A few things:

You need to generate this folder using diesel migration generate ... , and test that it works by doing:

export DATABASE_URL=postgres://lemmy:password@localhost:5432/lemmy
diesel migration run
diesel migration redo

Run redo a few times.

Comment on lines 34 to 36
alter table person add constraint person_name_lowercase_ck check (name = lower(name));
alter table person add constraint person_actor_id_lowercase_ck check (actor_id = lower(actor_id));
alter table person add constraint person_inbox_url_lowercase_ck check (inbox_url = lower(inbox_url));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preface these all with idx_ like the other indexes / constraints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. But shouldn't the prefix be ck_ or chk_, since this is a check constraint instead of a unique index constraints?

-- Add a lowecase enforcement check to these three columns

alter table person add constraint person_name_lowercase_ck check (name = lower(name));
alter table person add constraint person_actor_id_lowercase_ck check (actor_id = lower(actor_id));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a idx_group lower index, but you should probably remove that old one, as this new one is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, there is a lower_actor_id unique index constraint.

Indexes:
"person__pkey" PRIMARY KEY, btree (id)
"idx_person_actor_id" UNIQUE CONSTRAINT, btree (actor_id)
"idx_person_inbox_url" UNIQUE CONSTRAINT, btree (inbox_url)
"idx_person_lower_actor_id" UNIQUE, btree (lower(actor_id::text))
"idx_person_published" btree (published DESC)

Comment on lines 6 to 12
delete
from person
where id not in (
select min(id)
from person
group by lower(actor_id)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to learn the hard way, that doing an in query can take days sometimes, especially on large tables.

Use this one for dupes: https://github.com/LemmyNet/lemmy/blob/main/migrations/2021-11-22-135324_add_activity_ap_id_index/up.sql#L9

Comment on lines 14 to 22
-- Store the old values so that down.sql can recover them
alter table person add column old_name varchar(255);
update person set old_name = name;

alter table person add column old_actor_id varchar(255);
update person set old_actor_id = actor_id;

alter table person add column old_inbox_url varchar(255);
update person set old_inbox_url = inbox_url;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't worry about saving these, and it also might mess up some things with diesel if we add columns but don't do it also in the code. Don't worry, before we merge this and run it in production, I will test with our lemmy.ml data to make sure it works correctly. But thanks for thinking of this.

Comment on lines 4 to 12
-- Recover the old values
update person set name = old_name;
update person set actor_id = old_actor_id;
update person set inbox_url = old_inbox_url;

-- Delete the columns with the "old" values
alter table person drop column old_name;
alter table person drop column old_actor_id;
alter table person drop column old_inbox_url;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below.

@Kradyz
Copy link
Contributor Author

Kradyz commented Dec 18, 2021

I have implemented the changes and tested the migration using

diesel migration run
diesel migration redo

I had a bit of trouble implementing the exact same syntax for the deletion of dupes, so I used what I think would be the equivalent minimal syntax from this postgresql tutorial. This works with my small database so hopefully it will be efficient with your large database.

The lowercase check will make it such that a migrated instance won't be able to fetch data from capitalized usernames in non-migrated instances. I think that it should be possible to make this migration backwards-compatible by applying to_lowercase() when processing the person information in lemmy/crates/apub/src/objects/person.rs. I tried to implement this myself as a test, but I don't quite understand how to transform the "inbox_url.clone().into()" to lowercase.

@Nutomic
Copy link
Member

Nutomic commented Dec 18, 2021

The lowercase check will make it such that a migrated instance won't be able to fetch data from capitalized usernames in non-migrated instances. I think that it should be possible to make this migration backwards-compatible by applying to_lowercase() when processing the person information in lemmy/crates/apub/src/objects/person.rs. I tried to implement this myself as a test, but I don't quite understand how to transform the "inbox_url.clone().into()" to lowercase.

That sounds like it would not allow Lemmy to federate properly with other software which allows uppercase in usernames. Not sure if there is such software as usernames seem to be generally lowercase, but anyway it might not be good to make the db constraints too strict.

Also, did you check if the same problem happens with uppercase community names?

@Kradyz
Copy link
Contributor Author

Kradyz commented Dec 19, 2021

After diving a bit deeper into it...

To address #1955 and #1986, it is enough to change the webfinger regex in lemmy/crates/utils/src/settings/mod.rs, which currently excludes A-Z.

static WEBFINGER_REGEX: Lazy<Regex> = Lazy::new(|| {
  Regex::new(&format!(
    "^acct:([a-z0-9_]{{3,}})@{}$",
    Settings::get().hostname
  ))
  .expect("compile webfinger regex")
});

I have applied this change to my instance, and you can now access the user URL without problem:
http://lemmy.ml/u/Sal@mander.xyz

I also tested Mastodon, and the user is also fetched without issue.

Is using lowercase names actually necessary? If other projects prefer interacting with lowercase actor_id, why not just apply a to_lowercase() function call in generate_local_apub_endpoint() such that the actor_ids and the inbox_urls are set to lowercase, while the username remains catpitalized?

@Nutomic With the current webfinger regex, uppercase community names would also be rejected.

@Nutomic
Copy link
Member

Nutomic commented Dec 19, 2021

Good find, fixing the webfinger regex definitely seems like the better solution then. And as you say that Mastodon handles uppercase without problems, I dont see any reason for us to convert to lowercase.

Honestly this simply looks like an oversight because we only ever tested with lowercase names because its easier to type. While fixing this, you could change a username in the api_tests folder to contain uppercase, so that this gets tested for (but not sure if any of them use webfinger).

@dessalines
Copy link
Member

dessalines commented Dec 19, 2021

Along with the other PR, I'd still like to keep / do this one, and enforce lowercase lemmy usernames. Its something we've needed to do for a while, and its worthwhile to not have usernames like MyUser, myUser, etc. Holidays got me busy so it might be a bit before I can test this one.

@Nutomic
Copy link
Member

Nutomic commented Dec 19, 2021

@dessalines For that it would be enough to add a constraint that lowercase(username) is unique.

@Kradyz
Copy link
Contributor Author

Kradyz commented Dec 19, 2021

@Nutomic The unique index for lower(actor_id) called "idx_person_lower_actor_id" is already enforcing that no two people can have the same lower(username) in an instance. I think that this issue was dealt with during this migration, and the username duplicates were removed.

If you would like enforce lowercase usernames, I would suggest performing the migration and enforcing the creation of new lowercase usernames via the UI and via the VALID_ACTOR_NAME_REGEX, but not enforcing the lowercase database checks, as those checks will prevent interacting with uppercase usernames in other projects.

Alternatively, db checks can be enforced and external usernames could be changed to lowercase when they interact with lemmy.

@dessalines
Copy link
Member

The other reason I can think of, to enforce lowercase usernames: people would stop getting confused when trying to log in and not using the correct capital letters.

@dessalines
Copy link
Member

@Nutomic what do you think? Should we disregard this now that we support uppercase user names?

@Nutomic
Copy link
Member

Nutomic commented Jan 4, 2022

I dont think there is any reason to prevent uppercase in usernames. Only argument I could think of would be confusion between some characters like I / l, but thats something admins can take care of if necessary.

@dessalines
Copy link
Member

In that case I'll close, @Kradyz other merged PR should be good enough.

@dessalines dessalines closed this Jan 6, 2022
@Kradyz
Copy link
Contributor Author

Kradyz commented Jan 6, 2022

@dessalines Ok! Just a reminder that LemmyNet/lemmy-ui/#525 was merged, preventing users from inputting uppercase usernames when creating a user. You might want to revert this too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants