New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check that username contains no uppercase letters #2000
Conversation
Removed uppercase letters from the names passed to test_valid_actor_name()
We've been wanting to add lowercase-forced names for a while, so this might as well do the rest too. This will also need a migration to:
|
Password test needs lowercase user
|
Great, this is looking good so far. A few things: You need to generate this folder using
Run redo a few times. |
alter table person add constraint person_name_lowercase_ck check (name = lower(name)); | ||
alter table person add constraint person_actor_id_lowercase_ck check (actor_id = lower(actor_id)); | ||
alter table person add constraint person_inbox_url_lowercase_ck check (inbox_url = lower(inbox_url)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preface these all with idx_
like the other indexes / constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. But shouldn't the prefix be ck_ or chk_, since this is a check constraint instead of a unique index constraints?
-- Add a lowecase enforcement check to these three columns | ||
|
||
alter table person add constraint person_name_lowercase_ck check (name = lower(name)); | ||
alter table person add constraint person_actor_id_lowercase_ck check (actor_id = lower(actor_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a idx_group lower
index, but you should probably remove that old one, as this new one is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, there is a lower_actor_id unique index constraint.
Indexes:
"person__pkey" PRIMARY KEY, btree (id)
"idx_person_actor_id" UNIQUE CONSTRAINT, btree (actor_id)
"idx_person_inbox_url" UNIQUE CONSTRAINT, btree (inbox_url)
"idx_person_lower_actor_id" UNIQUE, btree (lower(actor_id::text))
"idx_person_published" btree (published DESC)
delete | ||
from person | ||
where id not in ( | ||
select min(id) | ||
from person | ||
group by lower(actor_id) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to learn the hard way, that doing an in query can take days sometimes, especially on large tables.
Use this one for dupes: https://github.com/LemmyNet/lemmy/blob/main/migrations/2021-11-22-135324_add_activity_ap_id_index/up.sql#L9
-- Store the old values so that down.sql can recover them | ||
alter table person add column old_name varchar(255); | ||
update person set old_name = name; | ||
|
||
alter table person add column old_actor_id varchar(255); | ||
update person set old_actor_id = actor_id; | ||
|
||
alter table person add column old_inbox_url varchar(255); | ||
update person set old_inbox_url = inbox_url; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't worry about saving these, and it also might mess up some things with diesel if we add columns but don't do it also in the code. Don't worry, before we merge this and run it in production, I will test with our lemmy.ml data to make sure it works correctly. But thanks for thinking of this.
-- Recover the old values | ||
update person set name = old_name; | ||
update person set actor_id = old_actor_id; | ||
update person set inbox_url = old_inbox_url; | ||
|
||
-- Delete the columns with the "old" values | ||
alter table person drop column old_name; | ||
alter table person drop column old_actor_id; | ||
alter table person drop column old_inbox_url; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below.
I have implemented the changes and tested the migration using
I had a bit of trouble implementing the exact same syntax for the deletion of dupes, so I used what I think would be the equivalent minimal syntax from this postgresql tutorial. This works with my small database so hopefully it will be efficient with your large database. The lowercase check will make it such that a migrated instance won't be able to fetch data from capitalized usernames in non-migrated instances. I think that it should be possible to make this migration backwards-compatible by applying to_lowercase() when processing the person information in lemmy/crates/apub/src/objects/person.rs. I tried to implement this myself as a test, but I don't quite understand how to transform the "inbox_url.clone().into()" to lowercase. |
That sounds like it would not allow Lemmy to federate properly with other software which allows uppercase in usernames. Not sure if there is such software as usernames seem to be generally lowercase, but anyway it might not be good to make the db constraints too strict. Also, did you check if the same problem happens with uppercase community names? |
After diving a bit deeper into it... To address #1955 and #1986, it is enough to change the webfinger regex in lemmy/crates/utils/src/settings/mod.rs, which currently excludes A-Z.
I have applied this change to my instance, and you can now access the user URL without problem: I also tested Mastodon, and the user is also fetched without issue. Is using lowercase names actually necessary? If other projects prefer interacting with lowercase actor_id, why not just apply a to_lowercase() function call in generate_local_apub_endpoint() such that the actor_ids and the inbox_urls are set to lowercase, while the username remains catpitalized? @Nutomic With the current webfinger regex, uppercase community names would also be rejected. |
Good find, fixing the webfinger regex definitely seems like the better solution then. And as you say that Mastodon handles uppercase without problems, I dont see any reason for us to convert to lowercase. Honestly this simply looks like an oversight because we only ever tested with lowercase names because its easier to type. While fixing this, you could change a username in the api_tests folder to contain uppercase, so that this gets tested for (but not sure if any of them use webfinger). |
Along with the other PR, I'd still like to keep / do this one, and enforce lowercase lemmy usernames. Its something we've needed to do for a while, and its worthwhile to not have usernames like |
@dessalines For that it would be enough to add a constraint that |
@Nutomic The unique index for lower(actor_id) called "idx_person_lower_actor_id" is already enforcing that no two people can have the same lower(username) in an instance. I think that this issue was dealt with during this migration, and the username duplicates were removed. If you would like enforce lowercase usernames, I would suggest performing the migration and enforcing the creation of new lowercase usernames via the UI and via the VALID_ACTOR_NAME_REGEX, but not enforcing the lowercase database checks, as those checks will prevent interacting with uppercase usernames in other projects. Alternatively, db checks can be enforced and external usernames could be changed to lowercase when they interact with lemmy. |
The other reason I can think of, to enforce lowercase usernames: people would stop getting confused when trying to log in and not using the correct capital letters. |
@Nutomic what do you think? Should we disregard this now that we support uppercase user names? |
I dont think there is any reason to prevent uppercase in usernames. Only argument I could think of would be confusion between some characters like |
In that case I'll close, @Kradyz other merged PR should be good enough. |
@dessalines Ok! Just a reminder that LemmyNet/lemmy-ui/#525 was merged, preventing users from inputting uppercase usernames when creating a user. You might want to revert this too. |
Addresses #1955 and #1986 for future users