Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add more neopronouns #48

Merged
merged 1 commit into from
Apr 5, 2023
Merged

Conversation

bbrk24
Copy link
Contributor

@bbrk24 bbrk24 commented Apr 5, 2023

In my experience (in general, not just on stackexchange), xe/xem and fae/faer are the most widely-used neopronouns. However, they aren't recognized by the pronoun assistant userscript, despite the fact that some other neopronouns are. This PR adds them.

@Glorfindel83
Copy link
Owner

I'll admit I have no experience at all, besides what I see on Stack Exchange ...

@Glorfindel83 Glorfindel83 merged commit e25c235 into Glorfindel83:master Apr 5, 2023
@bbrk24
Copy link
Contributor Author

bbrk24 commented Apr 6, 2023

image

What happened here? Do I have to put 'xe' after 'xem' in the list?

@makyen
Copy link
Contributor

makyen commented Apr 6, 2023

Yes and no. The list is concatenated into a regular expression of the form '\W*((pronoun1|pronoun2|...)(\s*/\s*(pronoun1|pronoun2|...))+)\W*. When using alternates like (alt1|alt2|alt3|...), regular expressions will use the first alternative that matches at the current character. So, the regex (xey?|xem) by itself can never match the entirety of "xem", because the "xe" in "xem" will always be matched by xey? as just "xe". Using the regex (xem|xey?) will work correctly (i.e. reversing the order in the list), but there's a larger issue which should be resolved which also resolves this issue.

Larger issue: missing \b before and after the pronouns

However, there's a larger issue that the overall regular expression isn't bracketed by \b (word boundary) both prior to and after the list of pronouns. Without a \b before and after (or other means to guarantee the regex is matching the entire word), it results in her? matching words like "here" or "help" and xey? matching things like "xerox". If the regular expression is \b(pronoun1|pronoun2|...)\b, such matches would be prevented. It also has the effect that \b(xey?|xem)\b must match the complete word, not a partial word, so xey? won't match "xem", because there's no word character to non-word character boundary after the "xe", resulting in xem matching "xem".1

Overall, the line:

let pronounListRegex = new RegExp('\\W*((' + allPronouns + ')(\\s*/\\s*(' + allPronouns + '))+)\\W*', 'i');

should change to something like:

const pronounListRegex = new RegExp('\\b((' + allPronouns + ')(\\s*/\\s*(' + allPronouns + '))+)\\b', 'i');

Note: the above assumes that none of the pronouns that are being looked for begin with a non-word character or end with a non-word character. If they do, then things get a bit more complicated, both due to needing to account for the transition from the pronoun ending with a non-word character to the next character, which is very likely to also be a non-word character and due to how \b is defined in some regular expression engines. If the possible pronouns include ones with Unicode characters outside the ASCII word characters, [A-Za-z0-9_] then it's, potentially, even more complex as you may need to account for using those characters. While this is a potential issue, it's easier to ignore it until there's a pronoun that begins or ends with a non-ASCII word character.

regex.lastIndex issue

In addition, given that these are regular expressions which are being created and reused, prior to using each individual regex on new text, that regex's .lastIndex should be set to 0. This is a foible of how JavaScript implements RegExp. While it will usually work to not set <you regex>.lastIndex = 0 when starting a comparison against a different string, it will fail to operate properly if the strings happen to have the same content, because JavaScript checks to see if the text you're testing against now is the same content (i.e. not the same identical String) as the text the regex was last used against in order to choose to automatically set .lastIndex = 0. If the current text is the same as the previous text, then the regex engine will begin the check from where it left off of the last check. Overall, best practice is just to always manually set .lastIndex = 0 when you're reusing a regex and starting a comparison against a string.


  1. Given that a "/" is required between the detected words, it requires text that's a bit more convoluted to have a full match, but still possible (e.g. "other/xerox" would currently be recognized as "her/xe"). The more pronouns which are added, the more possible combinations there are with real words.

@bbrk24
Copy link
Contributor Author

bbrk24 commented Apr 6, 2023

missing \b before and after the pronouns

This seems like an easy fix; I could get a PR up for this later today.

regex's .lastIndex should be set to 0.

I think I understand why, but I haven't looked at the script thoroughly enough to know where to do this. Doesn't the /y flag affect this in some way, too?

@bbrk24 bbrk24 mentioned this pull request Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants