add more neopronouns #48

bbrk24 · 2023-04-05T03:21:20Z

In my experience (in general, not just on stackexchange), xe/xem and fae/faer are the most widely-used neopronouns. However, they aren't recognized by the pronoun assistant userscript, despite the fact that some other neopronouns are. This PR adds them.

Glorfindel83 · 2023-04-05T18:48:13Z

I'll admit I have no experience at all, besides what I see on Stack Exchange ...

bbrk24 · 2023-04-06T18:47:41Z

What happened here? Do I have to put 'xe' after 'xem' in the list?

makyen · 2023-04-06T19:47:28Z

Yes and no. The list is concatenated into a regular expression of the form '\W*((pronoun1|pronoun2|...)(\s*/\s*(pronoun1|pronoun2|...))+)\W*. When using alternates like (alt1|alt2|alt3|...), regular expressions will use the first alternative that matches at the current character. So, the regex (xey?|xem) by itself can never match the entirety of "xem", because the "xe" in "xem" will always be matched by xey? as just "xe". Using the regex (xem|xey?) will work correctly (i.e. reversing the order in the list), but there's a larger issue which should be resolved which also resolves this issue.

Larger issue: missing `\b` before and after the pronouns

However, there's a larger issue that the overall regular expression isn't bracketed by \b (word boundary) both prior to and after the list of pronouns. Without a \b before and after (or other means to guarantee the regex is matching the entire word), it results in her? matching words like "here" or "help" and xey? matching things like "xerox". If the regular expression is \b(pronoun1|pronoun2|...)\b, such matches would be prevented. It also has the effect that \b(xey?|xem)\b must match the complete word, not a partial word, so xey? won't match "xem", because there's no word character to non-word character boundary after the "xe", resulting in xem matching "xem".¹

Overall, the line:

let pronounListRegex = new RegExp('\\W*((' + allPronouns + ')(\\s*/\\s*(' + allPronouns + '))+)\\W*', 'i');

should change to something like:

const pronounListRegex = new RegExp('\\b((' + allPronouns + ')(\\s*/\\s*(' + allPronouns + '))+)\\b', 'i');

Note: the above assumes that none of the pronouns that are being looked for begin with a non-word character or end with a non-word character. If they do, then things get a bit more complicated, both due to needing to account for the transition from the pronoun ending with a non-word character to the next character, which is very likely to also be a non-word character and due to how \b is defined in some regular expression engines. If the possible pronouns include ones with Unicode characters outside the ASCII word characters, [A-Za-z0-9_] then it's, potentially, even more complex as you may need to account for using those characters. While this is a potential issue, it's easier to ignore it until there's a pronoun that begins or ends with a non-ASCII word character.

regex.lastIndex issue

In addition, given that these are regular expressions which are being created and reused, prior to using each individual regex on new text, that regex's .lastIndex should be set to 0. This is a foible of how JavaScript implements RegExp. While it will usually work to not set <you regex>.lastIndex = 0 when starting a comparison against a different string, it will fail to operate properly if the strings happen to have the same content, because JavaScript checks to see if the text you're testing against now is the same content (i.e. not the same identical String) as the text the regex was last used against in order to choose to automatically set .lastIndex = 0. If the current text is the same as the previous text, then the regex engine will begin the check from where it left off of the last check. Overall, best practice is just to always manually set .lastIndex = 0 when you're reusing a regex and starting a comparison against a string.

Given that a "/" is required between the detected words, it requires text that's a bit more convoluted to have a full match, but still possible (e.g. "other/xerox" would currently be recognized as "her/xe"). The more pronouns which are added, the more possible combinations there are with real words.

bbrk24 · 2023-04-06T23:15:36Z

missing \b before and after the pronouns

This seems like an easy fix; I could get a PR up for this later today.

regex's .lastIndex should be set to 0.

I think I understand why, but I haven't looked at the script thoroughly enough to know where to do this. Doesn't the /y flag affect this in some way, too?

add more neopronouns

06f74cd

Glorfindel83 merged commit e25c235 into Glorfindel83:master Apr 5, 2023

bbrk24 mentioned this pull request Apr 7, 2023

Fix pronoun RegExps #49

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add more neopronouns #48

add more neopronouns #48

bbrk24 commented Apr 5, 2023

Glorfindel83 commented Apr 5, 2023

bbrk24 commented Apr 6, 2023

makyen commented Apr 6, 2023 •

edited

bbrk24 commented Apr 6, 2023

add more neopronouns #48

add more neopronouns #48

Conversation

bbrk24 commented Apr 5, 2023

Glorfindel83 commented Apr 5, 2023

bbrk24 commented Apr 6, 2023

makyen commented Apr 6, 2023 • edited

Larger issue: missing \b before and after the pronouns

regex.lastIndex issue

bbrk24 commented Apr 6, 2023

makyen commented Apr 6, 2023 •

edited

Larger issue: missing `\b` before and after the pronouns