Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factoids Rewrite #36

Open
wants to merge 30 commits into
base: master
from

Conversation

@brigand
Copy link
Owner

commented Oct 6, 2019

This is initial work for the reimplementation of factoids so we can sunset ecmabot in the near future (a bot we can't update isn't healthy). Please open issues or message me if other things need to be done before then.

Future plans (maybe tomorrow):

  • proper alias support
  • access control (users can propose factoids, elevated users can directly set factoids or approve proposals)
  • inline factoid usage, e.g. We recommend {!airbnb}

In case it's confusing, the .persistent.js file is immune to HMR, while .internal.js can be evicted from the cache.

Edit: will add some tests tomorrow.

const [, key, value] = learnMatch;
if (key.length > 50) {
msg.respondWithMention(
`Is anyone going to remember a ${key.length} character trigger? Try something shorter (max 50)`,

This comment has been minimized.

Copy link
@brigand

brigand Oct 6, 2019

Author Owner

Maybe a less sassy message; I dunno.

let i = 0;
i < MAX_ALIAS_DEPTH && cursor && cursor.type === 'alias';
i += 1
) {

This comment has been minimized.

Copy link
@brigand

brigand Oct 6, 2019

Author Owner

Please recommend a better way to do this while loop with an iteration limit.

This comment has been minimized.

Copy link
@caub

caub Oct 6, 2019

Contributor

would it be possible to resolve aliases at save-time, so read is always O(1)? a command could have an array of aliases

This comment has been minimized.

Copy link
@brigand

brigand Oct 6, 2019

Author Owner

You can update the thing the alias points to.

me: !learn alias foo = bar
me: !learn bar = new value
me: !foo
jelllbot: new value

We could do depth and circular reference checking at !learn time, though.

This comment has been minimized.

Copy link
@caub

caub Oct 7, 2019

Contributor

What does this output?

me: !learn alias foo = bar
me: !foo

I think it's not necessary to store aliases apart, other than an array around the command, and the above !foo command can either fail, or succeed and create a bar command with aliases: ["foo"], and empty content (so no output by jellobot, until someone adds content to bar), both choices make sense
I think that's simpler no? in term of storage structure

Or even if aliases are stored apart, it's possible to resolves them at save time, what I mean is for example:

!learn foo=bar
!learn alias foo1=foo
!learn alias foo2=foo1
!learn alias foo3=foo2

then foo3 should directly point to foo

This comment has been minimized.

Copy link
@brigand

brigand Oct 7, 2019

Author Owner

If we wanted O(1) lookup, we'd need to maintain a mapping of aliases to factoids in memory, which is extra synchronization work. I'm not worried about the cost of resolving a few aliases on factoid usage; especially considering the cost is only paid when it is an actual alias.

Storing aliases on the thing they point to is much more expensive, and the data model would allow creating the same alias twice without special checks for it existing anywhere else, as well as !learn conflicting with an alias. The way both the previous and new file format deal with aliases seems to solve all of these problems fairly well.

I think I'd like to prevent a dangling alias simply because it would typically be a mistake, and make it easier to end up with a circular alias.

Limiting alias depth also limits complexity (to users of the bot), and I'm not convinced we even need more than one level of aliases (i.e. it might be fine to forbid an alias to an alias).

);
}

if (value.length > 400) {

This comment has been minimized.

Copy link
@brigand

brigand Oct 6, 2019

Author Owner

In IRC, what is the message limit actually measured in? Is it bytes? Might need to correct this.

This comment has been minimized.

Copy link
@ljharb

ljharb Oct 6, 2019

is it hardcoded in the protocol, or determined by the server?

This comment has been minimized.

Copy link
@kirjavascript

kirjavascript Oct 10, 2019

in node-irc and splitlong.pl, the max length is calculated as

497 - nick length - hostmask length

the protocol gives a limit of 510 /bytes/ but things like PRIVMSG decrease this number

you can see how 497 is calculated here: https://github.com/irssi/scripts.irssi.org/blob/master/scripts/splitlong.pl#L35

This comment has been minimized.

Copy link
@brigand

brigand Oct 10, 2019

Author Owner

Thanks for clarifying. In that case, we should do utf-16 string -> utf8 byte length and check, right?

This comment has been minimized.

Copy link
@kirjavascript

kirjavascript Oct 11, 2019

ah, hadn't realised this - have just been checking the JS string length

I doubt the edge cases that affect it will crop up in factoids, but it definitely makes sense to do so.

using this to calculate it now;

function byteCount(s) {
    return encodeURI(s).split(/%..|./).length - 1;
}
this._dirty = true;
}

async writeToDisk() {

This comment has been minimized.

Copy link
@brigand

brigand Oct 6, 2019

Author Owner

Need a sanity check here. The goal is to not have two of these running concurrently.

return !this._loaded;
}

async loadFromDisk() {

This comment has been minimized.

Copy link
@brigand

brigand Oct 6, 2019

Author Owner

Need a sanity check here. The goal is to not have two of these running concurrently.

@brigand

This comment has been minimized.

Copy link
Owner Author

commented Oct 6, 2019

@ljharb @caub please review this if you have time.

Changes without the extra lint fixes

brigand added 3 commits Oct 6, 2019
@ljharb

This comment has been minimized.

Copy link

commented Oct 6, 2019

What's the overarching goal here? What's wrong with the current factoids system and format that needs changing?

@brigand

This comment has been minimized.

Copy link
Owner Author

commented Oct 6, 2019

I would like to work on the new features mentioned above, and be able to add new features in the future.

I actually intend to change the format a little bit more. It'll be simpler if even the first !learn is in the changes array (with previous: null). The first item in the array is "most recent" and the last item is "the initial value".

This allows implementing access control + personal factoids with changes.find(item => item.editor === msg.from || item.global), where global is set by an elevated user on !learn, or on someone else's edit via a to-be-implemented command.

There's quite a bit of duplication in the current format, but only after the first edit. It's just pretty weird how the data is laid out. It would be quite awkward to implement various things in the current format.

@ljharb

This comment has been minimized.

Copy link

commented Oct 6, 2019

I'm quite interested in access controls.

When you say "personal factoids", however, that seems rife for abuse. I actively would not want individuals to be able to store their own private factoids in the bot.

@brigand

This comment has been minimized.

Copy link
Owner Author

commented Oct 6, 2019

Sure, makes sense. Then the check for retrieving a factoid is reduced to changes.find(item => item.global), and the pending changes will only appear in the factoid file, waiting to be approved.

}

if (cursor && cursor.type === 'alias') {
throw new Error(`Alias depth exceeded when looking up a factoid.`);

This comment has been minimized.

Copy link
@ljharb

ljharb Oct 6, 2019

why is there a depth limit?

This comment has been minimized.

Copy link
@brigand

brigand Oct 6, 2019

Author Owner

I can replace this with circular alias checking at insertion time.

const entries = JSON.parse(await readFile(FILE_PATH, 'utf-8'));
for (const key of Object.keys(entries)) {
map.set(key, entries[key]);
}

This comment has been minimized.

Copy link
@ljharb

ljharb Oct 6, 2019

const map = new Map(Object.entries(entries));
@brigand

This comment has been minimized.

Copy link
Owner Author

commented Oct 7, 2019

Some updates...

The config file now has an entry for factoid moderators.

{
  "plugins": {
    "jsEval": { "timeout": 5000 },
    "factoid": {
      "moderators": ["GreenJello", "Other", "Names", "2B", "Added"]
    }

Anyone can !learn and they get a different message depending on if they're a moderator or not.

NormalUser: !learn foo = bar
bot: NormalUser, change proposed to "foo"

Moderator: !learn foo = bar
bot: Moderator, got it. I'll remember this for when "!foo" is used.

Moderator: !publish foo
bot: Moderator, done. Everyone will now see the previous draft when "!foo" is used.

The JSON file format has been reduced to the following:

  "classes": {
    "type": "factoid",
    "popularity": 5,
    "editors": [
      "ljharb",
      "some-user"
    ],
    "changes": [
      {
        "date": "2019-10-07T01:38:06.850Z",
        "editor": "some-user",
        "value": "is this java?",
        "live": false
      },
      {
        "date": "2019-01-29T00:18:51.526Z",
        "editor": "ljharb",
        "value": "Class hierarchies? Don't do that! http://raganwald.com/2014/03/31/class-hierarchies-dont-do-that.html (See also, !inheritance)",
        "live": true
      },
      {
        "editor": "mjcd",
        "date": "2019-01-28T12:43:13.327Z",
        "value": "Class hierarchies? Don't do that! http://raganwald.com/2014/03/31/class-hierarchies-dont-do-that.html (See also, !inheritance)",
        "live": true
      }
    ]
  },

I might also get rid of "editors".

Given that config (note the first item is a proposed change, i.e. live: false):

Anyone: !classes
bot: Anyone, Class hierarchies? Don't do that! http://raganwald.com/2014/03/31/class-hierarchies-dont-do-that.html (See also, !inheritance)

Moderator: !publish classes
bot: Moderator, done. Everyone will now see the previous draft when "!classes" is used.

Anyone: !classes
bot: Anyone, is this java?

Also, !forget is implemented for both moderators and other users. In the store it simply adds a change where the value is null.

I still need to work out the best way to review factoids others have proposed. Suggestions welcome on that. Maybe we just take the most recent proposals and put them in a markdown file in a gist.

@ljharb

This comment has been minimized.

Copy link

commented Oct 7, 2019

@brigand moderator-only text should only appear in PM

@brigand

This comment has been minimized.

Copy link
Owner Author

commented Oct 7, 2019

You're free to run the commands in PM. Is that insufficient?

@ljharb

This comment has been minimized.

Copy link

commented Oct 7, 2019

Yes - it would be pretty bad if "being a moderator" meant that factoids i triggered (or that were triggered pointing at me) contained noise that was irrelevant to the majority of the people that saw it.

People without permissions should ideally not even know that those commands exist.

@brigand

This comment has been minimized.

Copy link
Owner Author

commented Oct 7, 2019

Triggering factoids doesn't consider the moderator status of anyone; only !learn and !publish consider this. You're always free to execute those in private messages, and if you do then no one will even know you're a moderator.

For those commands, I think it's important that we signal to users that e.g. !learn foo = bar won't be immediately visible when !foo is used, and !publish should present an error, even if only for the cases where a human that is a moderator is presenting with a nick not in the moderator list.

We could require these to be run in private messages, but I think we can leave the agency of where to invoke them in the hands of individual users.

Edit: to clarify my previous message, "Anyone" means any user on IRC, in the moderator list or not.

@ljharb

This comment has been minimized.

Copy link

commented Oct 7, 2019

If I wanted to truly hide being a moderator, I'd have to use !learn in a channel and not have the moderator-specific commands show up.

@brigand

This comment has been minimized.

Copy link
Owner Author

commented Oct 7, 2019

That actually doesn't really solve it, because there will be a difference if you do !learn foo = bar in a channel and someone immediately runs !foo. It'll either return the old value if you're not a moderator, or the new value if you are.

The only way around that is for the bot to make your edit live at a random time, simulating an approval from a human. I don't think we want to go that route.

What is your cause and level of concern with people knowing someone is a moderator?

@ljharb

This comment has been minimized.

Copy link

commented Oct 7, 2019

I suspect it will immediately and frequently generate complaints about "i want to be a moderator", "why is that person a moderator", etc; it will also generate confusion and frequent re-explanation about how permissions work.

@brigand

This comment has been minimized.

Copy link
Owner Author

commented Oct 7, 2019

I think you're right on that, but I'm not clear on what the solution is. Should we only allow commands like !learn in private messages?

@ljharb

This comment has been minimized.

Copy link

commented Oct 8, 2019

Maybe even moderator-created learn commands should require a PM-only publish command?

@brigand

This comment has been minimized.

Copy link
Owner Author

commented Oct 8, 2019

I like that idea. I'll implement it that way.


const moderators = (msg.selfConfig && msg.selfConfig.moderators) || [];
console.log({ self: msg.selfConfig });
const isModerator = moderators.includes(msg.from);

This comment has been minimized.

Copy link
@kirjavascript

kirjavascript Oct 13, 2019

is this the only check done? people can circumvent this by changing their nicks to a mod's nick when the mod is not online

checking if the user is identified can resolve this. on freenode you can send ACC username to NickServ to see if they are. most other services / servers use STATUS username instead

There isnt a standard for this but using ACC or STATUS has convered every server I've tried it on

This comment has been minimized.

Copy link
@brigand

brigand Oct 13, 2019

Author Owner

Yeah, would this work?

  • cache mapping msg.from (user sending the message) to account name
  • invalidate cache for a user on part, quit, nick change
  • use ACC or STATUS (based on config) as needed to populate the cache
  • failure of ACC/STATUS causes permission to be denied

"part" is overly aggressive invalidation as they may be in multiple channels, but I don't know how to work around that.

This comment has been minimized.

Copy link
@kirjavascript

kirjavascript Oct 13, 2019

not sure if the cache will buy you much at the cost of complexity and runtime cost

In my implementation I simply check on every elevated request. The difference is basically unnoticable so I had no desire to try and improve the speed.

That said, if you wanted to add caching I cant think of a way to circumvent what you've outlined.

This comment has been minimized.

Copy link
@ljharb

ljharb Oct 14, 2019

What happens if the bot misses a part, quit, or nick change?

This comment has been minimized.

Copy link
@brigand

brigand Oct 14, 2019

Author Owner

How about this?

  • no cache (for now)
  • check username on !learn
    • if not logged in, reject the !learn
    • otherwise, store the proposed change with the account name rather than nick
  • check the username on !publish
    • if not logged in, reject the publish
    • if not in moderator list, reject the publish
    • otherwise make the change live

This comment has been minimized.

Copy link
@ljharb

ljharb Oct 15, 2019

why reject the learn from an anon user? we don't get that problem with ecmabot, and anyone on freenode could edit the factoids directly in a PM, auth or not.

This comment has been minimized.

Copy link
@brigand

brigand Oct 16, 2019

Author Owner

I would like to track the account name in the proposals to have a more reliable history of changes. That requires the user to be logged in.

We could alternatively allow anonymous !learn and do something having an editorNick, editorAccount, with the latter potentially being null.

This comment has been minimized.

Copy link
@ljharb

ljharb Oct 16, 2019

Tracking both nick and account seems important regardless; it seems useful to then allow account to be null?

We can always lock it down further later if it becomes a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.