Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argument commands #17

Open
larsgw opened this issue Nov 16, 2020 · 5 comments
Open

Argument commands #17

larsgw opened this issue Nov 16, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@larsgw
Copy link
Member

larsgw commented Nov 16, 2020

The citationjs parser needs to allow for more different kinds of commands, mostly argument commands. Arguments seem to be treated the same always: it either takes in a braced block or the first character of text. Exceptions are math blocks: \url takes in the dollar sign verbatim while \emph does not.

@larsgw larsgw added the enhancement New feature or request label Nov 16, 2020
@retorquere
Copy link
Contributor

That's more a difference whether a command parses its argument in verbatim-mode; \url expects one parameter, and parses that in verbatim mode; \href expects two arguments, but parses the first verbatim, and the 2nd normal. \begin{verbatim} ...\end{verbatim} parses everything in that environment verbatim. \verb parses everything until the end of the block it's in verbatim.

There's simply no math in verbatim environments, because the $ is just a character there.

@larsgw
Copy link
Member Author

larsgw commented Nov 16, 2020

That's a bit annoying, I was planning to do something like the following:

// constants.js
export const argumentCommands = {
  href (url, text) { return text === url ? text : `${text} (${url})` }
}

// value.js (grammar)
const grammar = new Grammar({
  // ...

  Command () {
    const command = this.consumeToken('command').value

    if (command in constants.argumentCommands) {
      const func = constants.argumentCommands[command]
      const args = []
      let arity = func.length // fun thing

      while (arity-- > 0) {
        this.consumeToken('whitespace', /* optional: */ true)
        args.push(this.consumeRule('Argument'))
      }

      return func(...args)
    } // else...
  },

  // ...
})

@retorquere
Copy link
Contributor

If you retain the full parsed input attached to the tokens while tokenizing, it's possible to decide during this phase how you want to handle the input. Basically, you process the tokens according to their semantic meaning for normal mode, and for verbatim mode, you take the parsed orig text attached to the tokens and string it together.

Don't forget that commands can have arguments in square brackets. I simply ignore them, but for that I do have to parse them.

@larsgw
Copy link
Member Author

larsgw commented Nov 22, 2020

I think I might just let the command functions be called as if they're rules in the grammar, i.e. they can decide themselves how to parse their arguments. Perhaps a bit similar to what you're doing, based on what I saw. It feels a bit weird to make it that customisable but I don't think it can lead to code injection or the like.

By the way, I am working on a prototype plugin for @citation-js/plugin-bibtex that extends unicode support with your unicode2latex tables. I don't really want to put an additional 400KB in the default browser bundle so I think an optional plugin to the plugin could work well. I am still working out how to add things like {\\'{}I} but that might be helped by the changes mentioned above.

@retorquere
Copy link
Contributor

From my pov you're making astounding progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants