Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper Variable Expansion #56

Closed
CosmicToast opened this issue Nov 12, 2015 · 27 comments
Closed

Proper Variable Expansion #56

CosmicToast opened this issue Nov 12, 2015 · 27 comments

Comments

@CosmicToast
Copy link

Currently, dotbot uses os.path.expandvars.
However, in some cases, it may be desireable to follow proper shell parameter substitution. (see: the related manual)
Here is an example usage: I want my zshrc to work. Some systems may define $ZDOTDIR to be $XDG_DATA_HOME/zsh, some may leave it undefined.
The typical solution would be to use ${ZDOTDIR:-$HOME}/.zshrc as the install location. However, without parameter substitution, there is no way to achieve this.

@anishathalye
Copy link
Owner

Great idea! I can certainly see how this can be useful.

We probably want to follow the Open Group standard rather than the Bash one.

Is there a Python builtin similar to os.path.expandvars that does this? Or will we have to write something ourselves?

@CosmicToast
Copy link
Author

I was meaning to link the Open Group standard, but I was at work (office) and had to find it using my phone, so I just got the one I found first.

Unfortunately I couldn't find a Python builtin that achieve this... Or a package (had a quick look through pypi), so we'll probably have to write one ourselves (maybe make it into a package for others to use?)

The standard is fairly well defined. I see four possible solutions:

  • delegate the expansion to /bin/sh : should be simplest to implement, but also the slowest
  • copy /bin/sh (probably bash or mksh) implementation, make it into a separate utility, call that : medium time to implement, still fairly slow due to external exec
  • copy /bin/sh implementation, make into cython packge... probably a while to implement (I haven't had any experience with cython, have you?)
  • write an implementation from scratch in pure python, either taking inspiration from bash/mksh or simply following the standard : medium time to implement, most portable (no external deps)

Technically, wordexp(3) can be useful here, but it can be... interesting to work with. (OpenBSD does not implement it, despite being a POSIX standard, on Apple it essentially delegates the expansion to perl... it's a fairly large mess on some systems, as I found out when trying to fix neovim's broken globber).

However, a (VERY) minimalistic implementation of basic expansion using it (no error checking etc) is effectively 10 lines of code.

Fairly good question on how to approach this.

@anishathalye
Copy link
Owner

I'd prefer to write an implementation from scratch in pure Python.

For reference, here's the implementation of posixpath.expandvars from Python 3.4:

# Expand paths containing shell variable substitutions.
# This expands the forms $variable and ${variable} only.
# Non-existent variables are left unchanged.

_varprog = None
_varprogb = None

def expandvars(path):
    """Expand shell variables of form $var and ${var}.  Unknown variables
    are left unchanged."""
    global _varprog, _varprogb
    if isinstance(path, bytes):
        if b'$' not in path:
            return path
        if not _varprogb:
            import re
            _varprogb = re.compile(br'\$(\w+|\{[^}]*\})', re.ASCII)
        search = _varprogb.search
        start = b'{'
        end = b'}'
        environ = getattr(os, 'environb', None)
    else:
        if '$' not in path:
            return path
        if not _varprog:
            import re
            _varprog = re.compile(r'\$(\w+|\{[^}]*\})', re.ASCII)
        search = _varprog.search
        start = '{'
        end = '}'
        environ = os.environ
    i = 0
    while True:
        m = search(path, i)
        if not m:
            break
        i, j = m.span(0)
        name = m.group(1)
        if name.startswith(start) and name.endswith(end):
            name = name[1:-1]
        try:
            if environ is None:
                value = os.fsencode(os.environ[os.fsdecode(name)])
            else:
                value = environ[name]
        except KeyError:
            i = j
        else:
            tail = path[j:]
            path = path[:i] + value
            i = len(path)
            path += tail
    return path

So it's not way too complicated, but it's not super simple either.

@CosmicToast
Copy link
Author

Maybe we can wrap around this?
Detect any time there is ${c VARIABLE c} (c being different control sequences, as per standard), and use posixpath.expandvars for the actual expansion. Since it does not modify non-existent variables, we can detect when substitution is going to be needed.

@anishathalye
Copy link
Owner

Yeah, that could save some effort.

@CosmicToast
Copy link
Author

The Open Group spec specifies different cases for when a parameter is set, but null (e.g export abcd). From my quick tests, however, python does not seem to be aware of that situation. Can you look around and see if there's a way to detect those?

@CosmicToast
Copy link
Author

Current state: I got (as far as I can tell, for now) basic detection going (including depth-first searches so we expand the innermost things first, to avoid collisions). What I'm going to do (for now) is I'll make null variants work the same as the non-null ones (e.g something-orother will do the same thing as something:-orother) until a decision is made.

Has support for: $VAR, ${VAR}, $(COMMAND), ${STUFF:-SUB}, ${STUFF-SUB}.
Planning to add: the rest of param expansions (a small while), '~' expansion (fairly simple) .
Not currently planned (but could be done?): arithmetic expansion.

@CosmicToast
Copy link
Author

Did some testing with various shells, and there seems to be disagreement on what constitutes a "Null Parameter".
According to zsh, export test makes test set but null. According to bash, ksh and mksh, that does not mean it's set, and to make it set but null, one must export test=''.
This might be a zsh bug. I couldn't find a spec from Open Group...

The script I used to test can be found here.
Tested with the following shells:

  • mksh
  • zsh
  • ksh
  • bash

All the shells except zsh had the same output.
Zsh only differs from the others on one line: it also outputs "tdef is set".

Any insight? Could default to python's detection (which is consistent with the other shells, '' is a value, but without an = it is not).

@anishathalye
Copy link
Owner

Nice! You're also supporting command substitution? That's cool.

Depth-first searches? Does the standard allow for nested expansion (I don't know)?

Defaulting to Python's detection (using what's in os.environ) seems reasonable.

@CosmicToast
Copy link
Author

The standard does not seem to require it, but every single implementation I see does. Try this, for example ${A:+${PATH:=path is totally defined}}.

Command substitution is fairly trivial to achieve, but it might cause side effects if it directly replaces posix.expandvars. I'm probably going to make command substitution optional (linked to a variable that defaults to false) and essentially pull a subprocess.run (essentially say that if you use this, you must be the one to sanitize it).

Not entirely sure what to do if the variable is false: either keep the value (e.g $(echo hi) would expand to $(echo hi)) or substitute null by default (e.g ab$(echo hi)cd would expand to abcd)

@anishathalye
Copy link
Owner

Oh I see. For our purposes, it may be okay to skip recursive substitution if it's more effort.

What's the issue with command substitution and variable expansion? One uses $(...) and the other uses ${...} (or just $...), so they don't conflict.

Btw, we might want to skip implementing ${parameter:=[word]} (assigning default values), because it's kinda weird if we're dependent on execution order.

@CosmicToast
Copy link
Author

I already have recursive substitution, it's very easy to implement.

The issue with $(...) is that it has to call something external (very often even a shell), and could be ran on an untrusted string.

Not entirely sure how we're dependent on execution order... the typical use case is for stuff like XDG_CONFIG_HOME, where you go ${XDG_CONFIG_HOME:=$HOME/.config}, that way if it is set, use it, and if it isn't, default to the spec.

@anishathalye
Copy link
Owner

I don't think we need to worry about what's in a $(...). Just like Dotbot can run shell commands directly with the shell directive. What's in the install.conf.yaml is trusted.

That being said, I don't know if we want to support $(...) in links. It might be overkill.

Using default values is fine, we just shouldn't assign default values.

@CosmicToast
Copy link
Author

If the standard specifies assigning default values (note: this is only with := and =, which have to be explicitly used by the user) there shouldn't be a problem.
Another tidbit: even if we assign values, they will only propagate to our process' children (so e.g if we assign to XDG_CONFIG_HOME, the next shell directives will be able to use it, but it won't affect the outside environment).

@anishathalye
Copy link
Owner

Right. But if there's something like this for example:

(suppose $parameter is unset before executing)

- link:
    "a": "~/${parameter:=[a]}"
    "b": "~/${parameter:=[b]}"

Then the results will be dependent on the order in which these are executed. And we don't guarantee a specific order of execution.

@CosmicToast
Copy link
Author

This use case makes no sense, however.
If $parameter is set, then the two link to the same place.
If it is, I have no clue what the user is expecting by trying to assign different values to the same thing.

Essentially, I don't see a reason not to implement it, especially considering I already have support for it. (I'm planning to clean up a bit and upload an initial version for sanity testing before Monday).

@anishathalye
Copy link
Owner

Okay, sure. I guess there's no harm in having it implemented.

@CosmicToast
Copy link
Author

I still want to implement %, %%, # and ##, but I need to understand them better first... not entirely sure how I'd make them.

This is because after some thought, I realized I want this to be where my zshrc links: ${ZDOTDIR:-${XDG_CONFIG_HOME:-$HOME}/${${XDG_CONFIG_HOME:-.}%${XDG_CONFIG_HOME}}zsh} (it's quite something, yes).,, I'll probably just define ZDOTDIR for now... Complexity of the parser is increasing slightly (partially because of special cases I hadn't thought of, but I guess I'll show off what I've got done up to now).

Current status: here.

I'm taking a small break (some personal life issues arose), so please accumulate comments and suggestions here until then :)

@anishathalye
Copy link
Owner

Whoa progress looks good!

Yeah, take your time, best of luck with everything that's going on.

@CosmicToast
Copy link
Author

If you could test / read / give out comments (RFC style), that'd be really appreciated :)
I'm probably not going to be able to work on this for a week or two, so the idea would be to accumulate ideas in that time so I can get to implementing / improving.

The short version of how it functions (to save you time if you do choose to go through the code):

  • it loops character by character and pushes the status of the string into a stack
  • see line 121 to 131 for details on what stack statuses do
  • the stack entry is actually a tuple: tuple[0] is the character responsible for the change in status, tuple[1] is the new status
  • -2 means that the word has been modified, so parsing has to start from the beginning (non-optimal, but shouldn't be a big deal imo, my rpi2 handles it about as fast as my desktop does)

os_* are the direct "modify os settings" functions
* are the wrapper functions that call the os_* functions if needed.

the param_lambda stuff essentially implement all of 2.6.2 that's currently implemented (see lines 61-64ish, it's essentially just a behavior table)

@anishathalye
Copy link
Owner

Ok, I'll take a look soon. Things are a bit crazy right now with projects and finals at university, so it may be a week or two…

@anishathalye
Copy link
Owner

Random thought that I wanted to write down so I don't forget -- we need to make sure that variable expansion works as expected on both POSIX systems as well as Windows. Python automatically imports the correct thing for os.path depending on the system, loading either posixpath or ntpath.

@CosmicToast
Copy link
Author

Well, this would be a replacement for posixpath.
We can simply have a wrapper as os.path that would choose between our posixpath and ntpath.

@anishathalye
Copy link
Owner

👍

I just added that comment to make sure that we don't forget :)

@simonvanderveldt
Copy link

simonvanderveldt commented Jul 11, 2016

FYI there would definitely be more projects that could use a way to use a POSIX compatible variable substitution, most of the time the main reason is being able to define a default value.
Example of a project that's looking for a solution to this docker/compose#1377

@choman
Copy link

choman commented Oct 19, 2016

Possible use case

So I have some systems that I would preferr alternative configs.
Is there a way to say:
bash_aliases
.bash_aliases.$(HOSTNAME) or bash_aliases.$(MAC)
so take bash_aliases unless HOSTNAME or MAC or etc exist

likewise ignore, use .tmux.conf every where expect on A B C machines.
bad example. maybe its an app i'm installing via dotbot

@anishathalye
Copy link
Owner

I think it's too much effort to get this right / enough of an edge case that it's not worth the effort. For the original task:

The typical solution would be to use ${ZDOTDIR:-$HOME}/.zshrc as the install location. However, without parameter substitution, there is no way to achieve this.

There's a workaround by adding ZDOTDIR=${ZDOTDIR:-$HOME} to the ./install script itself. Not pretty (doesn't go in the config file), but it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants