Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import directive should not import dotfiles #5048

Closed
cincodenada opened this issue Sep 17, 2022 · 16 comments
Closed

import directive should not import dotfiles #5048

cincodenada opened this issue Sep 17, 2022 · 16 comments
Labels
bug 🐞 Something isn't working good first issue 🐤 Good for newcomers
Milestone

Comments

@cincodenada
Copy link

cincodenada commented Sep 17, 2022

I'm attempting to import all files in a directory, as directed in the documentation:

import sites-enabled/*

But I'm running into errors because it's trying to include my editor's dotfiles, which are also in that directory.

Obviously I can clean up the dotfiles, or add a suffix to all the files I do want to include or what have you, and you can argue about whether editing files in-place is good practice. But I would consider this a bug - standard shell globbing doesn't include dotfiles, so it is surprising that Caddy's import statement does, and I don't see any way to avoid including dotfiles - these are globs, not regexes.

I'm not super familiar with Go, but I looked briefly at the code hoping that there was a flag or something, but it seems not. Python, for instance, has glob.glob() for shell-like file globbing (which doesn't include dotfiles) and pathlib.Path.glob() which does include dotfiles. I don't know if Go has anything similar, or if we'd just have to manually exclude any files that start with . after the fact.

In any case, I think globs in an include spec should be changed to not include dotfiles, you could include a config flag for backwards compatibility if desired, although I would be surprised if anyone is intentionally including dotfiles.

Note this is a totally separate concern from issues like #2286, which are about serving dotfiles - this is about importing dotfiles as part of config parsing.

@cincodenada cincodenada changed the title import directive should not include dotfiles import directive should not import dotfiles Sep 17, 2022
@francislavoie
Copy link
Member

I'd suggest using an extension like .conf or something for your actual config files in that case, then importing with import sites-enabled/*.conf. I'm not sure special logic here is warranted, it might break a legitimate use for someone else.

@cincodenada
Copy link
Author

cincodenada commented Sep 18, 2022

Thank you for considering, and a quick response! I appreciate the workaround but as I noted in my post, I've already considered such options, and if that's all I was here for I wouldn't have opened the issue 🙂

I opened this issue because from my perspective the globbing behavior is a misfeature at best: as I stated above, standard shell globbing doesn't include dotfiles (they're Unix's "hidden files", after all), and most include specs I've used follow suit. I would guess this is unexpected behavior to most folks who are familiar with globbing. I would be surprised if someone was depending on including dotfiles for their config, but if we're concerned about that, it could be behind a flag and a major version change.

I just ran a test to double-check and confirmed that both Apache and Nginx ignore dotfiles when gathering includes in their config files. I think Caddy should follow suit, in the interest of following the Principle of Least Astonishment. It's quite surprising to find that Caddy acts differently than the two other predominant Linux webservers in how it handles hidden files and globbing. It's quite common for editors to leave hidden files alongside files they're editing, and it's frustrating to have this behavior suddenly break my webserver in unexpected ways!

Additionally, with the current behavior, I have no way of achieving what I want, other than workarounds that require me to name my files differently than I planned to. I understand not wanting to potentially break existing usage and prioritizing that to an extent, but it's also breaking my use that is legitimate in basically every other Unix application I've used, and not at all hypothetical.

Whereas in the reverse situation - if Caddy ignored dotfiles by default, but a user really wanted to include dotfiles for whatever reason, there's a much cleaner workaround readily available. They could do so easily with two includes without changing their file structure at all:

import dir/.*
import dir/*

A little funky-looking, sure, but you're using dotfiles as config files, you're already doing funky things. There's not even much concern around ordering, because dotfiles naturally come before all the other files anyway, unless they start config files with quotes or spaces...which I would be really surprised if folks were doing intentionally.

@francislavoie
Copy link
Member

Well argued. Fair enough. And sorry for missing that you mentioned the suffix idea already 👍

@francislavoie francislavoie added the bug 🐞 Something isn't working label Sep 18, 2022
@francislavoie francislavoie added this to the 2.x milestone Sep 18, 2022
@francislavoie francislavoie added the good first issue 🐤 Good for newcomers label Sep 18, 2022
@abheekda1
Copy link

I'd like to give implementing this a shot, where could I find the path selector in this case?

@mohammed90
Copy link
Member

I'd like to give implementing this a shot, where could I find the path selector in this case?

You'll be working in this range:

// make path relative to the file of the _token_ being processed rather
// than current working directory (issue #867) and then use glob to get
// list of matching filenames
absFile, err := filepath.Abs(p.Dispenser.File())
if err != nil {
return p.Errf("Failed to get absolute path of file: %s: %v", p.Dispenser.File(), err)
}
var matches []string
var globPattern string
if !filepath.IsAbs(importPattern) {
globPattern = filepath.Join(filepath.Dir(absFile), importPattern)
} else {
globPattern = importPattern
}
if strings.Count(globPattern, "*") > 1 || strings.Count(globPattern, "?") > 1 ||
(strings.Contains(globPattern, "[") && strings.Contains(globPattern, "]")) {
// See issue #2096 - a pattern with many glob expansions can hang for too long
return p.Errf("Glob pattern may only contain one wildcard (*), but has others: %s", globPattern)
}
matches, err = filepath.Glob(globPattern)
if err != nil {
return p.Errf("Failed to use import pattern %s: %v", importPattern, err)
}
if len(matches) == 0 {
if strings.ContainsAny(globPattern, "*?[]") {
caddy.Log().Warn("No files matching import glob pattern", zap.String("pattern", importPattern))
} else {
return p.Errf("File to import not found: %s", importPattern)
}
}
// collect all the imported tokens
for _, importFile := range matches {
newTokens, err := p.doSingleImport(importFile)
if err != nil {
return err
}
importedTokens = append(importedTokens, newTokens...)
}
nodes = matches

@abheekda1
Copy link

Alright, so would it make sense to just check for whether there's a . at the beginning of importPattern and if not then add a [!.] at the beginning to exclude hidden files and/or directories?

@abheekda1
Copy link

// ignore hidden files if not explicity specified
if !strings.HasPrefix(filepath.Base(importPattern), ".") || filepath.Base(importPattern) == "." || filepath.Base(importPattern) == ".." {
	globPattern = "[!.]" + globPattern
}

I'm thinking something like this should go after line 383.

@cincodenada
Copy link
Author

cincodenada commented Sep 21, 2022

Hi @ADawesomeguy thanks for digging in here! I'm not a maintainer here but as the one who opened the ticket, I've thought through it a bit, and have a couple suggestions here, hopefully they're helpful!


Looking at the docs for filepath.Glob() which uses the same rules as filepath.Match(), I don't see an exclamation point in that spec, but it may be an undocumented alias for ^? But reading the code, it appears that globPattern is an absolute path, so I don't think simply prefixing the pattern will work. You could split off the basename, prefix it, and put it back together...

...however, I am wary of modifying the pattern at all. I'm wary of attempting to modify user input in general, because it creates many possible edge cases. For example, suppose the user put something like import subdir/??? to import only files with three characters. Even if we handled the path correctly, prefixing would result in import subdir/[!.]???, which would match only files with four characters, not what the user wanted at all!

This is just an example off the top of my head - the general gist is, trying to anticipate and adjust for every possible thing a user might do like this is not a situation I like to be in as a programmer, it's too much trouble :)


Instead, I would suggest leaving the user input as-is, let Glob() return whatever it finds, and then exclude any dotfiles after the fact. There are a couple places I see that would make sense to do this filtering - we could filter matches immediately after the Glob() call, or we could wait until we're trying to import things in doSingleImport() and skip dotfiles there.

I'm not sure which is best, the Caddy maintainers may have opinions one way or the other. Pro/cons I see:

  • Doing it right after making the list avoids any awkwardness deciding how to handle the no-matches case (what if we match only dotfiles?)
  • doSingleImport() already has several similar checks (we ignore and log if a directory is matched, for instance), so it makes some sense logically to do this check in the same place.

A couple other brief notes from thinking through this approach:

  • You would still need to split off and inspect the user's pattern, to make sure it didn't start with a dot
    • Users should still be able to explicitly import dotfiles with e.g. import subdir/.*
  • I think it is sufficient to just check if the pattern portion starts with a dot and disable the filtering if it does, but I haven't thought through this thoroughly

Some further thoughts - these all optional I think, depends on what the maintainers think. Most of them could be added separately as enhancements, so if it seems like a lot you can probably ignore them!

  • We could emit a log message of some kind to tell the user if we found dotfiles that we're not including
    • I'm not sure what kind of version checking exists, but it might be possible to do this only on the first run after the release that includes this, since if people are relying on the existing behavior it would be nice to notify them...possibly too complicated though.
  • It might be useful to some sort of configuration option to restore the original behavior
    • Alternately, it could be off by default with an option to enable, maybe until the next major version?
  • We could add a special message in the zero-matches case where we matched (but excluded) dotfiles

@we-sell-bags
Copy link

You might also want to consider "~" as well as "." ,
which I have seen some platforms use to mark windows files as deleted in some file systems.

@abheekda1
Copy link

Hi @ADawesomeguy thanks for digging in here! I'm not a maintainer here but as the one who opened the ticket, I've thought through it a bit, and have a couple suggestions here, hopefully they're helpful!


Looking at the docs for filepath.Glob() which uses the same rules as filepath.Match(), I don't see an exclamation point in that spec, but it may be an undocumented alias for ^? But reading the code, it appears that globPattern is an absolute path, so I don't think simply prefixing the pattern will work. You could split off the basename, prefix it, and put it back together...

...however, I am wary of modifying the pattern at all. I'm wary of attempting to modify user input in general, because it creates many possible edge cases. For example, suppose the user put something like import subdir/??? to import only files with three characters. Even if we handled the path correctly, prefixing would result in import subdir/[!.]???, which would match only files with four characters, not what the user wanted at all!

This is just an example off the top of my head - the general gist is, trying to anticipate and adjust for every possible thing a user might do like this is not a situation I like to be in as a programmer, it's too much trouble :)


Instead, I would suggest leaving the user input as-is, let Glob() return whatever it finds, and then exclude any dotfiles after the fact. There are a couple places I see that would make sense to do this filtering - we could filter matches immediately after the Glob() call, or we could wait until we're trying to import things in doSingleImport() and skip dotfiles there.

I'm not sure which is best, the Caddy maintainers may have opinions one way or the other. Pro/cons I see:

  • Doing it right after making the list avoids any awkwardness deciding how to handle the no-matches case (what if we match only dotfiles?)

  • doSingleImport() already has several similar checks (we ignore and log if a directory is matched, for instance), so it makes some sense logically to do this check in the same place.

A couple other brief notes from thinking through this approach:

  • You would still need to split off and inspect the user's pattern, to make sure it didn't start with a dot

    • Users should still be able to explicitly import dotfiles with e.g. import subdir/.*
  • I think it is sufficient to just check if the pattern portion starts with a dot and disable the filtering if it does, but I haven't thought through this thoroughly


Some further thoughts - these all optional I think, depends on what the maintainers think. Most of them could be added separately as enhancements, so if it seems like a lot you can probably ignore them!

  • We could emit a log message of some kind to tell the user if we found dotfiles that we're not including

    • I'm not sure what kind of version checking exists, but it might be possible to do this only on the first run after the release that includes this, since if people are relying on the existing behavior it would be nice to notify them...possibly too complicated though.
  • It might be useful to some sort of configuration option to restore the original behavior

    • Alternately, it could be off by default with an option to enable, maybe until the next major version?
  • We could add a special message in the zero-matches case where we matched (but excluded) dotfiles

@cincodenada thanks for the very good points! Seems very doable let me go back through and make that change.

@mholt
Copy link
Member

mholt commented Sep 21, 2022

@we-sell-bags Generally we try to avoid bash expansion in Caddy config, since (1) we're not bash, and (2) it tends to be surprising or lead to misconfigurations / vulnerabilities. (for example, Caddy often executes as a different user than the one who wrote the config)

@cincodenada
Copy link
Author

cincodenada commented Sep 21, 2022

@mholt I think @we-sell-bags wasn't talking about the "home" tilde, but rather a similar convention from Windows (see a couple related issues on Microsoft's support site).

I was not familiar with this convention, I'm not very familiar with Windows and certainly not running webservers on it, so I can't really speak to how universal it is. From the support links above, it looks like this convention is generally used in combination with the actual "hidden" flag, and that unhidden tilde-files are not treated specially by Windows Explorer (at least in recent versions of Windows). So, to me it seems less universal than dotfiles, which essentially are Unix's hidden flag, and probably don't make sense to treat specially, but again I am not very familiar with it, so I'll defer to folks who are.

That brings up another question though, which is, what is pathlib.Glob()'s behavior in Windows? Does it include files with the hidden attribute or not? I tried to research this briefly but couldn't find any documentation either way, in pathlib or afero.

Note: I initially thought @we-sell-bags was talking about a third kind of file, editor temporary files that end with a tilde, but after some research concluded I was assuming incorrectly. View the version from 2:31 PDT in the edit history on this post (that's just before the first 7-minute gap) if you care to see my argument about why we shouldn't do anything special for those 😅

@mholt
Copy link
Member

mholt commented Sep 21, 2022

Oh, I see. That's very unusual. 😛

Let's start with dotfiles, then if anyone is actually hitting a snag with tilde files, we can address that after getting their report/use case.

@we-sell-bags
Copy link

it is more MS deliberate corruption of the standards.

you don't need to be running this on windows...... you only need to have windows kit exposed the directory. with R/W for that user.... also i have seen windows spam all over a linux directory tree....

Personally I avoid it where possible , but running linux kit back end.. you get to see these things...

@simon04
Copy link
Contributor

simon04 commented Oct 2, 2022

For a related change in Go see golang/go@36dbf7f: //go:embed excludes hidden files unless the pattern starts with all:

@francislavoie
Copy link
Member

This is done as of #5320, this didn't get closed for some reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working good first issue 🐤 Good for newcomers
Projects
None yet
Development

No branches or pull requests

7 participants