Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beesd script is not accepting options with arguments #205

Closed
JaviVilarroig opened this issue Nov 21, 2021 · 9 comments
Closed

Beesd script is not accepting options with arguments #205

JaviVilarroig opened this issue Nov 21, 2021 · 9 comments

Comments

@JaviVilarroig
Copy link
Contributor

I have been trying to use beesd with the parameter --thread-count 2 to limit the CPU but fails. After some testing I found that it's taking the 2 as a parameter. Then it consider that is an unsupported parameter and bailing out to show the help message and stop.

I wonder if I'm missing something on how the parameters must be given to beesd.

If not I can voluteer to prepare a PR wit an improved script.

@kakra
Copy link
Contributor

kakra commented Nov 21, 2021

Use --thread-count=2

BTW: Options with arguments are supposed to be used with =, it may be that some utilities break that convention. Only single letter options allow to directly use the argument (without space) behind it, unless followed by other single letter options.

@JaviVilarroig
Copy link
Contributor Author

Sorry, I was not detailed enough.
I have already tried --tread-count=2, -c 2 and -c=2. And none of them works for me.

@eightys3v3n
Copy link

Added echo "${ARGUMENTS[@]}" to the second last line in beesd.

Running with the arguments --thread-count X --loadavg-target Y prints --thread-count --loadavg-target.
Running with the arguments --thread-count=X --loadavg-target=Y fails because the loop that checks if an argument is supported checks if the entire argument is equal to a known argument. So it checks if "--thread-count=X == --thread-count"

The argument parser needs to be changed, I think, so it either accepts arguments to options, or checks if an option is supported by ignoring anything after an = sign.

@kakra
Copy link
Contributor

kakra commented Nov 23, 2021

Yeah I think there's a bug, I just looked at my old configuration: I passed the options to bees via OPTIONS= and that worked. Meanwhile, I don't use the script at all, I use the bees daemon directly which just works.

The script has multiple problems anyways, i.e. it expects .beeshome to be a subvolume even when not stored on a btrfs volume.

The problem with the arguments is that it parses the output from bees --help to figure out the supported options, and then just splits the word by word into two array: supported and unsupported options. It'll then assume that all unsupported options are a UUIDs to mount and pass to bees.

That's problematic from two perspectives: bees only supports a single filesystem only for some while now, there's no need to multiple path arguments, and it assumes everything that's not a word from the help to be a separate argument even when it doesn't match a UUID format. It should probably reverse the logic and look for arguments that look like a path or UUID, and just extract those, and pass the rest as-is.

@JaviVilarroig
Copy link
Contributor Author

I can rewrite the script to use getopt to grab options.

Any issue with that approach?

@kakra
Copy link
Contributor

kakra commented Nov 25, 2021

Personally, I'd prefer if bees itself would do some of the things right within the C++ code (creating and size adjustments of the hash file), and only mounting deferred to outside of the process. But I think @Zygo as a different idea of how to split that due to reducing dependencies, and that is valid.

If you ask me, starting bees right within the correct directory, it should figure out automatically most things, then we could delegate mounting to the systemd process with maybe a small pre-exec script which adjusts the environment. The bash script should not bother with any bees arguments: If it sees a path or uuid, let it extract that, the rest should be passed as-is to bees. Ideally, it doesn't need any getopt at all. No fancy parsing of the bees --help output, no getopt.

Some ideas:

Just go through the arguments and either put them in a pass list, or if it starts with / or is in UUID format, handle that and put the converted argument to the pass list. Everything else would be duplicated effort because you'd need to maintain getopt formats in two places.

I could imagine that bees could have a "secret" argument like --check-params and if passed, it just checks all parameters for sanity and then exists with either 0 or 1, so the starter script would now early if the command line works.

For systemd mode, a ExecStartPre script could parse the configuration files and export the proper environment variables, then just exit, and let bees run directly. bees may internally need some additional support for environment variables but that should be all. That's similar to how ElasticSearch works setting up its environment.

What do you think?

@kakra
Copy link
Contributor

kakra commented Nov 25, 2021

BTW: There was an attempt by me to add configuration file support into bees natively, but I lost track of that during some personal live changes. The idea has not died yet but the branch is very outdated and I'd need to recap what I did there. Maybe you can find it useful: https://github.com/kakra/bees/commits/feature/configuration-files

@JaviVilarroig
Copy link
Contributor Author

Well. I just have a problem with CPU consumption ant trying to fix it, a shell script fix is fast and easy. I'm a pragmatic person an look for direct solutions to my problems. 😄

That said. I don't like too much the current bees aproach to configuration and systemd interaction. Having to type the UUID of the drive every time I want to start or stop bees, for example.

I agree that adding using getopt is adding an additional dependency. But I don't know any distribution that does not packaged it by default 😉 There's also the option to forget about long arguments, limit it to short options an then use the bash built-in getopts to parse the option. This my approach at work when server writing scripts (among other things right now I work as a sysadmin).

Having a configuration file per btrfs set I think is OK. I don't want to give the same treatment to each of my btrf sets. But I agree that it must be bees itself reading the configuration file. I fact, I will put all the options in the configuration file. Having to modify the systemd unit file look ugly for me.

Still I consider that beesd is buggy right now and having a fast and working fix to it is a must. If not, the final users like me will end having search for their own solution. And this is bad.

So I propose to fix beesd script to work as documented and later on look for improving it.

Later on I can take a look at the bees code itself and have it looking for the configuration file and taking the configuration from there. I just need to find some time for that.

Expect to have two PR's coming from me. One for the script fix, and later on one for bees to read the configuration from the config file.

Can you buy this? 😉

@Zygo
Copy link
Owner

Zygo commented Nov 26, 2021

Personally, I'd prefer if bees itself would do some of the things right within the C++ code (creating and size adjustments of the hash file), and only mounting deferred to outside of the process. But I think @Zygo as a different idea of how to split that due to reducing dependencies, and that is valid.

I might have had ideas but they were a long time ago...

I got tired of dealing with multiple incompatible implementations of libuuid so I ripped out the code in bees that handles uuids. I guess we only need the uuid-to-string function from that library, so we could just import it instead of trying to sync up with a lib that users have the other versions of.

For configuration, we need a solution that scales up to a lot of options. I'm thinking of having a simple 'name = value' inheritable dictionary-based configuration format. The names might follow a vaguely hierarchical convention like hash-algorithm and hash-bits.

  1. We put the defaults in a text file, turn them into C code, and compile them into the binary to set the defaults. That same text file can provide the help message text. Or alternatively we write it in C and then have a binary that dumps it out in different formats once compiled. Either way, there should be only one source for these that generates the other.
  2. Default-constructing a BeesConfig gives you a pile of filled-in configuration variables from the defaults. Also tests the parser for the non-trivial option value formats.
  3. We read the command line looking for --config file and --option name=value options, and set those options. We make the variable currently known as $BEESHOME one of those options. All the existing command line options also map to "bees options," and there are no new command-line options after that.
  4. We read /etc/bees/bees.conf (a file named by config.global) for more options
  5. We read $BEESHOME/bees.conf (a file named by config.local) for more options
  6. We hand the parsed options object to BeesContext's constructor, and bees runs.

Order and precedence gets a little wonky because we have to physically process item 3 before item 5, but we want the stuff in item 5 to be overridden by item 3. Not hard to solve with a priority system.

The stuff in the first 100 lines of src/bees.h becomes an option, and there's a clear path for adding more options, which we'll need if we start doing things like excluding subvols by path regexp, or specifying in fine detail how to slice and dice btrfs csums into bees hashes (that's 6 to 8 parameters by itself) or what sizes to use for extent size tiering (that's up to a 5x4 table of parameters describing a Venn diagram of what to include and exclude and what order to do things that are included).

So after we read all that, we have all the stuff in one place that is currently in three places:

  • environment variables BEESHOME and BEESSTATUS
  • command-line options parsed by getopt
  • src/bees.h
  • places beesd stores stuff outside of bees

With a parsed config we would have a specification for a hash table format. Then we look at the hash table that's on disk, and if it's different (or missing), we either abort with an error, or we convert the data from the format it's in to the format specified in the configuration. A config option tells us which of those to do.

Anyway that's all future stuff. For now, hack and slash the shell script until it works, if not properly, then at least usably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants