New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore supporting external compaction processes in accumulo-cluster script #2138
Comments
Rather than create new hosts files, I would like to see the hosts files consolidated into a single accumulo-cluster.conf file which includes the hosts for all the services the accumulo-cluster script manages (there may be an existing ticket for this... I didn't check before writing this). If host configuration were streamlined into a single config file, it would be easier to specify the startup of additional services, perhaps even with separate config for each host. |
That sounds interesting. Any thoughts on the structure and how compactor+queue info might fit into that? For compactors I was thinking of the following possibilities, had not considered a single file.
(comment edited by @ctubbsii to view format of option 3 better, and to number the options, to make it easier to reference) |
Of the three, option 3 is closest to what I was thinking. However, if it was done in a single file, it would strongly depend on the format for that file. If it were YAML, it might look something like: managers:
- host1
gc:
- host1
tservers:
- host2
- host3
compaction:
coordinator: host4
queues:
-
name: queue1
workers: [ host5, host6 ]
-
name: queue2
workers:
- host7
- host8
- host9 |
Would you expect that |
I have not done anything w/ yaml using command line tools. I wonder if there are tools that would make it possible to convert the existing files to yaml in a few lines of bash. If so the script could detect the old files and just print an error message w/ the suggested commands to run to convert them. |
Install
So, for this file:
These commands work:
|
I've never used these. I'm not sure what we'd use to parse. However, I don't think it'd be unreasonable to add a yaml parser dependency for the |
I am concerned about adding |
It looks like to install |
If you do We could also do something in Java. We already have commons-configuration2 on the classpath, and it supports YAML. We already call Java in the accumulo-cluster script for ZooZap. It wouldn't be hard to write something small that parses our config file for us and extracts the relevant bits. |
Json would work too. Unfortunately json does not allow for comments in files. So you would need two different types of files depending on whether or not you are using external compactions. The examples below are with and without external compactions.
|
JSON is uglier, but more ubiquitous. Ultimately, I think I'd prefer YAML, but if there are things we can do to make it depend on less (such as having a bit of java code using one of our existing dependencies) to make the bar lower, that would be nice. If not, I think I'd still rather users get Could also do simple properties files as well, but the schema would be a little bit frustrating. Commons-configuration2 could easily manage it, though... but if we're going that route, it'd be better to just use YAML, since commons-configuration2 can handle that too.
|
One approach to supporting yaml is that we write a Java program to parse the yaml and emit text that is easy to deal with in bash. This avoids the dependency on accumulo org.apache.accumulo.core.cluster.YamlExtract <cluster conf file> The java program could extract specific information based on arguments OR convert the yaml to CSV like the following that is easy to process in bash.
Not sure if its better to make the java program smart w/ lots of options or make it dumb and spit out something like CSV moving more logic to bash. Making the java program smarter would look like the following, where the list of tservers is extracted from the yaml file. $ accumulo org.apache.accumulo.core.cluster.YamlExtract --server-type tserver <cluster conf file>
host1
host2
host3 |
@keith-turner that's basically what I was thinking, since we probably have everything we need on the class path already in |
I didn't see your earlier message I see it now. It seems like doing the heavy lifting in Java is the best way to go, especially if we already had the needed deps on the classpath.
I think I do too. I also agree that we want to avoid creating something like yq or shyaml in Java. The Java code and options should be tailored to the accumulo cluster file format. |
Modified the cluster start/stop scripts to use a new file (cluster.yml) for defining the hosts that will run the different server components. Added a class (and test) that parses the yaml file into a form that is usable by the scripts. Added ability to specify and start/stop the external compaction server processes. Closes apache#2138
@keith-turner @ctubbsii - take a look at the draft PR, let me know if this is what you had in mind. I have not tested it yet. |
Modified the cluster start/stop scripts to use a new file (cluster.yaml) for defining the hosts that will run the different server components. Added a class (and test) that parses the yaml file into a form that is usable by the scripts. Added ability to specify and start/stop the external compaction server processes. Closes #2138 Co-authored-by: Keith Turner <kturner@apache.org>
The accumulo-cluster script could be modified to support starting coordinator and compactor processes for external compactions if the corresponding files exists. Not sure if this is a good idea in general, but thought it was worth considering. Trying to determine if this is something worth pursuing, came up with the following list of pros and cons. What other things are there to consider?
Pros
Cons
The text was updated successfully, but these errors were encountered: