-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #842 Support multiple ways for running Fluo applications #883
Conversation
When reviewing this pull request, take a look at the following repos & pull requests: |
;; | ||
list) | ||
check_hadoop | ||
java org.apache.fluo.cluster.command.FluoCommand $FLUO_HOME $HADOOP_PREFIX $1 app ${*:2} | ||
if [ -f "$conf/connection.properties" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking for non-existence, could check that connection.properties has no props set. This would allow old code that only modifies fluo.props to work w/o having to delete this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In f89f6eb, I changed the name of fluo.properties
to fluo.properties.deprecated
. If users want to use the old config file, they just need to change the name. Existence of both fluo.properties
and connection.properties
will cause an error in scripts.
@@ -57,44 +57,44 @@ extra) | |||
download aopalliance:aopalliance:jar:1.0 | |||
download ch.qos.logback:logback-classic:jar:1.1.3 ./logback | |||
download ch.qos.logback:logback-core:jar:1.1.3 ./logback | |||
download com.101tec:zkclient:jar:0.3 | |||
download com.101tec:zkclient:jar:0.3 ./twill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could group all of these twill jars together in the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in f89f6eb
docs/install.md
Outdated
| [connection.properties] | Configures connection to Fluo. Required for all commands. | | ||
| [application.properties] | Configuration passed to `fluo setup` when initializing Fluo application. | | ||
| [log4j.properties] | Configures logging | | ||
| [fluo.properties] | Deprecated Fluo configuration file | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could mention what replaces it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in da8968e
docs/install.md
Outdated
|
||
## Start your application | ||
cp conf/application.properties conf/myapp.properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should advocate copying to conf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
* | ||
* @param observerJarsUrl URL to observer jars directory | ||
*/ | ||
public FluoConfiguration setObserverJarsUrl(String observerJarsUrl) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
public String getAccumuloClasspath() { | ||
return getString(ADMIN_ACCUMULO_CLASSPATH_PROP, ADMIN_ACCUMULO_CLASSPATH_DEFAULT); | ||
} | ||
|
||
public FluoConfiguration setAccumuloJars(String path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -186,6 +166,30 @@ public String getString(String key, String defaultValue) { | |||
return internalConfig.getString(key, defaultValue); | |||
} | |||
|
|||
public void load(InputStream in) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since tag? javadoc? need to document if the load merges or clears before loading. Also need to document what the behavior is if multiple loads have the same prop, does the last one loaded always win?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
## Accumulo properties | ||
## ------------------- | ||
## Accumulo instance to connect to | ||
fluo.accumulo.instance=myInstance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised to see the accumulo props here instead of connection props. I had expected to see them in connections props. But I don't have a good reason for where they should go. I need to work out the pros and cons of each location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With #893 I think storing the accumulo credentials in ZK is a good way to go
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express | ||
# or implied. See the License for the specific language governing permissions and limitations under | ||
# the License. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should document that this file is deprecated and only used if conn.props is not present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 241f237
@@ -47,7 +74,12 @@ setupClasspathFromSystem() | |||
test -z "$ACCUMULO_HOME" && ACCUMULO_HOME=/path/to/accumulo | |||
test -z "$ZOOKEEPER_HOME" && ZOOKEEPER_HOME=/path/to/zookeeper | |||
|
|||
CLASSPATH="$FLUO_HOME/lib/*:$FLUO_HOME/lib/logback/*" | |||
CLASSPATH="$lib/*" | |||
if [ -f "$conf/connection.properties" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs some comments explaining why it exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in d464bd9
echo " wait <app> Waits until all notifications are processed for <app>" | ||
echo " exec <app> <class> {<arg>} Executes <class> with <args> using classpath for <app>"; | ||
|
||
echo -e "\nDeprecated commands:\n" | ||
echo " new <app> Creates configuration for new application in apps/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably annotate deprecated commands in this output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in d464bd9
import org.apache.log4j.Level; | ||
import org.apache.log4j.Logger; | ||
|
||
public class FluoScan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should share code with the deprecated scan command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Is there a command to print the props stored in zookeeper? |
echo " classpath Prints the classpath setup in fluo-env.sh" | ||
echo " list Lists all Fluo applications in Fluo instance" | ||
echo " scan <app> Prints snapshot of data in Fluo <app>" | ||
echo " setup <appProps> {<arg>} Sets up Fluo application using <appProps>. Run with -h to see additional args." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running the following. Would it make sense to not require the props file for -h?
$ fluo setup -h
Application properties file '-h' does not exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed now.. Command is now init
rather than setup
## Application properties | ||
## ---------------------- | ||
## Fluo application name | ||
fluo.connection.application.name= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be fluo.application.name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Properties cannot be named starting with fluo.app
due to prefix being used for user-specific properties
#fluo.observer.provider=com.foo.AppObserverProvider | ||
## Observer jars in this directory are stored away in HDFS during initialization | ||
#fluo.observer.init.dir=/path/to/observer/jars/ | ||
## Properties with a prefix of fluo.app.* are stored in zookeeper at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now everything is stored in zookeeper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
. "$bin"/impl/config.sh | ||
if [ -f "$conf/fluo-env.sh" ]; then | ||
. "$conf"/fluo-env.sh | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The help for script should print something about being deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
## Zookeeper connection string specifying host and chroot where Fluo stores data. | ||
## A chroot directory suffix must be specified but doesn't need to be named | ||
## '/fluo'. If not specified, a Fluo application cannot be initialized. | ||
#fluo.connection.zookeepers=localhost/fluo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reason for changing the prefix on these props from fluo.client
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FluoFactory can create many other things (not just FluoClient)...
* Split fluo.properties into connection.properties and application.properties * Renamed several properties to support split * Support loading FluoConfiguration from multiple files * Created main methods commands: FluoOracle, FluoWorker, FluoGetJars, FluoSetup, FluoList, FluoWait, FluoScan * Updated documentation to reflect changes
* Deprecated all classes in fluo-cluster module
Add 'fluo config' command in 96d1fbe |
@@ -134,6 +132,8 @@ void initialize(InitializationOptions opts) throws AlreadyInitializedException, | |||
*/ | |||
void updateSharedConfig(); | |||
|
|||
FluoConfiguration getSharedConfig(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs javadoc and since tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could have two methods here. One that returns exactly what config was passed to the factory and another that returns what config is stored in zookeeper. We could create another issue to support merging one simple config into another. This would be a good starter issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #894
docs/install.md
Outdated
|------------------------------|----------------------------------------------------------------------------------------------| | ||
| [fluo-env.sh] | Configures classpath for `fluo` script. Required for all commands. | | ||
| [fluo-conn.properties] | Configures connection to Fluo. Required for all commands. | | ||
| [fluo-app.properties] | Configuration passed to `fluo init` when initializing Fluo application. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could mention that this is a template for the file that will be passed to init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -50,24 +50,21 @@ After you obtain a Fluo distribution tarball, follow these steps to install Fluo | |||
1. Choose a directory with plenty of space and untar the distribution: | |||
|
|||
tar -xvzf fluo-1.1.0-incubating-bin.tar.gz | |||
cd fluo-1.1.0-incubating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very beginning of this file YARN is mentioned, seems like it should not be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
docs/install.md
Outdated
be in the `worker_*.log` file for each of your worker containers. | ||
The commands will retrieve your application configuration and observer jars (using your | ||
application name) before starting the oracle or worker process. The commands are designed | ||
to be run on a cluster to distribute the oracle and workers across multiple nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the wording of this is a bit confusing. For the phrase distribute the oracle and workers across multiple nodes
what is being distributed? The Fluo software? May be more clear to write After installing the Fluo software on each node, ....
Below it mentions installing software on each node, so maybe just the order the information is introduced needs to be reworked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
docs/install.md
Outdated
your YARN configuration `yarn-site.xml` (see [yarn-default.xml] for more info about this property). | ||
If you are running Fluo on single machine, this method works well. If you are running Fluo on a | ||
cluster, your containers and their logs will be spread across the cluster. | ||
The oracle & worker logs can be found in the `logs/` directory of your Fluo installation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the log files are currently prefixed with the app name. I was thinking we could put the logs in a dir that uses the app name instead. A dir like logs/apps/<app name>/
or logs/<app name>
. Could do this as a follow on issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
FluoConfiguration fluoConfig = new FluoConfiguration(connectionPropsFile); | ||
fluoConfig.setApplicationName(applicationName); | ||
CommandUtil.verifyAppInitialized(fluoConfig); | ||
fluoConfig = FluoAdminImpl.mergeZookeeperConfig(fluoConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this merge should be done. I think it would be less confusing to inject connections props + app name, the minimum needed to create a client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the merge because there are classes (specifically OptimizeTable used by phrasecount) that are written for the fluo exec
command and depend on the FluoConfiguration
that is passed in to have all configuration set. If we don't do the merge, we need to update these classes to grab all configuration from Zookeeper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I understand why the merge is needed now, trying to maintain the behavior of passing the $FLUO_HOME/apps/app/conf/fluo.properties
file. This is kinda unfortunate because it means we now have to connect to zookeeper before running the user code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All that is advertised in the docs is that this config can be used to create a client. I think it would be nice for the old config file to just pass it through and for the new config file only set the app name.
Need to do some sort of upgrade testing. This could be a follow on task/issue. |
As for upgrade testing, I was able to run phrasecount (using master branch) using a Fluo tarball produced PR. I had to delete |
} | ||
|
||
if (!accumuloClasspath.isEmpty()) { | ||
String contextName = "fluo-" + config.getApplicationName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice. I wonder if we need to be careful and check what characters are in the app name
/** | ||
* @since 1.2.0 | ||
*/ | ||
public String getHdfsRoot() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to drop the H and name the method getDfsRoot()
. Currently we only support Hadoop, but in the future we could support URIs for other dfs types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
try { | ||
fs.copyFromLocalFile(new Path(jarPath), new Path(hdfsDestDir)); | ||
} catch (IOException e) { | ||
logger.error("Failed to copy file {} to HDFS directory {}", jarPath, hdfsDestDir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not throw an exception when this fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
## implements org.apache.fluo.api.observer.ObserverProvider. | ||
#fluo.observer.provider=com.foo.AppObserverProvider | ||
## Observer jars in this directory are stored away in HDFS during initialization | ||
#fluo.observer.init.dir=/path/to/observer/jars/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to this prop, it seems like the new accumulo jars prop should be documented somewhere in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
String accumuloClasspath; | ||
if (!accumuloJars.isEmpty()) { | ||
accumuloClasspath = copyJarsToHdfs(accumuloJars, "lib/accumulo"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice change, putting the accumulo jars in a per app dfs dir.
conn.tableOperations().setProperty(config.getAccumuloTable(), | ||
AccumuloProps.TABLE_CLASSPATH, contextName); | ||
} | ||
|
||
if (config.getObserverJarsUrl().isEmpty() && !config.getObserverInitDir().trim().isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any test for diff combos of these configs? Combos of accumulo jars, init dir, observers jars, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added check to FluoAdminImpl.initialized()
Objects.requireNonNull(connConfig); | ||
Preconditions.checkArgument(connConfig.hasRequiredConnectionProps(), | ||
"missing required connection properties"); | ||
config = FluoAdminImpl.mergeZookeeperConfig(connConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this call creates a curator and closes it. I suspect later in this method a curator is created, could possibly refactor code to avoid creating multiple connections to ZK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #896
config.setProperty(prop, sharedProps.getProperty(prop)); | ||
tmpConfig.setProperty(prop, sharedProps.getProperty(prop)); | ||
} | ||
config = FluoAdminImpl.mergeZookeeperConfig(config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could curator be passed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #896
echo " wait <app> Waits until all notifications are processed" | ||
echo " version Prints the version of Fluo" | ||
echo " exec <app> <class> {<argument>}"; | ||
echo " init <appProps> {<arg>} Initializes Fluo application using <appProps>. Run with -h to see additional args." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a thought, not sure if it will work. If this were of the form init <app name> <appProps> {args}
then it would be more consistent with other commands and the fluo-app.props file would not require the name prop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
export FLUO_LOG_ID="${cmd}_$(hostname)_$(date +%s)" | ||
|
||
JAVA_OPTS=("-Dlog4j.configuration=file:${FLUO_LOG4J_CONFIG}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like these JAVA_OPTS are only used for services? If so, I think the name should changes. At first I thought maybe all of the commands would use this (like fluo init
). Also, if all of the logging stuff is only used for services it would be nice to group all of the service config into a section (delimited by the ############
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.. changed JAVA_OPTS
to SERVICE_OPTS
@@ -41,14 +41,14 @@ | |||
|
|||
@Test | |||
public void testDefaults() { | |||
Assert.assertEquals(FluoConfiguration.CLIENT_ZOOKEEPER_CONNECT_DEFAULT, | |||
Assert.assertEquals(FluoConfiguration.CONNECTION_ZOOKEEPERS_DEFAULT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try running this test with code coverage? If not I can try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I didn't
@mikewalch re upgrade testing. I am thinking of initializing and starting an app with 1.0 or 1.1, stopping the app, and then trying to start it with 1.2.0-SNAPSHOT. |
################################################################## | ||
##################################################################### | ||
# Build SERVICE_OPTS variable. Defaults below work but can be edited. | ||
##################################################################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all of the logging stuff move to this new section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in bebb493. Let me know what you think.
Made this change online using the github editor
application.properties
FluoGetJars, FluoSetup, FluoList, FluoWait, FluoScan