Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #842 Support multiple ways for running Fluo applications #883

Merged
merged 25 commits into from
Aug 4, 2017

Conversation

mikewalch
Copy link
Member

  • Split fluo.properties into connection.properties and
    application.properties
  • Renamed several properties to support split
  • Support loading FluoConfiguration from multiple files
  • Created main methods commands: FluoOracle, FluoWorker,
    FluoGetJars, FluoSetup, FluoList, FluoWait, FluoScan
  • Updated documentation to reflect changes

@mikewalch
Copy link
Member Author

When reviewing this pull request, take a look at the following repos & pull requests:

;;
list)
check_hadoop
java org.apache.fluo.cluster.command.FluoCommand $FLUO_HOME $HADOOP_PREFIX $1 app ${*:2}
if [ -f "$conf/connection.properties" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking for non-existence, could check that connection.properties has no props set. This would allow old code that only modifies fluo.props to work w/o having to delete this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In f89f6eb, I changed the name of fluo.properties to fluo.properties.deprecated. If users want to use the old config file, they just need to change the name. Existence of both fluo.properties and connection.properties will cause an error in scripts.

@@ -57,44 +57,44 @@ extra)
download aopalliance:aopalliance:jar:1.0
download ch.qos.logback:logback-classic:jar:1.1.3 ./logback
download ch.qos.logback:logback-core:jar:1.1.3 ./logback
download com.101tec:zkclient:jar:0.3
download com.101tec:zkclient:jar:0.3 ./twill
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could group all of these twill jars together in the file

Copy link
Member Author

@mikewalch mikewalch Jul 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f89f6eb

docs/install.md Outdated
| [connection.properties] | Configures connection to Fluo. Required for all commands. |
| [application.properties] | Configuration passed to `fluo setup` when initializing Fluo application. |
| [log4j.properties] | Configures logging |
| [fluo.properties] | Deprecated Fluo configuration file |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could mention what replaces it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in da8968e

docs/install.md Outdated

## Start your application
cp conf/application.properties conf/myapp.properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should advocate copying to conf.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

*
* @param observerJarsUrl URL to observer jars directory
*/
public FluoConfiguration setObserverJarsUrl(String observerJarsUrl) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since tag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

public String getAccumuloClasspath() {
return getString(ADMIN_ACCUMULO_CLASSPATH_PROP, ADMIN_ACCUMULO_CLASSPATH_DEFAULT);
}

public FluoConfiguration setAccumuloJars(String path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since tag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -186,6 +166,30 @@ public String getString(String key, String defaultValue) {
return internalConfig.getString(key, defaultValue);
}

public void load(InputStream in) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since tag? javadoc? need to document if the load merges or clears before loading. Also need to document what the behavior is if multiple loads have the same prop, does the last one loaded always win?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

## Accumulo properties
## -------------------
## Accumulo instance to connect to
fluo.accumulo.instance=myInstance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised to see the accumulo props here instead of connection props. I had expected to see them in connections props. But I don't have a good reason for where they should go. I need to work out the pros and cons of each location.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #893 I think storing the accumulo credentials in ZK is a good way to go

# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing permissions and limitations under
# the License.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document that this file is deprecated and only used if conn.props is not present.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 241f237

@@ -47,7 +74,12 @@ setupClasspathFromSystem()
test -z "$ACCUMULO_HOME" && ACCUMULO_HOME=/path/to/accumulo
test -z "$ZOOKEEPER_HOME" && ZOOKEEPER_HOME=/path/to/zookeeper

CLASSPATH="$FLUO_HOME/lib/*:$FLUO_HOME/lib/logback/*"
CLASSPATH="$lib/*"
if [ -f "$conf/connection.properties" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs some comments explaining why it exists

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in d464bd9

echo " wait <app> Waits until all notifications are processed for <app>"
echo " exec <app> <class> {<arg>} Executes <class> with <args> using classpath for <app>";

echo -e "\nDeprecated commands:\n"
echo " new <app> Creates configuration for new application in apps/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably annotate deprecated commands in this output

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in d464bd9

import org.apache.log4j.Level;
import org.apache.log4j.Logger;

public class FluoScan {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should share code with the deprecated scan command

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@keith-turner
Copy link
Contributor

Is there a command to print the props stored in zookeeper?

echo " classpath Prints the classpath setup in fluo-env.sh"
echo " list Lists all Fluo applications in Fluo instance"
echo " scan <app> Prints snapshot of data in Fluo <app>"
echo " setup <appProps> {<arg>} Sets up Fluo application using <appProps>. Run with -h to see additional args."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running the following. Would it make sense to not require the props file for -h?

$ fluo setup -h
Application properties file '-h' does not exist

Copy link
Member Author

@mikewalch mikewalch Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed now.. Command is now init rather than setup

## Application properties
## ----------------------
## Fluo application name
fluo.connection.application.name=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be fluo.application.name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properties cannot be named starting with fluo.app due to prefix being used for user-specific properties

#fluo.observer.provider=com.foo.AppObserverProvider
## Observer jars in this directory are stored away in HDFS during initialization
#fluo.observer.init.dir=/path/to/observer/jars/
## Properties with a prefix of fluo.app.* are stored in zookeeper at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now everything is stored in zookeeper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

. "$bin"/impl/config.sh
if [ -f "$conf/fluo-env.sh" ]; then
. "$conf"/fluo-env.sh
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help for script should print something about being deprecated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

## Zookeeper connection string specifying host and chroot where Fluo stores data.
## A chroot directory suffix must be specified but doesn't need to be named
## '/fluo'. If not specified, a Fluo application cannot be initialized.
#fluo.connection.zookeepers=localhost/fluo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason for changing the prefix on these props from fluo.client?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Split fluo.properties into connection.properties and
  application.properties
* Renamed several properties to support split
* Support loading FluoConfiguration from multiple files
* Created main methods commands: FluoOracle, FluoWorker,
  FluoGetJars, FluoSetup, FluoList, FluoWait, FluoScan
* Updated documentation to reflect changes
* Deprecated all classes in fluo-cluster module
@mikewalch
Copy link
Member Author

Is there a command to print the props stored in zookeeper?

Add 'fluo config' command in 96d1fbe

@@ -134,6 +132,8 @@ void initialize(InitializationOptions opts) throws AlreadyInitializedException,
*/
void updateSharedConfig();

FluoConfiguration getSharedConfig();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs javadoc and since tag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could have two methods here. One that returns exactly what config was passed to the factory and another that returns what config is stored in zookeeper. We could create another issue to support merging one simple config into another. This would be a good starter issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #894

docs/install.md Outdated
|------------------------------|----------------------------------------------------------------------------------------------|
| [fluo-env.sh] | Configures classpath for `fluo` script. Required for all commands. |
| [fluo-conn.properties] | Configures connection to Fluo. Required for all commands. |
| [fluo-app.properties] | Configuration passed to `fluo init` when initializing Fluo application. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could mention that this is a template for the file that will be passed to init

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -50,24 +50,21 @@ After you obtain a Fluo distribution tarball, follow these steps to install Fluo
1. Choose a directory with plenty of space and untar the distribution:

tar -xvzf fluo-1.1.0-incubating-bin.tar.gz
cd fluo-1.1.0-incubating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very beginning of this file YARN is mentioned, seems like it should not be

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

docs/install.md Outdated
be in the `worker_*.log` file for each of your worker containers.
The commands will retrieve your application configuration and observer jars (using your
application name) before starting the oracle or worker process. The commands are designed
to be run on a cluster to distribute the oracle and workers across multiple nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the wording of this is a bit confusing. For the phrase distribute the oracle and workers across multiple nodes what is being distributed? The Fluo software? May be more clear to write After installing the Fluo software on each node, .... Below it mentions installing software on each node, so maybe just the order the information is introduced needs to be reworked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

docs/install.md Outdated
your YARN configuration `yarn-site.xml` (see [yarn-default.xml] for more info about this property).
If you are running Fluo on single machine, this method works well. If you are running Fluo on a
cluster, your containers and their logs will be spread across the cluster.
The oracle & worker logs can be found in the `logs/` directory of your Fluo installation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the log files are currently prefixed with the app name. I was thinking we could put the logs in a dir that uses the app name instead. A dir like logs/apps/<app name>/ or logs/<app name> . Could do this as a follow on issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

FluoConfiguration fluoConfig = new FluoConfiguration(connectionPropsFile);
fluoConfig.setApplicationName(applicationName);
CommandUtil.verifyAppInitialized(fluoConfig);
fluoConfig = FluoAdminImpl.mergeZookeeperConfig(fluoConfig);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this merge should be done. I think it would be less confusing to inject connections props + app name, the minimum needed to create a client.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the merge because there are classes (specifically OptimizeTable used by phrasecount) that are written for the fluo exec command and depend on the FluoConfiguration that is passed in to have all configuration set. If we don't do the merge, we need to update these classes to grab all configuration from Zookeeper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand why the merge is needed now, trying to maintain the behavior of passing the $FLUO_HOME/apps/app/conf/fluo.properties file. This is kinda unfortunate because it means we now have to connect to zookeeper before running the user code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All that is advertised in the docs is that this config can be used to create a client. I think it would be nice for the old config file to just pass it through and for the new config file only set the app name.

@keith-turner
Copy link
Contributor

Need to do some sort of upgrade testing. This could be a follow on task/issue.

@mikewalch
Copy link
Member Author

As for upgrade testing, I was able to run phrasecount (using master branch) using a Fluo tarball produced PR. I had to delete fluo-conn.properties (as expected) but everything else worked out of the box.

}

if (!accumuloClasspath.isEmpty()) {
String contextName = "fluo-" + config.getApplicationName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice. I wonder if we need to be careful and check what characters are in the app name

/**
* @since 1.2.0
*/
public String getHdfsRoot() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to drop the H and name the method getDfsRoot(). Currently we only support Hadoop, but in the future we could support URIs for other dfs types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

try {
fs.copyFromLocalFile(new Path(jarPath), new Path(hdfsDestDir));
} catch (IOException e) {
logger.error("Failed to copy file {} to HDFS directory {}", jarPath, hdfsDestDir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not throw an exception when this fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

## implements org.apache.fluo.api.observer.ObserverProvider.
#fluo.observer.provider=com.foo.AppObserverProvider
## Observer jars in this directory are stored away in HDFS during initialization
#fluo.observer.init.dir=/path/to/observer/jars/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to this prop, it seems like the new accumulo jars prop should be documented somewhere in this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


String accumuloClasspath;
if (!accumuloJars.isEmpty()) {
accumuloClasspath = copyJarsToHdfs(accumuloJars, "lib/accumulo");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice change, putting the accumulo jars in a per app dfs dir.

conn.tableOperations().setProperty(config.getAccumuloTable(),
AccumuloProps.TABLE_CLASSPATH, contextName);
}

if (config.getObserverJarsUrl().isEmpty() && !config.getObserverInitDir().trim().isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any test for diff combos of these configs? Combos of accumulo jars, init dir, observers jars, etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added check to FluoAdminImpl.initialized()

Objects.requireNonNull(connConfig);
Preconditions.checkArgument(connConfig.hasRequiredConnectionProps(),
"missing required connection properties");
config = FluoAdminImpl.mergeZookeeperConfig(connConfig);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this call creates a curator and closes it. I suspect later in this method a curator is created, could possibly refactor code to avoid creating multiple connections to ZK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #896

config.setProperty(prop, sharedProps.getProperty(prop));
tmpConfig.setProperty(prop, sharedProps.getProperty(prop));
}
config = FluoAdminImpl.mergeZookeeperConfig(config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could curator be passed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #896

echo " wait <app> Waits until all notifications are processed"
echo " version Prints the version of Fluo"
echo " exec <app> <class> {<argument>}";
echo " init <appProps> {<arg>} Initializes Fluo application using <appProps>. Run with -h to see additional args."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a thought, not sure if it will work. If this were of the form init <app name> <appProps> {args} then it would be more consistent with other commands and the fluo-app.props file would not require the name prop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


export FLUO_LOG_ID="${cmd}_$(hostname)_$(date +%s)"

JAVA_OPTS=("-Dlog4j.configuration=file:${FLUO_LOG4J_CONFIG}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like these JAVA_OPTS are only used for services? If so, I think the name should changes. At first I thought maybe all of the commands would use this (like fluo init). Also, if all of the logging stuff is only used for services it would be nice to group all of the service config into a section (delimited by the ############)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.. changed JAVA_OPTS to SERVICE_OPTS

@@ -41,14 +41,14 @@

@Test
public void testDefaults() {
Assert.assertEquals(FluoConfiguration.CLIENT_ZOOKEEPER_CONNECT_DEFAULT,
Assert.assertEquals(FluoConfiguration.CONNECTION_ZOOKEEPERS_DEFAULT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try running this test with code coverage? If not I can try that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I didn't

@keith-turner
Copy link
Contributor

@mikewalch re upgrade testing. I am thinking of initializing and starting an app with 1.0 or 1.1, stopping the app, and then trying to start it with 1.2.0-SNAPSHOT.

##################################################################
#####################################################################
# Build SERVICE_OPTS variable. Defaults below work but can be edited.
#####################################################################
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all of the logging stuff move to this new section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bebb493. Let me know what you think.

mikewalch and others added 2 commits August 3, 2017 13:31
Made this change online using the github editor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants