Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2913] Place our log4j.properties at the head of the classpath. #1844

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
7 participants
@JoshRosen
Copy link
Contributor

commented Aug 8, 2014

This addresses SPARK-2913, an issue where changes to log4j.properties didn't take effect in our default Spark EC2 configuration because Hadoop's configuration directory would end up ahead of Spark's on the computed CLASSPATH.

@SparkQA

This comment has been minimized.

Copy link

commented Aug 8, 2014

QA tests have started for PR 1844. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18161/consoleFull

@SparkQA

This comment has been minimized.

Copy link

commented Aug 8, 2014

QA results for PR 1844:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18161/consoleFull

@aarondav

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2014

There was also this PR I made approximately forever-ago to accomplish a similar goal, but just for spark-ec2: https://github.com/mesos/spark-ec2/pull/56/files

@JoshRosen

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2014

Any thoughts on this approach of doing it in Spark vs spark-ec2? The current configuration docs just tell users "edit Spark's log4j.properties" without any classpath-ordering caveats. I think it will be really confusing to users if these instructions don't work out of the box in the most common configurations.

The only downside that I see here is that users might want to place some other log4j.properties further ahead on the classpath. In that case, they could just symlink the other file into Spark's log4j.properties location.

@pwendell

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2014

I think it's reasonable to handle this specially in spark itself - @aarondav what do you think?

@shivaram

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2014

One thing thats good to keep in mind that often applications written on top of Spark may also have their own log4j.properties and core-site.xml etc. So putting Spark's conf before everything else could make it hard to override for users.

Changing this in spark-ec2 however should be fine as that is a custom setup just for trying out Spark.

@JoshRosen

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2014

@shivaram Good points. We probably want Spark's log4j.properties to be used for interactive shells and when launching Spark standalone masters and workers, but I can see cases where the user might prefer if spark-submit used the log4j.properties file from their application JAR.

Even if we decide to only change the code in spark-ec2, I think we should still update the documentation to be clearer about possible sources of log4j.properties (and maybe explain how to debug these issues with SPARK_PRINT_LAUNCH_COMMAND=1).

I can look into this, but do you know whether the application JAR takes precedence over the other classpath entries so that its log4j.properties file will always be preferred?

@shivaram

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2014

Agree about updating the documentation.

I don't know if application jars take precedence when you use SparkSubmit. On the executor side I think JVMs are always launched using spark-class which means that they should use compute-classpath.sh -- Also my guess is that since logging is initialized very early on, it might be too late if the application's log4j.properties is added to the class loader when doing addJar or something like that.

@aarondav @pwendell could probably throw more light on this.

@sryza

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2014

On the executor side, framework jars come first unless spark.files.userClassPathFirst is set to true.
At least for Spark on YARN, executors are not launched with spark-class.

@JoshRosen

This comment has been minimized.

Copy link
Contributor Author

commented Aug 20, 2014

I'm going to close this for now and revisit it later; there's some more complexity RE: the log4j.properties file that takes precedence when launching workers.

@JoshRosen JoshRosen closed this Aug 20, 2014

@mammothcm

This comment has been minimized.

Copy link

commented on de5bf18 Nov 21, 2014

I think this solution is a little tricky. Why not just remove the judgement of "log4j12Initialized" in Logging.scala ?

This comment has been minimized.

Copy link
Owner Author

replied Nov 23, 2014

@mammothcm Do you mind posting your suggestion over at the SPARK-2913 JIRA, since it will be more likely to be discussed over there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.