Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated jars in classpath when submitting a spark job with Elasticsearch to YARN #1097

Closed
1 of 2 tasks
eubnara opened this issue Jan 21, 2018 · 2 comments
Closed
1 of 2 tasks

Comments

@eubnara
Copy link

eubnara commented Jan 21, 2018

What kind an issue is this?

  • Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
    The easier it is to track down the bug, the faster it is solved.
  • Feature Request. Start by telling us what problem you’re trying to solve.
    Often a solution already exists! Don’t send pull requests to implement new features without
    first getting our support. Sometimes we leave features out on purpose to keep the project small.

Issue description

Description

I use spark on Yarn. The localized jar file is included classpath duplicately when the jar path is configured as a symbolic link.
For example, if /some/__app__.jar is included after submitting a spark job and the symbolic link(/some -> /parent/some) is there, final classpath has both /parent/some/__app__.jar and /some/__app__.jar.
It has occurred when I use spark-submit with arguments as --packages, --jars.

I do some workarounds to solve this problem.

  • I localized the elasticsearch hadoop jar using --files.
  • I add ./* to classpath manually.

I read some similar issue(#579).
I think it is more reasonable to normalize some jars by identifying inode.

Steps to reproduce

  1. The localized nodemanager's directory is composed of some symbolic link.
  2. I use spark-submit and send jars through --packages or --jars arguments. (e.g. --packages org.elasticsearch:elasticsearch-spark-20_2.10:6.1.1)
  3. I get Multiple ES-Hadoop versions detected in the classpath ... error messages.

like as:

java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one
jar:file:/some/__app__.jar
jar:file:/parent/some/__app__.jar

Version Info

OS: : centos7
JVM : jdk-1.8.0_65
Hadoop/Spark: Spark 2.2.1
ES-Hadoop : elasticsearch-spark-20_2.10
ES : 6.1.1

eubnara added a commit to eubnara/elasticsearch-hadoop that referenced this issue Jan 21, 2018
eubnara added a commit to eubnara/elasticsearch-hadoop that referenced this issue Jan 28, 2018
eubnara added a commit to eubnara/elasticsearch-hadoop that referenced this issue Feb 4, 2018
@eubnara
Copy link
Author

eubnara commented Feb 4, 2018

I'm trying another method, not using inode, but using getCanonicalPath() api.

@jbaiera
Copy link
Member

jbaiera commented Nov 30, 2018

Fixed with #1216

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants