Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparkexjar always empty #15

Closed
sciaba opened this issue Sep 25, 2018 · 23 comments
Closed

sparkexjar always empty #15

sciaba opened this issue Sep 25, 2018 · 23 comments

Comments

@sciaba
Copy link

sciaba commented Sep 25, 2018

In

https://github.com/vkuznet/CMSSpark/blob/7e8a241a9801e77346f5d86443fde4eeaa7688e4/bin/run_spark#L34

the variable always gets an empty value, as the path used does not exist on LXPLUS7. Is this line of code still relevant?

@vkuznet
Copy link
Collaborator

vkuznet commented Sep 27, 2018 via email

@sciaba
Copy link
Author

sciaba commented Sep 27, 2018

If so, how can I know if the fact of not having found the code on lxplus is a problem for my job? What would the symptoms be?

@vkuznet
Copy link
Collaborator

vkuznet commented Sep 27, 2018 via email

@vkuznet
Copy link
Collaborator

vkuznet commented Sep 27, 2018 via email

@sciaba
Copy link
Author

sciaba commented Sep 27, 2018

The second commit just removes the trailing comma if sparkexjar is empty, right? I guess it doesn't change anything from a practical point of view.
The only remaining issue is purely cosmetic: the output still produces

ls: cannot access /usr/lib/spark/examples/lib/spark-examples*: No such file or directory

so I'd add a 2> /dev/null...

@vkuznet
Copy link
Collaborator

vkuznet commented Sep 27, 2018 via email

@sciaba
Copy link
Author

sciaba commented Oct 4, 2018

There is a fundamental problem. If sparkexjar is empty, there is no way for

https://github.com/vkuznet/CMSSpark/blob/d42692cd75c20227b2988c033cec99788170da9c/src/python/CMSSpark/spark_utils.py#L419

to work because

aconv="org.apache.spark.examples.pythonconverters.AvroWrapperToJavaConverter"

won't exist.
I believe this is why I cannot get CMSSpark to work from ithdp-client01.cern.ch when I do

source hadoop-setconf.sh analytix

which is equivalent to

source /cvmfs/sft.cern.ch/lcg/etc/hadoop-confext/hadoop-setconf.sh analytix

In both cases you use the latest version of Spark, which doesn't distribute the examples. So, I cannot even suggest a fix.

@sciaba
Copy link
Author

sciaba commented Oct 4, 2018

@vkuznet
Copy link
Collaborator

vkuznet commented Oct 4, 2018 via email

@sciaba
Copy link
Author

sciaba commented Oct 5, 2018

Hi Valentin,
thank you! Could you give a look at the SNOW ticket I created? Zbigniew is willing to distribute the examples jar and you might want to agree with him how and where.
I don't know if I'll have time to try your fix very soon, I'll do my best.

@sciaba
Copy link
Author

sciaba commented Oct 5, 2018

Just by looking at the code I noticed that you are pointing to the old version of the jar; Zbigniew made available the latest version, compatible with 2.3.2, on it-hadoop-client under
/usr/hdp/spark-2/examples/jars/spark-examples_2.11-2.3.2.jar

@vkuznet
Copy link
Collaborator

vkuznet commented Oct 5, 2018 via email

@sciaba
Copy link
Author

sciaba commented Oct 16, 2018

The jar file disappeared. Now I'm 100% stuck. Do you manage to use CMSSpark at all?

@vkuznet
Copy link
Collaborator

vkuznet commented Oct 16, 2018 via email

@sciaba
Copy link
Author

sciaba commented Oct 16, 2018 via email

@vkuznet
Copy link
Collaborator

vkuznet commented Oct 16, 2018 via email

@vkuznet
Copy link
Collaborator

vkuznet commented Oct 16, 2018 via email

@vkuznet
Copy link
Collaborator

vkuznet commented Oct 16, 2018 via email

@sciaba
Copy link
Author

sciaba commented Oct 16, 2018

It worked also for me from lxplus7, but only after replacing

.write.format("com.databricks.spark.csv")\

with

.write.format("csv")\

Otherwise I was getting

pyspark.sql.utils.AnalysisException: u'path hdfs://analytix/cms/users/asciaba/prova2/2018/10/15 already exists.;'

even if prova2 didn't exist before.

@sciaba
Copy link
Author

sciaba commented Oct 16, 2018

Could you try to do the same test but with code that reads Avro files (that is, that calls jm_tables or cmssw_tables)? This is to test if the Avro jar works with Spark 2.3.

@vkuznet
Copy link
Collaborator

vkuznet commented Oct 16, 2018 via email

@vkuznet
Copy link
Collaborator

vkuznet commented Oct 16, 2018 via email

@sciaba
Copy link
Author

sciaba commented Oct 17, 2018

I could also run my code (which uses Avro files) successfully with and without Yarn. I suppose we can close this issue.

@vkuznet vkuznet closed this as completed Oct 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants