-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sparkexjar always empty #15
Comments
This is relevant since we *do* use code on non-lxplus machine where we load
libraries from local areas. But as you pointed out the code may exists on eos
now. The problem is to adopt to different configuration on nodes, e.g. where
no eos is present.
…On 0, Andrea Sciaba ***@***.***> wrote:
In
https://github.com/vkuznet/CMSSpark/blob/7e8a241a9801e77346f5d86443fde4eeaa7688e4/bin/run_spark#L34
the variable always gets an empty value, as the path used does not exist on LXPLUS7. Is this line of code still relevant?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#15
|
If so, how can I know if the fact of not having found the code on lxplus is a problem for my job? What would the symptoms be? |
you don't need to do anything, I'll refactor code to adopt to different
scenarios. Once committed I'll update the ticket.
…On 0, Andrea Sciaba ***@***.***> wrote:
If so, how can I know if the fact of not having found the code on lxplus is a problem for my job? What would the symptoms be?
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#15 (comment)
|
… On 0, Andrea Sciaba ***@***.***> wrote:
If so, how can I know if the fact of not having found the code on lxplus is a problem for my job? What would the symptoms be?
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#15 (comment)
|
The second commit just removes the trailing comma if sparkexjar is empty, right? I guess it doesn't change anything from a practical point of view. ls: cannot access /usr/lib/spark/examples/lib/spark-examples*: No such file or directory so I'd add a 2> /dev/null... |
yes, if sparkexjar is empty it is not appended to jars.
And, yes your observation is correct with the error and I added your suggestion
in the head.
…On 0, Andrea Sciaba ***@***.***> wrote:
The second commit just removes the trailing comma if sparkexjar is empty, right? I guess it doesn't change anything from a practical point of view.
The only remaining issue is purely cosmetic: the output still produces
ls: cannot access /usr/lib/spark/examples/lib/spark-examples*: No such file or directory
so I'd add a 2> /dev/null...
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#15 (comment)
|
There is a fundamental problem. If sparkexjar is empty, there is no way for to work because aconv="org.apache.spark.examples.pythonconverters.AvroWrapperToJavaConverter" won't exist. source hadoop-setconf.sh analytix which is equivalent to source /cvmfs/sft.cern.ch/lcg/etc/hadoop-confext/hadoop-setconf.sh analytix In both cases you use the latest version of Spark, which doesn't distribute the examples. So, I cannot even suggest a fix. |
I have submitted a ticket: https://cern.service-now.com/service-portal/view-request.do?n=RQF1130097 |
Ahh, I see, so we need to bug CERN IT to provide it in new spark setup.
Meanwhile, the fix is easy:) As with other missing jars I put this one into my
public area and load it from there if it is not present on a system.
The fix is committed in c58772d
Please try out the masthead and let me know if it works.
…On 0, Andrea Sciaba ***@***.***> wrote:
There is a fundamental problem. If sparkexjar is empty, there is no way for
https://github.com/vkuznet/CMSSpark/blob/d42692cd75c20227b2988c033cec99788170da9c/src/python/CMSSpark/spark_utils.py#L419
to work because
aconv="org.apache.spark.examples.pythonconverters.AvroWrapperToJavaConverter"
won't exist.
I believe this is why I cannot get CMSSpark to work from ithdp-client01.cern.ch when I do
source hadoop-setconf.sh analytix
which is equivalent to
source /cvmfs/sft.cern.ch/lcg/etc/hadoop-confext/hadoop-setconf.sh analytix
In both cases you use the latest version of Spark, which doesn't distribute the examples. So, I cannot even suggest a fix.
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#15 (comment)
|
Hi Valentin, |
Just by looking at the code I noticed that you are pointing to the old version of the jar; Zbigniew made available the latest version, compatible with 2.3.2, on it-hadoop-client under |
Well, it is not a generic solution since it is only available on it-hadoop-clent
node but not on lxplus. Please ask him to put it on lxplus, otherwise we still
need it to put in a someones public area. I can't comment on a SNOW ticket but I
added myself to watch it.
…On 0, Andrea Sciaba ***@***.***> wrote:
Just by looking at the code I noticed that you are pointing to the old version of the jar; Zbigniew made available the latest version, compatible with 2.3.2, on it-hadoop-client under
/usr/hdp/spark-2/examples/jars/spark-examples_2.11-2.3.2.jar
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#15 (comment)
|
The jar file disappeared. Now I'm 100% stuck. Do you manage to use CMSSpark at all? |
Could you please post details/log?
I didn't touch jar files and I can see them:
ls -al /afs/cern.ch/user/v/valya/public/spark/spark-csv-assembly-1.4.0.jar
-rw-r--r--. 1 valya zh 7467651 Dec 12 2016 /afs/cern.ch/user/v/valya/public/spark/spark-csv-assembly-1.4.0.jar
ls -al /afs/cern.ch/user/v/valya/public/spark/avro-mapred-1.7.6-cdh5.7.6.jar
-rw-r--r--. 1 valya zh 181060 Apr 5 2018 /afs/cern.ch/user/v/valya/public/spark/avro-mapred-1.7.6-cdh5.7.6.jar
ls -al /afs/cern.ch/user/v/valya/public/spark/spark-examples-1.6.0-cdh5.15.1-hadoop2.6.0-cdh5.15.1.jar
-rw-r--r--. 1 valya zh 19914970 Oct 4 19:22 /afs/cern.ch/user/v/valya/public/spark/spark-examples-1.6.0-cdh5.15.1-hadoop2.6.0-cdh5.15.1.jar
So, it would be nice if you provide exact command how run_spark is invoked.
I run CMSSpark all the time on our wmarchive and cms popularity nodes, but I
don't run it on lxplus.
…On 0, Andrea Sciaba ***@***.***> wrote:
The jar file disappeared. Now I'm 100% stuck. Do you manage to use CMSSpark at all?
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#15 (comment)
|
Hi Valentin,
for my lxplus7 (or at most ithdp-client) is the only option. Are you
using Spark 1.6 or 2.3 on your nodes? I need 2.3 because this is now the
only available version on the public nodes.
Anyway, I get all varieties of error messages and I am losing all hopes
of using CMSSpark.
For example, on ithdp-client, submission dies straight away with
18/10/16 15:43:16 ERROR SparkContext: Error initializing SparkContext.
org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit
application_1539347604788_2729 to YARN : Failed to renew token: Kind:
HDFS_DELEGATION_TOKEN, Service: ha-hdfs:analytix, Ident: (token for
asciaba: HDFS_DELEGATION_TOKEN owner=asciaba@CERN.CH, renewer=nobody,
realUser=, issueDate=1539697383192, maxDate=1540302183192,
sequenceNumber=4619426, masterKeyId=2069)
while on lxplus7 I get a ludicrous error, for which I submitted a ticket:
https://cern.service-now.com/service-portal/view-incident.do?n=INC1816438
Both seem to be at a more fundamental level than CMSSpark.
Even assuming I solve these, I still will need the examples jar file,
which now is at
/usr/hdp/spark-2.3/examples/jars/spark-examples_2.11-2.3.2.jar
on ithdp-client (they relocated it, for some reason).
It's a mess...
Andrea
…On 16-10-2018 15:36, Valentin Kuznetsov wrote:
Could you please post details/log?
I didn't touch jar files and I can see them:
ls -al /afs/cern.ch/user/v/valya/public/spark/spark-csv-assembly-1.4.0.jar
-rw-r--r--. 1 valya zh 7467651 Dec 12 2016
/afs/cern.ch/user/v/valya/public/spark/spark-csv-assembly-1.4.0.jar
ls -al /afs/cern.ch/user/v/valya/public/spark/avro-mapred-1.7.6-cdh5.7.6.jar
-rw-r--r--. 1 valya zh 181060 Apr 5 2018
/afs/cern.ch/user/v/valya/public/spark/avro-mapred-1.7.6-cdh5.7.6.jar
ls -al
/afs/cern.ch/user/v/valya/public/spark/spark-examples-1.6.0-cdh5.15.1-hadoop2.6.0-cdh5.15.1.jar
-rw-r--r--. 1 valya zh 19914970 Oct 4 19:22
/afs/cern.ch/user/v/valya/public/spark/spark-examples-1.6.0-cdh5.15.1-hadoop2.6.0-cdh5.15.1.jar
So, it would be nice if you provide exact command how run_spark is invoked.
I run CMSSpark all the time on our wmarchive and cms popularity nodes, but I
don't run it on lxplus.
On 0, Andrea Sciaba ***@***.***> wrote:
> The jar file disappeared. Now I'm 100% stuck. Do you manage to use
CMSSpark at all?
>
> --
> You are receiving this because you commented.
> Reply to this email directly or view it on GitHub:
> #15 (comment)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#15 (comment)>, or
mute the thread
<https://github.com/notifications/unsubscribe-auth/AMLFbWBfn_yZGCinHwFVr7QV5Jfy_VU-ks5uleDCgaJpZM4W4pyD>.
|
Andrea,
on our nodes we still have 1.6 version of spark and I need to admit that I
haven't time to test 2.3.
I setup now CMSSpark environment and will see what I can do today. Unfortunately
tomorrow I'm heading to CERN and it means can't work for couple of days.
I'll keep you posted what I can find and try to make CMSSpark work on 2.3 on
lxplus7.
Best,
Valentin.
…On 0, Andrea Sciaba ***@***.***> wrote:
Hi Valentin,
for my lxplus7 (or at most ithdp-client) is the only option. Are you
using Spark 1.6 or 2.3 on your nodes? I need 2.3 because this is now the
only available version on the public nodes.
Anyway, I get all varieties of error messages and I am losing all hopes
of using CMSSpark.
For example, on ithdp-client, submission dies straight away with
18/10/16 15:43:16 ERROR SparkContext: Error initializing SparkContext.
org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit
application_1539347604788_2729 to YARN : Failed to renew token: Kind:
HDFS_DELEGATION_TOKEN, Service: ha-hdfs:analytix, Ident: (token for
asciaba: HDFS_DELEGATION_TOKEN ***@***.***, renewer=nobody,
realUser=, issueDate=1539697383192, maxDate=1540302183192,
sequenceNumber=4619426, masterKeyId=2069)
while on lxplus7 I get a ludicrous error, for which I submitted a ticket:
https://cern.service-now.com/service-portal/view-incident.do?n=INC1816438
Both seem to be at a more fundamental level than CMSSpark.
Even assuming I solve these, I still will need the examples jar file,
which now is at
/usr/hdp/spark-2.3/examples/jars/spark-examples_2.11-2.3.2.jar
on ithdp-client (they relocated it, for some reason).
It's a mess...
Andrea
On 16-10-2018 15:36, Valentin Kuznetsov wrote:
> Could you please post details/log?
> I didn't touch jar files and I can see them:
>
> ls -al /afs/cern.ch/user/v/valya/public/spark/spark-csv-assembly-1.4.0.jar
> -rw-r--r--. 1 valya zh 7467651 Dec 12 2016
> /afs/cern.ch/user/v/valya/public/spark/spark-csv-assembly-1.4.0.jar
> ls -al /afs/cern.ch/user/v/valya/public/spark/avro-mapred-1.7.6-cdh5.7.6.jar
> -rw-r--r--. 1 valya zh 181060 Apr 5 2018
> /afs/cern.ch/user/v/valya/public/spark/avro-mapred-1.7.6-cdh5.7.6.jar
> ls -al
> /afs/cern.ch/user/v/valya/public/spark/spark-examples-1.6.0-cdh5.15.1-hadoop2.6.0-cdh5.15.1.jar
> -rw-r--r--. 1 valya zh 19914970 Oct 4 19:22
> /afs/cern.ch/user/v/valya/public/spark/spark-examples-1.6.0-cdh5.15.1-hadoop2.6.0-cdh5.15.1.jar
>
> So, it would be nice if you provide exact command how run_spark is invoked.
>
> I run CMSSpark all the time on our wmarchive and cms popularity nodes, but I
> don't run it on lxplus.
>
> On 0, Andrea Sciaba ***@***.***> wrote:
>> The jar file disappeared. Now I'm 100% stuck. Do you manage to use
> CMSSpark at all?
>>
>> --
>> You are receiving this because you commented.
>> Reply to this email directly or view it on GitHub:
>> #15 (comment)
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#15 (comment)>, or
> mute the thread
> <https://github.com/notifications/unsubscribe-auth/AMLFbWBfn_yZGCinHwFVr7QV5Jfy_VU-ks5uleDCgaJpZM4W4pyD>.
>
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#15 (comment)
|
Andrea,
I was able to reproduce the error you reported in SNOW ticket and I contacted
privately with CERN Spark people. Let's see what they will do.
I've seen in a past token issue and it should be resolve by them.
And, I asked for permanent location of jar files.
You don't need to tell me about a mess, I knew it and experienced much
more headache with running spark jobs then you do :) That's why I decided
to write CMSSpark in a first place. So with upgrade they haven't look
at all pieces we need and I'll try to push them to fix that.
Best,
Valentin.
…On 0, Andrea Sciaba ***@***.***> wrote:
Hi Valentin,
for my lxplus7 (or at most ithdp-client) is the only option. Are you
using Spark 1.6 or 2.3 on your nodes? I need 2.3 because this is now the
only available version on the public nodes.
Anyway, I get all varieties of error messages and I am losing all hopes
of using CMSSpark.
For example, on ithdp-client, submission dies straight away with
18/10/16 15:43:16 ERROR SparkContext: Error initializing SparkContext.
org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit
application_1539347604788_2729 to YARN : Failed to renew token: Kind:
HDFS_DELEGATION_TOKEN, Service: ha-hdfs:analytix, Ident: (token for
asciaba: HDFS_DELEGATION_TOKEN ***@***.***, renewer=nobody,
realUser=, issueDate=1539697383192, maxDate=1540302183192,
sequenceNumber=4619426, masterKeyId=2069)
while on lxplus7 I get a ludicrous error, for which I submitted a ticket:
https://cern.service-now.com/service-portal/view-incident.do?n=INC1816438
Both seem to be at a more fundamental level than CMSSpark.
Even assuming I solve these, I still will need the examples jar file,
which now is at
/usr/hdp/spark-2.3/examples/jars/spark-examples_2.11-2.3.2.jar
on ithdp-client (they relocated it, for some reason).
It's a mess...
Andrea
On 16-10-2018 15:36, Valentin Kuznetsov wrote:
> Could you please post details/log?
> I didn't touch jar files and I can see them:
>
> ls -al /afs/cern.ch/user/v/valya/public/spark/spark-csv-assembly-1.4.0.jar
> -rw-r--r--. 1 valya zh 7467651 Dec 12 2016
> /afs/cern.ch/user/v/valya/public/spark/spark-csv-assembly-1.4.0.jar
> ls -al /afs/cern.ch/user/v/valya/public/spark/avro-mapred-1.7.6-cdh5.7.6.jar
> -rw-r--r--. 1 valya zh 181060 Apr 5 2018
> /afs/cern.ch/user/v/valya/public/spark/avro-mapred-1.7.6-cdh5.7.6.jar
> ls -al
> /afs/cern.ch/user/v/valya/public/spark/spark-examples-1.6.0-cdh5.15.1-hadoop2.6.0-cdh5.15.1.jar
> -rw-r--r--. 1 valya zh 19914970 Oct 4 19:22
> /afs/cern.ch/user/v/valya/public/spark/spark-examples-1.6.0-cdh5.15.1-hadoop2.6.0-cdh5.15.1.jar
>
> So, it would be nice if you provide exact command how run_spark is invoked.
>
> I run CMSSpark all the time on our wmarchive and cms popularity nodes, but I
> don't run it on lxplus.
>
> On 0, Andrea Sciaba ***@***.***> wrote:
>> The jar file disappeared. Now I'm 100% stuck. Do you manage to use
> CMSSpark at all?
>>
>> --
>> You are receiving this because you commented.
>> Reply to this email directly or view it on GitHub:
>> #15 (comment)
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#15 (comment)>, or
> mute the thread
> <https://github.com/notifications/unsubscribe-auth/AMLFbWBfn_yZGCinHwFVr7QV5Jfy_VU-ks5uleDCgaJpZM4W4pyD>.
>
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#15 (comment)
|
The good news is that I was able to run CMSSpark job without yarn and it
completed successfully.
Here is how I did it:
run_spark phedex.py --fout=hdfs:///cms/tmp/phedex --date=20181015
and it didn't complained about missing jars, tokens, etc.
It would be nice if you'll try it too (with different fout argument).
It means that the actual problem is yarn submission which lost files
and not per-se with CMSSpark codebase.
Best,
Valentin.
…On 0, Andrea Sciaba ***@***.***> wrote:
Hi Valentin,
for my lxplus7 (or at most ithdp-client) is the only option. Are you
using Spark 1.6 or 2.3 on your nodes? I need 2.3 because this is now the
only available version on the public nodes.
Anyway, I get all varieties of error messages and I am losing all hopes
of using CMSSpark.
For example, on ithdp-client, submission dies straight away with
18/10/16 15:43:16 ERROR SparkContext: Error initializing SparkContext.
org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit
application_1539347604788_2729 to YARN : Failed to renew token: Kind:
HDFS_DELEGATION_TOKEN, Service: ha-hdfs:analytix, Ident: (token for
asciaba: HDFS_DELEGATION_TOKEN ***@***.***, renewer=nobody,
realUser=, issueDate=1539697383192, maxDate=1540302183192,
sequenceNumber=4619426, masterKeyId=2069)
while on lxplus7 I get a ludicrous error, for which I submitted a ticket:
https://cern.service-now.com/service-portal/view-incident.do?n=INC1816438
Both seem to be at a more fundamental level than CMSSpark.
Even assuming I solve these, I still will need the examples jar file,
which now is at
/usr/hdp/spark-2.3/examples/jars/spark-examples_2.11-2.3.2.jar
on ithdp-client (they relocated it, for some reason).
It's a mess...
Andrea
On 16-10-2018 15:36, Valentin Kuznetsov wrote:
> Could you please post details/log?
> I didn't touch jar files and I can see them:
>
> ls -al /afs/cern.ch/user/v/valya/public/spark/spark-csv-assembly-1.4.0.jar
> -rw-r--r--. 1 valya zh 7467651 Dec 12 2016
> /afs/cern.ch/user/v/valya/public/spark/spark-csv-assembly-1.4.0.jar
> ls -al /afs/cern.ch/user/v/valya/public/spark/avro-mapred-1.7.6-cdh5.7.6.jar
> -rw-r--r--. 1 valya zh 181060 Apr 5 2018
> /afs/cern.ch/user/v/valya/public/spark/avro-mapred-1.7.6-cdh5.7.6.jar
> ls -al
> /afs/cern.ch/user/v/valya/public/spark/spark-examples-1.6.0-cdh5.15.1-hadoop2.6.0-cdh5.15.1.jar
> -rw-r--r--. 1 valya zh 19914970 Oct 4 19:22
> /afs/cern.ch/user/v/valya/public/spark/spark-examples-1.6.0-cdh5.15.1-hadoop2.6.0-cdh5.15.1.jar
>
> So, it would be nice if you provide exact command how run_spark is invoked.
>
> I run CMSSpark all the time on our wmarchive and cms popularity nodes, but I
> don't run it on lxplus.
>
> On 0, Andrea Sciaba ***@***.***> wrote:
>> The jar file disappeared. Now I'm 100% stuck. Do you manage to use
> CMSSpark at all?
>>
>> --
>> You are receiving this because you commented.
>> Reply to this email directly or view it on GitHub:
>> #15 (comment)
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#15 (comment)>, or
> mute the thread
> <https://github.com/notifications/unsubscribe-auth/AMLFbWBfn_yZGCinHwFVr7QV5Jfy_VU-ks5uleDCgaJpZM4W4pyD>.
>
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#15 (comment)
|
It worked also for me from lxplus7, but only after replacing .write.format("com.databricks.spark.csv")\ with .write.format("csv")\ Otherwise I was getting pyspark.sql.utils.AnalysisException: u'path hdfs://analytix/cms/users/asciaba/prova2/2018/10/15 already exists.;' even if prova2 didn't exist before. |
Could you try to do the same test but with code that reads Avro files (that is, that calls jm_tables or cmssw_tables)? This is to test if the Avro jar works with Spark 2.3. |
I think it is different issue and if so should be claimed in separate ticket.
But said that I doubt that path is an issue of namespace for calling csv format.
Since I don't know how you run it I may only guess that you
apply --fout=hdfs://analytix/cms/users/asciaba/prova2
and in this case if prova2 didn't exist the entire path didn't exists.
I think what is missing is logic to test the fout directory first and not
specification of namespace for csv format.
…On 0, Andrea Sciaba ***@***.***> wrote:
It worked also for me from lxplus7, but only after replacing
.write.format("com.databricks.spark.csv")\
with
.write.format("csv")\
Otherwise I was getting
pyspark.sql.utils.AnalysisException: u'path hdfs://analytix/cms/users/asciaba/prova2/2018/10/15 already exists.;'
even if prova2 didn't exist before.
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#15 (comment)
|
Yes, it works, I run it as following:
run_spark jm.py --fout=hdfs:///cms/tmp/dbs_jm --date=20181015 2>&1 1>& jm.log
where jm.py you can find here:
/afs/cern.ch/user/v/valya/public/CMSSpark/jm.py
It reads some attributes of jm_tables.
…On 0, Andrea Sciaba ***@***.***> wrote:
Could you try to do the same test but with code that reads Avro files (that is, that calls jm_tables or cmssw_tables)? This is to test if the Avro jar works with Spark 2.3.
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#15 (comment)
|
I could also run my code (which uses Avro files) successfully with and without Yarn. I suppose we can close this issue. |
In
https://github.com/vkuznet/CMSSpark/blob/7e8a241a9801e77346f5d86443fde4eeaa7688e4/bin/run_spark#L34
the variable always gets an empty value, as the path used does not exist on LXPLUS7. Is this line of code still relevant?
The text was updated successfully, but these errors were encountered: