New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"TypeError: 'int' object is not iterable" cause the application to abort #2
Comments
nevermind, problem solved with a correct hdfs configuration and settings in spark-env.sh. The split works |
running into essentially the same issue - do you recall what configuration needed adjustment? |
hi, it's been a while and honestly i don't recall precisely what caused the issue. numpy scipy pandas ml_metrics predictionio tqdm click openpyxl pyspark Anyway, it's still working so here's my .bashrc part for python (3.5) and the one for spark-env.sh (2.1.1): .bashrc
spark-env.sh:
This is how it worked for me |
Thanks for this! It was exactly the issue I was facing - for me, the only requirement from your bash work I had to do to get mine working was: export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH Your mileage may vary - but I can now use pyspark locally in unit tests. |
Hi, "map_test.py split" cause the application to abort apparently due to a TypeError.
Is this tool supposed to work with python 3.5 or 3.6?
Anyway, here's the output:
`2018-05-14 11:51:45,168 INFO Splitting started
/home/aml/mlolUR/ur-analysis-tools/report.py:15: DeprecationWarning: Call to deprecated function remove_sheet (Use wb.remove(worksheet) or del wb[sheetname]).
wb.remove_sheet(wb.active)
2018-05-14 11:51:45,170 INFO Spark initialization
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/05/14 11:51:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/05/14 11:51:46 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
2018-05-14 11:51:47,467 INFO Source file reading
2018-05-14 11:51:53,761 INFO Filter users with small number of events
2018-05-14 11:51:54,122 INFO Split data into train and test
[Stage 3:> (0 + 2) / 200]18/05/14 11:52:08 WARN TaskSetManager: Lost task 1.0 in stage 3.0 (TID 19, 172.31.70.53, executor 0): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 163, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 54, in read_command
command = serializer._read_with_length(file)
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 169, in _read_with_length
return self.loads(obj)
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 451, in loads
return pickle.loads(obj, encoding=encoding)
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 783, in _make_skel_func
closure = _reconstruct_closure(closures) if closures else None
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 775, in _reconstruct_closure
return tuple([_make_cell(v) for v in values])
TypeError: 'int' object is not iterable
18/05/14 11:52:08 ERROR TaskSetManager: Task 3 in stage 3.0 failed 4 times; aborting job
18/05/14 11:52:08 WARN TaskSetManager: Lost task 1.2 in stage 3.0 (TID 30, 172.31.70.53, executor 0): TaskKilled (killed intentionally)
18/05/14 11:52:08 WARN TaskSetManager: Lost task 2.2 in stage 3.0 (TID 29, 172.31.70.53, executor 0): TaskKilled (killed intentionally)
Traceback (most recent call last):
File "map_test.py", line 647, in
root()
File "/home/aml/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/home/aml/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/aml/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/aml/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/aml/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "map_test.py", line 122, in split
train_df, test_df = split_data(df)
File "map_test.py", line 63, in split_data
split_date = get_split_date(df, cfg.splitting.split_event, cfg.splitting.train_ratio)
File "map_test.py", line 51, in get_split_date
total_primary_events = date_rdd.count()
File "/home/aml/anaconda3/lib/python3.6/site-packages/pyspark/rdd.py", line 1056, in count
return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
File "/home/aml/anaconda3/lib/python3.6/site-packages/pyspark/rdd.py", line 1047, in sum
return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add)
File "/home/aml/anaconda3/lib/python3.6/site-packages/pyspark/rdd.py", line 921, in fold
vals = self.mapPartitions(func).collect()
File "/home/aml/anaconda3/lib/python3.6/site-packages/pyspark/rdd.py", line 824, in collect
port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call
File "/home/aml/anaconda3/lib/python3.6/site-packages/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 3.0 failed 4 times, most recent failure: Lost task 3.3 in stage 3.0 (TID 28, 172.31.70.53, executor 0): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 163, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 54, in read_command
command = serializer._read_with_length(file)
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 169, in _read_with_length
return self.loads(obj)
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 451, in loads
return pickle.loads(obj, encoding=encoding)
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 783, in _make_skel_func
closure = _reconstruct_closure(closures) if closures else None
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 775, in _reconstruct_closure
return tuple([_make_cell(v) for v in values])
TypeError: 'int' object is not iterable
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:453)
at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 163, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 54, in read_command
command = serializer._read_with_length(file)
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 169, in _read_with_length
return self.loads(obj)
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 451, in loads
return pickle.loads(obj, encoding=encoding)
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 783, in _make_skel_func
closure = _reconstruct_closure(closures) if closures else None
File "/home/aml/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 775, in _reconstruct_closure
return tuple([_make_cell(v) for v in values])
TypeError: 'int' object is not iterable
`
The text was updated successfully, but these errors were encountered: