error Dbscan.train on 9_1M.csv (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.alitouka.spark.dbscan.spatial.Box.adjacentBoxes of type scala.collection.immutable.List in instance of org.alitouka.spark.dbscan.spatial.Box) #22

ttpro1995 · 2017-11-10T11:03:04Z

Zeppelin notebook export json https://gist.github.com/0f067d6ff2239500ca8eed7d38b5872b
Built on commit d3b085286ccb16b146e7bb5234765cbc23e11c66

val data_path2 = "hdfs://127.0.0.1:9000/data/9_1M.csv"
val dataset2 = IOHelper.readDataset(sc, data_path2)
val settings = new DbscanSettings ().withEpsilon (0.8).withNumberOfPoints (4).withTreatBorderPointsAsNoise(true)
val clusteringResult = Dbscan.train (dataset2, settings)

error log https://gist.github.com/ttpro1995/7437b1f3b1f944fd26daf2ef4ba73efe

build.sbt

name := "spark_dbscan"

organization := "org.alitouka"

version := "0.0.4"

scalaVersion := "2.11.7"

libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.2.0" % "provided"

libraryDependencies += "org.scalatest" % "scalatest_2.11" % "2.1.3" % "test"

libraryDependencies += "org.apache.commons" % "commons-math3" % "3.2"

// https://mvnrepository.com/artifact/com.github.scopt/scopt_2.10
libraryDependencies += "com.github.scopt" % "scopt_2.11" % "3.7.0"

The text was updated successfully, but these errors were encountered:

valera7979 · 2018-01-22T12:36:29Z

I have the same problem. This problem occur when use spark 1.6 and higher. It works when reduce number of points below org.alitouka.spark.dbscan.spatial.rdd.PartitioningSettings.DefaultNumberOfPointsInBox . But in this case calculation performed in one container.

valera7979 · 2018-02-16T11:50:49Z

I found solution. I reduced scala version to 2.10 because at higher version some methods became deprecated, and it worked.

lccmpn · 2018-03-15T15:16:29Z

I tried compiling the library with scala version 2.10.6 and spark 2.1.0 but still no luck, it throws the same exception. @valera7979 could you explain how did you manage to make it work?

valera7979 · 2018-03-27T13:32:55Z

I made a pull request. See #24
Use this code https://github.com/valera7979/spark_dbscan/tree/rise_Spark

shuangyumo · 2018-06-07T08:29:11Z

i get the same error, but my friends runs it and no error. very confused the code.
can you help me to solve the problem？

sfdan473414 · 2019-02-14T10:53:24Z

i get the same error , but i solved this problem by the following code .
before:
val partitioningSettings = new PartitioningSettings (numberOfPointsInBox = argsParser.args.numberOfPoints)
after:
val partitioningSettings = new PartitioningSettings ()

it does work well.

Benji81 · 2019-02-14T17:43:28Z

i get the same error , but i solved this problem by the following code .
before:
val partitioningSettings = new PartitioningSettings (numberOfPointsInBox = argsParser.args.numberOfPoints)
after:
val partitioningSettings = new PartitioningSettings ()

it does work well.

@sfdan473414 could you give the filename(s) and line number please?

Benji81 · 2019-02-15T16:00:24Z

I found solution. I reduced scala version to 2.10 because at higher version some methods became deprecated, and it worked.

@valera7979
Do you know which parts are deprecated?

sfdan473414 · 2019-02-18T06:49:19Z

i get the same error , but i solved this problem by the following code .
before:
val partitioningSettings = new PartitioningSettings (numberOfPointsInBox = argsParser.args.numberOfPoints)
after:
val partitioningSettings = new PartitioningSettings ()
it does work well.

@sfdan473414 could you give the filename(s) and line number please?

The filename is DbcanDriver,but this problem occur again when the dataset is large (100M_4d). when i run it using small dataset (150 records) ,it does work well .

sfdan473414 · 2019-02-18T07:11:39Z

My scala version is 2.11.12 and spark version is 2.0.0.cloudera2(spark 2.x).
we can get the key info from the exception "cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.alitouka.spark.dbscan.spatial.Box.adjacentBoxes ".
that is to say, the field "org.alitouka.spark.dbscan.spatial.Box.adjacentBoxes" failed serialization or can't be serialization just like SparkContext class. so you should add annotation '@transient' for it.

Class name : org.alitouka.spark.dbscan.spatial.Box

private [dbscan] class Box (val bounds: Array[BoundsInOneDimension], val boxId: BoxId = 0, val partitionId: Int = -1, @transient var adjacentBoxes: List[Box] = Nil)

Now the program can run well in big datasets or small datasets.

DanyYan · 2019-04-24T03:51:30Z

My scala version is 2.11.12 and spark version is 2.0.0.cloudera2(spark 2.x).
we can get the key info from the exception "cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.alitouka.spark.dbscan.spatial.Box.adjacentBoxes ".
that is to say, the field "org.alitouka.spark.dbscan.spatial.Box.adjacentBoxes" failed serialization or can't be serialization just like SparkContext class. so you should add annotation '@transient' for it.

Class name : org.alitouka.spark.dbscan.spatial.Box

private [dbscan] class Box (val bounds: Array[BoundsInOneDimension], val boxId: BoxId = 0, val partitionId: Int = -1, @transient var adjacentBoxes: List[Box] = Nil)

Now the program can run well in big datasets or small datasets.

Dose the code support high dimension data？

laksheenmendis · 2020-06-03T21:07:18Z

My scala version is 2.11.12 and spark version is 2.0.0.cloudera2(spark 2.x).
we can get the key info from the exception "cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.alitouka.spark.dbscan.spatial.Box.adjacentBoxes ".
that is to say, the field "org.alitouka.spark.dbscan.spatial.Box.adjacentBoxes" failed serialization or can't be serialization just like SparkContext class. so you should add annotation '@transient' for it.

Class name : org.alitouka.spark.dbscan.spatial.Box

private [dbscan] class Box (val bounds: Array[BoundsInOneDimension], val boxId: BoxId = 0, val partitionId: Int = -1, @transient var adjacentBoxes: List[Box] = Nil)

Now the program can run well in big datasets or small datasets.

Thank you very much @sfdan473414
I had many challenges, however with your suggestion, I was able to run this with a Spark 2.2.1 cluster, with Hadoop 2.7.3

ttpro1995 referenced this issue in ttpro1995/spark_dbscan Nov 10, 2017

can run on simple data

d3b0852

MARUI1995 referenced this issue May 24, 2018

Spark version is raised to 2.2.1

280a758

laksheenmendis mentioned this issue Jun 3, 2020

java.lang.NoClassDefFoundError: org/apache/spark/Logging #29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error Dbscan.train on 9_1M.csv (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.alitouka.spark.dbscan.spatial.Box.adjacentBoxes of type scala.collection.immutable.List in instance of org.alitouka.spark.dbscan.spatial.Box) #22

error Dbscan.train on 9_1M.csv (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.alitouka.spark.dbscan.spatial.Box.adjacentBoxes of type scala.collection.immutable.List in instance of org.alitouka.spark.dbscan.spatial.Box) #22

ttpro1995 commented Nov 10, 2017 •

edited

valera7979 commented Jan 22, 2018 •

edited

valera7979 commented Feb 16, 2018 •

edited

lccmpn commented Mar 15, 2018

valera7979 commented Mar 27, 2018

shuangyumo commented Jun 7, 2018 •

edited

sfdan473414 commented Feb 14, 2019

Benji81 commented Feb 14, 2019

Benji81 commented Feb 15, 2019 •

edited

sfdan473414 commented Feb 18, 2019

sfdan473414 commented Feb 18, 2019

DanyYan commented Apr 24, 2019

laksheenmendis commented Jun 3, 2020

error Dbscan.train on 9_1M.csv (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.alitouka.spark.dbscan.spatial.Box.adjacentBoxes of type scala.collection.immutable.List in instance of org.alitouka.spark.dbscan.spatial.Box) #22

error Dbscan.train on 9_1M.csv (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.alitouka.spark.dbscan.spatial.Box.adjacentBoxes of type scala.collection.immutable.List in instance of org.alitouka.spark.dbscan.spatial.Box) #22

Comments

ttpro1995 commented Nov 10, 2017 • edited

valera7979 commented Jan 22, 2018 • edited

valera7979 commented Feb 16, 2018 • edited

lccmpn commented Mar 15, 2018

valera7979 commented Mar 27, 2018

shuangyumo commented Jun 7, 2018 • edited

sfdan473414 commented Feb 14, 2019

Benji81 commented Feb 14, 2019

Benji81 commented Feb 15, 2019 • edited

sfdan473414 commented Feb 18, 2019

sfdan473414 commented Feb 18, 2019

DanyYan commented Apr 24, 2019

laksheenmendis commented Jun 3, 2020

ttpro1995 commented Nov 10, 2017 •

edited

valera7979 commented Jan 22, 2018 •

edited

valera7979 commented Feb 16, 2018 •

edited

shuangyumo commented Jun 7, 2018 •

edited

Benji81 commented Feb 15, 2019 •

edited