Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Babylon distributed image compile error #80

Closed
geoHeil opened this issue Apr 23, 2017 · 17 comments

Comments

@geoHeil
Copy link

commented Apr 23, 2017

I get the following compile error in case Babylon distributed images are used

overloaded method value SaveAsFile with alternatives:
[error]   (x$1: java.util.List[String],x$2: String,x$3: org.datasyslab.babylon.utils.ImageType)Boolean <and>
[error]   (x$1: java.awt.image.BufferedImage,x$2: String,x$3: org.datasyslab.babylon.utils.ImageType)Boolean <and>
[error]   (x$1: org.apache.spark.api.java.JavaPairRDD,x$2: String,x$3: org.datasyslab.babylon.utils.ImageType)Boolean
[error]  cannot be applied to (org.apache.spark.api.java.JavaPairRDD[Integer,String], String, org.datasyslab.babylon.utils.ImageType)
[error]     imageGenerator.SaveAsFile(visualizationOperator.distributedVectorImage, outputPath, ImageType.SVG)

e.g. the following using simply vector images compiles fine:

def buildScatterPlot(outputPath: String, spatialRDD: SpatialRDD): Boolean = {
    val envelope = spatialRDD.boundaryEnvelope
    val s = spatialRDD.getRawSpatialRDD.rdd.sparkContext
    val visualizationOperator = new ScatterPlot(7000, 4900, envelope, false, -1, -1, true, true)
    visualizationOperator.CustomizeColor(255, 255, 255, 255, Color.GREEN, true)
    visualizationOperator.Visualize(s, spatialRDD)
    import org.datasyslab.babylon.utils.ImageType
    imageGenerator.SaveAsFile(visualizationOperator.vectorImage, outputPath, ImageType.SVG)
  }

@jiayuasu please could you explain a bit if distributedvevctorimage is required in case the boolean option for parallel image rendering / filter was selected.

@geoHeil

This comment has been minimized.

Copy link
Author

commented Apr 24, 2017

Also I experience null pointer exceptions when using the regular rasterImage when using it as outlined here https://gist.github.com/geoHeil/dbef714e2254956840832ebaabf12a07

@jiayuasu

This comment has been minimized.

Copy link
Member

commented Apr 25, 2017

@geoHeil

new ScatterPlot(7000, 4900, envelope, false, -1, -1, false, true)

Since you set the last parameter "generateVectorImage" as true, you have to use vectorImage and store it as SVG format.

imageGenerator.SaveAsFile(visualizationOperator.vectorImage, outputPath, ImageType.SVG)

In addition, for generating raster image, you don't have to use super high resolution, 1000*600 will good enough.

@jiayuasu

This comment has been minimized.

Copy link
Member

commented Apr 25, 2017

@geoHeil I know Babylon APIs are complicated. Trying to figure out a better API structure.

@geoHeil

This comment has been minimized.

Copy link
Author

commented Apr 25, 2017

@geoHeil

This comment has been minimized.

Copy link
Author

commented Apr 25, 2017

to clarify

def parallelFilterRenderStitch(outputPath: String, spatialRDD: SpatialRDD): Boolean = {
    val s = spatialRDD.getRawSpatialRDD.rdd.sparkContext
    val visualizationOperator = new HeatMap(1000, 600, spatialRDD.boundaryEnvelope, false, 2, -1, -1, false, true)
    visualizationOperator.Visualize(s, spatialRDD)
    visualizationOperator.stitchImagePartitions
    imageGenerator.SaveAsFile(visualizationOperator.distributedRasterImage, outputPath, ImageType.PNG)
  }

fails with the following compile error

aveAsFile with alternatives:
[error]   (x$1: java.util.List[String],x$2: String,x$3: org.datasyslab.babylon.utils.ImageType)Boolean <and>
[error]   (x$1: java.awt.image.BufferedImage,x$2: String,x$3: org.datasyslab.babylon.utils.ImageType)Boolean <and>
[error]   (x$1: org.apache.spark.api.java.JavaPairRDD,x$2: String,x$3: org.datasyslab.babylon.utils.ImageType)Boolean
[error]  cannot be applied to (org.apache.spark.api.java.JavaPairRDD[Integer,org.datasyslab.babylon.core.ImageSerializableWrapper], String, org.datasyslab.babylon.utils.ImageType)
[error]     imageGenerator.SaveAsFile(visualizationOperator.distributedRasterImage, outputPath, ImageType.PNG)

Please could you explain the last couple of parameters: -1, -1, false, false shouldn't the number of partitions be inferred automatically? Is it correct / mandatory to set the last ones to true (assuming the compile error is fixed) in order to perform distributed rendering for speed increase?

@jiayuasu

This comment has been minimized.

Copy link
Member

commented Apr 25, 2017

@geoHeil , the partitions on X and Y in parallel filtering and parallel rendering should be all > 0 (e.g., 2, 2) if you set the corresponding two boolean as true.

This will generate distribute raster image. You can use Spark image generator to store it onto hdfs/S3 or other Spark friendly storage. You also can use NativeJavaImageGenerator to store the distributed raster image onto local file system. It will generate a bunch of image tiles. If you choose to stitch the tiles, the stitch function will generate a raster image and store it as rasteImage.

I suggest you

  1. start with my Babylon runnable demo. You can clone GeoSpark repository and directly run that java file. Later on you can switch to Scala.

  2. Then you can try to play with RasterImage/distributed RasterImage first. Use the simplest Scatter Plot. Store them on local file system.

  3. Then you can try HeatMap raster image and Choropleth Map raster image. Please use PNG format for all cases. I am noticing sometimes the GIF does not work well.

  4. Finally, you can try the vector image format and store on local file system. Note that, currently vector image only available for Scatter Plot and Choropleth Map.

In addition, the difference between Spark Image generator and Java Image Generator is that:

  1. the former can store distributed rasterImageRDD/vectorImageRDD to Spark friendly storage such as S3/HDFS/LocalFileSystem using binary format. This way is scalable but you may not be able to read the persisted image outside Spark because they are in Spark binary format.
  2. The latter one can only store distributed or single raster/vector image onto your local file system but you are able to access/view the image using regular image viewer.
@geoHeil

This comment has been minimized.

Copy link
Author

commented Apr 26, 2017

@geoHeil

This comment has been minimized.

Copy link
Author

commented Apr 26, 2017

@jiayuasu, I created the babylon example in scala here as well:
https://github.com/geoHeil/geoSparkScalaSample/blob/master/src/main/scala/myOrg/visualization/VisualizationGeosparkLocalRaster.scala

With the visualization implementation https://github.com/geoHeil/geoSparkScalaSample/blob/master/src/main/scala/myOrg/visualization/Vis.scala

As you will see https://github.com/geoHeil/geoSparkScalaSample/blob/master/src/main/scala/myOrg/visualization/Vis.scala#L42-L61 the compile error overloaded method value SaveAsFile with alternatives is still there.

Unfortunately, your sample data from src/test/resources/ will trigger a IllegalArgumentException: Points of LinearRing do not form a closed linestring Exception. To reproduce simply

git clone https://github.com/geoHeil/geoSparkScalaSample.git
cd geoSparkScalaSample
sbt run
# when propmpted for multiple main classes select 2 

I will try to ask some Scala experts regarding the compile issue. Maybe you could have a look at the input files.

@geoHeil

This comment has been minimized.

Copy link
Author

commented Apr 26, 2017

@jiayuasu: Unfortunately, I think there is a bug in the latest geospark version. Causing Points of LinearRing do not form a closed linestring for a dataset which worked finde with older versions.

@geoHeil

This comment has been minimized.

Copy link
Author

commented Apr 26, 2017

@jiayuasu regarding the original problem: http://stackoverflow.com/questions/43626048/convert-java-to-scala-code-change-of-method-signatures/43630473#43630473 you are using Raw type parameters. Do you have any plans to use regular generics as suggested from the answer?

@jiayuasu

This comment has been minimized.

Copy link
Member

commented Apr 26, 2017

@geoHeil , thanks for the great information! For the two bugs you've mentioned, 1. How can I reproduce the first one (do not form a closed linestring)? It is weird and probably GeoSpark unit tests don't cover it. 2. I intend to make JavaPairRDD not type safe due to some reasons. But now I realize it is not wise.

I will release a patch to fix the second bug very soon. Also please tell me how to reproduce the "LinearRing do not closed" bug.

To summarize, this Babylon bug happens when users use Babylon API to visualize distribute image RDD in Scala

@jiayuasu jiayuasu self-assigned this Apr 26, 2017

@jiayuasu jiayuasu added the bug label Apr 26, 2017

@geoHeil

This comment has been minimized.

Copy link
Author

commented Apr 27, 2017

@jiayuasu thanks. I will create a separate issue for the line string problem. #83

@jiayuasu

This comment has been minimized.

Copy link
Member

commented Apr 28, 2017

Hi @geoHeil ,

This issue should have been solved in Babylon 0.1.2-snapshot. I have deprecated all old image generators and add a new "BabylonImageGenerator" to replace all old generator APIs. The new "BabylonImageGenerator" has new APIs which are easier to understand.

Please refer to the latest Babylon Java example and try it out in Scala. I think you just need to replace the old imageGenerator part.

@geoHeil

This comment has been minimized.

Copy link
Author

commented Apr 28, 2017

Hi @jiayuasu ,
thanks for the quick response. The new API is much nicer. Though, it would be great if similar to df.write.mode(SaveMode.Overwrite).parquet(path) you would automatically allow to overwrite output in case of distributed images.

However, Still get a IllegalArgumentException: image == null! in case of distributed raster image and saving to local:

val vDistributedRaster = new ScatterPlot(1000, 600, USMainLandBoundary, false, 2, 2, true, false)
  vDistributedRaster.CustomizeColor(255, 255, 255, 255, Color.GREEN, true)
  vDistributedRaster.Visualize(spark.sparkContext, spatialRDD)
  val imageGenerator = new BabylonImageGenerator()
  imageGenerator.SaveRasterImageAsLocalFile(vDistributedRaster.distributedRasterImage, scatterPlotOutputPath + "distributedRaster", ImageType.PNG)


java.lang.IllegalArgumentException: image == null!                              
  at javax.imageio.ImageTypeSpecifier.createFromRenderedImage(ImageTypeSpecifier.java:925)
  at javax.imageio.ImageIO.getWriter(ImageIO.java:1592)
  at javax.imageio.ImageIO.write(ImageIO.java:1520)
  at org.datasyslab.babylon.extension.imageGenerator.BabylonImageGenerator.SaveRasterImageAsLocalFile(BabylonImageGenerator.java:35)
  at org.datasyslab.babylon.core.AbstractImageGenerator.SaveRasterImageAsLocalFile(AbstractImageGenerator.java:59)
  ... 42 elided
@geoHeil

This comment has been minimized.

Copy link
Author

commented May 1, 2017

I could track down the problem
new HeatMap(7000, 4900, envelope, false, 1, 2, 2, true, true) should render a distributed image, but shows the null pointer from above, and new HeatMap(7000, 4900, envelope, false, 2) works just fine

Where some more context is provided here. The commented out functions are the ones with the null pointer.

def buildHeatMap(outputPath: String, spatialRDD: SpatialRDD, envelope: Envelope): Boolean = {
    val s = spatialRDD.getRawSpatialRDD.rdd.sparkContext
    // TODO strange overhead for distributed image rendering. No task scheduled for 2min before something happens.
//    val visualizationOperator = new HeatMap(7000, 4900, envelope, false, 1, 2, 2, true, true)
        val visualizationOperator = new HeatMap(7000, 4900, envelope, false, 2)
    visualizationOperator.Visualize(s, spatialRDD)
    //    val imageGenerator = new BabylonImageGenerator
        imageGenerator.SaveRasterImageAsLocalFile(visualizationOperator.rasterImage, outputPath, ImageType.PNG)
//    imageGenerator.SaveRasterImageAsLocalFile(visualizationOperator.distributedRasterImage, outputPath, ImageType.PNG)
  }
@geoHeil

This comment has been minimized.

Copy link
Author

commented May 1, 2017

Though, still it seems to work https://github.com/DataSystemsLab/GeoSpark/blob/master/babylon/src/main/scala/org/datasyslab/geospark/showcase/ScalaExample.scala not sure what is different. As it must be an issue on my side I will close the issue.

@geoHeil geoHeil closed this May 1, 2017

@jiayuasu

This comment has been minimized.

Copy link
Member

commented May 1, 2017

@geoHeil OK, thanks. I will investigate more and optimize Babylon performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.