# The Onion Partitioning

The idea is to load raw 3D data from a file, and re-partition the space according to the distance to the center of the space.
Let's first load the needed dependencies for this exercise

In [1]:
// Package to read data from FITS file
%AddDeps com.github.astrolabsoftware spark-fits_2.11 0.6.0

// Smile provides visualisation tools
%AddDeps com.github.haifengl smile-plot 1.5.1
%AddDeps com.github.haifengl smile-math 1.5.1
%AddDeps com.github.haifengl smile-core 1.5.1
%AddDeps com.github.haifengl smile-scala_2.11 1.5.1

// Contains extensions to the Swing GUI toolkit
%AddDeps org.swinglabs swingx 1.6.1

// Add the spark3d JAR. To generate it, run `sbt ++2.11.8 package` at the root of the package
%AddJar file:/Users/julien/Documents/workspace/myrepos/spark3D/target/scala-2.11/spark3d_2.11-0.2.2.jar

// Add healpix JAR
%AddJar file:/Users/julien/Documents/workspace/myrepos/spark3D/lib/jhealpix.jar

Marking com.github.astrolabsoftware:spark-fits_2.11:0.6.0 for download
Preparing to fetch from:
-> file:/var/folders/my/lfvl285927q2hzk545f39sy40000gn/T/toree_add_deps7604199992315755517/
-> https://repo1.maven.org/maven2
-> New file at /var/folders/my/lfvl285927q2hzk545f39sy40000gn/T/toree_add_deps7604199992315755517/https/repo1.maven.org/maven2/com/github/astrolabsoftware/spark-fits_2.11/0.6.0/spark-fits_2.11-0.6.0.jar
Marking com.github.haifengl:smile-plot:1.5.1 for download
Preparing to fetch from:
-> file:/var/folders/my/lfvl285927q2hzk545f39sy40000gn/T/toree_add_deps7604199992315755517/
-> https://repo1.maven.org/maven2
-> New file at /var/folders/my/lfvl285927q2hzk545f39sy40000gn/T/toree_add_deps7604199992315755517/https/repo1.maven.org/maven2/com/github/haifengl/smile-plot/1.5.1/smile-plot-1.5.1.jar
Marking com.github.haifengl:smile-math:1.5.1 for download
Preparing to fetch from:
-> file:/var/folders/my/lfvl285927q2hzk545f39sy40000gn/T/toree_add_deps7604199992315755517/
-> htt

# From raw data RDD to Point3D RDD

Load data from the test file provided in the spark3d repo.
Our raw data contains points with 3D coordinates (spherical: r, theta, phi). Let's transform it into a Point3D RDD

In [2]:
import com.astrolabsoftware.spark3d.spatial3DRDD._
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("OnionSpace").getOrCreate()

val fn = "../../src/test/resources/astro_obs.fits"
val columns = "Z_COSMO,RA,DEC"
val spherical = true
val options = Map("hdu" -> "1")

// Load the data
val pointRDD = new Point3DRDD(spark, fn, columns, spherical, "fits", options)

# Repartitioning of the space

By default, the pointRDD is partitioned randomly (i.e. Spark made partition regardless to the content of the file).
Let's repartition our data based on the distance to the center (Onion Space).

In [3]:
import com.astrolabsoftware.spark3d.utils.GridType

// As we are in local mode, and the file is very small, the RDD pointRDD has only 1 partition.
// For the sake of this example, let's increase the number of partition to 5.
val pointRDD_part = pointRDD.spatialPartitioning(GridType.LINEARONIONGRID, 5)

Let see how our space is now partitioned:

In [4]:
val partitionsAfter = pointRDD_part.mapPartitions(
    iter => Array(iter.size).iterator, true)

// This is the number of objects per partition. 
println(partitionsAfter.collect().toList)

List(4142, 3959, 4053, 3860, 3986)


# Visualize the partitioning

Let's plot the partitioning!

In [5]:
import smile.plot._
import java.awt.Color
import java.awt.{GridLayout, Dimension}

import javax.swing.JFrame
import javax.swing.JPanel

import com.astrolabsoftware.spark3d.utils.Utils.sphericalToCartesian
import org.apache.spark.rdd.RDD
import com.astrolabsoftware.spark3d.geometryObjects._

/** 
  * Define palette of colors 
  *
  * @return (Array[java.awt.Color]) Colors for each partition
  */
def colors : Array[java.awt.Color] = {
    Array(Color.BLACK, Color.RED, Color.GREEN, Color.BLUE,
          Color.ORANGE, Color.YELLOW, Color.DARK_GRAY, Color.PINK,
          Color.MAGENTA, Color.CYAN)
}

/** 
  * format the data for smile.
  * The data for ScatterPlot must be Array[Array[Double]] (=Array[point3d])
  * We add one more dimension which is the partition.
  *
  * @param rdd : (RDD[Point3D])
  *   RDD whose elements are Point3D instances.
  * @return (Array[Array[Array[Double]]]) data as partitions -> points -> point -> coordinate 
  * 
  */
def format_data_for_smile(rdd: RDD[Point3D]) : Array[Array[Array[Double]]] = {
    rdd.map(
        x=> sphericalToCartesian(x).center.getCoordinate.toArray)
    .glom.collect().toArray
}

/** 
  * Show or save the results.
  * 
  * @param display : (String)
  *   Either show or save. If save, extension will be given in the outname.
  * @param rdd : (RDD[Point3D])
  *   RDD whose elements are instances of Point3D
  * @param outname : (String)
  *   If save mode, name (incl. extenstion) for the out file.
  * @param title : (String)
  *   Title of the window.
  *
  */
def MyScatterPlot(display: String, rdd: RDD[Point3D], 
                outname: String, title: String) : Unit = {
    
    // Re-arange the data for plotting
    val data = format_data_for_smile(rdd)
    
    // Plot the results
    val window = ScatterPlot.plot(data(0), '.', colors(0))
    for (part <- 1 to data.size - 1) {
      window.points(data(part), '.', colors(part))
    }
    display match {
      case "show" => {
        val partFrame = new JFrame(title)
        partFrame.setLocationRelativeTo(null)
        partFrame.getContentPane().add(window)
        partFrame.setVisible(true)
        partFrame.setSize(new Dimension(500, 500))
      }
      case "save" => {
        val partHeadless = new Headless(window);
        partHeadless.pack();
        partHeadless.setVisible(true);
        partHeadless.setSize(new Dimension(500, 500))
        window.save(new java.io.File(outname))
      }
      case _ => throw new AssertionError("""
        I do not understand the kind of display you want.
        Choose between "show" and "save".
        """)
    }
}

// Set to "show" or "save"
val display = "show"

// Display the result
MyScatterPlot(display, pointRDD.rawRDD.repartition(5), "myOnionFigRaw.png", "Raw data")
MyScatterPlot(display, pointRDD_part, "myOnionFig.png", "Partitioned data")


Raw partitioning             |  Onion Partitioning
:-------------------------:|:-------------------------:
![title](images/myOnionFigRaw.png)  |   ![title](images/myOnionFig.png)

Et voilà!