# The Onion Partitioning

The idea is to load raw 3D data from a file, and re-partition the space according to the distance to the center of the space.
Let's first load the needed dependencies for this exercise

In [1]:
// Package to read data from FITS file
%AddDeps com.github.JulienPeloton spark-fits_2.11 0.3.0

// Smile provides visualisation tools
%AddDeps com.github.haifengl smile-plot 1.5.1
%AddDeps com.github.haifengl smile-math 1.5.1
%AddDeps com.github.haifengl smile-core 1.5.1
%AddDeps com.github.haifengl smile-scala_2.11 1.5.1

// Contains extensions to the Swing GUI toolkit
%AddDeps org.swinglabs swingx 1.6.1

// Add the spark3d JAR. To generate it, run `sbt ++2.11.8 package at the root of the package`
%AddJar file:/Users/julien/Documents/workspace/myrepos/spark3D/target/scala-2.11/spark3d_2.11-0.1.0.jar

// Add healpix JAR
%AddJar file:/Users/julien/Documents/workspace/myrepos/spark3D/lib/jhealpix.jar

Marking com.github.JulienPeloton:spark-fits_2.11:0.3.0 for download
Preparing to fetch from:
-> file:/var/folders/my/lfvl285927q2hzk545f39sy40000gn/T/toree_add_deps5357257577699850268/
-> https://repo1.maven.org/maven2
-> New file at /var/folders/my/lfvl285927q2hzk545f39sy40000gn/T/toree_add_deps5357257577699850268/https/repo1.maven.org/maven2/com/github/JulienPeloton/spark-fits_2.11/0.3.0/spark-fits_2.11-0.3.0.jar
Marking com.github.haifengl:smile-plot:1.5.1 for download
Preparing to fetch from:
-> file:/var/folders/my/lfvl285927q2hzk545f39sy40000gn/T/toree_add_deps5357257577699850268/
-> https://repo1.maven.org/maven2
-> New file at /var/folders/my/lfvl285927q2hzk545f39sy40000gn/T/toree_add_deps5357257577699850268/https/repo1.maven.org/maven2/com/github/haifengl/smile-plot/1.5.1/smile-plot-1.5.1.jar
Marking com.github.haifengl:smile-math:1.5.1 for download
Preparing to fetch from:
-> file:/var/folders/my/lfvl285927q2hzk545f39sy40000gn/T/toree_add_deps5357257577699850268/
-> https://r

# From raw data RDD to Point3D RDD

Load data from the test file provided in the spark3d repo.
Our raw data contains points with 3D coordinates (spherical: r, theta, phi). Let's transform it into a Point3D RDD

In [2]:
import com.spark3d.spatial3DRDD._
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("OnionSpace").getOrCreate()

val fn = "../../src/test/resources/astro_obs.fits"
val hdu = 1
val columns = "Z_COSMO,RA,DEC"
val spherical = true

// Load the data
val pointRDD = new Point3DRDDFromFITS(spark, fn, hdu, columns, spherical)

# Repartitioning of the space

By default, the pointRDD is partitioned randomly (i.e. Spark made partition regardless to the content of the file).
Let's repartition our data based on the distance to the center (Onion Space).

In [3]:
import com.spark3d.utils.GridType

// As we are in local mode, and the file is very small, the RDD pointRDD has only 1 partition.
// For the sake of this example, let's increase the number of partition to 10.
val pointRDD_part = pointRDD.spatialPartitioning(GridType.LINEARONIONGRID, 10)

Let see how our space is now partitioned:

In [4]:
val partitionsAfter = pointRDD_part.mapPartitions(iter => Array(iter.size).iterator, true).collect()

// This is the number of objects per partition. 
// Last partition is 0 to include points outside (will change in the next release).
println(partitionsAfter.toList)

List(2104, 2038, 1985, 1974, 2027, 2026, 1898, 1962, 1974, 2012, 0)


# Visualize the partitioning

Let's plot the partitioning!

In [16]:
import smile.plot._
import java.awt.Color
import java.awt.{GridLayout, Dimension}

import javax.swing.JFrame
import javax.swing.JPanel

import com.spark3d.utils.Utils.sphericalToEuclidean

// Set to "show" or "save"
val display = "show"

// Define palette of colors
val colors = Array(Color.BLACK, Color.RED, Color.GREEN, Color.BLUE,
  Color.PINK, Color.YELLOW, Color.DARK_GRAY, Color.ORANGE,
  Color.MAGENTA, Color.CYAN)

// Re-arange the data for plotting
val rawData = pointRDD.rawRDD.repartition(10).map(
  x=> sphericalToEuclidean(x).center.getCoordinate.toArray).glom.collect().toArray
val partData = pointRDD_part.map(
  x=> sphericalToEuclidean(x).center.getCoordinate.toArray).glom.collect().toArray

// Plot the results
val rawWindow = ScatterPlot.plot(dataRaw(0), '.', colors(0))
for (part <- 1 to dataRaw.size - 2) {
  rawWindow.points(dataRaw(part), '.', colors(part))
}
val partWindow = ScatterPlot.plot(data(0), '.', colors(0))
for (part <- 1 to data.size - 2) {
  partWindow.points(data(part), '.', colors(part))
}

// Display the result
display match {
  case "show" => {
    val partFrame = new JFrame("Partitioned data")
    partFrame.setLocationRelativeTo(null)
    partFrame.getContentPane().add(partWindow)
    partFrame.setVisible(true)
    partFrame.setSize(new Dimension(500, 500))
      
    val rawFrame = new JFrame("Raw data")
    rawFrame.setLocationRelativeTo(null)
    rawFrame.getContentPane().add(rawWindow)
    rawFrame.setVisible(true)
    rawFrame.setSize(new Dimension(500, 500))
  }
  case "save" => {
    val partHeadless = new Headless(partWindow);
    partHeadless.pack();
    partHeadless.setVisible(true);
    partHeadless.setSize(new Dimension(500, 500))
    partWindow.save(new java.io.File("myOnionFig.png"))
      
    val rawHeadless = new Headless(rawWindow);
    rawHeadless.pack();
    rawHeadless.setVisible(true);
    rawHeadless.setSize(new Dimension(500, 500))
    rawWindow.save(new java.io.File("myOnionFigRaw.png"))
  }
  case _ => throw new AssertionError("""
    I do not understand the kind of display you want.
    Choose between "show" and "save".
    """)
}


Raw partitioning             |  Onion Partitioning
:-------------------------:|:-------------------------:
![title](myOnionFigRaw.png)  |   ![title](myOnionFig.png)

Et voilà!