# K-means

# Image analysis

To carry out image analysis, it is recommended to convert the usual color formats (e.g. `RGB`, `CYMK`) to the `Luv* color space` as the close values in the Luv-space correspond more to visual perceptions of color proximity, as well adding the row and column indices (x,y). 

Each pixel is transformed to a 5-dimensional vector $(x,y,L, u, v)$ which is then input into the mean shift clustering.

Train on the picture on the [color image #124084](https://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/BSDS300/html/dataset/images/color/124084.html) from [Berkeley Segmentation Dataset and Benchmark repository](https://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/).

<img src="https://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/BSDS300/html/images/plain/normal/color/124084.jpg"/>

//%%bash
//wget https://sites.google.com/site/lebbah/datatp/124084-orig.jpg -O /tmp/124084-orig.jpg



In [None]:
%classpath add mvn com.github.haifengl smile-scala_2.11 1.5.3

In [None]:
%classpath add mvn org.apache.sanselan sanselan 0.97-incubator

In [None]:
%classpath add mvn org.apache.spark spark-mllib_2.11 2.4.4

In [None]:
%classpath add mvn org.apache.spark spark-sql_2.11 2.4.4
org.apache.log4j.Logger.getRootLogger().setLevel(org.apache.log4j.Level.ERROR);

In [None]:
import smile._
import smile.util._
import smile.math._
import smile.math.distance._
import smile.math.kernel._
import smile.math.matrix._
import smile.stat.distribution._
import smile.data._
import smile.interpolation._
import smile.validation._
import smile.association._
import smile.regression._
import smile.classification._
import smile.feature._
import smile.clustering._
import smile.vq._
import smile.manifold._
import smile.mds._
import smile.sequence._
import smile.projection._
import smile.nlp._
import smile.plot._
import java.awt.Color
import smile.wavelet._

In [None]:
import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
                        .appName("Simple Application")
                        .master("local[4]")
                        .config("spark.ui.enabled", "false")
                        .getOrCreate()
val sc = spark.sparkContext

In [None]:
import java.awt.image.BufferedImage
import org.apache.sanselan.color.ColorConversions._
import org.apache.sanselan.ImageInfo

In [None]:
def toLUV(bni:java.awt.image.BufferedImage, x:Int, y:Int) = {
  //val xyz = org.apache.commons.imaging.color.ColorConversions.convertRGBtoXYZ(bni.getRGB(x,y))
  val xyz = convertRGBtoXYZ(bni.getRGB(x,y))
  val luv = convertXYZtoCIELuv(xyz)
  (x+1, y+1, luv.L, luv.u, luv.v)
}
def toRGB(L:Double, u:Double, v:Double) = {
  val xyz = convertCIELuvtoXYZ(L, u, v)
  val rgb = convertXYZtoRGB(xyz)
    //ajout pour coriger la fonction
  val javaRGB = new java.awt.Color(rgb,true)
  javaRGB
}

In [None]:
toRGB(22.37602962827482,55.415292981346056,16.03484906671305)

In [None]:
convertCIELuvtoXYZ(22.37602962827482,55.415292981346056,16.03484906671305)

In [None]:
convertRGBtoXYZ(-14603485)

## Build data <small>(optional)</small>

The data has to be given to the model training in _Luv_ colorspace, an usual way to represent the data then is CSV where the columns are $X$, $Y$, $L$, $u$, $v$.

In [None]:
// to indicate your local path 
val imgpath = "124084-orig.jpg"

In [None]:
val bni = javax.imageio.ImageIO.read(new java.io.File(imgpath))

In [None]:
val h = bni.getHeight
val w = bni.getWidth
(h, w)

In [None]:
val luvs = for {
  r <- 0 until h
  c <- 0 until w
} yield toLUV(bni, c, r)

In [None]:
import org.apache.spark._
import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
import org.apache.spark.mllib.linalg.Vectors

In [None]:
import spark.implicits._

In [None]:
val luvsDF = luvs.toVector.toSeq.toDF("x","y","l","u","v")

In [None]:
luvsDF.show

# K-Means

In [None]:
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.feature.StringIndexer

val assembler = new VectorAssembler()
  .setInputCols(Array("l", "u", "v"))
  .setOutputCol("features")

val output = assembler.transform(luvsDF)
//println("Assembled columns 'hour', 'mobile', 'userFeatures' to vector column 'features'")
output.select("features","x","y").show(false)

In [None]:
import org.apache.spark.ml.clustering.KMeans
import org.apache.spark.ml.evaluation.ClusteringEvaluator

In [None]:
val kmeans = new KMeans().setK(5).setSeed(1L)
val model = kmeans.fit(output)
val predictions = model.transform(output)
val ListLabel =predictions.select("prediction").map(f=>f.getInt(0))
                 .collect.toArray[Int]

In [None]:
model.clusterCenters

In [None]:
val df = predictions.select("x","y")
df.show

In [None]:
val assembler = new VectorAssembler()
  .setInputCols(Array("x", "y"))
  .setOutputCol("coord")

val output = assembler.transform(df)
//println("Assembled columns 'hour', 'mobile', 'userFeatures' to vector column 'features'")
output.show(false)

In [None]:
val df1 = output.select("coord")
df1.show

In [None]:
val dfPlot = df1.collect.map(_.mkString(",").split('[')(1).split(']')(0).split(",").map(_.toDouble))

In [None]:
val pl = plot(dfPlot, ListLabel, '.', Palette.COLORS).canvas

In [None]:
//test
val truc = toRGB(model.clusterCenters(1)(0),model.clusterCenters(1)(1),model.clusterCenters(1)(2))

In [None]:
//creation d'une palette java.awt.Color
var customPalette = Array[java.awt.Color]()
for(i <- 0 to model.clusterCenters.length-1){
    var tempRGB = toRGB(model.clusterCenters(i)(0),model.clusterCenters(i)(1),model.clusterCenters(i)(2))
    customPalette = customPalette :+ tempRGB
}
customPalette

In [None]:
val canvasfinal = plot(dfPlot, ListLabel, '.',customPalette).canvas