# Image Classification with DJL Spark Support

In this example, we will use Jupyter Notebook to run image Classification with DJL Spark extension on Scala. To execute this Scala kernel successfully, you need to install [Almond](https://almond.sh/), a Scala kernel for Jupyter Notebook. Almond provide extensive functionalities for Scala and Spark applications.

[Almond installation instruction](https://almond.sh/docs/quick-start-install) (Note: only Scala 2.12 are tested)

After that, you can start with DJL's Scala notebook.


## Import dependencies

Firstly, let's import the depdendencies we need. We choose to use DJL PyTorch as our backend engine. You can also switch to MXNet by uncommenting the two lines for MXNet and comment PyTorch.

In [None]:
import $ivy.`org.apache.spark::spark-sql:3.3.2`
import $ivy.`ai.djl:api:0.27.0`
import $ivy.`ai.djl.spark:spark_2.12:0.27.0`
import $ivy.`ai.djl.pytorch:pytorch-model-zoo:0.27.0`
import $ivy.`ai.djl.pytorch:pytorch-native-cpu-precxx11:2.1.1`


Then we can import the packages we need to use. In the last two lines, we disabled the Spark logging in order to avoid polluting your cell outputs.

In [None]:
import org.apache.spark.sql.NotebookSparkSession
import ai.djl.spark.task.vision.ImageClassifier
import org.apache.spark.sql.SparkSession

import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.OFF) // avoid too much message popping out
Logger.getLogger("ai").setLevel(Level.OFF) // avoid too much message popping out

## Start Spark application

We can create a `NotebookSparkSession` through the Almond Spark plugin. It will internally apply all necessary jars to each of the worker node.

In [None]:
// Create Spark session
val spark = {
  NotebookSparkSession.builder()
    .master("local[*]")
    .getOrCreate()
}

// Use Pytorch precxx11
System.setProperty("PYTORCH_PRECXX11", "true")

Let's try to load the images from the local folder using Spark library:

In [None]:
var df = spark.read.format("image").option("dropInvalid", true).load("../image-classification/images")
df.printSchema()

Then we can run inference on these images. All we need to do is to create a `ImageClassifier` and run inference with DJL.

In [None]:
df = df.select("image.*").filter("nChannels=3") // The model expects RGB images
val classifier = new ImageClassifier()
  .setInputCols(Array("origin", "height", "width", "nChannels", "mode", "data"))
  .setOutputCol("prediction")
  .setEngine("PyTorch")
  .setModelUrl("djl://ai.djl.pytorch/resnet")
  .setTopK(2)
var outputDf = classifier.classify(df)
outputDf.select("origin", "prediction.top_k").show(truncate=false)