# Basic Spark example

Almond comes with a Spark integration module based on [ammonite-spark](https://github.com/alexarchambault/ammonite-spark).
To use it, we have to import the *almond-spark* dependency as well as Spark 2.x itself.

*ammonite-spark* handles loading Spark in a clever way, and does not rely on a specific Spark distribution.
Because of that, you can load the Spark 2.x version of your choice. The only limitation is that the Scala version of Spark and the running Almond kernel must match.
For Scala 2.12, at least Spark 2.4.0 is required.

For more information, see the [README](https://github.com/alexarchambault/ammonite-spark/blob/master/README.md) of ammonite-spark.

In [None]:
import $ivy.`org.apache.spark::spark-sql:2.4.0` // Or use any other 2.x version here
import $ivy.`sh.almond::almond-spark:0.2.3-SNAPSHOT`

In [None]:
// silence logging
import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.OFF)

In [None]:
import org.apache.spark.sql._

Now we can create a `SparkSession` using the builder provided by *almond-spark*.

In [None]:
val spark = {
  NotebookSparkSession.builder()
    .master("local[*]")
    .getOrCreate()
}

Of course you can also connect to a real cluster. *ammonite-spark* currently supports standalone and *yarn* clusters. See its [README](https://github.com/alexarchambault/ammonite-spark/blob/master/README.md) for details.

Now we can get a `SparkContext` from our `SparkSession`.

In [None]:
def sc = spark.sparkContext

And then create an `RDD` and run some calculations.

In [None]:
val rdd = sc.parallelize(1 to 100000000, 100)

And now you should see a progress bar, showing the progress of the running Spark job.

In [None]:
val n = rdd.map(_ + 1).sum()

In [None]:
val n = rdd.map(n => (n % 10, n)).reduceByKey(_ + _).collect()