Links:
- The Scala home page
- Scala documentation
- Spark documentation
- Maven repository for Spark
- Scala tour webpage
Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It smoothly integrates features of object-oriented and functional languages.
- Scala code compiles to Java bytecode that can run on any Java Virtual Machine (JVM)
- Scala supports object-oriented and functional programming
- Scala is statically typed -- that is, the compiler performs data type checking (unlike dynamically typed languages like Python)
- Created by Martin Odersky and first released in 2004
Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
- Spark is written in Scala
- Spark is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries, and streaming
- Spark is designed to be fast and general-purpose
- Spark is designed to run on a cluster of machines and can be accessed through its own shell
To work with Scala and Spark, check the Spark latest version and Scala version compatibility. At the moment of writing this document, the latest Spark
version is 3.5.1
and it is compatible with Scala
2.12 and/or 2.13 and Java
8/11/17.
To work with Scala and Spark, you can use SBT
as a build tool. SBT is a build tool for Scala, Java, and more.
Once created a new project, add the following dependencies to build.sbt
file:
val sparkVersion = "3.5.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion
)
To run Spark
with Java
17 you need to add the following VM options:
--add-exports java.base/sun.nio.ch=ALL-UNNAMED
Add it to the configuration in the IDE.
Download Stock Market dataset from Kaggle.