In [1]:
println("Hello World") // make sure we're in a spark kernel

VBox()

Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
2,application_1551205501504_0003,spark,idle,Link,Link,✔


SparkSession available as 'spark'.
Hello World


# Scala for Spark - Assignment

Learning Scala the hard way.

## Part 1: The Basics

In [2]:
/*
  Try the REPL

  Scala has a tool called the REPL (Read-Eval-Print Loop) that is analogous to
  commandline interpreters in many other languages. You may type any Scala
  expression, and the result will be evaluated and printed.  

  The REPL is a very handy tool to test and verify code.  Use it as you read
  this tutorial to quickly explore concepts on your own.
*/

VBox()

In [3]:
// single line comments start with two forward slashes

/*
Multi-line comments look like this
*/

VBox()

In [4]:
// printing and forcing a new line
println("Hello world")
println(10)

VBox()

Hello world
10


In [5]:
//printing on the same line
print("Hello world")
print(10)

VBox()

Hello world10

In [6]:
/*
Scala is a statistically typed language, yet note that in the above declarations,
we did not specify a type. THis is due to a language feature called type inference.
In most cases, Scala compiler can guess what type of a variable is, so you don't have to type it every time.
We can explicitly declare the type of a variable like so:
*/
val z: Int = 10
val a: Double = 1.0

// notice automatic convertion from int to double, result is 10.0 not 10
val b: Double = 10

VBox()

z: Int = 10
a: Double = 1.0
b: Double = 10.0


In [7]:
// boolean values
true
false

// boolean operations
!true
!false
true == false
10 > 5

VBox()

res18: Boolean = true
res19: Boolean = false
res22: Boolean = false
res23: Boolean = true
res24: Boolean = false
res25: Boolean = true


In [8]:
// math operations
1 + 1 // add
2 - 1 // subtract
5 * 3 // multiply
6 / 2 // whole number division
6.0 / 4 // float division

// Evaluating an expression in the REPL gives you the type and value of the result

1 + 7

/* The above line results in:

  scala> 1 + 7
  res29: Int = 8

  This means the result of evaluating 1 + 7 is an object of type Int with a
  value of 8
*/

VBox()

res27: Int = 2
res28: Int = 1
res29: Int = 15
res30: Int = 3
res31: Double = 1.5
res35: Int = 8


In [9]:
// strings
"Scala strings are surrounded by double quotes"
'a' // a single scala char
// 'Single quote strings don't exist' <- gives an error

// strings have the usual Java methods defined on them
"hello world".length
"hello world".substring(2,6)
"hello world".replace("C", "3")

// they also have extra scala methods
"hello world".take(5)
"hello world".drop(5)

// string interpolation, notice the prefix 's'
val n = 45
s"We have $n apples" // -> we have 45 apples

// expressions inside interpolated strings are also possible
s"Power of 2: ${math.pow(2,2)})"

// some characters need to be "escaped", e.g. a double quote inside a string
"They stood outside the \"Rose and Crown\""

VBox()

res40: String = Scala strings are surrounded by double quotes
res41: Char = a
res45: Int = 11
res46: String = "llo "
res47: String = hello world
res50: String = hello
res51: String = " world"
n: Int = 45
res54: String = We have 45 apples
res57: String = Power of 2: 4.0)
res60: String = They stood outside the "Rose and Crown"


## Part 2: Functions

In [15]:
/*
Functions are defined like so:

def functionName(args ...): ReturnType = {body...}

If you come from more traditional programming languages, notice the omission of the
return keyword. In Scala, the last expression in the function block
is the return value
*/
def sumOfSquares(x: Int, y: Int): Int = {
    val x2 = x * x
    val y2 = y * y
    x2 + y2
}

// the { } can be omitted if the function body is a single expression
def sumOfSquaresShort(x: Int, y: Int): Int = x * x + y * y

/// syntax for calling functions is familiar
sumOfSquares(3, 4)

// you can use parameter names to specify them in a different order
def subtract(x: Int, y: Int): Int = x - y

subtract(10, 3)
subtract(y=10, x=3)

/*
In most cases (recursive functions being the most notable exception), function
return type can be omitted. The same type inference we saw with variables will
work with function return values
*/
def sq(x: Int) = x * x

// functions can have default parameters
def addWithDefault(x: Int, y: Int = 5) = x + y
addWithDefault(1, 2)
addWithDefault(1)

// anonymous functions look like this
(x: Int) => x * x

/*
If each argument in an anonymous function is only used once
Scala gives you an even shorter way to define them. These
anonymous functions turn out to be extremely common, as will
be obvious in the data structure section
*/
val addOne: Int => Int = _ + 1
val weirdSum: (Int, Int) => Int= (_ * 2 + _ * 3)

addOne(5)
weirdSum(2, 4)

VBox()

sumOfSquares: (x: Int, y: Int)Int
sumOfSquaresShort: (x: Int, y: Int)Int
res160: Int = 25
subtract: (x: Int, y: Int)Int
res164: Int = 7
res165: Int = -7
sq: (x: Int)Int
addWithDefault: (x: Int, y: Int)Int
res170: Int = 3
res171: Int = 6
res174: Int => Int = <function1>
addOne: Int => Int = <function1>
weirdSum: (Int, Int) => Int = <function2>
res178: Int = 6
res179: Int = 16


## Part 3: Flow Control

In [19]:
1 to 5
val r = 1 to 5
r.foreach(println)

// NB: Scala is quite lenient when it comes to dots and brckets - study
// the rules separately. This helps write DSLs and APIs that read like English

// Why doesn't println need any parameters here?
// Stay tuned for first-class functions in the Functional programmig section below
(5 to 1 by -1) foreach (println)

// recursion is the idiomatic way of repeating an action in Scala (as in most
// other functiona languages).
// recursive functions need an explicit return type, the compiler can't infer it
// here, it's Unit, which is analagous to a 'void' return type in Java
def showNumbersInRange(a: Int, b: Int): Unit = {
    print (a)
    if (a < b)
        showNumbersInRange(a + 1, b)
}
showNumbersInRange(1, 14)

// conditionals
val x = 10

if (x == 1) println("yeah")
if (x == 10) println("yeah")
if (x == 11) println("yeah")
if (x == 11) println("yeah") else println("nay")

println(if (x == 10) "yeah" else "nope")
val text = if (x == 10) "yeah" else "nope"

VBox()

res206: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5)
r: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5)
1
2
3
4
5
5
4
3
2
1
showNumbersInRange: (a: Int, b: Int)Unit
1234567891011121314x: Int = 10
yeah
nay
yeah
text: String = yeah


## Part 4: Data Structures

In [26]:
// arrays
val a = Array(1, 2, 3, 5, 8, 13)
a(0)
a(3)
// a(21) // throws an exception

// sets
val s = Set(1, 3, 7)
s(0) // boolean = false
s(1) // boolean = true

// tuples
(1, 2)
(4, 3, 2)
(1, 2, "three")
(a, 2, "three")

// function to divide integers and store remainder
val divideInts = (x: Int, y: Int) => (x / y, x % y)

// to access the elements of an tuple, use _._n where n is the 1 based
// index of the element
val d = divideInts(10, 3)

d._1
d._2

// alternatively, you can do multiple variable assignment to tuple,
// which is more convenient and readable in many cases
val (div, mod) = divideInts(10, 3)

div 
mod

VBox()

a: Array[Int] = Array(1, 2, 3, 5, 8, 13)
res311: Int = 1
res312: Int = 5
s: scala.collection.immutable.Set[Int] = Set(1, 3, 7)
res316: Boolean = false
res317: Boolean = true
res320: (Int, Int) = (1,2)
res321: (Int, Int, Int) = (4,3,2)
res322: (Int, Int, String) = (1,2,three)
res323: (Array[Int], Int, String) = (Array(1, 2, 3, 5, 8, 13),2,three)
divideInts: (Int, Int) => (Int, Int) = <function2>
d: (Int, Int) = (3,1)
res330: Int = 3
res331: Int = 1
div: Int = 3
mod: Int = 1
res336: Int = 3
res337: Int = 1


## Part 7: Functional Programming

In [28]:
// Scala allows methods and funtions to return, or take as parameters, other
// functions or methods

val add10: Int => Int = _ + 10 // a function taking in an int and returning an int
List(1, 2, 3) map add10 // add ten to each element

// anonymous functions can be used in place of named functions
List(1, 2, 3) map (x => x + 10)

// and the underscore symbol can be used if there is just one argument to the
// anonymous function. It gets bound as the variable
List(1, 2, 3) map (_ + 10)

// if the anonymous block AND the funtion you are applying both take one
// argument, you can even omit the underscore
List("Dom", "Bob", "Natalia") foreach println

VBox()

add10: Int => Int = <function1>
res345: List[Int] = List(11, 12, 13)
res348: List[Int] = List(11, 12, 13)
res352: List[Int] = List(11, 12, 13)
Dom
Bob
Natalia


In [34]:
// combinators (using s from above)
s.map(sq)

val sSquared = s.map(sq)
sSquared.filter(_ < 10)
sSquared.reduce(_+_)

// the filter function takes a predicate (a function from A -> boolean) and
// selects all elements which satisfy the predicate
List(1, 2, 3) filter (_ > 2)

case class Person(name: String, age: Int)
List(
    Person(name = "Dom", age = 23),
    Person(name = "Bob", age = 30)
).filter(_.age > 25)

// certain collections (such as List) in Scala have a foreach method,
// which takes as an argument a type returning Unit, that is, a void method
val aListOfNumbers = List(1, 2, 3, 4, 10, 20, 100)
aListOfNumbers foreach (x => println(x))
aListOfNumbers foreach println

VBox()

res408: scala.collection.immutable.Set[Int] = Set(1, 9, 49)
sSquared: scala.collection.immutable.Set[Int] = Set(1, 9, 49)
res410: scala.collection.immutable.Set[Int] = Set(1, 9)
res411: Int = 59
res415: List[Int] = List(3)
defined class Person
res417: List[Person] = List(Person(Bob,30))
aListOfNumbers: List[Int] = List(1, 2, 3, 4, 10, 20, 100)
1
2
3
4
10
20
100
1
2
3
4
10
20
100


## Part 9: Misc

In [35]:
// importing things
import scala.collection.immutable.List

// import all sub packages
import scala.collection.immutable._

// import multiple classes in one statement
import scala.collection.immutable.{List, Map}

// rename and import using =>
import scala.collection.immutable.{List => ImmutableList}

VBox()

import scala.collection.immutable.List
import scala.collection.immutable._
import scala.collection.immutable.{List, Map}
import scala.collection.immutable.{List=>ImmutableList}
