# References

* [opinionated scala](https://github.com/ghik/opinionated-scala)
* [small bites](http://matt.might.net/articles/learning-scala-in-small-bites/)
* [twitter scala school](https://twitter.github.io/scala_school/)     
* [twitter effective scala](http://twitter.github.io/effectivescala/)

# Types

__Symbols__

In [3]:
/*
Symbols are used where you have a closed set of identifiers that you want to be able to compare quickly (eg. eq)

When you have two String instances they are not guaranteed to be interned[1], so to compare them you must often 
check their contents by comparing lengths and even checking character-by-character whether they are the same.  [1] Interning
is a process whereby when you create an object, you check whether an equal one already exists, and use that one if it does.

With Symbol instances, comparisons are a simple eq check (i.e. == in Java), so they are constant time (i.e. O(1)) to look up.
*/
val aSymbol = 'foo
val aSecSymbol = 'bar
aSymbol == aSecSymbol

aSymbol: Symbol = 'foo
aSecSymbol: Symbol = 'bar
res2: Boolean = false


In [7]:
val aString = "foo"
val aSecString = "bar"
aSymbol == aSecSymbol

aString: String = foo
aSecString: String = bar
res4: Boolean = false


In [8]:
val aChar = 'f'
val aString = "f"

aChar: Char = f
aString: String = f


In [9]:
val a = "a|b|c"
a.split(raw"\|")
a.split("\\|")

a: String = a|b|c
res5: Array[String] = Array(a, b, c)


__Ranges__

In [13]:
val aRange1 = 1 to 5
val aRange2 = 1 until 5
val aRnage3 = 1 to 5 by 2
val aRange4 = 'a' to 'c'

aRange1: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5)
aRange2: scala.collection.immutable.Range = Range(1, 2, 3, 4)
aRnage3: scala.collection.immutable.Range = Range(1, 3, 5)
aRange4: scala.collection.immutable.NumericRange.Inclusive[Char] = NumericRange(a, b, c)


In [17]:
println(  (1 to 10).toList  )
println(  List.range(1,10)  )

List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
List(1, 2, 3, 4, 5, 6, 7, 8, 9)


__Functions__

In [51]:
// implicit
def incImplicit(x : Int ) = x + 1
// anonymous
val incAnonymous = (x : Int) => x + 1
// class with apply method:
class Identity {
  def apply(x : Int) = x + 1
}
val myId = new Identity
myId(123)

incImplicit: (x: Int)Int
incAnonymous: Int => Int = <function1>
defined class Identity
myId: Identity = Identity@440cdc92
res29: Int = 124


In [67]:
// Multi-argument functions:
def h(x : Int, y : Int) : Int = x + y
// A Curried multi-argument function:
def hC (x : Int) (y : Int) : Int = x + y

// Wrong: hC 3 4
hC (3) (4)
// Wrong: hC (3)
hC (3) _
// Wrong: hC _ (4)
hC (_:Int) (4)

val plus3 = hC (_:Int) (3) 
val plus_3 = hC (3) _
println(plus3(10)) 

13


h: (x: Int, y: Int)Int
hC: (x: Int)(y: Int)Int
plus3: Int => Int = <function1>
plus_3: Int => Int = <function1>


In [68]:
//Take any function of multiple arguments and curry it
val h_curry = (h _).curried
val h_addTwo = h_curry(2)
println(h_addTwo(3))

5


h_curry: Int => (Int => Int) = <function1>
h_addTwo: Int => Int = <function1>


In [54]:
// A procedure:
//Procedure syntax compiles to a method that returns unit
def proc(a : Int) { // Implicitly : Unit
  println("I'm a procedure.")
}
proc(10)

// An argument-less function:
def argless : Unit = println("argless got called!")
argless

// Lazy fields are argless functions that cache their result:
class LazyClass {
  lazy val x = { println("Evaluating x") ; 3 }
}
val lc = new LazyClass
println(lc.x)
println(lc.x)
println(lc.x)

I'm a procedure.
argless got called!
Evaluating x
3
3
3


proc: (a: Int)Unit
argless: Unit
defined class LazyClass
lc: LazyClass = LazyClass@11878370


In [69]:
//Variable-length parameters
def capitalizeAll(args: String*) = {
  args.map { arg =>
    arg.capitalize
  }
}
println( capitalizeAll("one","two","three") )

ArrayBuffer(One, Two, Three)


capitalizeAll: (args: String*)Seq[String]


__Objects and apply method__

[ref: stackoverflow](https://stackoverflow.com/questions/9737352/what-is-the-apply-function-in-scala)

Every function in Scala can be treated as an object and it works the other way too - every object can be treated as a function, provided it has the apply method

In [65]:
// define a function in scala
(x:Int) => x + 1
// assign an object representing the function to a variable
val f = (x:Int) => x + 1

//Since everything is an object in Scala f can now be treated as a reference to Function1[Int,Int] object. 
//For example, we can call toString method inherited from Any, that would have been impossible for a pure 
//function, because functions don't have methods:
f.toString

//Or we could define another Function1[Int,Int] object by calling compose method on f and chaining two 
//different functions together:
val f2 = f.compose((x:Int) => x - 1)

//Now if we want to actually execute the function, or as mathematician say "apply a function to its arguments"
//we would call the apply method on the Function1[Int,Int] object:
f2.apply(2)

//Scala compiler allows us to hide the apply call
f2(2)

f: Int => Int = <function1>
f2: Int => Int = <function1>
res39: Int = 2


In [None]:
//Many usage cases when we would want to treat an object as a function. The most common scenario is a factory pattern. 
List(1,2,3) // same as List.apply(1,2,3) but less clutter, functional notation

//Factory pattern in OOP language
List.instanceOf(1,2,3) 

__Empty__

[ref article](http://oldfashionedsoftware.com/2008/08/20/a-post-about-nothing/)

In [64]:
// **** NEVER USE NULL ****
//use this link for different approaches
//   https://alvinalexander.com/scala/scala-null-values-option-uninitialized-variables

In [24]:
// Null, null
//Null is a trait and null is its instance (the value of a reference that is not refering to any object)
def tryit(x:Null):Unit = {println("it worked!")}
tryit(null)

it worked!


tryit: (x: Null)Unit


In [26]:
// Nil
//an object that extends List[Nothing] 
Nil.length
Nil + "1,2,3"

res16: String = List()1,2,3


In [33]:
// Nothing
/*Nothing is another trait. It extends class Any. Any is the root type of the entire Scala type system.
**There are no instances of Nothing, but (here’s the tricky bit) Nothing is a subtype of everything. 
**Nothing is a subtype of List, it’s a subtype of String, it’s a subtype of Int, it’s a subtype of YourOwnCustomClass.

So Nothing is useful for defining base cases for collections
*/
val emptyStringList: List[String] = List[Nothing]()
emptyStringList + "sfd"

emptyStringList: List[String] = List()
res22: String = List()sfd


In [34]:
//It fails because although Nothing is a subtype of everything, it isn’t a superclass of anything and there are no instances of Nothing
val emptyStringList: List[String] = List[Nothing]("abc")

<console>: 24: error: type mismatch;

In [40]:
// None
/*What to do when you do not have a useful value to return?  Scala has a built-in solution to this problem. 
**If you want to return a String, for example, but you know that you may not be able to return a sensible value
**you can return an Option[String]
**
**Option is an abstract class with exactly two subclasses, class Some and object None. Those are the only two 
**ways to instantiate an Option. So getAStringMaybe returns either a Some[String] or None. Some and None are 
**case classes, so you can use the handy match/case construct to handle the result. None is object that signifies
**no result from the method.
*/
def getStringMaybe(num:Int):Option[String] = {
    if(num > 100) return Some("got a big number")
    else return None
}
def printResult(num:Int) = {
    getStringMaybe(num) match {
        case Some(str) => println(str)
        case None => println("did not get a big number!")
    }
}
printResult(10)

did not get a big number!


getStringMaybe: (num: Int)Option[String]
printResult: (num: Int)Unit


In [44]:
//If you simply returned null...
//this code isn't any worse than the Scala Option and match approach, but you did have to read the Javadoc to know this was needed.
def printResult(num:Int) = {
    val str = getStringMaybe(num) 
    str match {
        case null => println("did not get a big number!")
        case _ => println(str)
    }
}
printResult(10)

None


printResult: (num: Int)Unit


In [48]:
//cool use case
//if you need access to the exception (to discover why failed), then use Either, Left, and Right
def toInt(in: String): Option[Int] = {
    try {
        Some(Integer.parseInt(in.trim))
    } catch {
        case e: NumberFormatException => None
    }
}
val bag = List("1", "2", "foo", "3", "bar")
val sum = bag.flatMap(toInt).sum

toInt: (in: String)Option[Int]
bag: List[String] = List(1, 2, foo, 3, bar)
sum: Int = 6


In [41]:
// Unit
//Unit is the type of a method that doesn’t return a value of any sort. Sound familiar? It’s like a void return type in Java. 
def doThreeTimes(fn:Int=>Unit) = {
    fn(1); fn(2); fn(3);
}
doThreeTimes(println)

1
2
3


doThreeTimes: (fn: Int => Unit)Unit


# Case Matching

In [55]:
val y : Any = 10 
y match {
  case _ : String => println("It's a string.")
  case _ : Int => println("It's an integer.")
}

It's an integer.


y: Any = 10


In [56]:
// Case classes are matchable:
case class Pair(val x : Int, val y : String) 
val p = Pair(42,"foo")
p match {
  case Pair(43,"foo") => println("Not me.")
  case Pair(42,s) => println("It's " + s + ".") 
}

It's foo.


defined class Pair
p: Pair = Pair(42,foo)


In [70]:
//Matching on values
val times = 1
times match {
  case 1 => "one"
  case 2 => "two"
  case _ => "some other number"
}
//Matching with guards
times match {
  case i if i == 1 => "one"
  case i if i == 2 => "two"
  case _ => "some other number"
}

times: Int = 1
res44: String = one


In [None]:
//??? WTF ???

In [57]:
// Pattern-matchable lists can be created from scratch:
abstract class MyList[+A] {
  def :*: [B >: A] (head : B) = new `:*:`(head,this)
}
case class :*:[A](val head : A, val tail : MyList[A]) extends MyList[A] 
case object MyNil extends MyList[Nothing] 
val l : MyList[Int] = 3 :*: 4 :*: 5 :*: MyNil
l match {
  case hd :*: tl => println(hd)
}

3


defined class MyList
defined class $colon$times$colon
defined object MyNil
l: MyList[Int] = :*:(3,:*:(4,:*:(5,MyNil)))


# SortedMap Orderings

In [60]:
/*Sorted maps require a comparison procedure.  Sorted data structures will use an 'implicit' function for converting
**to Ordering if one is in scope.  If one is not in scope, it must be specified.
*/
import scala.collection.immutable.{SortedMap,TreeMap}
case class Person(val ssn : Int, val name : String) 

import scala.collection.immutable.{SortedMap, TreeMap}
defined class Person


In [61]:
val db1: SortedMap[Person,Symbol] = TreeMap[Person,Symbol]()

<console>: 27: error: No implicit Ordering defined for Person.

In [62]:
//Implicitly converts a Person to an Ordered[Person], using SSNs to compare.
implicit object OrderingBySSN extends Ordering[Person] {
  def compare (p1: Person, p2: Person): Int = p1.ssn - p2.ssn
}
val db1: SortedMap[Person,Symbol] = TreeMap[Person,Symbol]()
val db2 = db1 + ((Person(1,"Matt")) -> 'Chicken)
val db3 = db2 + ((Person(2,"Matt")) -> 'Mouse)
println(db3)

Map(Person(1,Matt) -> 'Chicken, Person(2,Matt) -> 'Mouse)


defined object OrderingBySSN
db1: scala.collection.immutable.SortedMap[Person,Symbol] = Map()
db2: scala.collection.immutable.SortedMap[Person,Symbol] = Map(Person(1,Matt) -> 'Chicken)
db3: scala.collection.immutable.SortedMap[Person,Symbol] = Map(Person(1,Matt) -> 'Chicken, Person(2,Matt) -> 'Mouse)


In [63]:
//Explicitly converts a Person to an Ordering[Person], using names to compare.
object OrderingByName extends Ordering[Person] {
  def compare (p1 : Person, p2 : Person) : Int = p1.name compare p2.name
}
val dbX: SortedMap[Person,Symbol] = TreeMap[Person,Symbol]()(OrderingByName)
val dbY = dbX + ((Person(1,"Matt")) -> 'Chicken)
val dbZ = dbY + ((Person(2,"Matt")) -> 'Mouse)
println(dbZ)

Map(Person(2,Matt) -> 'Mouse)


defined object OrderingByName
dbX: scala.collection.immutable.SortedMap[Person,Symbol] = Map()
dbY: scala.collection.immutable.SortedMap[Person,Symbol] = Map(Person(1,Matt) -> 'Chicken)
dbZ: scala.collection.immutable.SortedMap[Person,Symbol] = Map(Person(2,Matt) -> 'Mouse)


# Applications

In [59]:
/* An object with a main method is a program. */
object DemoApplication2 {
  def main (args : Array[String]) {
    println("Hello, World!") 
  }
}

defined object DemoApplication2


In [None]:
/* If an object extends Application, then the body of the object is
   effectively a script. */
object DemoApplication extends Application {
  /*
   Extending Application runs the entire program in the constructor
   for the object, which prevents the JVM from performing JIT
   optimizations.

   For large applications, use a main() method instead of extending
   Application.
  */
  println("Hello, World!")
}

# Collections and FP Combinators

In [86]:
Array(1,2,3)  //mutalbe

res59: Array[Int] = Array(1, 2, 3)


In [87]:
//In Java terms, Scala's Seq would be Java's List, and Scala's List would be Java's LinkedList
//List is fundamental for FP
List(1,2,3)   //immutable
 1 :: 2 :: 3 :: Nil

res60: List[Int] = List(1, 2, 3)


In [88]:
//You usually should use Seq as input parameter for method or class, defined for sequences in general (just general, not necessarily with generic):
//So now you can pass any sequence (like Vector or List) to mySort
/*
def mySort[T](seq: Seq[T]) = ...
case class Wrapper[T](seq: Seq[T]) 
implicit class RichSeq[T](seq: Seq[T]) { def mySort = ...}
*/
collection.mutable.Seq(1,2,3)
collection.immutable.Seq(1,2,3)  //default

In [89]:
Vector(1,2,3)   //used for parallel programming

res62: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3)


In [79]:
(1, 5)   //tuple

res53: (Int, Int) = (1,5)


In [80]:
1 -> 5   //tuple

res54: (Int, Int) = (1,5)


In [83]:
val num = Map(1->5, 2->6)

num: scala.collection.immutable.Map[Int,Int] = Map(1 -> 5, 2 -> 6)


In [84]:
num.get(3)   //returns Option[T]

res57: Option[Int] = None


# More Functions: Partial Functions and Composition

In [119]:
//Compose methods
def f(s:String) = "f(" + s + ")"
def g(s:String) = "g(" + s + ")"

f: (s: String)String
g: (s: String)String


In [120]:
val fComposeG = f _ compose g _

fComposeG: String => String = <function1>


In [121]:
// similar syntax
(f _).compose(g)

res82: String => String = <function1>


In [122]:
fComposeG("Yay")

res83: String = f(g(Yay))


In [107]:
//Compose functions... which have methods
val f = (s:String) => "f(" + s + ")"
val g = (s:String) => "g(" + s + ")"

f: String => String = <function1>
g: String => String = <function1>


In [109]:
f.andThen(g)

res74: String => String = <function1>


In [113]:
//Case statements
/* case statements are a subclass of function called a PartialFunction
** collection of multiple case statements are multiple PartialFunctions composed together
** isDefinedAt is a method on PartialFunction that can be used to determine if the PartialFunction will accept a given argument
** Note PartialFunction is unrelated to a partially applied function
*/
val one: PartialFunction[Int, String] = {case 1 => "one"}
println( one.isDefinedAt(1) )
println( one.isDefinedAt(2) )

true
false


one: PartialFunction[Int,String] = <function1>


In [114]:
one(1)

res78: String = one


In [116]:
val two: PartialFunction[Int, String] = { case 2 => "two" }
val partial = one orElse two
partial(2)

two: PartialFunction[Int,String] = <function1>
partial: PartialFunction[Int,String] = <function1>
res79: String = two


In [117]:
//filter takes a function. In this case a predicate function of (PhoneExt) => Boolean.
//A PartialFunction is a subtype of Function so filter can also take a PartialFunction!
case class PhoneExt(name: String, ext: Int)
val extensions = List(PhoneExt("steve", 100), PhoneExt("robey", 200))
extensions.filter { case PhoneExt(name, extension) => extension < 200 }

defined class PhoneExt
extensions: List[PhoneExt] = List(PhoneExt(steve,100), PhoneExt(robey,200))
res80: List[PhoneExt] = List(PhoneExt(steve,100))


Using spylon-kernel

ERROR: https://github.com/jupyter-scala/jupyter-scala/issues/191

In [None]:


val spark = SparkSession.builder()
.appName("Spark example")
.master("local[*]")
.config("option", "some-value")
.getOrCreate()

In [33]:
val spark = SparkSession.builder()
//.config("spark.serializer",classOf[KryoSerializer].getName)
//.config("spark.kryo.registrator", classOf[GeoSparkKryoRegistrator].getName)
.master("local[*]")
.appName("Bitre")
.getOrCreate()

<console>: 24: error: not found: value SparkSession

In [1]:
spark

Intitializing Scala interpreter ...

Spark Web UI available at http://f652f65de160:4040
SparkContext available as 'sc' (version = 2.3.1, master = local[*], app id = local-1535721809862)
SparkSession available as 'spark'


2018-08-31 13:23:21 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


res0: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@668644d


In [2]:
spark.version

res1: String = 2.3.1


In [3]:
val name = "Spark"
println(s"Hello, ${name}")

Hello, Spark


name: String = Spark


In [4]:
val data = Seq("a","b","c","d").zip(0 to 4)

data: Seq[(String, Int)] = List((a,0), (b,1), (c,2), (d,3))


In [5]:
println(data.getClass); 
println(data(0).getClass);
println( data(0)._1.getClass );

class scala.collection.immutable.$colon$colon
class scala.Tuple2
class java.lang.String


In [6]:
val ds = spark.createDataset(data)

ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]


In [7]:
case class DataRow(name:String, value:Int)

defined class DataRow


In [25]:
val ds2 = ds.map{ x:(String, Int) => DataRow(x._1,x._2)}
//val ds2 = ds.map{ case (a,b) => DataRow(a,b) }

ds2: org.apache.spark.sql.Dataset[DataRow] = [name: string, value: int]


In [9]:
println(ds2.getClass)
println(ds2.select("name").getClass)

class org.apache.spark.sql.Dataset
class org.apache.spark.sql.Dataset


In [10]:
ds2.printSchema

root
 |-- name: string (nullable = true)
 |-- value: integer (nullable = false)



In [11]:
ds2.createOrReplaceTempView("table")
val ds3 = spark.sql("SELECT * FROM table")

ds3: org.apache.spark.sql.DataFrame = [name: string, value: int]


In [27]:
//spark.sql("SELECT * FROM table").show()
ds2.toDF.createOrReplaceTempView("table2")

In [29]:
spark.sql("SELECT * FROM table2").show()

2018-08-31 13:34:30 ERROR Executor:91 - Exception in task 0.0 in stage 10.0 (TID 10)
java.lang.ClassCastException: $line12.$read$$iw$$iw$DataRow cannot be cast to $line12.$read$$iw$$iw$DataRow
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.deserializetoobject_doConsume_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
	at org.apache.spark.rdd.RDD

org.apache.spark.SparkException:  Job aborted due to stage failure: Task 0 in stage 10.0 failed 1 times, most recent failure: Lost task 0.0 in stage 10.0 (TID 10, localhost, executor driver): java.lang.ClassCastException: DataRow cannot be cast to DataRow