# Scala
(The dcumentation on Scala mentioned in this notebook is taken from - https://docs.scala-lang.org/overviews/scala-book/functional-programming.html and from https://learnxinyminutes.com/docs/scala/)

Scala lets you write code in an object-oriented programming (OOP) style, a functional programming (FP) style, and even in a hybrid style, using both approaches in combination.


# Functional Programming
Functional programming is a style of programming that emphasizes writing applications using only pure functions and immutable values. 

## Pure Functions
A function is a pure function if - 
 - The function’s output depends only on its input variables
 - It doesn’t mutate any hidden state
 - It doesn’t have any “back doors”: It doesn’t read data from the outside world (including the console, web   services, databases, files, etc.), or write data to the outside world

In other words - 
A pure function is a function that depends only on its declared inputs and its internal algorithm to produce its output. It does not read any other values from “the outside world” — the world outside of the function’s scope — and it does not modify any values in the outside world.


For Example Math functions

In [1]:
val r = Math.sqrt(4)

Intitializing Scala interpreter ...

Spark Web UI available at http://zipcodes-mbp-4.lan:4041
SparkContext available as 'sc' (version = 2.4.5, master = local[*], app id = local-1595711919186)
SparkSession available as 'spark'


r: Double = 2.0


## Impure Functions
Impure functions do one or more of these things:
 - Read hidden inputs, i.e., they access variables and data not explicitly passed into the function as input parameters
 - Write hidden outputs
 - Mutate the parameters they are given
 - Perform some sort of I/O with the outside world
 
For example any function that returns unit is an impure function .. like "foreach" method. This method is only used for side effects like printing out. It only returns type "Unit". For that matter any method that returns type "Unit" is an impure function.
 
Impure functions are also required in programming. A common recommendation is to write the core of your application using pure functions, and then to use impure functions to communicate with the outside world.

## REPL
REPL stands for Read Eval Print Loop.
It is a command line interface where you may type any Scala expression, and the result will be evaluated and printed. This is good tool to quickly try and experiment scala expressions.

### Print
To print you can use println or print.
println will force a new line for the next print wheras print statement will not force a  new line

In [18]:
// Printing, and forcing a new line on the next print
println("Line 1")
println("Line 2 printed on a new line")

// Printing, without forcing a new line on next print
print("Line 1 ")
print("Line 2 printed on the same line")


Line 1
Line 2 printed on a new line
Line 1 Line 2 printed on the same line

### Comments
Single line can be commented using double slash //

Multiline can be commented by /* and */ similar to java

### Decalre Variables
In scala variables can be declared as val or var

 - variables decalred with val are immutable
 - variables declared with var are mutable

In [21]:
//var variblaes can be changed
var x = 5
println(x)
x = 6
print(x)

5
6

x: Int = 6
x: Int = 6


In [22]:
//changing a val variable will result in an error
val x = 5
println(x)
x = 6
println(x)

<console>: 28: error: reassignment to val

### Data Types and Structures


Scala is a statically typed language, yet
we do not need to specify a type. This is due to a language feature called type
inference. In most cases, Scala compiler can guess what the type of a variable
is, so you don't have to type it every time. 


Data types are
 - Int
 - Double
 - String (strings are surrounded by double quotes or triple quotes for multilines)
 - true
 - false
 
Data Structures are 
 - Array
 - Tuple

## Functions
In Scala there is no return command in functions. The last statement in a function acts as the return statement
The functions are defined as -

### def functionname (arguments) : returntype = {body of the function}

we can skip {} if the body of the function is a single expression

In [26]:
def doubler(x: Int): Int = {
    x * 2
}

double: (x: Int)Int


In [27]:
doubler(2)

res13: Int = 4


In [1]:
//simple code Hello World
println("Hello World")

Intitializing Scala interpreter ...

Spark Web UI available at http://zipcodes-mbp-4.lan:4040
SparkContext available as 'sc' (version = 2.4.5, master = local[*], app id = local-1595709727049)
SparkSession available as 'spark'


Hello World


In [14]:
"hello world".drop(5)

res2: String = " world"


In [13]:
"hello world".take(5)


res1: String = hello


In [54]:
"hello world".length

res35: Int = 11


## Flow Control

In [56]:
val r = 1 to 5
r.foreach(println)

1
2
3
4
5


r: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5)


### do while loop

In [59]:
var i = 0
do {
  println("i is still less than 10")
  i += 1
} while (i < 10)

i is still less than 10
i is still less than 10
i is still less than 10
i is still less than 10
i is still less than 10
i is still less than 10
i is still less than 10
i is still less than 10
i is still less than 10
i is still less than 10


i: Int = 10


## Classes

Classes in scala are similar to classes in other languages. Constructor arguments are declared after the class name, and initialization is done in the class body.

In [55]:
class Dog(br: String) {
  // Constructor code here
  var breed: String = br

  // Define a method called bark, returning a String
  def bark = "Woof, woof!"

}

val mydog = new Dog("greyhound")
println(mydog.breed) // => "greyhound"
println(mydog.bark)  // => "Woof, woof!"

greyhound
Woof, woof!


defined class Dog
mydog: Dog = Dog@7c0a6075


## Objects
An object is a class that has exactly one instance
Object and class can have he same name
### Companion Object
A companion object in Scala is an object that’s declared in the same file as a class, and has the same name as the class.
This has several benefits. First, a companion object and its class can access each other’s private members (fields and methods). 


In [None]:
// Object Example 
object Dog {
  def allKnownBreeds = List("pitbull", "shepherd", "retriever")
  def createDog(breed: String) = new Dog(breed)
}

In [None]:
//Companion Object Example
class SomeClass {
    def printFilename() = {
        println(SomeClass.HiddenFilename)
    }
}

object SomeClass {
    private val HiddenFilename = "/tmp/foo.bar"
}

## Case Classes
Case classes are classes that have extra functionality built in.
The primary purpose of case classes is to hold immutable data. They often have few methods, and the methods rarely have side-effects.

### Example

In [None]:
case class Person(name: String, phoneNumber: String)

// Create a new instance. Note cases classes don't need "new"
val george = Person("George", "1234")
val kate = Person("Kate", "4567")

// With case classes, you get a few perks for free, like getters:
george.phoneNumber  // => "1234"

// Per field equality (no need to override .equals)
Person("George", "1234") == Person("Kate", "1236")  // => false

// Easy way to copy
// otherGeorge == Person("George", "9876")
val otherGeorge = george.copy(phoneNumber = "9876")


## Traits
Traits are used to share interfaces and fields between classes. These are similar to interfaces in Java. Similar to Java interfaces, traits define an object type and method signatures. Scala allows partial implementation of those methods. Constructor parameters are not allowed. Traits can inherit from other traits or classes without parameters.

Example -

In [28]:
trait Dog {
    def breed: String
    def color: String
    def bark: Boolean = true
    def bite: Boolean
}
class SaintBernard extends Dog {
    val breed = "Saint Bernard"
    val color = "brown"
    def bite = false
}  

defined trait Dog
defined class SaintBernard


# Example - Tennis Best Players on Clay Surface

In [30]:
/*
*Load csv file into dataframe using scala.
*The options selected inlcude 
*inferschema set to True so that it automatically detects the data type of each column
*delimiter set to comma and header set to True so that first line of the file becomes the header; 
*comparable command in pandas is --> data = pd.read_csv(file_name, encoding = 'ISO-8859-1')
*/
val dfatpclay = spark.read.options(Map("inferSchema"->"true","delimiter"->",","header"->"true"))
  .csv("/Users/psehgal/dev/airflow_home/Tennis_Data_Pipeline_Airflow_Project/images_for_reports/topclaycsv.csv")


dfatpclay: org.apache.spark.sql.DataFrame = [_c0: int, Surface: string ... 5 more fields]


In [38]:
//print the schema of the dataframe
dfatpclay.printSchema()

root
 |-- _c0: integer (nullable = true)
 |-- Surface: string (nullable = true)
 |-- Player: string (nullable = true)
 |-- Count_Win: integer (nullable = true)
 |-- Count_Lose: integer (nullable = true)
 |-- total_play: integer (nullable = true)
 |-- perc_win: double (nullable = true)



In [39]:
//Rename column name. It will create a new dataframe
val dfatpclay1= dfatpclay.withColumnRenamed("total_play", "total_games")


dfatpclay1: org.apache.spark.sql.DataFrame = [_c0: int, Surface: string ... 5 more fields]


In [40]:
//print the schema of the new dataframe
dfatpclay1.printSchema()

root
 |-- _c0: integer (nullable = true)
 |-- Surface: string (nullable = true)
 |-- Player: string (nullable = true)
 |-- Count_Win: integer (nullable = true)
 |-- Count_Lose: integer (nullable = true)
 |-- total_games: integer (nullable = true)
 |-- perc_win: double (nullable = true)



In [43]:
//If I change the column name of the existing dataframe
dfatpclay.withColumnRenamed("total_play", "total_games")


res24: org.apache.spark.sql.DataFrame = [_c0: int, Surface: string ... 5 more fields]


In [44]:
//and print the schema of the existing dataframe. you can see the column names did not change
dfatpclay.printSchema()

root
 |-- _c0: integer (nullable = true)
 |-- Surface: string (nullable = true)
 |-- Player: string (nullable = true)
 |-- Count_Win: integer (nullable = true)
 |-- Count_Lose: integer (nullable = true)
 |-- total_play: integer (nullable = true)
 |-- perc_win: double (nullable = true)



In [45]:
// print the contents of dataframe
//comparable command in pandas --> dfatpclay1
dfatpclay1.collect.foreach(println)

[396,Clay,Nadal R.,351,35,386,90.93]
[135,Clay,Djokovic N.,169,39,208,81.25]
[165,Clay,Federer R.,203,60,263,77.19]
[304,Clay,Kuerten G.,105,40,145,72.41]
[169,Clay,Ferrero J.C.,221,86,307,71.99]
[405,Clay,Nishikori K.,60,24,84,71.43]
[550,Clay,Thiem D.,59,24,83,71.08]
[100,Clay,Coria G.,129,53,182,70.88]
[168,Clay,Ferrer D.,293,122,415,70.6]
[389,Clay,Moya C.,203,85,288,70.49]


In [51]:
dfatpclay1.sort(col("perc_win").desc).show(true)

+---+-------+------------+---------+----------+-----------+--------+
|_c0|Surface|      Player|Count_Win|Count_Lose|total_games|perc_win|
+---+-------+------------+---------+----------+-----------+--------+
|396|   Clay|    Nadal R.|      351|        35|        386|   90.93|
|135|   Clay| Djokovic N.|      169|        39|        208|   81.25|
|165|   Clay|  Federer R.|      203|        60|        263|   77.19|
|304|   Clay|  Kuerten G.|      105|        40|        145|   72.41|
|169|   Clay|Ferrero J.C.|      221|        86|        307|   71.99|
|405|   Clay|Nishikori K.|       60|        24|         84|   71.43|
|550|   Clay|    Thiem D.|       59|        24|         83|   71.08|
|100|   Clay|    Coria G.|      129|        53|        182|   70.88|
|168|   Clay|   Ferrer D.|      293|       122|        415|    70.6|
|389|   Clay|     Moya C.|      203|        85|        288|   70.49|
+---+-------+------------+---------+----------+-----------+--------+



## Show Top 3 players on Clay Surface

In [53]:
dfatpclay1.show(3)

+---+-------+-----------+---------+----------+-----------+--------+
|_c0|Surface|     Player|Count_Win|Count_Lose|total_games|perc_win|
+---+-------+-----------+---------+----------+-----------+--------+
|396|   Clay|   Nadal R.|      351|        35|        386|   90.93|
|135|   Clay|Djokovic N.|      169|        39|        208|   81.25|
|165|   Clay| Federer R.|      203|        60|        263|   77.19|
+---+-------+-----------+---------+----------+-----------+--------+
only showing top 3 rows

