# Scala Overview for Python Programmers (Draft)

This is a gentle overview of [Scala](https://www.scala-lang.org) for people with a [Python](https://www.python.org) background. It started as a collection of notes, and is by no means exhaustive or claims any authority whatsoever. It does assume a decent knowledge of Python  (independant of version 2 or 3), while aiming to provide side-by-side comparisons of Scala and Python source code. Any feedback or improvements are highly welcome!

## Motivation

Scala is becoming ever more popular as a successor of Java which, in turn, started out as a better, C/C++ and iterated the mantra of "build once, run everywhere" on the basis of its JVM technology. Today Scala flourishes also because Java is perceived by many as an inflexible behemoth with an uncertain future in the hands of a single company with unclear interest in Open Source technology. And Java is perceived to no longer fit the bill anymore as an end-user language in today's world of big data and map-reduce paradigms. Scala adds the functional features fitting these new paradigms while reducing lots of the annoying Java syntax. As a result it is very popular in the big data and machine learning world of today.

Many authors on the web emphasize how much Scala is, in fact, *unlike* Java, referring mostly to syntax and functional features. Bruce Eckel regards Scala as a ["static language that feels dynamic"](http://www.artima.com/weblogs/viewpost.jsp?thread=328540) and even "Pythonic". And David Mertz has pointed out how much is already possible in terms of [functional programming in Python](http://www.oreilly.com/programming/free/files/functional-programming-python.pdf) already. All this makes it appear interesting to look more deeply at Scala from a Python rather than the usual Java perspective.

## About this Notebook

There are many sources on the web for learning about Scala. But few allow to play as interactively and joyfully with a programming language in an explorative way like [Jupyter](http://jupyter.org) *notebooks*. Originally something like a [REPL](https://en.wikipedia.org/wiki/Read–eval–print_loop) on steroids for Python only (and much inspired by *Mathematica* and MATLAB), Jupyter today supports ca. 40 languages, and has become the de-facto interactive, explorative programming environment for data scientists and other professionals. Paying tribute to this fact, the Apache foundation has started developping a Scala kernel extension for Jupyter named [Apache Toree](https://github.com/apache/incubator-toree) which is used in this notebook.

The REPL included with Scala is a useful command-line interpreter, much like the standard Python interpreter shell, but without many bells and whistles. For the more graphically minded users some Scala IDEs provide the concept of a *worksheet* in which you write code in one column and see its output and type information in a second column. Jupyter notebooks go way beyond that, blending interactive code, visualisations and text cells like in Donald E. Knuth's famous concept of [Literate Programming](https://en.wikipedia.org/wiki/Literate_programming), with tons of plugins provided by third parties.

The easiest way to interactively run this notebook (named "scala_for_pythoneers.ipynb") is in an [online version of Jupyter](https://try.jupyter.org). Just create a new session by opening the page in a browser, click on the white "Upload" button, select this notebook's file name on your local computer, click on the blue "Upload" button in the then newly added row at the top for this file, and finally click its name in the list of all available files. Then you can edit and execute cells and experiment with the notebook. But beware that any changes will be lost after you disconnect from this session! An alternative way of running this locally is given in an appendix on [Local Installation](#Appendix:-Local-Installation).

## Comments

Scala has single line and multi-line comments. The latter doesn't exist in Python, but these are often emulated using multi-line strings in tripple-quotes, either single or double ones.

The expressions in the cells below (here the literal number `42`) need to be there only to please the Scala kernel for Jupyter. Otherwise it will create an error for cells that contain a comment only. This is likely a buglet that will disappear in future versions.

In [1]:
42 // inline one-line comment

42

In [2]:
// a one-line comment
42

42

In [3]:
/* a multi-
   line
   comment */
42

42

## Print

Like any other programming language Scala also has a function to print something somewhere, in some a shell, on the screen, etc. This is `print` and a variation, `println` (the latter adding a trailing newline). These are not used a lot in this notebook because inside a cell an expression's result is evaluated and printed automatically:

In [4]:
40 + 2

42

In [5]:
print(40 + 2)

42

In [6]:
println(40 + 2)

42


Beware that the Scala REPL will provide more detailed output compared to Apache Toree (especially about types) like this, but maybe the latter will catch-up at some time:

```
scala> 40 + 2
res12: Int = 42
```

## Semicolons

Scala, like Python, doesn't require semicolons at the end of lines, but you can use them between statements if you absolutely feel like:

In [7]:
println(40)
println(42)

40
42


In [8]:
println(40);
println(42);

40
42


In [9]:
println(40); println(42)

40
42


## Literals

This is a short overview of Scala datastructures available as literals. All of them are treated in more detail in the section on [Basic Datatypes](#Basic-Datatypes).

In [10]:
true

true

In [11]:
false

false

In [12]:
42

42

In [13]:
3.14

3.14

In [14]:
"some string"

some string

In [15]:
"""
multi-line
string
"""

"
multi-line
string
"

In [16]:
"""
multi-line
string with ümläüts 
"""

"
multi-line
string with ümläüts
"

In [17]:
'A'

A

In [18]:
("my tuple", true, 42)

(my tuple,true,42)

Scala provides explicit symbols for "interning" strings, basically for accelerating string comparisons. Python does that partly implicitly for short strings and partly on demand with a function named `intern()` (in Python 3: `sys.intern()`.

In [19]:
'symbol

'symbol

In [20]:
val sym = Symbol("hello world")

In [21]:
sym

'hello world

Scala has XML support built-into the language, even as literals. Yes, that's right. You can happily type XML content in arbitrary tags:

In [22]:
val para = <p>Hello, XML world!</p>

In [23]:
para

<p>Hello, XML world!</p>

Functions without a name can be considered function literals. See more in section on [Anonymous Functions](#Anonymous-Functions)

In [24]:
(x: Int) => x + 1

<function1>

## Variables

Variables come in mutable and immutable versions, meaning one can either reassign their values or not, respectively. They are declared with `var` and `val`, respectively (use this menemonic device: VARiable versus VALue). So, variables with a `var` declarator are mutable:

In [25]:
var x = 42

In [26]:
x = 43

In [27]:
x

43

But those declared as `val` ("values") are immutable, ie. constant:

In [28]:
val x = 42

In [29]:
x = 43

Name: Compile Error
Message: <console>:19: error: reassignment to val
       x = 43
         ^
StackTrace: 

Scala derives types whenever it can, but, sure enough, one can provide an explicit type where desired or needed:

In [30]:
val x: Int = 42

In [31]:
x

42

In [32]:
val x: Double = 42

In [33]:
x

42.0

## Blocks

Unlike in Python blocks in Scala have a return value defined by the last expression inside the block (this is also why there is no `return` statement needed in Scala):

In [34]:
{
  var x = 42
  var y = 23
  x + y
}

65

## Basic Datatypes

Scala has limited support for basic datatypes that it can express as literals. These are Booleans, numbers, strings, symbols, tuples, but also XML and functions. Other types like lists, sets or maps (dictionaries in Python) need to be created using their respective names (or are the result of other operations or methods), see section [Additional Basic Datatypes](#Additional-Basic-Datatypes) below.

### Booleans

In [35]:
true

true

In [36]:
false

false

In [37]:
true || false

true

In [38]:
true && false

false

### Numbers

In [39]:
42

42

In [40]:
3.14

3.14

### Strings

In [41]:
"ABC"

ABC

In [42]:
// single characters
'A'

A

In [43]:
"Missisippi".distinct

Misp

### Tuples

In [44]:
(0, 1, 2)

(0,1,2)

In [45]:
// tuple unpacking
var (x, y, z) = (0, 1, 2)

In [46]:
(x, y, z)

(0,1,2)

### Ranges

In [47]:
0 to 5

Range(0, 1, 2, 3, 4, 5)

### XML

On these XML objects you can perform search, filter and all kind of things. You can also blend XML and Scala code directly in the language e.g. in order to create XML code dynamically from Scala objects. In Python one would use templating and XML processing packages like `jinja2` and `libxml`. This feature is beyond the scope of this notebook, though. See the [References](#References) section for more information. 

## String Interpolation

Scala, like Python, provides several mechanisms for string interpolation (or formatting). The basic, printf-like, one uses a `f` prefix and variable names followed by printf-style format strings like `%s`:

In [48]:
val name = "Alice"
val age = 11
val str = f"$name%s is $age%d years old."

In [49]:
str

Alice is 11 years old.

Scala has a conveniant string formatting mechanism, which in Python 3.6 is called [Literal String Interpolation](https://www.python.org/dev/peps/pep-0498/), with a similar syntax, but with `f` as a string prefix and curly braces for denoting the references to other variables. 

In [50]:
val name = "Alice"
val age = 11
val str = s"$name is $age years old."

In [51]:
str

Alice is 11 years old.

There is also a raw interpolator that performs no escaping of literals within the string, similar to raw strings in Python with prefix `r`:

In [52]:
raw"a\nb"

a\nb

In [53]:
"a\nb"

a
b

## Package Imports

In Scala many datatypes are accessible only by using their textual names and/or after importing them from its standard library. 

In [54]:
// wildcard import
import scala.collection._
// selective import
import scala.collection.immutable.Vector
import scala.collection.{Seq, Map}
// renaming import
import scala.collection.immutable.{Vector => Vec28}
//// import all from java.util except Date
// import java.util.{Date => _, _}
// declare a package
package pkg at start of file
package pkg { ... }
// specify package root to avoid collisions
import _root_.scala.math._

Name: Compile Error
Message: <console>:4: error: illegal start of definition
package pkg at start of file
^
StackTrace: 

## Additional Basic Datatypes

### List, Array, Vector

Lists are immutable (Arrays are mutable).

In [55]:
var xs = List(0, 1, 2)

In [56]:
xs

List(0, 1, 2)

In [57]:
xs(1)

1

In [58]:
// concatenate
1 :: List(2, 3)

List(1, 2, 3)

### Set

In [59]:
// import scala.collection.Set
var s = Set(0, 1, 2)

In [60]:
s

Set(0, 1, 2)

In [61]:
s += 3

In [62]:
s -= 2

In [63]:
s

Set(0, 1, 3)

In [64]:
s.

Name: Syntax Error.
Message: 
StackTrace: 

### Maps

These are what is called dictionaries in Python:

In [65]:
Map('a' -> 1, 'b' -> 2) // scala.collection.immutable.Map[Char,Int] = Map(a -> 1, b -> 2)

Map(a -> 1, b -> 2)

In [66]:
Map('a' -> 1, 'b' -> 2, 9 -> 'y') // scala.collection.immutable.Map[AnyVal,AnyVal] = Map(a -> 1, b -> 2, 9 -> y)

Map(a -> 1, b -> 2, 9 -> y)

### BigInt

In [67]:
import scala.math.BigInt

In [68]:
val x = BigInt(1024)

In [69]:
x * x * x * x * x

1125899906842624

### BigDecimal

Scala's `BigDecimal` type corresponds to Python's `decimals.Decimal` class... precission... Big??

In [70]:
val amount = BigDecimal(12345670000000000000000000.89)

In [71]:
print(amount * amount)

1.524155677489E+50

### Complex Numbers

Complex numbers seem to be, well, [lacking](https://www.scala-lang.org/api/current/?search=complex) from Scala's (and Java's) standard library, unlike in Python which has also literals for them, like `2+3j`. This is maybe a reason why in Scala tutorials it is so popular to give sample implementations for complex numbers.

## Conditionals

In [72]:
var one = 1
var two = 2
var ten = 10

In [73]:
if (one < two) "right"

right

In [74]:
if (one > two) "right" else "wrong"

wrong

In [75]:
if (one > two) "right"

()

These Scala `if` statements above return the result of the branch that evaluates as true, unlike in Python, where these statements cannot return a result. But in Scala one can write very short assignments based on a conditional expressions:

In [76]:
var result = if (one > two) "right" else "wrong"

The equivalent in Python goes like this: `result = "right" if (one > two) else "wrong"`. 

Unlike Python, Scala does not accept multiple Boolean operators in one expression:

In [77]:
if (one < two < ten) "right" else "wrong"

Name: Compile Error
Message: <console>:31: error: type mismatch;
 found   : Int
 required: Boolean
       if (one < two < ten) "right" else "wrong"
                       ^
StackTrace: 

In [78]:
if (one < two < ten)
  "right"
else
  "wrong"

Name: Compile Error
Message: <console>:31: error: type mismatch;
 found   : Int
 required: Boolean
       if (one < two < ten)
                       ^
StackTrace: 

Scala's syntax for multi-branch `if` statements is a little bit cumbersome for people used to Python's `if-elif-else`:  

In [79]:
val x = ten
if (x == one){
  println("Value of X is 1");
} else if (x == two){
  println("Value of X is 2");
} else{
  println("This is the else statement");
}

This is the else statement


The advantage is that one can put an entire such statement on a single line (shortened here slightly):

In [80]:
val x = ten
if (x == one) 1 else if (x == two) 2 else "something else"

something else

## Loops

"generators", "guards", "yield" returns a vector

In [81]:
for (i <- 0 to 9) println(i)

0
1
2
3
4
5
6
7
8
9


In [82]:
var count = 5
while (count >= 0) { println(s"Counting... $count"); count -= 1 }

Counting... 5
Counting... 4
Counting... 3
Counting... 2
Counting... 1
Counting... 0


## Exceptions

In [83]:
throw new Exception("RTFM")

Name: java.lang.Exception
Message: RTFM
StackTrace: 

In [84]:
try {
  1 / 0
} catch {
  case e: Exception => println("exception caught: " + e)
}

exception caught: java.lang.ArithmeticException: / by zero


()

In [85]:
try {throw new Exception("RTFM") } catch { case e: Exception => println("exception caught: " + e)}

exception caught: java.lang.Exception: RTFM


In [86]:
try {
  throw new Exception("RTFM")
} catch {
  case e: Exception => println("exception caught: " + e)
}

exception caught: java.lang.Exception: RTFM


In [87]:
// try {throw new Exception("RTFM")}
catch {case e: Exception => println("exception caught: " + e)}

Name: Unknown Error
Message: <console>:2: error: illegal start of definition
catch {case e: Exception => println("exception caught: " + e)}
^
StackTrace: 

## Functions

In [88]:
def sqr(x: Int) = x * x

In [89]:
sqr(2)

4

In [90]:
def sqr(x: Int) = { x * x }

In [91]:
sqr(2)

4

Scala distinguished between functions with parameters and functions without. Those without parameters can be called in Scala without parantheses, unlike in Python, where empty parantheses are mandatory:

In [92]:
def foo() = "bar"

In [93]:
foo

bar

In [94]:
def bar = "foo"

In [95]:
bar

foo

Named and default parameter values

In [96]:
def decorate(str: String, left: String = "[", right: String = "]") =
    left + str + right

In [97]:
decorate("Hello", right="]<<<")

[Hello]<<<

Variable argument number

In [98]:
def sum(args: Int*) = {
    var result = 0
    for (a <- args) result += a
    result
}

In [99]:
sum(1, 2, 3, 4, 5)

15

In [100]:
sum(1 to 5)

Name: Unknown Error
Message: <console>:33: error: type mismatch;
 found   : scala.collection.immutable.Range.Inclusive
 required: Int
       sum(1 to 5)
             ^
StackTrace: 

In [101]:
sum(1 to 5: _*)

15

## Anonymous Functions

Scala provides anonymous functions like those defined with Python's `lambda` statement:

In [102]:
(x: Int) => x + 1

<function1>

In [103]:
for (i <- 0 to 3) yield ((x: Int) => x + 1)(i)

Vector(1, 2, 3, 4)

In [104]:
val succ = (x: Int) => x + 1

In [105]:
succ(42)

43

## Built-in Functions

- no equivalent for Python's `id()` function

In [106]:
/* for Python 3.6:
   abs, all, any, ascii, bin, callable, chr, compile, delattr, dir, divmod, eval, exec,
   format, getattr, globals, hasattr, hash, hex, id, input, isinstance, issubclass, 
   iter, len, locals, max, min, next, oct, ord, pow, print, repr, round, setattr, sorted,
   sum, vars, open */

Name: Syntax Error.
Message: 
StackTrace: 

In [107]:
// abs chr divmod has hex len max min oct ord pow round sorted sum

Name: Syntax Error.
Message: 
StackTrace: 

## Annotations

Along the lines of Python decorators, but rather less powerful...

## Classes

In [108]:
// public
class C(val x: Int)
var c = new C(4)
c.x

4

In [109]:
// private
class C(x: Int) // same as class C(private val x: Int)
var c = new C(4)
c.x

Name: Unknown Error
Message: <console>:28: error: value x is not a member of C
       c.x
         ^
StackTrace: 

## Obtaining Type Information

In the Scala REPL one can use the `:type` command to find out about the type of an object.

```
$ scala
Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121).
Type in expressions for evaluation. Or try :help.

scala> :type 42
Int
```

At run-time one can use the `.getClass` method. See a few examples below: 

In [110]:
"ABC".getClass

class java.lang.String

In [111]:
'A'.getClass

char

In [112]:
42.getClass

int

In [113]:
(40 + 2).getClass

int

In [114]:
(5 until 9).getClass

class scala.collection.immutable.Range

## Little Pitfalls

These are a few potentially unexpected situations you might run into as somebody exposed for some time to a Python habitat.

In [115]:
"A" + "B" + "C"

ABC

In [116]:
'A' + 'B' + 'C'

198

In [117]:
'Æ'.toInt

198

In [118]:
'A'.toInt

65

In [119]:
"ABC".sum

Æ

For some operators there is no literal infix notation like the `**` exponentiation operator in Python. Here one needs to use imported functions like `scala.math.pow()` in this case (which is also available in Python as `math.pow`): 

In [120]:
scala.math.pow(2, 3)

8.0

In Scala there is often more than "one way to do it", a feature that Python claims for itself, with a few notable exceptions, though):

In [121]:
6 * 7

42

In [122]:
42.0

42.0

In [123]:
42f

42.0

In [124]:
42d

42.0

In [125]:
6. * 7

Name: Unknown Error
Message: <console>:1: error: ';' expected but integer literal found.
6. * 7
     ^
StackTrace: 

In [126]:
6 * 7.

Name: Syntax Error.
Message: 
StackTrace: 

In [127]:
6. * (7)

42

In [128]:
6.*(7) // * is actually a method!

42

In [129]:
6.*(7+0)

42

In [130]:
Range(1, 11)

Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

In [131]:
1 until 11

Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

In [132]:
1 to 10

Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

In [133]:
1.to(10)

Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

In [134]:
import scala.language.postfixOps // enable postfix operators
1 to 10 map(sqr) sum

385

In [135]:
sqr(1 to 10 sum)

3025

In [136]:
1 to 10 sum sqr

Name: Unknown Error
Message: <console>:32: error: missing argument list for method sqr
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `sqr _` or `sqr(_)` instead of `sqr`.
       1 to 10 sum sqr
                   ^
StackTrace: 

In [137]:
{}.getClass

void

In Scala you can try to execute something, while it's not mandatory to catch any errors. This means trying is identical to doing it:

In [138]:
try {print(42)}

42

In [139]:
print(42)

42

In [140]:
var v = try {1} finally {2}

In [141]:
v

1

One could sometimes argue if some methods return what one would expect. The `String.distinct` method e.g. returns a string with characters in the same order as in the input string, while one could also expect an array or even a set:  

In [142]:
"Missisippi".split("")

Array(M, i, s, s, i, s, i, p, p, i)

In [143]:
"Missisippi".distinct

Misp

In [144]:
"Missisippi".distinct.getClass

class java.lang.String

## Comparison

Similarities between Python and Scala:

- high code expressiveness
- use of semicolons at the end of lines is optional
- whitespace is relevant(?)
- similar exception handling
- usage of tripple-quotes for multi-line strings
- mix of imperative, object-oriented and functional programming paradigms
- Scala's type declarations seem very similar to those of Python 3 which can be statically checked using [MyPy](http://mypy-lang.org) 

Differences (Scala-centric):

- Scala uses curly braces for blocks (and therefore needs no "empty" statement like Python's `pass`)
- Scala recommends two blanks for indented code while Python recommends four
- Scala's style guidelines emphasize camel case more often than Python's
- Scala offers fewer datatypes available as literals
- Scala allows no multiple operators in Boolean expressions
- Scala sports more numeric types: Int, Float, Double, BigInt, BigDecimal
- Scala doesn't follow Python's "only one way to do it" philosophy
- Scala appears to have a cleaner top-level namespace (Python's contains e.g. built-in functions like `abs`, `divmod`, `max`, `min`, `pow`, `round`, and `sum` which are clearly math-related)
- Scala offers access to everything JVM, Python does only via [Jython](http://www.jython.org) which seems to have stalled in development
- Scala's standard library is much smaller, e.g.:
    - no complex numbers (seem to be missing in Java, too)
    - no JSON support, see http://json4s.org

## Conclusions

- Scala is certainly a great improvement over Java
- it adds a widely accepted/needed functional programming style
- its syntax is both, more compact and expressive, than Java's
- it has a rich standard library regarding collection classes
- it has a poor standard library support for any wider set of applications 
- it has syntax features of questionable, if any, value
- …

## References

- [Scala for the Impatient](https://www.safaribooksonline.com/library/view/scala-for-the/9780134510613/SCLA_02_01.html) (O'Reilly video course)
- https://gist.github.com/stantonk/6773672
- https://bugra.github.io/work/notes/2014-10-18/scala-basics-for-python-developers/
- http://scala-docs-sphinx.readthedocs.io/en/latest/cheatsheet.html
- http://alvinalexander.com/scala/scala-xml-examples-xml-literals-source-code-searching-xpath

## Appendix: Local Installation

This is an attempt to describe a local installation of several components needed to run Jupyter with the Apache Toree Scala kernel on a local computer. As some of these components can be installed in various ways the following gives only one example and only for OS X (after installing [brew](https://brew.sh) first!):

```
brew install python
pip install jupyter
brew install apache-spark
pip install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz
jupyter toree install —spark_home=/usr/local/Cellar/apache-spark/2.1.0/libexec
jupyter notebook
```

## TODO

- Types: Array, List, Map
- introspection
- documentation
- testing
- more details on functions and classes
- Null, Nil, Void
- comprehensions
- context managers?
- warnings?
- logging?