# Kotlin for Jupyter Notebooks

Over the last several years, following the growing importance of data science, [Jupyter notebooks](https://jupyter.org/) have proven to be a de-facto standard tool for data scientists as well as any other people working with data. Thanks to their interactivity, Jupyter notebooks are very convenient for transforming, visualizing and presenting data. Thanks to the extensibility and the open-source nature of Jupyter, it has turned into a large ecosystem around data science and got integrated into tons of other solutions related to data.

Just as Jupyter has become a de-facto standard tool for data scientists, Python has become a de-facto standard language for data scientists. In addition to being a language, Python has created a wide and rich ecosystem of data science tools, frameworks and libraries.

Despite the fact that Python is a language of choice for most Jupyter users, Jupyter in fact is a polyglot tool and its language support can be easily extended. In order to provide the support for a language, one must implement a Jupyter kernel (a sort of Jupyter backend plugins) responsible for execution and rendering of cells.  There are Jupyter kernels implemented by the community for Julia, R, and [dozens of other languages](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels), including [Kotlin](https://kotlinlang.org/).

Why Kotlin? First, Kotlin is a modern JVM language with fast-growing mindshare. It would be a pity if those who use Kotlin won’t be able to use it with Jupyter. Second, over the last few years, Kotlin has overgrown the market of JVM development and is ready to expand further into other platforms. The major traits of Kotlin (due to its design) include conciseness, safety, and interoperability. Those are fundamental language traits making it great across a wide variety of tasks and platforms. Data science is certainly one of these tasks. The great news here is that that the community has already started adopting Kotlin for data science. It’s very much recommended to watch a [talk](https://www.youtube.com/watch?v=yjVW6uCmVBA) by Holger Brandl (the creator of [krangl](https://github.com/holgerbrandl/krangl), a Kotlin’s analog of Python’s pandas) or another [talk](https://www.youtube.com/watch?v=-zTqtEcnM7A&feature=youtu.be) by Thomas Nield (the creator of [kotlin-statistics](https://github.com/thomasnield/kotlin-statistics)), or read his [article](https://towardsdatascience.com/introduction-to-kotlin-statistics-cdad3be88b5).

### Running cells

Here's a simple example with Kotlin code:

In [13]:
class Greeter(val name: String) {
    fun greet() {
        println("Hello, $name!")
    }
}

In [25]:
Greeter("Jupyter").greet() // Run me

Hello, Jupyter!


### Configuring Maven dependencies

Here's another example, courtsey of [thomasnield/kotlin-statistics](https://github.com/thomasnield/kotlin-statistics), showcasing how to load additional dependencies to the notebook from Maven repos:

In [21]:
@file:Repository("https://repo1.maven.org/maven2")
@file:DependsOn("org.nield:kotlin-statistics:1.2.1")

In [18]:
import java.time.LocalDate
import java.time.temporal.ChronoUnit
import org.nield.kotlinstatistics.*

data class Patient(val firstName: String,
                   val lastName: String,
                   val gender: Gender,
                   val birthday: LocalDate,
                   val whiteBloodCellCount: Int) {

    val age = ChronoUnit.YEARS.between(birthday, LocalDate.now())
}

val patients = listOf(
        Patient("John", "Simone", Gender.MALE, LocalDate.of(1989, 1, 7), 4500),
        Patient("Sarah", "Marley", Gender.FEMALE, LocalDate.of(1970, 2, 5), 6700),
        Patient("Jessica", "Arnold", Gender.FEMALE, LocalDate.of(1980, 3, 9), 3400),
        Patient("Sam", "Beasley", Gender.MALE, LocalDate.of(1981, 4, 17), 8800),
        Patient("Dan", "Forney", Gender.MALE, LocalDate.of(1985, 9, 13), 5400),
        Patient("Lauren", "Michaels", Gender.FEMALE, LocalDate.of(1975, 8, 21), 5000),
        Patient("Michael", "Erlich", Gender.MALE, LocalDate.of(1985, 12, 17), 4100),
        Patient("Jason", "Miles", Gender.MALE, LocalDate.of(1991, 11, 1), 3900),
        Patient("Rebekah", "Earley", Gender.FEMALE, LocalDate.of(1985, 2, 18), 4600),
        Patient("James", "Larson", Gender.MALE, LocalDate.of(1974, 4, 10), 5100),
        Patient("Dan", "Ulrech", Gender.MALE, LocalDate.of(1991, 7, 11), 6000),
        Patient("Heather", "Eisner", Gender.FEMALE, LocalDate.of(1994, 3, 6), 6000),
        Patient("Jasper", "Martin", Gender.MALE, LocalDate.of(1971, 7, 1), 6000)
)

enum class Gender {
    MALE,
    FEMALE
}

val clusters = patients.multiKMeansCluster(k = 3,
        maxIterations = 10000,
        trialCount = 50,
        xSelector = { it.age.toDouble() },
        ySelector = { it.whiteBloodCellCount.toDouble() }
)

In [19]:
clusters.forEachIndexed { index, item ->
    println("CENTROID: $index")
    item.points.forEach {
        println("\t$it")
    }
}

CENTROID: 0
	Patient(firstName=Sarah, lastName=Marley, gender=FEMALE, birthday=1970-02-05, whiteBloodCellCount=6700)
	Patient(firstName=Sam, lastName=Beasley, gender=MALE, birthday=1981-04-17, whiteBloodCellCount=8800)
CENTROID: 1
	Patient(firstName=John, lastName=Simone, gender=MALE, birthday=1989-01-07, whiteBloodCellCount=4500)
	Patient(firstName=Jessica, lastName=Arnold, gender=FEMALE, birthday=1980-03-09, whiteBloodCellCount=3400)
	Patient(firstName=Michael, lastName=Erlich, gender=MALE, birthday=1985-12-17, whiteBloodCellCount=4100)
	Patient(firstName=Jason, lastName=Miles, gender=MALE, birthday=1991-11-01, whiteBloodCellCount=3900)
	Patient(firstName=Rebekah, lastName=Earley, gender=FEMALE, birthday=1985-02-18, whiteBloodCellCount=4600)
CENTROID: 2
	Patient(firstName=Dan, lastName=Forney, gender=MALE, birthday=1985-09-13, whiteBloodCellCount=5400)
	Patient(firstName=Lauren, lastName=Michaels, gender=FEMALE, birthday=1975-08-21, whiteBloodCellCount=5000)
	Patient(firstName=James,

### Configuring the built-in  via magics

For a more straightforward, the Kotlin kernel pre-configures certain libraries, and allows the notebook user to load them via special commands, also known as [magics](https://ipython.readthedocs.io/en/stable/interactive/magics.html). To pre-configure a library for a notebook, one must use its name prepened with `%%`. Here's how it works:

In [24]:
%%kotlin-statistics

When such a cell is executed, the kernel, makes sure the corresponding Maven repo is configured, the library is loaded, necessary import statements are added (e.g. in that case `import org.nield.kotlinstatistics.*` won't be needed), and necessary renderers are configured. The supported magics now include: [`%%kotlin-statistics`](https://github.com/thomasnield/kotlin-statistics), [`%%klaxon`](https://github.com/cbeust/klaxon), [`%%krangl`](https://github.com/holgerbrandl/krangl), [`%%kravis`](https://github.com/holgerbrandl/kravis), and [`%%ggplot`](https://github.com/jetbrains/datalore-plot).

Here's another example, showcasing [`%%krangl`](https://github.com/holgerbrandl/krangl), and [`%%ggplot`](https://github.com/jetbrains/datalore-plot) libraries:

In [66]:
%%ggplot
%%krangl

In [43]:
val df = DataFrame.readCSV("data/iris.csv")
df.head()

A DataFrame: 5 x 5

sepal_length,sepal_width,petal_length,petal_width,species
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa


In [77]:
df.groupBy("species").count()

A DataFrame: 3 x 2

species,n
Iris-setosa,50
Iris-versicolor,50
Iris-virginica,50


In [105]:
val points = geom_point(
    data = mapOf(
        "x" to df["sepal_length"].asDoubles().toList(),
        "y" to df["sepal_width"].asDoubles().toList(),
        "color" to df["species"].asStrings().toList()
        
    ), alpha=1.0)
{
    x = "x" 
    y = "y"
    color = "color"
}

ggplot() + points

Plot(data=null, mapping=jetbrains.datalorePlot.intern.Options@49c28e24, features=[jetbrains.datalorePlot.geom.geom_point@13626996])

### Installation

Currently, Kotlin Jupyter kernel can be installed only via [conda](https://conda.io/en/latest/):

```bash
conda install kotlin-jupyter-kernel -c jetbrains
```

Later it will be also possible to install it via `pip install`.

Note, Kotlin Jupyter requires Java 8 to be installed:

```bash
apt-get install openjdk-8-jre
```

Once these requirements are satisfied, feel free to run `jupyter notebook` and switch to `Kotlin` kernel.

### Other useful information

To be added (on limitations, sources, roadmap, contribution, library integration, and other useful links)...