<em><sub>This page is available as an executable or viewable <strong>Jupyter Notebook</strong>:</sub></em>
<br/><br/>
<a href="https://mybinder.org/v2/gh/JetBrains/lets-plot-kotlin/master?filepath=docs%2Fguide%2Fuser_guide.ipynb"
   target="_parent"> 
   <img align="left" 
        src="https://mybinder.org/badge_logo.svg">
</a>
<a href="https://nbviewer.jupyter.org/github/JetBrains/lets-plot-kotlin/blob/master/docs/guide/user_guide.ipynb" 
   target="_parent"> 
   <img align="right" 
        src="https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.png" 
        width="109" height="20">
</a>
<br/>
<br/>

# Lets-Plot Usage Guide

<a href="https://opensource.org/licenses/MIT">
<img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="Couldn't load MIT license svg"/>
</a>


- [System requirements](#sys)
- [Installation](#install)
- [Understanding architecture](#implementation)
- [Learning API](#api)
- [Getting started](#gsg)


**Lets-Plot** is an open-source plotting library for statistical data. It is implemented using the 
[Kotlin programming language](https://kotlinlang.org/) that has a multi-platform nature.
That's why Lets-Plot provides the plotting functionality that 
is packaged as a JavaScript library, a JVM library, and a native Python extension.

The design of the Lets-Plot library is heavily influenced by [ggplot2](https://ggplot2.tidyverse.org) library.

<a name="Installation" id="install"></a>
## Installation

Library is distributed via [Maven Repository](https://bintray.com/jetbrains/lets-plot-maven/lets-plot-kotlin-api-jars). You may include it in your project using Maven or Gradle configuration files, or include it in your script via `@file:DependsOn()` annotation.

<a name="Implementation" id="implementation"></a>
## Understanding Lets-Plot architecture
In `lets-plot`, the **plot** is represented at least by one
**layer**. It can be built based on the default dataset with the aesthetics mappings, set of scales, or additional 
features applied.

The **Layer** is responsible for creating the objects painted on the ‘canvas’ and it contains the following elements:
- **Data** - the set of data specified either once for all layers or on a per layer basis.
One plot can combine multiple different datasets (one per layer).
- **Aesthetic mapping** - describes how variables in the dataset are mapped to the visual properties of the layer, such as color, shape, size, or position.
- **Geometric object** - a geometric object that represents a particular type of charts.
- **Statistical transformation** - computes some kind of statistical summary on the raw input data. 
For example, `bin` statistics is used for histograms and `smooth` is used for regression lines. 
Most stats take additional parameters to specify details of the statistical transformation of data.
- **Position adjustment** - a method used to compute the final coordinates of geometry. 
Used to build variants of the same `geom` object or to avoid overplotting.

![layer diagram](images/layer-small.png)

<a name="API" id="api"></a>
## Learning API
The typical code fragment that plots a Lets-Plot chart looks as follows:

```
import jetbrains.letsPlot.*
import jetbrains.letsPlot.geom.*
import jetbrains.letsPlot.stat.*

p = lets_plot(<dataframe>) 
p + geom_<chart_type>(stat=<stat>, position=<adjustment>)
```

### Geometric objects `geom`

There are two abstract classes which present every graphics object: `Layer` and `LayerBase`, which is inherited from `Layer`. All other classes are straight descendants of `LayerBase`. They all are described below. You may add such an object to `lets_plot`:

```
p = lets_plot(data=df)
p + geom_point()
```

The following charts are [supported](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/index.html):

- Area chart: [`geom_area()`](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/geom_area/index.html)
- Bar chart: [`geom_bar()`](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/geom_bar/index.html)
- Boxplot chart: [`geom_boxplot()`](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/geom_boxplot/index.html)
- Contour chart: [`geom_contour(), geom_contourf()`](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/geom_contour/index.html)
- Density chart: [`geom_density()`](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/geom_density/index.html)
  and [`geom_density2d()`](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/geom_density2d/index.html)
- Error bar chart: [`geom_errorbar()`](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/geom_errorbar/index.html)
- Historgam: [`geom_histogram()`](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/geom_histogram/index.html)
- Line chart: [`geom_line()`](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/geom_line/index.html)
- Scatter chart: [`geom_point()`](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/geom_point/index.html)
- Polygon chart:  [`geom_polygon`](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/geom_polygon/index.html)
- Rectangle chart, Tile chart: [`geom_rect()`, `geom_tile()`](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/geom_rect/index.html)
- Image plot: [`geom_image()`](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/geom_image/index.html)

See the [geom reference](../../plot-api/docs/plot-api/jetbrains.lets-plot.geom/index.html) for more information about the supported
geometric methods, their arguments, and default values.

### Stat `stat`

Add `stat` as an argument to `layer` or `geom` methods to define data transformations:

`geom_point(stat=Stat.count())`

Supported transformations:

- `identity`:  leaves the data unchanged
- `count`:  calculates the number of points with same x-axis coordinate
- `bin`:  calculates the number of points with x-axis coordinate in the same level
- `boxplot`: calculate components of a box plot.
- `density`: perform a kernel density estimation

### Aesthetic mappings `mapping`
With mappings, you can define how variables in dataset are mapped to the visual elements of the chart.
Add the `{x=< >; y=< >; ...}` method to `geom`, where
- `x`: the dataframe column to map to the x axis. 
- `y`: the dataframe column to map to the y axis.
- `...`: other visual properties of the chart, such as color, shape, size, or position.

`geom_point(){x = "cty"; y = "hwy"; color="cyl"}`

### Position `position`

All layers have a position adjustment that resolves overlapping geoms. 
Override the default settings by using the `position` argument in the `geom` methods:

`geom_bar(position=position(PosKind.DODGE))`

Available adjustments:
- `dodge` (`position(PosKind.DODGE)`)
- `jitter` (`position(PosKind.JITTER)`)
- `jitterdodge` (`position(PosKind.JITTER_DODGE)`)
- `nudge` (`position(PosKind.NUDGE)`)

See the [position reference](../../plot-api/docs/plot-api/jetbrains.lets-plot.intern/-pos-kind) for more information about position adjustments.

### Features affecting the entire plot

#### Scales

Scales enable tailored adjustment of data values to visual properties and aesthetics. Override default scales to tweak 
details like the axis labels or legend keys, or to use a completely different translation from data to aesthetic.
For example, to override the fill color on the histogram:

`p + geom_histogram() + scale_color_continuous("red", "green")`

See the list of the available `scale` methods in the [scale reference](../../plot-api/docs/plot-api/jetbrains.lets-plot.scale/index.html)

#### Legend
The axes and legends help users interpret plots.
Use the `guide` methods or the `guide` argument of the `scale` method to customize the legend.
For example, to define the number of columns in the legend:

`p + scale_color_discrete(guide=guide_legend(ncol=2))`

See more information in the [guide reference](../../plot-api/docs/plot-api/jetbrains.lets-plot.intern/-pos-kind)

#### Sampling

Sampling is a special technique of data transformation built into Lets-Plot and it is applied after stat transformation.
Sampling helps prevents UI freezes and out-pf-memory crashes when attempting to plot an excessively large number of geometries.
By default, the technique applies automatically when the data volume exceeds a certain threshold.
The `none` value disables any sampling for the given layer. The sampling methods can be chained together using the + operator.

Available methods:
- `sampling_random_stratified`: randomly selects points from each group proportionally to the group size but also ensures 
that each group is represented by at least a specified minimum number of points.
- `sampling_random`: selects data points at randomly chosen indices without replacement.
- `sampling_pick`: analyses X-values and selects all points which X-values get in the set of first `n` X-values found in the population.
- `sampling_systematic`: selects data points at evenly distributed indices.
- `sampling_vertex_dp`, `sampling_vertex_vw`: simplifies plotting of polygons. 
There is a choice of two implementation algorithms: Douglas-Peucker (`_dp`) and 
Visvalingam-Whyatt (`_vw`).

For more details, see the [sampling reference](../../plot-api/docs/plot-api/jetbrains.lets-plot.intern/-pos-kind).

<a name="GSG" id="gsg"></a>
### Getting started

Let's plot a point chart built using the mpg dataset.

Create the `DataFrame` object and retrieve the data.

In [1]:
%use lets-plot
@file:DependsOn("com.github.doyaaaaaken:kotlin-csv-jvm:0.7.3")

import com.github.doyaaaaaken.kotlincsv.client.*

val csvData = java.io.File("mpg.csv")
val mpg: List<Map<String, String>> = CsvReader().readAllWithHeader(csvData)

fun colAsInt(col: String): List<Int> {
    return mpg.map{it[col]!!.toInt()}
}

fun colAsDouble(col: String): List<Double> {
    return mpg.map{it[col]!!.toDouble()}
}

fun colAsString(col: String): List<String> {
    return mpg.map{it[col]!!}
}

val df = mapOf(
    "displ" to colAsDouble("displ"),
    "hwy" to colAsInt("hwy"),
    "cyl" to colAsInt("cyl"),
    "number" to colAsInt(""),
    "usage" to colAsInt("cty"),
    "drv" to colAsString("drv"),
    "year" to colAsInt("year")
)

Plot the basic point chart.

In [2]:
val p = lets_plot(df) {x = "number"; y = "usage"}
p + geom_point()



Perform the following aesthetic mappings:
 - `x` = displ (the **displ** column of the dataframe)
 - `y` = hwy  (the **hwy** column of the dataframe)
 - `color` = cyl (the **cyl** column of the dataframe)

In [3]:
// Mapping
lets_plot(df) {x = "displ"; y = "hwy"; color = "cyl"} + geom_point(df)



Apply statistical data transformation to count the number of cases at each x position.

In [4]:
val p = lets_plot(df) {x = "displ"; y = "hwy"; color = "cyl"}
p + geom_point(df, stat = Stat.count())



Change the pallete and the legend, add the title. 

In [5]:
import jetbrains.letsPlot.scale.*

val p = lets_plot(df) {x = "displ"; y = "hwy"; color = "cyl"}
p + 
    geom_point(df, position = Pos.nudge) + 
    scale_color_continuous("red", "green", guide = guide_legend(ncol=2)) + 
    ggtitle("Displacement by horsepower")    



Apply the randomly stratified sampling to select points from each group proportionally 
to the group size.

In [6]:
import jetbrains.letsPlot.scale.*
import jetbrains.letsPlot.intern.*
import jetbrains.letsPlot.sampling.*

val p = lets_plot(df) {x = "displ"; y = "hwy"; color = "cyl"}
p   + geom_point(
      data=df, position=Pos.nudge, 
      sampling = sampling_random_stratified(40)
  ) + scale_color_continuous(
      "blue", "pink",
      guide = guide_legend(ncol=2)
  )

