# Lets-Plot Usage Guide

<a href="https://opensource.org/licenses/MIT">
<img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="Couldn't load MIT license svg"/>
</a>

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/JetBrains/lets-plot-kotlin/add_docs_ggtitle_sampling?filepath=docs%2Fkotlin-lets-plot-documentation.ipynb)

- [System requirements](#sys)
- [Installation](#install)
- [Understanding architecture](#implementation)
- [Learning API](#api)
- [Getting started](#gsg)


**Lets-Plot** is an open-source plotting library for statistical data. It is implemented using the 
[Kotlin programming language](https://kotlinlang.org/) that has a multi-platform nature.
That's why Lets-Plot provides the plotting functionality that 
is packaged as a JavaScript library, a JVM library, and a native Python extension.

The design of the Lets-Plot library is heavily influenced by Leland Wilkinson's work 
[The Grammar of Graphics](https://www.goodreads.com/book/show/2549408.The_Grammar_of_Graphics) that describes the features 
that underlie all statistical graphics.

<a name="SystemRequirementa" id="sys"></a>
## System requirements
When installing the Lets-Plot library, consider the following requirements.

Supported operating systems:
- macOS
- Linux

<a name="Installation" id="install"></a>
## Installation

Library is distributed on Bintray in form of jar-archives.

<a name="Implementation" id="implementation"></a>
## Understanding Lets-Plot architecture
In `lets-plot`, the **plot** is represented at least by one
**layer**. It can be built based on the default dataset with the aesthetics mappings, set of scales, or additional 
features applied.

The **Layer** is responsible for creating the objects painted on the ‘canvas’ and it contains the following elements:
- **Data** - the set of data specified either once for all layers or on a per layer basis.
One plot can combine multiple different datasets (one per layer).
- **Aesthetic mapping** - describes how variables in the dataset are mapped to the visual properties of the layer, such as color, shape, size, or position.
- **Geometric object** - a geometric object that represents a particular type of charts.
- **Statistical transformation** - computes some kind of statistical summary on the raw input data. 
For example, `bin` statistics is used for histograms and `smooth` is used for regression lines. 
Most stats take additional parameters to specify details of the statistical transformation of data.
- **Position adjustment** - a method used to compute the final coordinates of geometry. 
Used to build variants of the same `geom` object or to avoid overplotting.

![layer diagram](layer-small.png)

<a name="API" id="api"></a>
## Learning API
The typical code fragment that plots a Lets-Plot chart looks as follows:

```
from lets_plot import *
p = ggplot(<dataframe>) 
p + geom_<chart_type>(stat=<stat>, position=<adjustment>)
```

### Geometric objects `geom`

You can add a new geometric object by specifying an argument in `layer(geom='<type of chart>')`, for example, `layer(geom='point')`.
Alternatively, you can add the `geom_xxx` object to `ggplot`:

```
p = ggplot(data=df)
p + geom_point()
```

The following charts are supported:

- Area chart: [`geom_area()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_area-geom_ribbon)
- Bar chart: [`geom_bar()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_bar--bar-chart)
- Boxplot chart: [`geom_boxplot()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_boxplot)
- Contour chart: [`geom_contour(), geom_contourf()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_contour-geom_contourf)
- Density chart: [`geom_density()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_density)
  and [`geom_density2d()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_density2d-geom_density2df)
- Error bar chart: [`geom_errorbar()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_errorbar)
- Historgam: [`geom_histogram()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_freqpoly-geom_histogram)
- Line chart: [`geom_line()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_path-geom_line-geom_step)
- Scatter chart: [`geom_point()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_point)
- Polygon chart:  [`geom_polygon`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_polygon)
- Rectangle chart, Tile chart: [`geom_rect()`, `geom_tile()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_raster-geom_tile-geom_rect)
- Image plot: [`geom_image()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_image)

See the [geom reference](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md) for more information about the supported
geometric methods, their arguments, and default values.

### Stat `stat`

Add `stat` as an argument to `layer` or `geom` methods to define data transformations:

`layer(start="count")`

Supported transformations:

- `identity`:  leaves the data unchanged
- `count`:  calculates the number of points with same x-axis coordinate
- `bin`:  calculates the number of points with x-axis coordinate in the same level
- `smooth`:  performs smoothing
- `contour`: calculates contours of 3D data

### Aesthetic mappings `mapping`
With mappings, you can define how variables in dataset are mapped to the visual elements of the chart.
Add the `aes(x, y, other)` method to `layer` or `geom`, where
- `x`: the dataframe column to map to the x axis. 
- `y`: the dataframe column to map to the y axis.
- `other`: other visual properties of the chart, such as color, shape, size, or position.

`geom_bar(x="cty", y="hwy", color="cyl")`
you can use a simplified form:
`geom_bar("cty", "hwy", color="cyl")`

### Position `position`

All layers have a position adjustment that resolves overlapping geoms. 
Override the default settings by using the `position` argument in the `geom` methods:

`geom_bar(position='dodge')`

Available adjustments:
- `dodge` (`position(PosKind.DODGE)`)
- `jitter` (`position(PosKind.JITTER)`)
- `jitterdodge` (`position(PosKind.JITTER_DODGE)`)
- `nudge` (`position(PosKind.NUDGE)`)

See the [position reference](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/positions.md) for more information about position adjustments.

### Features affecting the entire plot

#### Scales

Scales enable tailored adjustment of data values to visual properties and aesthetics. Override default scales to tweak 
details like the axis labels or legend keys, or to use a completely different translation from data to aesthetic.
For example, to override the fill color on the histogram:

`p + geom_histogram() + scale_color_continuous("red", "green")`

See the list of the available `scale` methods in the [scale reference](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/scales.md)

#### Legend
The axes and legends help users interpret plots.
Use the `guide` methods or the `guide` argument of the `scale` method to customize the legend.
For example, to define the number of columns in the legend:

`p + scale_color_discrete(guide=guide_legend(ncol=2))`

See more information in the [guide reference](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/guide.md)

#### Sampling

Sampling is a special technique of data transformation built into Lets-Plot and it is applied after stat transformation.
Sampling helps prevents UI freezes and out-pf-memory crashes when attempting to plot an excessively large number of geometries.
By default, the technique applies automatically when the data volume exceeds a certain threshold.
The `none` value disables any sampling for the given layer. The sampling methods can be chained together using the + operator.

Available methods:
- `sampling_random_stratified`: randomly selects points from each group proportionally to the group size but also ensures 
that each group is represented by at least a specified minimum number of points.
- `sampling_random`: selects data points at randomly chosen indices without replacement.
- `sampling_pick`: analyses X-values and selects all points which X-values get in the set of first `n` X-values found in the population.
- `sampling_systematic`: selects data points at evenly distributed indices.
- `sampling_vertex_dp`, `sampling_vertex_vw`: simplifies plotting of polygons. 
There is a choice of two implementation algorithms: Douglas-Peucker (`_dp`) and 
Visvalingam-Whyatt (`_vw`).

For more details, see the [sampling reference](https://github.com/JetBrains/lets-plot/blob/master/docs/sampling_python.md).

<a name="GSG" id="gsg"></a>
### Getting started

Let's plot a point chart built using the mpg dataset.

Create the `DataFrame` object and retrieve the data.

In [1]:
%use lets-plot,spark

val mpg = spark
            .read()
            .option("header", "true")
            .csv("mpg.csv")

mpg.show(5, true)

+---+------------+-----+-----+----+---+----------+---+---+---+---+-------+
|_c0|manufacturer|model|displ|year|cyl|     trans|drv|cty|hwy| fl|  class|
+---+------------+-----+-----+----+---+----------+---+---+---+---+-------+
|  1|        audi|   a4|  1.8|1999|  4|  auto(l5)|  f| 18| 29|  p|compact|
|  2|        audi|   a4|  1.8|1999|  4|manual(m5)|  f| 21| 29|  p|compact|
|  3|        audi|   a4|    2|2008|  4|manual(m6)|  f| 20| 31|  p|compact|
|  4|        audi|   a4|    2|2008|  4|  auto(av)|  f| 21| 30|  p|compact|
|  5|        audi|   a4|  2.8|1999|  6|  auto(l5)|  f| 16| 26|  p|compact|
+---+------------+-----+-----+----+---+----------+---+---+---+---+-------+
only showing top 5 rows



Plot the basic point chart.

In [2]:
fun colAsInt(col: String): List<Int> {
    return mpg.select(col).collectAsList().map{(it[0] as String).toInt()}
}

fun colAsDouble(col: String): List<Double> {
    return mpg.select(col).collectAsList().map{(it[0] as String).toDouble()}
}

fun colAsString(col: String): List<String> {
    return mpg.select(col).collectAsList().map{it[0] as String}
}

val df = mapOf(
    "displ" to colAsDouble("displ"),
    "hwy" to colAsInt("hwy"),
    "cyl" to colAsInt("cyl"),
    "number" to colAsInt("_c0"),
    "usage" to colAsInt("cty"),
    "drv" to colAsString("drv"),
    "year" to colAsInt("year")
)

val p = lets_plot(df) {x = "number"; y = "usage"}
p + geom_point()



Perform the following aesthetic mappings:
 - `x` = displ (the **displ** column of the dataframe)
 - `y` = hwy  (the **hwy** column of the dataframe)
 - `color` = cyl (the **cyl** column of the dataframe)

In [3]:
// Mapping
lets_plot(df) {x = "displ"; y = "hwy"; color = "cyl"} + geom_point(df)



Apply statistical data transformation to count the number of cases at each x position.

In [4]:
val p = lets_plot(df) {x = "displ"; y = "hwy"; color = "cyl"}
p + geom_point(df, stat = Stat.count())



Change the pallete and the legend, add the title. 

In [5]:
import jetbrains.letsPlot.scale.*

val p = lets_plot(df) {x = "displ"; y = "hwy"; color = "cyl"}
p + 
    geom_point(df, position = Pos.nudge) + 
    scale_color_continuous("red", "green", guide = guide_legend(ncol=2)) + 
    ggtitle("Displacement by horsepower")    



Apply the randomly stratified sampling to select points from each group proportionally 
to the group size.

In [7]:
import jetbrains.letsPlot.scale.*
import jetbrains.letsPlot.intern.*
import jetbrains.letsPlot.sampling.*

val p = lets_plot(df) {x = "displ"; y = "hwy"; color = "cyl"}
p   + geom_point(
      data=df, position=Pos.nudge, 
      sampling = sampling_random_stratified(40)
  ) + scale_color_continuous(
      "blue", "pink",
      guide = guide_legend(ncol=2)
  )

