# Lets-Plot Usage Guide

<a href="https://opensource.org/licenses/MIT">
<img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="Couldn't load MIT license svg"/>
</a>

- [System requirements](#sys)
- [Installation](#install)
- [Understanding architecture](#implementation)
- [Learning API](#api)
- [Getting started](#gsg)


**Lets-Plot** is an open-source plotting library for statistical data. It is implemented using the 
[Kotlin programming language](https://kotlinlang.org/) that has the multi-platform nature.
That's why Lets-Plot provides the plotting functionality that 
is packaged as a JavaScript library, a JVM library, and a native Python extension.

The design of Lets-Plot library is heavily influenced by Leland Wilkinson work 
[The Grammar of Graphics](https://www.goodreads.com/book/show/2549408.The_Grammar_of_Graphics) describing the deep features 
that underlie all statistical graphics.

<a name="SystemRequirementa" id="sys"></a>
## System requirements

Supported operating systems:
- macOS
- Linux

Supported Python versions:
- 3.7
- 3.8

<a name="Installation" id="install"></a>
## Installation

The `lets-plot` package is available in the [pypi.org](https://pypi.org/project/lets-plot/) repository.
Execute the following command to add the `lets-plot` package to your Python interpreter:

`pip install lets-plot`

<a name="Implementation" id="implementation"></a>
## Understanding Lets-Plot architecture
In `lets-plot`, the **plot** is represented at least by one
**layer**. It can be built based on the default dataset with the aesthetics mappings, set of scales, or additional 
features applied.

The **Layer** is responsible for creating the objects painted on the ‘canvas’ and it contains the following elements:
- **Data** - the set of data specified either once for all layers or on per layer basis.
One plot can combine multiple different datasets (one per layer).
- **Aesthetic mapping** - describes how variables in the dataset are mapped to visual properties of the layer, such as color, shape, size, or position.
- **Geometric object** - a geometric object used draw observations.
- **Statistical transformation** - computes some kind of statistical summary on the raw input data. 
For example, ‘bin’ statistic is used for histograms and `smooth` stat is used for regression lines. 
Most stats take additional parameters to specify details of statistical transformation of data.
- **Position adjustment** - method used compute the final coordinates of geometry. 
Used to build variants of the same geom object or to avoid overplotting.

![layer diagram](layer-small.png)

<a name="API" id="api"></a>
## Learning API

```
p = ggplot(data=df) 
p + layer(geom='point', stat='identity', mapping=aes('x', 'y', size='x'), position='identity')
```

### Geometric objects `geom`

You can add a new geometric specifying an argument in `layer(geom='<type of chart>')`, for example, `layer(geom='point')`.
Alternatively, you can add the `geom_xxx` object to `ggplot`:

```
p = ggplot(data=df)
p + geom_point()
```

The following charts are supported:

- Area chart: [`geom_area()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_area-geom_ribbon)
- Bar chart: [`geom_bar()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_bar--bar-chart)
- Boxplot chart: [`geom_boxplot()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_boxplot)
- Contour chart: [`geom_contour(), geom_contourf()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_contour-geom_contourf)
- Density chart: [`geom_density()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_density)
  and [`geom_density2d()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_density2d-geom_density2df)
- Error bar chart: [`geom_errorbar()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_errorbar)
- Historgam: [`geom_histogram()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_freqpoly-geom_histogram)
- Line chart: [`geom_line()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_path-geom_line-geom_step)
- Scatter chart: [`geom_point()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_point)
- Polygon chart:  [`geom_polygon`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_polygon)
- Rectangle chart, Tile chart: [`geom_rect()`, `geom_tile()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_raster-geom_tile-geom_rect)
- Image plot: [`geom_image()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md#geom_image)

See the [geom reference](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/geoms.md) for more information about the supported
geometric methods, their arguments, and default values.

### Stat `stat`

Add `stat` as an argument to `layer` or `geom` methods to define data transformations:

`layer(start='count`)

Supported transformations:

- `identity`:  leaves the data unchanged
- `count`:  calculates the number of points with same x-axis coordinate
- `bin`:  calculates the number of points with x-axis coordinate in the same level
- `smooth`:  performs smoothing
- `contour`: calculates contours of 3D data

### Aesthetic mappings `mapping`
With mappings, you can define how variables in dataset are mapped to the visual elements of the chart.
Add the `aes(x, y, other)` method to `layer` or `geom`, where
- `x`: the dataframe column to render on the x axis. 
- `y`: the dataframe column to render on the y axis.
- `other`: other visual properties of the chart, such as color, shape, size, or position.

`geom_bar(x='cty', y='hwy', color='cyl')`
you can use a simplified form:
`geom_bar('cty', 'hwy', color='cyl')`

### Position `position`

All layers have a position adjustment that resolves overlapping geoms. 
Override the default settings by using the `position` argument in the `geom` methods:

`geom_bar(position='dodge')`

Available adjustments:
- `dodge` ([`position_dodge()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/positions.md#position_dodge))
- `jitter` ([`position_jitter()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/positions.md#position_jitter))
- `jitterdodge` ([`position_jitterdodge()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/positions.md#position_jitterdodge))
- `nudge` ([`position_nudge()`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/positions.md#position_nudge))

See the [position reference](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/positions.md) for more information about position adjustments.

### Features affecting the entire plot

#### Scales

Scales enable tailored adjustment of data values to visual properties and aesthetics. Override default scales to tweak 
details like the axis labels or legend keys, or to use a completely different translation from data to aesthetic.
For example, to override the fill color on the histogram:

`p + geom_histogram() + scale_fill_brewer(name="Trend", palette="RdPu")`

See the list of the available  `scale` methods in the [scale reference](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/scales.md)

#### Coordinated system

The coordinate system determines how the x and y aesthetics combine to position elements in the plot. 
To override the default X and Y ratio:

`p + coord_fixed(ratio=2)`

See the list of the available methods in [coordinates reference](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/coordinates.md)

#### Legend
The axes and legends help users interpret plots.
Use the `guide` methods or the `guide` argument of the `scale` method for customize the legend.
For example, to define the number of columns in the legend:

`p + scale_color_discrete(guide=guide_legend(ncol=2))`

See more information in the [guide reference](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/guide.md)

#### Sampling

Sampling is a special technique of data transformation built into Lets-Plot and it is applied after stat transformation.
Sampling helps prevents from UI freezes and out-pf-memory crashes when attempting to plot an excessively large number of geometries.
By default, the technique applies automatically when the data volume exceeds a certain threshold.
The `none` value disables any sampling for the given layer. The sampling methods can be chained together using the + operator.

Available methods:
- `sampling_random_stratified`: randomly selects points from each group proportionally to the group size but also ensures 
that each group is represented by at least specified minimal number of points.
- `sampling_random`: selects data points at randomly chosen indices without replacement.
- `sampling_pick`: analyses X-values and selects all points which X-values get in the set of first `n` X-values found in the population.
- `sampling_systematic`: selects data points at evenly distributed indices.
- `sampling_vertex_dp`, `sampling_vertex_vw`: simplifies plotting of polygons. 
There is a choice of two implementation algorithms: Douglas-Peucker (`_dp`) and 
Visvalingam-Whyatt (`_vw`).

For more details, see the [sampling reference](https://github.com/JetBrains/lets-plot/blob/master/docs/sampling_python.md).

<a name="GSG" id="gsg"></a>
### Getting started

Let's plot a point chart built using the mpg dataset.

Create the `DataFrame` object and retrieve the data.

In [1]:
# Data set

import pandas as pd
from lets_plot import *
mpg = pd.read_csv('http://jetbrains.bintray.com/datalore-plot/mpg.csv')
mpg.head()

Unnamed: 0.1,Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
0,1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
1,2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
2,3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
3,4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
4,5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact


Plot the basic point chart.

In [2]:
# Basic plotting

p = ggplot(mpg)
p + geom_point()

Perform the following aesthetic mappings:
 - `x` = displ (the Displacement column)
 - `y` = hwy  (the Horsepower column)
 - `color` = cyl (the Cylinder column)

In [3]:
# Mapping

p + geom_point(aes('displ', 'hwy', color='cyl'))

Apply statistical data transformation to count the number of cases at each x position.

In [4]:
p + geom_point(aes('displ', 'hwy', color='cyl'), stat='count')

Change the pallete and the legend, add the title. 

In [5]:
p + geom_point(aes('displ', 'hwy', color='cyl'), position='nudge') + \
scale_color_brewer(palette="RdPu", guide=guide_legend(ncol=2)) + \
ggtitle('Displacement by horsepower')    

Apply the randomly stratified sampling to select points from each group proportionally 
to the group size.

In [6]:
p + geom_point(aes('displ', 'hwy', color='cyl'), position='nudge', 
               sampling=sampling_random_stratified(40), guide=guide_legend(ncol=2)) + \
scale_color_brewer(palette="RdPu")
  

Plot a collection of charts using the [`GGBunch`](https://github.com/JetBrains/lets-plot/blob/master/docs/ref/python/ggbunch.md) method. 

In [7]:
bunch = GGBunch()
# Point chart
p_point = p + geom_point(aes('displ', 'hwy', color='cyl'))
bunch.add_plot(p_point, 0, 0, 600, 200)
bunch.show()
#Area chart
p_area = p + geom_area(aes('displ', 'hwy', color='drv'))
bunch.add_plot(p_area, 600, 0, 600, 200)
bunch.show()