# Lets-Plot Usage Guide

<a href="https://opensource.org/licenses/MIT">
   <img align="left" 
        src="https://img.shields.io/badge/License-MIT-yellow.svg" 
        alt="Couldn't load MIT license svg"/>
</a>
<br/>
<br/>


- [System requirements](#sys)
- [Installation](#install)
- [Understanding architecture](#implementation)
- [Learning API](#api)
- [Getting started](#gsg)


**Lets-Plot** is an open-source plotting library for statistical data. It is implemented using the 
[Kotlin programming language](https://kotlinlang.org/) that has a multi-platform nature.
That's why Lets-Plot provides the plotting functionality that 
is packaged as a JavaScript library, a JVM library, and a native Python extension.

The design of the Lets-Plot library is heavily influenced by 
[ggplot2](https://ggplot2.tidyverse.org) library.

<a name="SystemRequirementa" id="sys"></a>
## System requirements
When installing the Lets-Plot library, consider the following requirements.

Supported operating systems:
- macOS
- Linux
- Windows

Supported Python versions:
- 3.8
- 3.9
- 3.10
- 3.11
- 3.12

<a name="Installation" id="install"></a>
## Installation

The `lets-plot` package is available in the [pypi.org](https://pypi.org/project/lets-plot/) repository.
Execute the following command to install the `lets-plot` package on your Python interpreter:

`pip install lets-plot`

<a name="Implementation" id="implementation"></a>
## Understanding Lets-Plot architecture
In `lets-plot`, the **plot** is represented at least by one
**layer**. It can be built based on the default dataset with the aesthetics mappings, set of scales, or additional 
features applied.

The **Layer** is responsible for creating the objects painted on the ‘canvas’ and it contains the following elements:
- **Data** - the set of data specified either once for all layers or on a per layer basis.
One plot can combine multiple different datasets (one per layer).
- **Aesthetic mapping** - describes how variables in the dataset are mapped to the visual properties of the layer, such as color, shape, size, or position.
- **Geometric object** - a geometric object that represents a particular type of plots.
- **Statistical transformation** - computes some kind of statistical summary on the raw input data. 
For example, `bin` statistics is used for histograms and `smooth` is used for regression lines. 
Most stats take additional parameters to specify details of the statistical transformation of data.
- **Position adjustment** - a method used to compute the final coordinates of geometry. 
Used to build variants of the same `geom` object or to avoid overplotting.

<img src="images/layer.png" width="628" height="636" />

<a name="API" id="api"></a>
## Learning API
The typical code fragment that renders a plot looks as follows:

```
from lets_plot import *
p = ggplot(<dataf>) 
p + geom_<plot_type>(mapping=aes('x', 'y', <other>='<data column name>'), stat=<stat>, position=<adjustment>)
```

### Geometric objects `geom`

You can add a new geometric object (or plot layer) by creating it using the `geom_xxx()` function and then adding this object to `ggplot`:

```
p = ggplot(data=df)
p + geom_point()
```

The following plots are supported:

- Area plot: [`geom_area()`](https://lets-plot.org/python/pages/api/lets_plot.geom_area.html)
- Discrete plot: [`geom_bar()`](https://lets-plot.org/python/pages/api/lets_plot.geom_bar.html), [`geom_pie()`](https://lets-plot.org/python/pages/api/lets_plot.geom_pie.html), [`geom_lollipop()`](https://lets-plot.org/python/pages/api/lets_plot.geom_lollipop.html), [`geom_count()`](https://lets-plot.org/python/pages/api/lets_plot.geom_count.html), [`stat_sum()`](https://lets-plot.org/python/pages/api/lets_plot.stat_sum.html)
- Boxplot: [`geom_boxplot()`](https://lets-plot.org/python/pages/api/lets_plot.geom_boxplot.html)
- Contours: [`geom_contour()`](https://lets-plot.org/python/pages/api/lets_plot.geom_contour.html), [`geom_contourf()`](https://lets-plot.org/python/pages/api/lets_plot.geom_contourf.html)
- Connectors [`geom_path()`](https://lets-plot.org/python/pages/api/lets_plot.geom_path.html), [`geom_line()`](https://lets-plot.org/python/pages/api/lets_plot.geom_line.html), [`geom_segment()`](https://lets-plot.org/python/pages/api/lets_plot.geom_segment.html), [`geom_curve()`](https://lets-plot.org/python/pages/api/lets_plot.geom_curve.html), [`geom_spoke()`](https://lets-plot.org/python/pages/api/lets_plot.geom_spoke.html), [`geom_step()`](https://lets-plot.org/python/pages/api/lets_plot.geom_step.html)
- Density plot: [`geom_density()`](https://lets-plot.org/python/pages/api/lets_plot.geom_density.html), [`geom_area_ridges()`](https://lets-plot.org/python/pages/api/lets_plot.geom_area_ridges.html), [`geom_violin()`](https://lets-plot.org/python/pages/api/lets_plot.geom_violin.html)
  and [`geom_density2d()`](https://lets-plot.org/python/pages/api/lets_plot.geom_density2d.html), [`geom_density2df()`](https://lets-plot.org/python/pages/api/lets_plot.geom_density2df.html)
- Error-bar plot: [`geom_errorbar()`](https://lets-plot.org/python/pages/api/lets_plot.geom_errorbar.html), [`geom_crossbar()`](https://lets-plot.org/python/pages/api/lets_plot.geom_crossbar.html), [`geom_linerange()`](https://lets-plot.org/python/pages/api/lets_plot.geom_linerange.html), [`geom_pointrange()`](https://lets-plot.org/python/pages/api/lets_plot.geom_pointrange.html)
- Histogram: [`geom_freqpoly()`](https://lets-plot.org/python/pages/api/lets_plot.geom_freqpoly.html), [`geom_histogram()`](https://lets-plot.org/python/pages/api/lets_plot.geom_histogram.html) and [`geom_bin2d()`](https://lets-plot.org/python/pages/api/lets_plot.geom_bin2d.html)
- Jitter plot: [`geom_jitter()`](https://lets-plot.org/python/pages/api/lets_plot.geom_jitter.html)
- Line plot: [`geom_line()`](https://lets-plot.org/python/pages/api/lets_plot.geom_line.html)
- Reference lines: [`geom_abline()`](https://lets-plot.org/python/pages/api/lets_plot.geom_abline.html), [`geom_hline()`](https://lets-plot.org/python/pages/api/lets_plot.geom_hline.html), [`geom_vline()`](https://lets-plot.org/python/pages/api/lets_plot.geom_vline.html)
- Polygons:  [`geom_polygon`](https://lets-plot.org/python/pages/api/lets_plot.geom_polygon.html)
- Rectangles, Tiles, Raster: [`geom_rect()`](https://lets-plot.org/python/pages/api/lets_plot.geom_rect.html), [`geom_tile()`](https://lets-plot.org/python/pages/api/lets_plot.geom_tile.html), [`geom_raster()`](https://lets-plot.org/python/pages/api/lets_plot.geom_raster.html)
- Ribbons: [`geom_ribbon()`](https://lets-plot.org/python/pages/api/lets_plot.geom_ribbon.html)
- Scatter plot: [`geom_point()`](https://lets-plot.org/python/pages/api/lets_plot.geom_point.html)
- Dot plot: [`geom_dotplot()`](http://lets-plot.org/python/pages/api/lets_plot.geom_dotplot.html), [`geom_ydotplot()`](http://lets-plot.org/python/pages/api/lets_plot.geom_ydotplot.html)
- Regression lines: [`geom_smooth()`](https://lets-plot.org/python/pages/api/lets_plot.geom_smooth.html)
- Q-Q plot: [`geom_qq()`](https://lets-plot.org/python/pages/api/lets_plot.geom_qq.html), [`geom_qq_line()`](https://lets-plot.org/python/pages/api/lets_plot.geom_qq_line.html), [`geom_qq2()`](https://lets-plot.org/python/pages/api/lets_plot.geom_qq2.html), [`geom_qq2_line()`](https://lets-plot.org/python/pages/api/lets_plot.geom_qq2_line.html)
- ECDF plot: [`stat_ecdf()`](https://lets-plot.org/python/pages/api/lets_plot.stat_ecdf.html)
- Summary: [`stat_summary()`](https://lets-plot.org/python/pages/api/lets_plot.stat_summary.html), [`stat_summary_bin()`](https://lets-plot.org/python/pages/api/lets_plot.stat_summary_bin.html)
- Function plot: [`geom_function()`](https://lets-plot.org/python/pages/api/lets_plot.geom_function.html)
- Text: [`geom_text()`](https://lets-plot.org/python/pages/api/lets_plot.geom_text.html), [`geom_label()`](https://lets-plot.org/python/pages/api/lets_plot.geom_label.html)
- Map: [`geom_map()`](https://lets-plot.org/python/pages/api/lets_plot.geom_map.html)
- Image: [`geom_imshow()`](https://lets-plot.org/python/pages/api/lets_plot.geom_imshow.html)

See the [geom reference](https://lets-plot.org/python/pages/charts.html) for more information about the supported
geometric methods, their arguments, and default values.

### Collections of plots
With the [`GGBunch()`](https://lets-plot.org/python/pages/api/lets_plot.GGBunch.html) method, you can 
render a collection of plots. 
Use the `add_plot()` method to add plot to the bunch and set an arbitrary location and size for plots inside the grid:

```
bunch = GGBunch()
bunch.add_plot(plot1, 0, 0)
bunch.add_plot(plot2, 0, 200)
```

See the [GGBunch](https://nbviewer.jupyter.org/github/JetBrains/lets-plot-docs/blob/master/source/examples/cookbook/ggbunch.ipynb) example for more information.

### Stat `stat`

Add `stat` as an argument to `geom_xxx()` function to define statistical data transformations:

`geom_point(stat='count')`

Supported transformations:

- `identity`:  leave the data unchanged
- `count`:  calculate the number of points with same x-axis coordinate
- `bin`:  calculate the number of points falling in each of adjacent equally sized ranges along the x-axis
- `bin2d`:  calculate the number of points falling in each of adjacent equal sized rectangles on the plot plane
- `smooth`:  perform smoothing
- `contour`, `contourf`: calculate contours of 3D data
- `boxplot`: calculate components of a box plot.
- `density`, `density2d`, `density2df`: perform a kernel density estimation for 1D and 2D data

### Aesthetic mappings `mapping`
With mappings, you can define how variables in dataset are mapped to the visual elements of the plot.
Pass the result of the `aes(x, y, other)` function to `geom`, where:
- `x`: the dataframe column to map to the x axis. 
- `y`: the dataframe column to map to the y axis.
- `other`: other visual properties of the plot, such as color, shape, size, or position.

`geom_bar(x='cty', y='hwy', color='cyl')`
you can use a simplified form:
`geom_bar('cty', 'hwy', color='cyl')`

### Position adjustment `position`

All layers have a position adjustment that computes the final coordinates of geometry. 
Position adjustment is used to build variances of the same plots and resolve overlapping. 
Override the default settings by using the `position` argument in the `geom` functions:

`geom_bar(position='dodge')`

Available adjustments:
- `dodge`
- `jitter`
- `jitterdodge`
- `nudge`
- `identity`
- `fill`
- `stack`

See the [position reference](https://lets-plot.org/python/pages/api.html#positions) for more information about position adjustments.

### Features affecting the entire plot

#### Scales

Enables choosing a reasonable scale for each mapped variable depending on the variable attributes. Override default scales to tweak 
details like the axis labels or legend keys, or to use a completely different translation from data to aesthetic.
For example, to override the fill color on the histogram:

`p + geom_histogram() + scale_fill_brewer(name="Trend", palette="RdPu")`

See the list of the available `scale` methods in the [scale reference](https://lets-plot.org/python/pages/api.html#scales)

#### Coordinated system

The coordinate system determines how the x and y aesthetics combine to position elements in the plot. 
For example, to override the default X and Y ratio:

`p + coord_fixed(ratio=2)`

See the list of the available methods in [coordinates reference](https://lets-plot.org/python/pages/api.html#coordinates)

#### Legend
The axes and legends help users interpret plots.
Use the `guide` methods or the `guide` argument of the `scale` method to customize the legend.
For example, to define the number of columns in the legend:

`p + scale_color_discrete(guide=guide_legend(ncol=2))`

See more information in the [guide reference](https://lets-plot.org/python/pages/api.html#scale-guides)

Adjust legend location on plot using the `theme` legend_position, legend_justification and legend_direction methods, see:
[TBD]


#### Sampling

Sampling is a special technique of data transformation built into Lets-Plot and it is applied after stat transformation.
Sampling helps prevents UI freezes and out-of-memory crashes when attempting to plot an excessively large number of geometries.
By default, the technique applies automatically when the data volume exceeds a certain threshold.
The `none` value disables any sampling for the given layer. The sampling methods can be chained together using the + operator.

Available methods:
- `sampling_random_stratified`: randomly selects points from each group proportionally to the group size but also ensures 
that each group is represented by at least a specified minimum number of points.
- `sampling_random`: selects data points at randomly chosen indices without replacement.
- `sampling_pick`: analyses X-values and selects all points which X-values get in the set of first `n` X-values found in the population.
- `sampling_systematic`: selects data points at evenly distributed indices.
- `sampling_vertex_dp`, `sampling_vertex_vw`: simplifies plotting of polygons. 
There is a choice of two implementation algorithms: Douglas-Peucker (`_dp`) and 
Visvalingam-Whyatt (`_vw`).

For more details, see the [sampling reference](https://lets-plot.org/python/pages/sampling.html).

<a name="GSG" id="gsg"></a>
### Getting started

Let's plot a point plot built using the mpg dataset.

Create the `DataFrame` object and retrieve the data.

In [1]:
# Data set

import pandas as pd
mpg = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/mpg.csv")
mpg.head()

Unnamed: 0.1,Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
0,1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
1,2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
2,3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
3,4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
4,5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact


Plot the basic point plot.

In [2]:
# Basic plotting
from lets_plot import *
# Load Lets-Plot JS library
LetsPlot.setup_html()

Perform the following aesthetic mappings:
 - `x` = displ (the **displ** column of the dataframe)
 - `y` = hwy  (the **hwy** column of the dataframe)
 - `color` = cyl (the **cyl** column of the dataframe)

In [3]:
p = ggplot(mpg)

p + geom_point(aes('displ', 'hwy', color='cyl'))

Apply statistical data transformation to count the number of cases at each x position.

In [4]:
p + geom_point(aes('displ', size='..count..', col='..count..'), stat='count')

Change the pallete and the legend, add the title. 

In [5]:
p += scale_color_continuous(low="blue", high="pink", guide=guide_legend(ncol=2)) \
     + ggtitle('Highway MPG by displacement')
p + geom_point(aes('displ', 'hwy', color='cyl'), position=position_jitter(seed=42))    

Apply the randomly stratified sampling to select points from each group proportionally 
to the group size.

In [6]:
p + geom_point(
    aes('displ', 'hwy', color='cyl'), 
    position=position_jitter(seed=42), 
    sampling=sampling_random_stratified(40))