In [1]:
%useLatestDescriptors
%use ggdsl(0.2.3-dev-11)

# Basics

## Data

The main data model for working with plotting is "named data" or "dataframe", i.e. a set of named value columns of the same size. At the moment, the library does not support working with nullable values. Thus, the input data must have the form `Map<String, List<Any>>`.

In [2]:
val dataset = mapOf<String, List<Any>>(
    "time, ms" to listOf(12, 87, 130, 149, 200, 221, 250),
    "relativeHumidity" to listOf(0.45, 0.3, 0.21, 0.15, 0.22, 0.36, 0.8),
    "flowOn" to listOf(true, true, false, false, true, false, false)
)

In the future we will need to refer to the columns of our dataset. To do this we need to create `ColumnPointer` for each one.

In [3]:
// 1. Using `columnPointer()` function. We need specify type of column and its name in the dataset:
val timeMs = columnPointer<Int>("time, ms")
// 2. String API similar to the previous one, but with using invocation of `String`:
val humidity = "relativeHumidity"<Double>()
// 3. Using delegation of unnamed column pointer - his name will be taken from the name of the variable:
val flowOn by columnPointer<Boolean>()

## Plot creation

To create a plot, you need to call the `plot()` function by passing a dataset as an argument. This function creates a context in which you can add layers. A *layer* is a set of mappings from the data to the graph's visual parameters. Here's an example of a graph with one simple layer:

In [4]:
plot(dataset) {
    point {
        // maps values from "time, ms" to X
        x(timeMs)
        // maps values from "relativeHumidity" to Y
        y(humidity)
        // set size of points to 4.5
        size(4.5)
    }
}

### Layers, aesthetics, mappings and scales

Each layer is characterized by its *geometrical entity* or just *geom*. Each geom has its own set of *aesthetic attributes* or simply *aesthetics* or *aes*. Aesthetics can be *positional* (e.g. `x`, `y`, `yMin`, `yMax`, `middle`) or *non-positional* (such as `color`, `size`, `width`). Non-positional attributes are characterized by some type (for example, `size` associated with `Double`, `color` with a special type `Color`).
The aes value can be assigned in 2 ways: by setting and by mapping.

*Setting* is a simply setting constant value:
```
x(12.0f)
size(5.0)
color(Color.RED)
```

*Mapping* is a mapping from the data column to the values of the aesthetic attribute.
```
x(timeMs)
size(humidity)
color(flowOn)
```

The function of this mapping is called *scale*. Scales play a key role in data visualization. In the examples above, the mappings use the default scales, but we can specify the scales explicitly. There are two types of scales --- *categorical* (or discrete) and *continuous*, depends on its domain and range type. If scale is continuous, its domain and range are set using limits, while categorical scale domain and range are set like lists of categories and corresponding to them values. Also scales can be either positional or non-positional (depending on which aesthetic attributes are displayed). Refined scales (with explicit domain/range) are typed. Continuous scales have `transform` parameter; it defines the function type (which is linear by default). Scales can be created with special functions:

In [5]:
// Non-positional unspecified categorical scale.
val nonPosCatUnspec = categorical()
// Non-positional unspecified continuous scale.
val nonPosContUnspec = continuous(Transformation.LOG10)
// Positional unspecified categorical scale.
val posCatUnspec = categoricalPos()
// Positional unspecified continuous scale.
val posContUnspec = continuousPos()

// Non-positional categorical scale.
val nonPosCat = categorical(listOf(true, false), listOf(Color.RED, Color.BLUE)) //types are inferred
// Non-positional continuous scale.
val nonPosCont = continuous<Double, Double>(rangeLimits = 8.0 to 17.0) // specify range only, need to specify types
// Positional categorical scale.
val posCat = categoricalPos(listOf(1, 2, 4, 8, 16))
// Positional continuous scale.
val posCont = continuousPos(0 to 260, transform = Transformation.REVERSE)

To apply scale on column, simply use `.scaled()` extensin function of column and pass your scale as an argument. Note, that for refined scale its `DomainType` must match the type of column.

In [6]:
val flowOnToColor = flowOn.scaled(nonPosCat)

The last thing left is to create a mapping, which is done in the same way as in the case without an explicit scaling --- aesthetic innovation:

In [7]:
plot(dataset) {
    point {
        x(timeMs.scaled(posContUnspec))
        y(humidity.scaled(continuousPos(0.0 to 1.0)))
        size(humidity.scaled(nonPosCont))
        color(flowOnToColor)
        symbol(flowOn.scaled(categorical()))
    }
}

### Scale parameters: axis and legend

The most important part of reading the charts is the *guides*. They are in fact essentially mini-charts of scales. The guides of positional scales are *axes*, and the guides of non-positional ones are *legends*. Every applied scale has its own default guide. You can customize it through use `with`

In [8]:
// TODO change `with` name/API
// TODO 2 better breaks API
plot(dataset) {
    point {
        x(timeMs).with {
            axis.name = "Time from start of counting,\n milliseconds"
        }
        y(humidity.scaled(continuousPos(0.0 to 1.0))).with {
            axis {
                name = "Relative humidity"
                 breaks = listOf(0.0, 0.3, 0.6, 0.9)
                 labels = listOf("0%", "30%", "60%", "90%")
            }
         }
        size(12.0)
        color(humidity.scaled(continuous())).with {
            legend {
                name = ""
                type = colorBar(40.0, 190.0, 15)
                format = "e"
            }
        }
    }
}

## Special aes types

In addition to the standard types of aesthetic attributes, there are a few with special restrictions. For example, so-called "sub-positional" aesthetic attributes can be mapped, but only with a regular column pointer, not a scaled one (in fact, these aesthetic attributes are a sub-part of another one, and their mappings have the same scale as the "parent" aesthetic). Also, some aesthetics only support the setting, and can not be mapped.

In [9]:
plot(
    mapOf(
        "x" to listOf("a", "b", "c"),
        "min" to listOf(0.8, 0.4, 0.6),
        "lower" to listOf(0.9, 1.4, 0.8),
        "middle" to listOf(1.5, 2.4, 1.6),
        "upper" to listOf(1.9, 3.4, 1.7),
        "max" to listOf(3.1, 4.4, 2.6),
    )
) {
    boxplot {
        x("x"())
        // sub-y aesthetics:
        yMin("min"())
        lower("lower"())
        middle("middle"())
        upper("upper"())
        yMax("max"())
        // `fatten` can only be setted:
        fatten(4.5)
    }
}

## Grouping

This part of API allows you to work with grouped data. You can either provide groped data as dataset or perform grouping inside the plot DSL:

In [10]:
val dataset = NamedData(
    mapOf(
        "timeG" to listOf(1.0, 2.2, 3.4, 6.6, 2.1, 4.4, 6.0, 1.5, 4.7, 6.7),
        "value" to listOf(112.0, 147.3, 111.1, 200.6, 90.8, 110.2, 130.4, 100.1, 90.0, 121.8),
        "c-type" to listOf("A", "A", "A", "A", "B", "B", "B", "C", "C", "C")
    )
)

val timeG by columnPointer<Double>()
val value by columnPointer<Double>()
val cType = columnPointer<String>("c-type")

In [11]:
plot(dataset.groupBy(cType)) {
    line {
        x(timeG)
        y(value)
    }
}

In [12]:
plot(dataset) {
    groupBy(cType) {
        line {
            x(timeG)
            y(value)
        }
    }
}

When working with grouped data, mappings are allowed only for those columns that are grouping keys

In [13]:
plot(dataset) {
    groupBy(cType) {
        line {
            x(timeG)
            y(value)
            color(cType)
            width(4.0)
        }
    }
}

## Iterable API

Instead of using dataframes and column pointers, you can simply use iterables as a data source. You can map them to aesthetics just like pointers to columns --- via invocation, and apply scales in the same way (using the `.scaled()` extension)

In [14]:
val month = listOf("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
val numberOfDays = listOf(31, 28, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30)
val season = listOf("winter", "winter", "spring", "spring", "spring", "summer", "summer", "summer", "autumn", "autumn", "autumn", "winter")

plot {
    x((month to "month").scaled(categoricalPos()))
    bar {
        y(numberOfDays to "number of days")
        color((season to "season").scaled(categorical(
            listOf("winter", "spring", "summer", "autumn"),
            listOf(Color.BLUE, Color.GREEN, Color.RED, Color.ORANGE)
        )))
     }
}

# DataFrame API

# Statistics

Rather than using statistical pre-transformations of your dataset, you can calculate it inside the DSL. The "stat" family of functions is used for this purpose. These functions convert the original data into a new dataset with calculated statistics. The set of these statistics is defined by the function. Within a context, its statistics can be accessed through the `Stat` field .These statistics act as pointers to the columns in the new dataset. And you can do the same thing with them as with regular `ColumnPointer` : they can be mapped to aesthetics, scaled, used in tooltips, etc.

In [16]:
import kotlin.random.Random

val observations = List(1000) { Random.nextDouble() }
val observationsDataset = mapOf(
    "observations" to observations
)
val obs = columnPointer<Double>("observations")

In [17]:
plot(observationsDataset) {
    statBin(obs) {
         bar {
            // simple mapping
            x(Stat.BINS)
            // mapping with scale
            y(Stat.COUNT.scaled(continuousPos(0 to 100, transform = Transformation.REVERSE)))

            alpha(0.5)

            // formatting of stat value format
            tooltips(statFormats = mapOf(
                Stat.COUNT to "d"
            )) {
                // line with the name of stat (i.e. "count") on the left side and its value on the right side
                line(Stat.COUNT)
             }
         }

         path {
             x(Stat.BINS)
             y(Stat.COUNT)

             width(2.5)
             color(Color.RED)
         }
    }
}

In addition, for basic statistical charts there is a simpler API, which combines into one function the counting of statistics and the creation of a layer. For example, a histogram is nothing more than a counting of "bin" statistics and a bar chart that has bins values on X and count values on Y.

In [18]:
val histPlot = plot(observationsDataset) {
    histogram(obs)

    layout.title = "`histogram`"
}
histPlot

You can compare it to a bar chart with the calculation of bins stat:

In [19]:
val binBarPlot = plot(observationsDataset) {
    statBin(obs) {
        bar {
            x(Stat.BINS)
            y(Stat.COUNT)
         }
    }
    layout.title = "`statBin` + `bar`"
}

plotGrid(listOf(histPlot, binBarPlot), 2, 800, 600)

`histogram` function also opens a context, in which you can create bindings for the aesthetic attributes of the bar. In addition, the "stat-bin" stats are defined in this context, allowing you to map them to aesthetics. Also, you can rewrite default mappings to coordinates (to display relative values instead of absolute ones, for example).

In [20]:
plot(observationsDataset) {
    histogram(obs, Bins.byWidth(0.05), BinXPos.center(0.5)) {
        y(Stat.DENSITY)

        color(Stat.COUNT.scaled(continuous(rangeLimits = Color.GREEN to Color.RED)))

        borderLine {
            color(Color.BLACK)
            width(0.3)
        }

        tooltips(title = "${value(Stat.BINS)} ± 0.025") {
            line(Stat.DENSITY)
         }
    }
}

Stats API works with `Iterable` as well:

In [21]:
plotGrid(listOf(
    plot {
        statBin(observations) {
            point {
                x(Stat.DENSITY)
                y(Stat.BINS)
             }
        }
    },
    plot {
        histogram(observations)
    }
), 2, 800, 600)

# Series

In addition to the classic GoG-like approach, you can also use a more familiar approach, the so-called *series* approach (which is used in matplotlib, plotly and many other plotting libraries). That is, you can create plot by defining not layers, but series. Series are similar to layers, but unlike layers, data in a series can only be mapped on coordinates (non-position attributes can still be set as well). Every series has its own label. Under the hood, mappings are created from the set of labels of the series of this plot to those non-positional attributes that have been set within the series (note the limitation: settings for the same attributes must be made within the series!; TODO: autocomplete?)

In [22]:
import kotlin.random.Random

val time = List(15) {it}
val valueA = List(15) { Random.nextDouble() }
val valueB = List(15) { Random.nextDouble() }

//TODO
val dataSeries = NamedData(mapOf(
    "time" to time,
    "valueA" to valueA,
    "valueB" to valueB
))

val timeSrc = columnPointer<Int>("time")
val valASrc = columnPointer<Int>("valueA")
val valBSrc = columnPointer<Int>("valueB")

Series-plot can be created by using corresponding functions. For example, `barPlot`:

In [23]:
linePlot(dataSeries) {
    x(timeSrc)
    series("A1") {
        y(valASrc)
        color(Color.RED)
        type(LineType.DASHED)
    }
    series("B2+") {
        y(valBSrc)
        color(Color.ORANGE)
        type(LineType.SOLID)
    }
}

Or simply use `Iterable`:

In [24]:
barPlot(position = Position.Dodge(0.8)) {
    x(time)
    // general setting for all series
    width(0.5)
    series("A1") {
        y(valueA)
        color(Color.RED)
    }
    series("B2+") {
        y(valueB)
        color(Color.ORANGE)
    }
}

Series plot can be created with DataFrame API:

In [25]:
val seriesDF = dataFrameOf(
    "time" to time,
    "aVal" to valueA,
    "bVal" to valueB,
)

In [26]:
seriesDF.create { pointPlot {
   size(5.6)
   series("A") {
       x(time)
       y(aVal)

       symbol(Symbol.ASTERIX)
       color(Color.PEACH)
   }
   series("B") {
       x(time)
       y(bVal)

       symbol(Symbol.DIAMOND)
       color(Color.BLUE)
   }
} }

# Themes

Themes allow you to customize all the graphical elements of the layout --- styles of lines, backgrounds, text, etc. You can create your own theme or use one of the prepared.

In [27]:
val mpgDF = DataFrame.readCSV("https://raw.githubusercontent.com/JetBrains/lets-plot-kotlin/master/docs/examples/data/mpg.csv")
mpgDF.head(3)

To apply theme, just use `theme()` method of layout:

In [28]:
mpgDF.create { plot {
    point {
        x(cty)
        y(hwy)
    }
    layout.theme(Theme.Classic)
} }

In [29]:
import org.jetbrains.kotlinx.ggdsl.ir.Plot
fun plotWithTheme(theme: Theme? = null, title: String? = null):Plot {
    return mpgDF.create { plot {
        point {
            x(cty)
            y(hwy)
        }
        layout {
            theme?.let {
                theme(it)
            }
            this.title = title
        }
    } }
}

In [30]:
plotGrid(listOf(
    plotWithTheme(Theme.Classic, "\"classic\" theme"),
    plotWithTheme(Theme.Grey, "\"grey\" theme"),
    plotWithTheme(Theme.Light, "\"light\" theme"),
    plotWithTheme(Theme.Minimal, "\"minimal\" theme"),
    plotWithTheme(Theme.Minimal2, "\"minimal2\" theme (by default)"),
    plotWithTheme(Theme.None, "\"none\" theme"),
), 2, 600, 400)

## Custom themes

There is a DSL for creating custom themes. The main part of this DSL is the setting of parameters of type "line", "text" and "background". These parameters can be created separately and then applied, or be set up inplace. Each of them has a `blank` parameter of `Boolean` type. If you set it to true, the item will not be displayed.

In [31]:
val redLine = LayoutParameters.line(Color.RED)

val simpleCustomTheme = theme {
    // use previously created parameters
    xAxis.line(redLine)
    // set up parameters
    yAxis.line {
        color = Color.RED
        width = 0.3
    }
    // remvoe ticks on both axes
    axis.ticks {
        blank = true
    }
}

plotWithTheme(simpleCustomTheme)

In [32]:
// blanking all details on axes:
val blankAxesTheme = theme {
    blankAxes()
}
plotWithTheme(blankAxesTheme)