### Stratified sampling

In large dataset a relatively small group of points might be overplotted by the dominant group. In this case **stratified** sampling can help.

In [1]:
import numpy as np
import pandas as pd
from datalore_plot import *

In [2]:
N = 5000 
small_group = 3
large_group = N - small_group

np.random.seed(123)
data = dict(
    x = np.random.normal(0, 1, N),
    y = np.random.normal(0, 1, N),
    cond = ['A' for _ in range(large_group)] + ['B' for _ in range(small_group)]
)

In [3]:
# Data points in group 'B' (small group) are overplotted by dominant group 'A'.
p = ggplot(data, aes('x','y',color='cond'))
p + geom_point(size=5, alpha=.2)

In [4]:
# If 'random' sampling is applied (which is the default for point layers) the group 'B' is lost altogether.
p + geom_point(size=5, sampling=sampling_random(50, seed=2))

In [5]:
# Stratified sampling ensures that group 'B' is represented.
p + geom_point(size=5, sampling=sampling_random_stratified(50, seed=2))